Jeremy Yu

Jeremy

Formal
- Algorithm
  - disjointset
- Common
- Docker
  - docker
- Javascript
- Network
- Node
- Summary
notes
- LowCodeProject
  - howItWorks
  - progress
- MCU
  - MCU8051
  - commandSystem
  - introduction
- MC_MP_Programming
  - MPI
  - accelerators
  - intro
  - openMP
- advancedAlgorithmicTech
  - AsympoticNotation
  - DerandomisationandRandomisedRounding
  - Graph
  - Greedy
  - Intro
  - RandomWalksandSAT
  - SearchingLogarithmicTime
  - dynamicProgramming
  - networkFlow
  - onlineAlgorithm
- algorithm
  - dynamicProgramming
  - sorting
- algorithmGameTheory
  - ProfitMaximization
  - conbinationAuctions
  - congestionGames
  - intro
  - loadBalanceGames
  - mechanismDesign
  - mechanismWithMoney
  - mechanismsWithoutMoney
  - networkGames
  - review
- compilation
  - compiler
  - intro
  - lexicalAnalysis
- computerCompose
  - catalog
- controlSystem
  - catalog
- dataStructure
  - heap
  - leetCode
  - linearList
  - mod
- efficientAlgorithm
  - Intro
  - Lec2
- frontend
  - cache
- higherMath
  - catalog
- javascript
  - basic
  - nextjs
  - react
  - svelte
  - vue
- linearMath
  - catalog
- machineLearning
  - gradientDescent
  - intro
  - linearRegression
  - multipleFeatures
- network
  - catalog
- privacy_security
  - DiffieHellmanKeyExchange
  - MessageAuthenticationAndHashFunctions
  - PublicKeyEncryption
  - Review
  - Steganography
  - elementsOfCryptography
  - homomorphicEncryptionAndCryptDB
  - intro
- programming_language
  - c++
  - python
- researchMethod
  - articles
  - baseInfo
  - lecture
- typescript
  - typeManipulation
  - unilityTypes
- videoEditor
  - todo
- vite
  - create
- webassembly
  - deeplearn
  - primary
- webpack
  - entry
  - hmr
  - loader
  - output
  - plugin
  - proformance

Gradient Descent

Gradient Descent is an algorithm that can use to try to minimize any function.

Gradient Descent outline

Start with some w,b (attributes)
Keep changing w,b to reduce J(w, b)
Until we settle at or near a minimum

Gradient Descent algorithm

$$ w = w - \alpha \frac{\partial{}}{\partial{w}}J(w, b) $$

$$ b = b - \alpha \frac{\partial{}}{\partial{b}}J(w, b) $$

\alpha means the learning rate, which is between 0 to 1.
Derivative term
Correct: Simultaneously update w and b

The principle of Gradient Descent algorithm

How to choose Alpha (Learning rate)

If the learning rate is too small then gradient descent will work, but it will be slow.
By contrast, if the learning rate is too large then gradient descent may overshoot and never reach minimum.

Running gradient descent

“Batch” Gradient descent

Batch: Each step of gredient descent uses all of the training examples

Feature scaling