From Scratch: Linear Regression
I implemented linear regression (gradient descent) using only numpy and matplotlib (for visualization). I made two implementation: a 1-feature version (the simplest case), and a generalized version (used 2-features for visualization). Here I’ll describe the general implementation.
Technical Details
Sample data. For easier validation, I generated noisy data based a predetermined target plane (y=4+30x1+200x2). Here, the original feature matrix X is defined as [x1 x2].
Feature scaling and adding bias. To speed up the convergence, all feature columns xj∈X were normalized to z-scores
xj:=σjxj−μj
After that, a bias column is added:
Xb:=[1 x1 ... xn]
Regression setup. Given the normalized feature matrix (m data points) with bias column Xb and matrix of regression coefficients θ=[θ0 θ1 θ2]T (θ0 represents the bias), the regression in matrix notation is denoted as below. In numpy, all of these are np.array.
hθ(Xb)=Xb⋅θ
Loss function. The prediction error was measured through a 1/2-scaled mean square:
J(θ)=2m1i=1∑m(hθ(x(i))−y(i))2
Gradient descent (regression). For each parameter θj∈θ, the update through gradient descent (with learning rate α) is defined as:
θj:=θj−α∂θj∂J
This is implemented in matrix form as:
θ:=θ−α∇J(θ)=θ−αm1XbT(hθ(Xb)−y)
Results. With α=0.01, after 1000 iterations of gradient descent, the final parameters achieved were:
θ=[5312.86566202,17.05242068,2766.87065479]