Régression Linéaire : Calcul Matriciel

On reprend le dataset de la partie précédente.

m nombre d’échantillons dans le dataset ( ici 50 )
n nombre de variables dans le dataset ( ici 1 )

Le Modèle sous forme matricielle

$\Theta$ est une matrice contenant a et b :

\Theta = \left[ \begin{matrix} a \\ b \\ \end{matrix} \right]

X est une matrice contenant :

l’ensemble des coordonnées x des échantillons
une colonne de biais ( 1 )

X = \left[ \begin{matrix} x^0 & 1 \\ x^1 & 1 \\ . \\ . \\ x^{m-1} & 1 \\ \end{matrix} \right]

On considère : $\boxed{ F = X.\Theta }$

En effet :

X.\Theta = \left[ \begin{matrix} x^0 & 1 \\ x^1 & 1 \\ . \\ . \\ x^{m-1} & 1 \\ \end{matrix} \right] . \left[ \begin{matrix} a \\ b \\ \end{matrix} \right]

X.\Theta = \left[ \begin{matrix} a.x^0 + b \\ a.x^1 + b \\ .. \\ a.x^{m-1}+b\end{matrix} \right]

X.\Theta = \left[ \begin{matrix} a.x^0 + b \\ a.x^1 + b \\ .. \\ a.x^{m-1}+b\end{matrix} \right] = \left[ \begin{matrix} f(x^0)\\ f(x^1)\\ . \\ . \\ f(x^{m-1}) \\ \end{matrix} \right] =F

Je retrouve donc bien, pour chaque i, l’expression de l’équation du modèle : $f(x^i)=a.x^i+b$

REMARQUE : Dimension des matrices :

dim(F) = [m x 1]
dim(X) = [m*(n + 1)]
dim( $\Theta$ )= [(n + 1) x 1]

# Matrice X  
X = np.hstack((x,np.ones(x.shape)))   # collage de 2 vecteurs l'un à côté de l'autre ( horizontal stack )

# THETA = [a,b] , ce qu'on cherche, y=ax+b
THETA = np .random.randn(2,1) # valeurs initiales aléatoires  

print('THETA =', THETA)
MODEL = X.dot(THETA)

plt.scatter(x, y, label="Nuage de points", color="blue", alpha=0.6)
plt.plot(x, MODEL, label="Courbe de tendance y = ax + b", color="red", linewidth=2)

# Ajout des barres entre les points réels et la courbe de tendance
for i in range(len(x)):
    plt.plot([x[i][0], x[i][0]], [y[i][0], MODEL[i][0]], color="gray", linestyle="--", linewidth=0.8)
plt.xlabel("X")
plt.ylabel("Y")
plt.title("Nuage de points et courbe de tendance avec erreurs")
plt.legend()
plt.grid(True)
plt.show()

THETA =  
[[ 0.76672273]
 [-0.02675862]]

png

Fonction Coût

J(a,b) =\frac{1}{2.m}\sum_{i=0}^{m-1}\left(a.x^{(i)}+b-y^{(i)}\right)^2

J est un scalaire ( moyenne de toutes les erreurs ), dim(J) = [1 x 1]

Y est une matrice contenant l’ensemble des coordonnées y des échantillons:

Y = \begin{bmatrix} y^0 \\ y^1 \\ . \\ . \\ y^{m-1} \\ \end{bmatrix}

J devient alors :

J(a,b) =\frac{1}{2.m}\sum_{i=0}^{m-1}\left(X \Theta-Y\right)^2

\boxed{ J(\Theta) =\frac{1}{2.m}\sum_{i=0}^{m-1}\left(X \Theta-Y\right)^2 }

Calcul du Gradient

Lors de la présentation de la régression linéaire, nous avions :

$\frac{\partial J(a,b) }{\partial a} = \frac{1}{m} \sum_{i=0}^{m-1}\left( x^i.(a.x^i+b-y^i) \right)$
$\frac{\partial J(a,b) }{\partial b} = \frac{1}{m} \sum_{i=0}^{m-1}\left( a.x^i+b-y^i \right)$

Nous pouvons alors considérer la matrice :

\frac{\partial J(\Theta) }{\partial \Theta} = \begin{bmatrix} \frac{\partial J(a,b) }{\partial a} \\ \frac{\partial J(a,b) }{\partial b} \end{bmatrix} = {\color{Magenta} \begin{bmatrix} \frac{1}{m} \sum_{i=0}^{m-1}\left( x^i.(a.x^i+b-y^i) \right) \\ \frac{1}{m} \sum_{i=0}^{m-1}\left( a.x^i+b-y^i \right) \end{bmatrix} }

Reprenons X :

X = \left[ \begin{matrix} x^0 & 1 \\ x^1 & 1 \\ . \\ . \\ x^{m-1} & 1 \\ \end{matrix} \right] \Rightarrow {\color{DarkGreen}X^T=\left[ \begin{matrix} x^0 & x^1 & . & . & x^{m-1} \\ 1 & 1 & . & . & 1 \\ \end{matrix} \right]}

X.\Theta = \left[ \begin{matrix} a.x^0 + b \\ a.x^1 + b \\ .. \\ a.x^{m-1}+b\end{matrix} \right]

{\color{Red} X.\Theta - Y = \left[ \begin{matrix} a.x^0 + b -y^0\\ a.x^1 + b -y^1\\ .. \\ a.x^{m-1}+b-y^{m-1}\end{matrix} \right] }

\frac{1}{m}{\color{DarkGreen}X^T}{\color{Red} \left(X.\Theta - Y\right)} = \frac{1}{m} {\color{DarkGreen}\left[ \begin{matrix} x^0 & x^1 & . & . & x^{m-1} \\ 1 & 1 & . & . & 1 \\ \end{matrix} \right]} {\color{Red} \left[ \begin{matrix} a.x^0 + b -y^0\\ a.x^1 + b -y^1\\ .. \\ a.x^{m-1}+b-y^{m-1}\end{matrix} \right] }

\frac{1}{m}{\color{DarkGreen}X^T} {\color{Red} \left(X.\Theta - Y\right) } = \frac{1}{m} \left[ \begin{matrix} {\color{DarkGreen}x^0}{\color{Red} (a.x^0+b-y^0)} + {\color{DarkGreen}x^1}{\color{Red} (a.x^1+b-y^1)} + ... + {\color{DarkGreen}x^{m-1}}{\color{Red} (a.x^{m-1}+b-y^{m-1})} \\ {\color{Red} (a.x^0+b-y^0)} + {\color{Red} (a.x^1+b-y^1)} + ... + {\color{Red} (a.x^{m-1}+b-y^{m-1})} \\ \end{matrix} \right]

\frac{1}{m}{\color{DarkGreen}X^T}{\color{Red} \left(X.\Theta - Y\right)} = {\color{Magenta} \left[ \begin{matrix} \frac{1}{m} \sum_{i=0}^{m-1}\left( x^i.(a.x^i+b-y^i) \right) \\ \frac{1}{m} \sum_{i=0}^{m-1}\left( a.x^i+b-y^i \right) \\ \end{matrix} \right]}

\boxed{ \frac{\partial J(\Theta) }{\partial \Theta} = \frac{1}{m}.\mathrm{X}^T.(X.\Theta-Y) }

Algorithme de descente du gradient

a_{i+1}=a_i-\alpha.\frac{\partial J(a_i) }{\partial a}

b_{i+1}=b_i-\alpha.\frac{\partial J(b_i) }{\partial b}

Devient :

\begin{bmatrix} a_{i+1} \\ b_{i+1} \end{bmatrix} = \begin{bmatrix} a_{i} \\ b_{i} \end{bmatrix} - \alpha \begin{bmatrix} \frac{\partial J(a_i) }{\partial a} \\ \frac{\partial J(b_i) }{\partial b} \end{bmatrix}

\boxed{ \Theta = \Theta - \alpha.\frac{\partial J(\Theta) }{\partial \Theta} }

#-----------------------------------------------------------------
def model( X, theta ):
    return X.dot(theta)
#-----------------------------------------------------------------
# Fonction coût
def J(X, y, theta):
    m=len(y)
    return 1/(2*m) * np.sum((model(X,theta) - y)**2)
#-----------------------------------------------------------------
# Calcul du Gradient
def grad(X,y ,theta):
    m = len(y)
    return (1/m) * X.T.dot( model(X, theta) - y)
#-----------------------------------------------------------------
def gradient_descent(X, y, theta, alpha, n_iterations):
    
    J_history = np.zeros(n_iterations) 
    
    for i in range(0, n_iterations):
        theta = theta - alpha * grad(X, y, theta) 
        J_history[i] = J(X, y, theta)
        
    return theta, J_history
#-----------------------------------------------------------------
#-----------------------------------------------------------------
n_iterations = 1000
alpha = 0.01 # learning rate

THETA_final, J_hist = gradient_descent(X, y, THETA, alpha, n_iterations)

print('THETA_final = ',THETA_final)

plt.figure(figsize=(10, 6))
plt.scatter(x, y, label="Nuage de points", color="blue", alpha=0.6)
plt.plot(x,model(X,THETA_final), label="Courbe de tendance y = ax + b", color="red", linewidth=2)
# Ajout des barres entre les points réels et la courbe de tendance
for i in range(len(x)):
    plt.plot([x[i][0], x[i][0]], [y[i][0], model(X,THETA_final)[i][0]], color="gray", linestyle="--", linewidth=0.8)
plt.xlabel("X")
plt.ylabel("Y")
plt.title("Modèle déterminé par l'algorithme de la descente du gradient")
plt.legend()
plt.grid(True)
plt.show()

THETA_final =   
[[ 56.55848319]
 [-38.5156578 ]]

png

plt.figure()
plt.plot(J_hist)
plt.xlabel("n_iterations")
plt.ylabel("J")
plt.title("Evolution de J")
plt.grid(True)
plt.show()

png