Logistic Regression from Scratch
Implementing logistic regression in Python using numpy library
Introduction
Logistic Regression is a statistical method to predict qualitative response using one or more independent variables. Instead of predicting the response directly, Logistic Regression models probability that the response belongs to a particular category. In logistic regression, we use logistic function.
The logistic function will always producr an S shaped curve, so we always get a sensible prediction. $p(X)$ is sometimes also called sigmoid function.
$$ \log\Big(\frac{p(X)}{1-p(X)}\Big) = w^TX + c $$
$$z = w^TX + c $$
$$ p(X) = \frac{1}{1 + e^{-z}} $$
#collapse-hide
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import normalize
#collapse-hide
def sigmoid(z):
'''Sigmoid function maps from a value between 0 and 1'''
return 1/(1+np.exp(-z))
x = np.linspace(-10,10,100)
y = sigmoid(x)
plt.plot(x,y)
plt.title("Sigmoid Function")
plt.show()
A method called maximum likelihood is to estimate the unknown coefficients. The idea behind this method is to estimate coefficients such that predicted probability $\hat{p}(x_i)$ of each observation corresponds closely as possible to the observed value of response variable for the same observation. In other words, we estimate coefficients such that $p(X)$ yields a number close to 1 for the observations with actual response value 1 and a number close to 0 for the observations with actual response value 0. The mathematical eequation for the likelihood function is
$$ \ell(W) = \prod_{i;y_i = 1} p(x_i) \prod_{i;y_i = 0} ( 1 - p(x_i)) $$
The $W$ estimates are chosen to maximize the likelihood function.
Manipulating the above constraint, we arrive at Cross Entropy Loss for binary classification case. \begin{equation} L(y,\hat{y}) = -\frac{1}{N} \sum_{i=1}^N y_i \log(\hat{y_i}) + (1-y_i)\log(\hat{y_i}) \end{equation}
def cross_entropy_loss(w,x,y):
'''Cross entropy loss function'''
z = np.dot(x,w)
h = sigmoid(z)
total_loss = np.sum(-y*np.log(h) - (1-y)*np.log(1-h)).mean()
return total_loss
Use chain rule to calculate gradient of loss
\begin{equation} \frac{\partial{L}}{\partial{w_i}} = \frac{\partial{L}}{\partial{\hat{y}}} \frac{\partial{\hat{y}}}{\partial{z}} \frac{\partial{z}}{\partial{w}} \end{equation}Examining each factor in turn
\begin{equation} \frac{\partial{L}}{\partial{\hat{y_i}}} = \frac{-y_i}{\hat{y_i}} + \frac{1-y_i}{1-\hat{y_i}} \end{equation}\begin{equation} \frac{\partial{L}}{\partial{\hat{y_i}}} = \frac{\hat{y_i}-y_i}{\hat{y_i}(1-\hat{y_i})} \end{equation}\begin{equation} \frac{\partial{\hat{y}}}{\partial{z}} = \hat{y_i}(1-\hat{y_i}) \end{equation}\begin{equation} \frac{\partial{z}}{\partial{w}} = x_i \end{equation}Multiplying all the indivdual terms, you get
\begin{equation} \frac{\partial{L}}{\partial{w_i}} = (\hat{y_i}-y_i)x_i \end{equation}def gradient(w,x,y):
'''Gradient for cross entropy loss'''
z = np.dot(x,w)
h = sigmoid(z)
gradient = np.dot(x.T,h - y)
return gradient
We are gonnna use a algorithm called batch gradient descent to find the optimal weights.
$$ \Delta w_{i} = -\eta \frac{\partial L}{\partial w_{i}} $$
$$ w_{i} := w_{i} + \Delta w_{i} $$
def update_weights(learning_rate = 0.01, n_iters = 1000, loss_threshold = 0.001):
## Initialize the weights with zero values
w = np.random.rand(x.shape[1])
for epoch in range(n_iters):
loss = cross_entropy_loss(w,x,y)
grad = gradient(w,x,y)
w = w - learning_rate * grad
if epoch % 1000 == 0:
print(f"Loss after iteration {epoch} is {round(loss,2)}")
if loss < loss_threshold:
break
return(w)
def accuracy(true, probs, threshold = 0.5):
predicted = (probs > threshold).astype(int)
return 100*(predicted == true).mean()
def predict(w,x):
return sigmoid(np.dot(x,w))
Lets load some binary classification data from sklearn.datasets
and split the data into train test split.
cancer_data = load_breast_cancer()
x, x_test, y, y_test = train_test_split(cancer_data.data,
cancer_data.target,
test_size=0.25,
random_state=0)
x = normalize(x)
Lets call the update weights function to find the optimal weights that minimizes cross entropy loss
w = update_weights(learning_rate = 0.05, n_iters = 10000, loss_threshold = 0.001)
Lets use the model to generate predictions for the test data and see what kind of accuracies are we reaching.
preds = predict(w,normalize(x_test))
acc = accuracy(y_test, preds)
print(f"Accuracy on test data is {round(acc,3)}")