Supervised Learning : What if we use our raw sigmoid function as part of our cost function to optimize the problem of logistic regression.

April 23, 2018

Supervised Learning : What if we use our raw sigmoid function as part of our cost function to optimize the problem of logistic regression.

Abstract

A common question , once one learner of machine learning catches the concept of regression is to extrapolate the usage of mean squared errors through sigmoid function to figure out the problem of mathematical optimization.
This post was popped up during my recent flight from Bogota to Los Angeles, in which I started to think how I could demonstrate in a simple way the non-convexity of this cost function.

Development
Let's commence by: What if we use our raw sigmoid function as part of our cost function to optimize the problem of logistic regression.

In order to achieve this demonstration in an intuitive way and to further simplify, a dataset was created on purpose in which the presence of a malignant tumor is given by the tumor size.

For example:
Tumor Size, Malign
1.0 ,0
1.1 ,0
1.2 ,0
1.3 ,0
1.4,0
1.5,0....

Now, let's define our function:

Then, let's put the dataset info in a graphic 2D in which Y-axis represents the presence of malignant tumor and x-axis represents tumor size - highlighted in blue color. Now, by using sigmoid function, lets draw a sample of prediction through of an specific duple of סּ (Theta0 = -40 and Theta1= 12) - This סּ came up after several iterations in which the sigmoid function fits appropriately - Trial and Error in Python).

Now, let's define our classic function cost such as The model of regression does:

By using Python, it is created a surface graph of our cost function in terms of סּ:

The contour map is:

Visually speaking , We can notice it is non convex and hence, it has the drawback of having   many local minimum in which the gradient descent algorithm could converge.

Even although, it is a simplification of How to see the non-convexity,   it helps to understand the reason of designing a new cost function known as:

J(θ)=−1m∑1m[y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))]

The Python code is:

Created on Fri Apr 20 15:53:38 2018
@author: Eduardo Toledo
"""
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import numpy as np
import pandas as pd
# basic functions for data operations
def sigmoid(x,theta0,theta1):
    return 1/(1 + np.exp(-(+ theta0 + theta1*x )))
def MSE(real, prediction): # squared error loss
    return np.mean( (real - prediction)**2)

# Load data
tumor_data = pd.read_csv('Tumor.csv')
tumor_size = tumor_data.TumorSize.values
malignant = tumor_data.Malign.values
#theta0=-40
#theta1 =14
prediction = np.array(sigmoid(tumor_size, -40, 14))
#prediction = np.array(sigmoid(tumor_size, 100, 0))
mse1 = MSE(malignant, prediction)
# plot the data
fig = plt.figure()
plt.plot(tumor_size, malignant,'X',color='blue',label="Real Data")
plt.plot(tumor_size, prediction,'+',color='red',label="Prediction")
plt.title("Presence of Malignant Tumor")
plt.legend()
plt.show()

thetha0 = np.linspace(-42, -38,60)
thetha1 = np.linspace(0, 20,50)
mse=[]
for i in range(thetha0.shape[0]):
   for j in range(thetha1.shape[0]):
      prediction = np.array(sigmoid(tumor_size, i, j))
      mse.append(MSE(malignant, prediction))

fig = plt.figure()
ax = fig.gca(projection='3d')
Theta0, Theta1= np.meshgrid(thetha0, thetha1)
MSE1= np.reshape(mse, (50, 60))
ax.plot_wireframe(Theta0, Theta1, MSE1, rstride=5, cstride=5)

ax.set_zlabel("J(θ0,θ1)")
ax.set_zlim(0.2, 0.5)
ax.set_xlim(-42, -38)
ax.set_ylim(0, 20)
ax.xaxis.set_major_formatter(FormatStrFormatter('%.0f'))
ax.yaxis.set_major_formatter(FormatStrFormatter('%.0f'))
plt.xlabel("θ0")
plt.ylabel("θ1")
plt.show()
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
from matplotlib import cm
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.yaxis.set_major_formatter(FormatStrFormatter('%.0f'))
ax.xaxis.set_major_formatter(FormatStrFormatter('%.0f'))
ax.set_zlabel("J(θ0,θ1)")
cset = ax.contour(Theta0, Theta1, MSE1)
plt.xlabel("θ0")
plt.ylabel("θ1")
plt.show()
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np

fig = plt.figure()
ax = fig.add_subplot(111)
ax.xaxis.set_major_formatter(FormatStrFormatter('%.0f'))
ax.yaxis.set_major_formatter(FormatStrFormatter('%.0f'))

cset = ax.contour(Theta0, Theta1, MSE1)
plt.xlabel("θ0")
plt.ylabel("θ1")
plt.show()

Search This Blog

Data Science

Supervised Learning : What if we use our raw sigmoid function as part of our cost function to optimize the problem of logistic regression.

Comments

Post a Comment

Popular Posts

Supervised Learning - Linear Regression Cost Function Intuition