理论和代码-线性回归

步骤1:模型假设，如何选择模型;这里选一维线性模型
$y=w.x+b$

步骤2:模型评估，如何判断哪个模型好坏；一般是通过损失函数
$L(w,b)=\Sigma_{i}^{n}(\widehat{y_{i}}-(w.x_{i}+b))^2$，其中$\widehat{y_{i}}$是真实值
步骤3:模型优化，如何筛选出最优的模型；梯度下降是方法之一，这里用这个
$\frac{\partial{L(w,b)}}{\partial{w}}=\Sigma_{i}^{n}2.(\widehat{y_{i}}-(w.x_{i}+b)).(-x_{i})$
$\frac{\partial{L(w,b)}}{\partial{b}}=\Sigma_{i}^{n}2.(\widehat{y_{i}}-(w.x_{i}+b)).(-1)$
$w^{j+1}\leftarrow w^{j}-\eta.\frac{\partial{L(w,b)}}{\partial{w}}\vert w=w^{j},b=b^{j}$
$b^{j+1}\leftarrow b^{j}-\eta.\frac{\partial{L(w,b)}}{\partial{b}}\vert w=w^{j},b=b^{j}$
步骤4:必要的调整，如有
1，多模型组合
2，更多的输入参数，如输入x从一维变成n维，此时，步骤1模型假设为$y=\Sigma_i^{n}{w_{i}.x_{ji}}+b$，其中某个输入$x_{j}=[x_{j0},x_{j1}…]$
3，加入正则化，对于多维线性模型，L1正则化步骤2模型评估有$L(w,b)=\Sigma_{i}^{n}(\widehat{y_{i}}-(\Sigma w_{i}.x_{ji}+b))^2+\lambda.\Sigma\vert w_{i} \vert$
L2正则化步骤2模型评估有$L(w,b)=\Sigma_{i}^{n}(\widehat{y_{i}}-(\Sigma w_{i}.x_{ji}+b))^2+\lambda.\Sigma (w_{i})^2$
4，选择性使用AdaptiveGradient，或 StochasticGradient
5，多维度特征时，特征的归一化到同一范围，进行特征缩放，比如$x_{j}^{i}\leftarrow \frac{x_{j}^{i}-m_{i}}{\sigma_{i}}$，其中$x_{j}^{i}$是第j个训练数据的第i维特征值，$m_{i}$是所有训练数据的第i维特征值的均值，$\sigma_{i}$是所有训练数据第i维特征值的标准差

以前三个步骤为例，coding如下

raw python 版本 demo

#! /usr/bin/python
# -*- coding:utf-8 -*-

import numpy as np
import matplotlib.pyplot as plt
from pylab import mpl

# to support chinese
plt.rcParams['font.sans-serif'] = ['Simhei']
# make '-' show correct
plt.rcParams['axes.unicode_minus'] = False


# train data
x_data = [338., 333., 328., 207., 226., 25., 179., 60., 208., 606.]
y_data = [640., 633., 619., 393., 428., 27., 193., 66., 226., 1591.]

# init
bias = -120
weight = -4
learning_rate = 1
iteration = 10000

# for show
bias_history = []
weight_history = []

# for better result, special value
learning_rate_bias = 0
learning_rate_weight= 0

# train
for i in range(iteration):
    bias_grad = 0.0
    weight_grad = 0.0
    for j in range(len(x_data)):
        bias_grad += 2.0*(y_data[j]-weight*x_data[j]-bias)*(-1.0)
        weight_grad += 2.0*(y_data[j]-weight*x_data[j]-bias)*(-1.0 * x_data[j])
    learning_rate_bias += bias_grad ** 2
    learning_rate_weight= weight_grad ** 2

    bias -= learning_rate / np.sqrt(learning_rate_bias) * bias_grad
    weight -= learning_rate / np.sqrt(learning_rate_weight) * weight_grad

    # add to history for showing
    bias_history.append(bias)
    weight_history.append(weight)
    #break

# for show
x = np.arange(-200, -100, 1)
y = np.arange(-5, 5, 0.1)
X, Y = np.meshgrid(x, y)
Z = np.zeros((len(x), len(y)))
#print X, Y, Z

plt.contourf(X, Y, Z, 50, alpha=0.5, cmap=plt.get_cmap('jet'))
plt.plot([-188.4], [2.67], 'x', ms=12, mew=3, color='red')
plt.plot(bias_history, weight_history, 'o-', ms=3, lw=1.5, color='black')
plt.xlim(-200, -100)
plt.ylim(-5, 5)
plt.xlabel(r'$bias$')
plt.ylabel(r'$weight$')
plt.title("Linear Regression")
plt.show()

numpy 版 demo，

#! /usr/bin/python
# -*- coding:utf-8 -*-

import numpy as np
import matplotlib.pyplot as plt
from pylab import mpl

# to support chinese
plt.rcParams['font.sans-serif'] = ['Simhei']
# make '-' show correct
plt.rcParams['axes.unicode_minus'] = False


# train data
x_data = np.array([338., 333., 328., 207., 226., 25., 179., 60., 208., 606.])
y_data = np.array([640., 633., 619., 393., 428., 27., 193., 66., 226., 1591.])

# init
bias = -120
weight = -4
learning_rate = 1
iteration = 10000

# for show
bias_history = []
weight_history = []

# for better result, special value
learning_rate_bias = 0
learning_rate_weight= 0

# train
for i in range(iteration):
    delt = y_data -(weight * x_data + bias)
    loss = np.dot(delt, delt)
    bias_grad = -2.0 * np.sum(delt)
    weight_grad = -2.0 * np.dot(delt, x_data)

    learning_rate_bias += bias_grad ** 2
    learning_rate_weight= weight_grad ** 2

    bias -= learning_rate / np.sqrt(learning_rate_bias) * bias_grad
    weight -= learning_rate / np.sqrt(learning_rate_weight) * weight_grad

    # add to history for showing
    bias_history.append(bias)
    weight_history.append(weight)
    #break

# for show
x = np.arange(-200, -100, 1)
y = np.arange(-5, 5, 0.1)
X, Y = np.meshgrid(x, y)
Z = np.zeros((len(x), len(y)))
#print X, Y, Z

plt.contourf(X, Y, Z, 50, alpha=0.5, cmap=plt.get_cmap('jet'))
plt.plot([-188.4], [2.67], 'x', ms=12, mew=3, color='red')
plt.plot(bias_history, weight_history, 'o-', ms=3, lw=1.5, color='black')
plt.xlim(-200, -100)
plt.ylim(-5, 5)
plt.xlabel(r'$bias$')
plt.ylabel(r'$weight$')
plt.title("Linear Regression")
plt.show()

tensorflow 版demo

import numpy as np
import tensorflow as tf

# train data
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.3 + 0.2

# create tf structure
weight = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
bias = tf.Variable(tf.zeros([1]))
learning_rate = 0.5
iteration = 1000

# model 
y = weight * x_data + bias

loss = tf.reduce_mean(tf.square(y_data-y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train = optimizer.minimize(loss)

# init 
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for step in range(iteration):
        sess.run(train)

        if step % 20 == 0:
            print step, sess.run(weight) , sess.run(bias)