理论和代码-概率分类模型

步骤1:模型假设,如何选择模型;这里选概率模型,以二分类为例,$p(c_{1}|x)=\frac{p(x|c_{1}).p(c_{1})}{p(x|c_{1}).p(c_{1}) + p(x|c_{2}).p(c_{2})},$
$如果p(c_{1}|x) > 0.5, x\in c_{1} 否则x \in c_{2}$

步骤2:模型评估,如何判断哪个模型好坏;一般是通过损失函数,假设$p(x|c_{1})$服从期望为$\mu$,协方差为$\Sigma$的高斯分布,损失函数可以定义为最大似然函数,$L(\mu, \Sigma)=\prod_{i=1}^{n}f_{\mu, \Sigma}(x_{i}),f_{\mu, \Sigma}(x)=\frac{1}{(2\pi)^{D/2}}\frac{1}{|\Sigma|^{1/2}}e^{-\frac{1}{2}(x-\mu)^{T}\Sigma^{-1}(x-\mu)}=p(x|c)$,n为$c_{1}$的样本数,D是特征向量纬度,$\Sigma^{-1}$是$\Sigma$的逆矩阵, $\Sigma$是DXD的对称矩阵,对于$p(x|c_{2})$同理
步骤3:模型优化,如何筛选出最优的模型;这里用最大似然估计,$\mu^{\ast},\Sigma^{\ast}=arg max L(\mu, \Sigma)$,可以求得,$\mu^{\ast}=\frac{1}{n}\Sigma_{i=1}^{n}x_{i}, \Sigma^{\ast}=\frac{1}{n}\Sigma_{i=1}^{n}(x_{i}-\mu)(x_{i}-\mu)^{T}$
注意:
1,求出$p(x|c_{1}),p(x|c_{2})$后,$p(c_{1}), p(c_{2})$基于统计得到,即可求出$p(c_{1}|x)$的值。
2,对于类别$C_{1},C_{2}$根据其样本情况,会有不同的$(\mu, \Sigma)$,根据步骤2可以分别求得;
3,$x_{i}$是指某一个样本,举例,如果样本是一系列二维坐标点,可以有,$x_{i}=(x^{i}, y^{i})$分别为其x轴y轴坐标。
4,对于步骤2,对于naive bayes,$p(x|c)$是基于统计的来。

假设类别$c_{1}$中有二维样本(1, 2), (3, 4), (5, 6), (7, 8),类别$c_{2}$中有二维样本(0, -1), (1, 0.7), (2, 1), (3, 2),只计算步骤3为例,之前文章介绍过协方差矩阵的计算,和本文本质一致

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#! /usr/bin/python
# -*- coding:utf-8 -*-

import numpy as np
#c1 points (1, 2), (3, 4), (5, 6), (7, 8)
#c2 points (0, -1), (1, 0.5), (2, 1.5), (3, 2)

#c1 = np.array([[1, 3, 5, 7], [2, 4, 6 , 8]])
#c2 = np.array([[0, 1, 2, 3], [-1, 0.5, 1.5 , 2]])
c1 = np.array([[1, 2], [3, 4], [5, 5], [7, 8]])
c2 = np.array([[0, -1], [1, 0.5], [2, 1.5], [3, 2]])
mu_c1 = np.mean(c1, axis=0)
mu_c2 = np.mean(c2, axis=0)
print mu_c1
print mu_c2
sigma_c1 = np.dot((c1-mu_c1).T, (c1-mu_c1)) / len(c1)
sigma_c2 = np.dot((c2-mu_c2).T, (c2-mu_c2)) / len(c2)
print sigma_c1
print sigma_c1

接下来推理下概率模型和逻辑回归的互通性
$p(c_{1}|x)=\frac{p(x|c_{1})p(c_{1})}{p(x|c_{1})p(c_{1})+p(x|c_{2})p(c_{2})}$
$p(c_{1}|x)=\frac{1}{1+\frac{p(x|c_{2})p(c_{2})}{p(x|c_{1})p(c_{1})}}$
设$e^{-z}=\frac{p(x|c_{2})p(c_{2})}{p(x|c_{1})p(c_{1})}$,则有:
$p(c_{1}|x)=\frac{1}{1+e^{-z}}$
$z = ln(\frac{p(x|c_{1})p(c_{1})}{p(x|c_{2})p(c_{2})})$
$z = ln(\frac{p(x|c_{1})}{p(x|c_{2})})+ln(\frac{p(c_{1})}{p(c_{2})})$
若$p(x|c_{1}), p(x|c_{2})$分别服从$(\mu_{1}, \Sigma_{1}), (\mu_{2}, \Sigma_{2})$的高斯分布,有$p(x|c)=\frac{1}{(2\pi)^{D/2}}\frac{1}{|\Sigma|^{1/2}}e^{-\frac{1}{2}(x-\mu)^{T}\Sigma^{-1}(x-\mu)}$,若$\Sigma_{1}=\Sigma_{2}$,则带入推导可有:
$z=(\mu_{1}-\mu_{2})^{T}\Sigma^{-1}x-\frac{1}{2}\mu_{1}^{T}\Sigma^{-1}\mu_{1}+\frac{1}{2}\mu_{2}^{T}\Sigma^{-1}\mu_{2}+ln(\frac{p(c_{1})}{p(c_{2})})$
可以令$w=(\mu_{1}-\mu_{2})^{T}\Sigma^{-1},b=-\frac{1}{2}\mu_{1}^{T}\Sigma^{-1}\mu_{1}+\frac{1}{2}\mu_{2}^{T}\Sigma^{-1}\mu_{2}+ln(\frac{p(c_{1})}{p(c_{2})})$,则,
$z=wx+b$
综合有,
$p(c_{1}|x)=\frac{1}{1+e^{-(wx+b)}}$,即求参数w,b即可,这个也是逻辑回归

坚持原创技术分享,您的支持将鼓励我继续创作!