微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

概率论:高斯分布及其相似度

1,高斯分布

高斯分布(Gaussian distribution)又称正态分布(normal distribution):随机变量 X 服从一个数学期望为 \small \mu,方差为 \small \sigma^2 的正态分布,则记为 \small X\sim N(\mu,\sigma)。高斯分布的概率密度函数为正态分布,期望值 \small \mu 决定了其位置,其标准差 \small \sigma 决定了分布的幅度。当 \mu=0,标准差\sigma =1 时的正态分布是标准正态分布。

\large f(x)=\frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}

2,KL散度

假设给定离散事件 x, 则我们有以下定义:

  • 概率:p(x),q(x)
  • 信息:对 p(x) 取对数,加符号得正值:I(p)=-logp(x),概率越高,包含的信息小,因为事件越来越确定。相反,概率越低,包含的信息越多,因为事件具有很大的不确定性。
  • 香农熵:p(x) 对 I(p) 平均:H(p)=\mathbb{E}_{x\sim P}[I(p)]=\sum p(x)I(p)=-\sum p(x)log \, p(x),熵是信息的平均,直观上,香农熵是信息在同一分布 下的平均。
  • 交叉熵:p(x) 对 I(q) 平均:H(p,q)=\mathbb{E}_{x\sim P}[I(q)]=\sum p(x)I(q)=-\sum p(x)log \, q(x),熵是信息的平均,直观上,交叉熵是信息在不同分布下的平均。
  • KL散度(相对熵):相对熵 = 交叉熵 - 香农熵,非对称 D_{KL}(p||q)\neq D_{KL}(q||p),亦不满足三角不等式,故不是距离。

D_{KL}(p||q)=H(p,q)-H(p)=-\sum p(x)log\,q(x)+\sum p(x)log \,p(x)

=-\sum p(x)log\, \frac{q(x)}{p(x)}

=\sum p(x)log\,\frac{p(x)}{q(x)}

若为连续事件:

  • 香农熵:H(x)=E_{x\sim P}[I(p)]=\int p(x)I(p)dx=-\int p(x)log\,p(x)dx
  • 交叉熵:H(p,q)=E_{x\sim P}[I(q)]=E_{x\sim P}[-log(q)]=-\int_xp(x)log\,q(x)dx
  • 相对熵:D_{KL}(p||q)=\int p(x)log\,\frac{p(x)}{q(x)}dx

3,KL散度衡量两个高斯分布相似性

高斯分布为连续型分布,故

D_{KL}(p||q)=\int p(x)\left [ log\,p(x)-log\,q(x) \right ] dx

设 p(x)=N(\mu_1,\sigma_1)=\frac{1}{\sigma_1 \sqrt{2\pi}}e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}}q(x)=N(\mu_2,\sigma_2)=\frac{1}{\sigma_2 \sqrt{2\pi}}e^{-\frac{(x-\mu_2)^2}{2\sigma_2^2}}

故:

D_{KL}(p||q)=\int p(x)\left [log\left ( \frac{1}{\sigma_1 \sqrt{2\pi}}e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} \right )-log\left (\frac{1}{ \sigma_2 \sqrt{2\pi}}e^{-\frac{(x-\mu_2)^2}{2\sigma_2^2}} \right )\right ]dx

=\int p(x)\left [{-\frac{(x-\mu_1)^2}{2\sigma_1^2}}+log\left ( \frac{1}{\sigma_1 \sqrt{2\pi}} \right )-log\left (\frac{1}{ \sigma_2 \sqrt{2\pi}}e^{-\frac{(x-\mu_2)^2}{2\sigma_2^2}} \right )\right ]dx

=\int p(x)\left [{-\frac{(x-\mu_1)^2}{2\sigma_1^2}}-log(\sigma_1)-log(\sqrt{2\pi})-log\left (\frac{1}{ \sigma_2 \sqrt{2\pi}}e^{-\frac{(x-\mu_2)^2}{2\sigma_2^2}} \right )\right ]dx

=\int p(x)\left [-\frac{1}{2}\left (\frac{x-\mu_1}{\sigma_1} \right )^2-log(\sigma_1)-\frac{1}{2}log(2\pi)-log\left (\frac{1}{ \sigma_2 \sqrt{2\pi}}e^{-\frac{(x-\mu_2)^2}{2\sigma_2^2}} \right )\right ]dx

=\int p(x)\left [-\frac{1}{2}\left (\frac{x-\mu_1}{\sigma_1} \right )^2-log(\sigma_1)-\frac{1}{2}log(2\pi)+\frac{1}{2}\left (\frac{x-\mu_2}{\sigma_2} \right )^2+log(\sigma_2)+\frac{1}{2}log(2\pi)\right ]dx

=\int p(x)\left [-\frac{1}{2}\left (\frac{x-\mu_1}{\sigma_1} \right )^2-log(\sigma_1)+\frac{1}{2}\left (\frac{x-\mu_2}{\sigma_2} \right )^2+log(\sigma_2)\right ]dx

=\int p(x)\left [\frac{1}{2}\left [\left (\frac{x-\mu_2}{\sigma_2} \right )^2 -\left (\frac{x-\mu_1}{\sigma_1} \right )^2 \right ]+log(\frac{\sigma_2}{\sigma_1})\right ]dx

根据交叉熵公式反推:

=\mathbb{E}_{x\sim P}\left [\frac{1}{2}\left [\left (\frac{x-\mu_2}{\sigma_2} \right )^2 -\left (\frac{x-\mu_1}{\sigma_1} \right )^2 \right ]+log(\frac{\sigma_2}{\sigma_1})\right ]

=log(\frac{\sigma_2}{\sigma_1})+\frac{1}{2\sigma_2^2}\mathbb{E}_{x\sim P}\left [\left (x-\mu_2\right )^2 \right ]-\frac{1}{2\sigma_1^2}\mathbb{E}_{x\sim P}\left [\left (x-\mu_1 \right )^2 \right ]

由于 \sigma_1^2 = (x-\mu_1)^2,x\in P

=log(\frac{\sigma_2}{\sigma_1})+\frac{1}{2\sigma_2^2}\mathbb{E}_{x\sim P}\left [\left (x-\mu_2\right )^2 \right ]-\frac{1}{2}

由于 (x-\mu_2)^2=(x-\mu_1+\mu_1-\mu_2)^2= (x-\mu_1)^2+2(x-\mu_1)(\mu_1-\mu_2)+(\mu_1-\mu_2)^2

=log(\frac{\sigma_2}{\sigma_1})+\frac{1}{2\sigma_2^2}\mathbb{E}_{x\sim P}\left [(x-\mu_1)^2+2(x-\mu_1)(\mu_1-\mu_2)+(\mu_1-\mu_2)^2 \right ]-\frac{1}{2}

=log(\frac{\sigma_2}{\sigma_1})+\frac{1}{2\sigma_2^2}\left [ \mathbb{E}_{x\sim P}\left [ (x-\mu_1)^2 \right ]+2(\mu_1-\mu_2)\mathbb{E}_{x\sim P}\left [ x-\mu_1 \right ]+(\mu_1-\mu_2)^2 \right ]-\frac{1}{2}

由于标准差求和为0,故:

=log(\frac{\sigma_2}{\sigma_1})+\frac{1}{2\sigma_2^2}\left [ \sigma_1^2+2(\mu_1-\mu_2)*0+(\mu_1-\mu_2)^2 \right ]-\frac{1}{2}

D_{KL}(p||q)=log(\frac{\sigma_2}{\sigma_1})+\frac{\sigma_1^2+(\mu_1-\mu_2)^2 }{2\sigma_2^2}-\frac{1}{2}

其中:p,q为torch.distributions.normal 表示正态分布。

def _kl_normal_normal(p, q):
    var_ratio = (p.scale / q.scale).pow(2)
    t1 = ((p.loc - q.loc) / q.scale).pow(2)
    return 0.5 * (var_ratio + t1 - 1 - var_ratio.log())

4,Wasserstein距离

两个多元高斯分布之间的2阶Wasserstein距离:

W(\mu;v):=inf\,\,\,\mathbb{E}(||X-Y||_2^2)^{\frac{1}{2}}

如果采用距离函数欧几里得距离的话,那么两个分布之间的2阶Wasserstein距离是:

d^2=||m_1-m_2||_2^2+Tr\left (\Sigma_1+\Sigma_2-2\left ( \Sigma_1^{\frac{1}{2}}\Sigma_2\Sigma_1^{\frac{1}{2}} \right ) ^{\frac{1}{2}} \right )

当协方差矩阵可以互换 \Sigma_1\Sigma_2=\Sigma_2\Sigma_1

Tr\left (\Sigma_1+\Sigma_2-2\left ( \Sigma_1^{\frac{1}{2}}\Sigma_2\Sigma_1^{\frac{1}{2}} \right ) ^{\frac{1}{2}} \right )=Tr\left ( \Sigma_1 \right )+Tr\left ( \Sigma_2 \right )+2Tr\left ( \Sigma_1^{\frac{1}{2}}\Sigma_2\Sigma_1^{\frac{1}{2}} \right ) ^{\frac{1}{2}}

当 A,B 与 都是对称矩阵时,有 Tr(A^{\frac{1}{2}}BA^{\frac{1}{2}})=Tr(AB)

Tr\left ( \Sigma_1 \right )+Tr\left ( \Sigma_2 \right )+2Tr\left ( \Sigma_1^{\frac{1}{2}}\Sigma_2\Sigma_1^{\frac{1}{2}} \right ) ^{\frac{1}{2}}=Tr\left ( \Sigma_1 \right )+Tr\left ( \Sigma_2 \right )+2Tr\left ( \Sigma_1\Sigma_2\right ) ^{\frac{1}{2}}

=Tr\left ( \left ( \Sigma_1^{\frac{1}{2}} -\Sigma_2^{\frac{1}{2}} \right ) ^2\right )

\large =||\Sigma_1^{\frac{1}{2}}-\Sigma_2^{\frac{1}{2}}||_{Frobenius}^2

此时:

W(N(m_1,\Sigma_1),N(m_2,\Sigma_2))=||m_1-m_2||_2^2+||\Sigma_1^{\frac{1}{2}}-\Sigma_2^{\frac{1}{2}}||_{Frobenius}^2

其中:p,q为torch.distributions.normal 表示正态分布。

def ws_normal_normal(p, q):
    u = p.loc - q.loc
    p1 = torch.sum(torch.pow(u, 2), 1)
    p2 = torch.sum(torch.pow(torch.pow(p.scale, 1 / 2) - torch.pow(q.scale, 1 / 2), 2), 1)
    result = (p1 + p2).mean()
    return result

原文地址:https://www.jb51.cc/wenti/3280746.html

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐