对于两个不同的概率分布函数(pdf),p(x)和p(y),我有两个列表。我知道它们之间存在相关性,并希望生成联合分布p(x,y),因此我可以计算它们的互信息。
我已经进行了研究,并在统计学中发现了copula理论,显然,这就是解决我的问题的方法。但是,即使完成了copula,我也不知道如何生成p(x,y)。我已经尝试过“ copulalib”,“ copula”这两个软件包,直到现在我能达到的最好成绩是使用“ ambhas”(不过我不得不在该类中进行一些修改)。这是我的代码:
import scipy as sp
import scipy.interpolate
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import rcParams
plt.rcParams.update({'font.size': 10})
rcParams.update({'figure.autolayout': True})
from ambhas.errlib import rmse, correlation
from ambhas.copula import Copula
import seaborn as sns
def log_interp1d(xx, yy, kind='linear'):
logx = np.log10(xx)
logy = np.log10(yy)
lin_interp = sp.interpolate.interp1d(logx, logy, kind=kind)
log_interp = lambda zz: np.power(10.0, lin_interp(np.log10(zz)))
return log_interp
def derivada(f,x):
dx = x[1] - x[0]
flinha = []
for i in range(1,len(f)):
flinha.append((f[i]-f[i-1])/dx)
return flinha
#interpolating logarithmic data and generating the cumulated distribution
xx = [1e-10, 0.00014, 0.00042, 0.0014, 0.0042, 0.014,0.07]
yy = [1e-20, 0.125, 0.275, 0.4711, 0.775, 0.875,1]
f = log_interp1d(xx,yy)
xnew = np.linspace(xx[0], xx[-1], num=10000, endpoint=True)
fda_cerc = f(xnew) #cdf
x_cerc = xnew
xx = [1e-10, 2.1e-6,2.1e-5,2.1e-4,2.1e-3, 0.007, 0.021, 0.049, 0.07]
yy = [1e-20, 0.0583, 0.1, 0.1083, 0.5083, 0.7583, 0.9917, 0.9917, 1]
f = log_interp1d(xx,yy)
fda_imida = f(xnew) #cdf
x_imida = xnew
#generatin p(x) and p(y)
pdf_imida = np.array(derivada(fda_imida,x_imida)) #p(x)
pdf_cerconil = np.array(derivada(fda_cerc,x_cerc)) #p(y)
c = correlation(pdf_imida,pdf_cerconil)
foo = Copula(pdf_imida, pdf_cerconil, 'frank')
u,v = foo.generate_uv(9999)
plt.plot(v)
h = sns.jointplot(u,v,kind = 'kde')
plt.savefig('copulas.jpg', dpi = 400)
生成的图形与copula理论一致,但是我该怎么做才能生成p(x,y)?有简单的方法吗?