Question

我有两个数据集，其中两个值在哪里测量。我对差异的价值和标准差之间的差异感兴趣。我做了一个直方图，我想要适合两个正态分布。计算最大值之间的差异。我还想评估在数据集中我对一个值的数据少得多的影响。我已经看过这个链接，但它并不是我真正需要的东西： Python: finding the intersection point of two gaussian curves

for ii in range(2,8):
   # Kanal = ii - 1
    file = filepath + '\Mappe1.txt'
    data = np.loadtxt(file, delimiter='\t', skiprows=1)
    data = data[:,ii]
    plt.hist(data,bins=100)
    plt.xlabel("bins")
    plt.ylabel("Counts")
    plt.tight_layout()
    plt.grid()
    plt.figure()

plt.show()

Answer 1

使用 scipy ：

可以轻松实现快速和肮脏的拟合

from scipy.optimize import curve_fit #non linear curve fitting tool
from matplotlib import pyplot as plt

def func2fit(x1,x2,m_1,m_2,std_1,std_2,height1, height2): #define a simple gauss curve
    return height1*exp(-(x1-m_1)**2/2/std_1**2)+height2*exp(-(x2-m_2)**2/2/std_2**2)

init_guess=(-.3,.3,.5,.5,3000,3000) 
#contains the initial guesses for the parameters (m_1, m_2, std_1, std_2, height1, height2) using your first figure

#do the fitting
fit_pars, pcov =curve_fit(func2fit,xdata,ydata,init_guess) 
#fit_pars contains the mean, the heights and the SD values, pcov contains the estimated covariance of these parameters 

plt.plot(xdata,func2fit(xdata,*fit_pars),label='fit') #plot the fit

有关进一步参考，请参阅scipy手册页： curve_fit

Answer 2

假设两个样本是独立的，则不需要使用曲线拟合来处理这个问题。它的基本统计数据。这里有一些代码可以进行所需的计算，并在评论中将来源归结。

## adapted from http://onlinestatbook.com/2/estimation/difference_means.html

from random import gauss
from numpy import sqrt

sample_1 = [ gauss(0,1) for _ in range(10) ]
sample_2 = [ gauss(1,.5) for _ in range(20) ]

n_1 = len(sample_1)
n_2 = len(sample_2)

mean_1 = sum(sample_1)/n_1
mean_2 = sum(sample_2)/n_2

SSE = sum([(_-mean_1)**2 for _ in sample_1]) + sum([(_-mean_2)**2 for _ in sample_2])
df = (n_1-1) + (n_2-1)
MSE = SSE/df

n_h = 2 / ( 1/n_1 + 1/n_2 )
s_mean_diff = sqrt( 2* MSE / n_h )

print ( 'difference between means', abs(n_1-n_2))
print ( 'std dev of this difference', s_mean_diff )

Python：两个正态分布

2 个答案: