如何在python中计算(统计)幂函数与样本大小?

时间:2017-11-15 05:13:49

标签: python statistics t-test sample-size

如何在python中完成?

  1. 计算给定功率和alpha的样本大小?
  2. 计算给定样本量和alpha的功率?
  3. 注意: 我完全困惑:(用python为(统计)幂函数计算提供的函数。

    有人可以帮我在这里订购吗?

    statsmodels下有两个函数:

    from statsmodels.stats.power import ttest_power, tt_ind_solve_power()
    

    我们有:

    tt_ind_solve_power(effect_size=effect_size, alpha=alpha, power=0.8, ratio=1, alternative='two-sided')
    

    我们也有:

    ttest_power(0.2, nobs=sampleSize, alpha=alpha, alternative='two-sided')
    

    还有一些代码:

    import statsmodels.stats.api as sms
    es = sms.proportion_effectsize(prop1, prop2, method='normal')
    n = sms.NormalIndPower().solve_power(es, power=0.9, alpha=0.05, ratio=2)
    

    我在某个地方找到了这个例子,但它没有解释什么是prop1和prop2!

    每个人都给了我不同的价值观。

    感谢

1 个答案:

答案 0 :(得分:3)

作为上述问题的答案,我编写了这个函数来计算功效与样本大小。

调用tt_ind_solve_power时,您需要将一个参数保留为无以便进行计算。在下面的示例中,我将权力保持为None

我希望有人会发现它有用,欢迎任何改进。

from statsmodels.stats.power import  tt_ind_solve_power
from scipy.interpolate import interp1d
import matplotlib.pyplot as plt

def test_ttest_power_diff(mean, std, sample1_size=None, alpha=0.05, desired_power=0.8, mean_diff_percentages=[0.1, 0.05]):
    '''
    calculates the power function for a given mean and std. the function plots a graph showing the comparison between desired mean differences
    :param mean: the desired mean
    :param std: the std value
    :param sample1_size: if None, it is assumed that both samples (first and second) will have same size. The function then will
    walk through possible sample sizes (up to 100, hardcoded).
    If this value is not None, the function will check different alternatives for sample 2 sizes up to sample 1 size.
    :param alpha: alpha default value is 0.05
    :param desired_power: will use this value in order to mark on the graph
    :param mean_diff_percentages: iterable list of percentages. A line per value will be calculated and plotted.
    :return: None
    '''
    fig, ax = plt.subplots()
    for mean_diff_percent in mean_diff_percentages:
        mean_diff = mean_diff_percent * mean
        effect_size = mean_diff / std

        print('Mean diff: ', mean_diff)
        print('Effect size: ', effect_size)

        powers = []

        max_size  = sample1_size
        if sample1_size is None:
            max_size = 100

        sizes = np.arange(1, max_size, 2)
        for sample2_size in sizes:
            if(sample1_size is None):
                n = tt_ind_solve_power(effect_size=effect_size, nobs1=sample2_size, alpha=alpha, ratio=1.0, alternative='two-sided')
                print('tt_ind_solve_power(alpha=', alpha, 'sample2_size=', sample2_size, '): sample size in *second* group: {:.5f}'.format(n))
            else:
                n = tt_ind_solve_power(effect_size=effect_size, nobs1=sample1_size, alpha=alpha, ratio=(1.0*sample2_size/sample1_size), alternative='two-sided')
                print('tt_ind_solve_power(alpha=', alpha, 'sample2_size=', sample2_size, '): sample size *each* group: {:.5f}'.format(n))

            powers.append(n)

        try: # mark the desired power on the graph
            z1 = interp1d(powers, sizes)
            results = z1(desired_power)

            plt.plot([results], [desired_power], 'gD')
        except Exception as e:
            print("Error: ", e)
            #ignore

        plt.title('Power vs. Sample Size')
        plt.xlabel('Sample Size')
        plt.ylabel('Power')

        plt.plot(sizes, powers, label='diff={:2.0f}%'.format(100*mean_diff_percent)) #, '-gD')

    plt.legend()
    plt.show()

例如,如果你用mean = 10和std = 2来调用这个函数,你会得到这个图:

Power vs. Sample Size (avg=10, std=2)