如何根据两组三次重复值在python中生成p值?

时间:2013-12-11 15:37:42

标签: python statistics

我对两个不同的样本有一系列三重测量,我想知道差异是否具有统计学意义。由于样本量小,我不能使用学生T检验。一位同事使用名为limma(http://bioinf.wehi.edu.au/limma/)的R包,但我不想调用R。

1 个答案:

答案 0 :(得分:0)

Here是我在Udacity论坛中发布的一篇文章,其中展示了如何获得适合各种样本量的t统计量。

这应该给你一个t统计值,类似于this维基百科文章底部表格中显示的值。

如果第一个链接发生任何事情,请输入以下代码:

# -*- coding: utf-8 -*-
from __future__ import division
import math
from scipy.stats import t

def mean(lst):
    # μ = 1/N Σ(xi)
    return sum(lst) / float(len(lst))

def variance(lst):
    """
    Uses standard variance formula (sum of each (data point - mean) squared)
    all divided by number of data points
    """
    # σ² = 1/N Σ((xi-μ)²)
    mu = mean(lst)
    return 1.0/len(lst) * sum([(i-mu)**2 for i in lst])

def get_tstat(probability, degrees_of_freedom, tails=2):
    """get the t-statistic required for confidence intetval calculations"""
    if tails not in [1,2]:
        sys.exit('invalid tails parameter (1 or 2 valid)')
    inv_p = 1 - ((1 - probability) / tails)
    return t.ppf(inv_p, degrees_of_freedom)

def conf_int(lst=None, p=0, n=0, perc_conf=95, tails=2):
    """
    Confidence interval - supply a list OR a probability (p) and sample
    size (n) e.g. if you want to know the confidence interval for 1000
    coin tosses (0.5 p i.e. a fair coin) then call with (None, 0.5, 1000).

    If a list id provided then the relevant stats are calculated on the
    list from within the function so no p or n value is required.

    The result gives a figure that you can be confident to conf (e.g
    95% certain for 0.95) that the result will always be within this
    amount +/- from the mean.

    e.g. 1000 coin tosses returns ~0.03 on a fair coin (with 0.95 conf)
    which means that you can be 95% confident of a getting within 3%
    of the expected no of heads if you did this experiement.
    """
    if lst:
        n, v = len(lst), variance(lst)
        t = get_tstat(perc_conf/100, n-1)
        return math.sqrt(v/n) * t
    else:
        if not 0 < p < 1:
            sys.exit('p parameter must be >0 and <1 if lst not given')
        if n == 0:
            sys.exit('n parameter must be >0 if lst not given')
        t = get_tstat(perc_conf/100, n-1)
        return t*(math.sqrt(p*(1-p)) / math.sqrt(n))

################################################################################
# Example: 1000 coin tosses on a fair coin. What is the range that I can be 95%
#          confident the result will fall within.
################################################################################

# get confidence interval
sample_size, probability, perc_conf_req = 1000, 0.5, 95
c_int = conf_int(n=sample_size, p=probability, perc_conf=perc_conf_req)

# show results
exp_heads = probability * sample_size
x = round(sample_size * c_int, 0)

print 'I can be '+str(perc_conf_req)+'% confident that the result of '+ \
      str(sample_size)+' coin flips will be within +/- '+ \
      str(round(c_int*100,2))+'% of '+str(int(exp_heads)) + \
      ' i.e. between '+str(int(exp_heads-x))+' and '+str(int(exp_heads+x))+ \
      ' heads (assuming a probability of '+str(probability)+' for each flip).'