如何在Python中实现R的p.adjust

时间:2011-09-16 22:18:41

标签: python r statistics rpy2

我有一个p值列表,我想计算FDR的多重比较的调整p值。在R中,我可以使用:

pval <- read.csv("my_file.txt",header=F,sep="\t")
pval <- pval[,1]
FDR <- p.adjust(pval, method= "BH")
print(length(pval[FDR<0.1]))
write.table(cbind(pval, FDR),"pval_FDR.txt",row.names=F,sep="\t",quote=F )

如何在Python中实现此代码?这是我在Google的帮助下在Python中的可行尝试:

pvalue_list [2.26717873145e-10, 1.36209234286e-11 , 0.684342083821...] # my pvalues
pvalue_lst = [v.r['p.value'] for v in pvalue_list]
p_adjust = R.r['p.adjust'](R.FloatVector(pvalue_lst),method='BH')
for v in p_adjust:
    print v

上面的代码会引发AttributeError: 'float' object has no attribute 'r'错误。任何人都可以帮助指出我的问题吗?在此先感谢您的帮助!

7 个答案:

答案 0 :(得分:16)

如果你想确定你从R获得了什么,你也可以表明你希望使用R包'stats'中的功能:

from rpy2.robjects.packages import importr
from rpy2.robjects.vectors import FloatVector

stats = importr('stats')

p_adjust = stats.p_adjust(FloatVector(pvalue_list), method = 'BH')

答案 1 :(得分:15)

答案 2 :(得分:9)

这是我使用的内部功能:

def correct_pvalues_for_multiple_testing(pvalues, correction_type = "Benjamini-Hochberg"):                
    """                                                                                                   
    consistent with R - print correct_pvalues_for_multiple_testing([0.0, 0.01, 0.029, 0.03, 0.031, 0.05, 0.069, 0.07, 0.071, 0.09, 0.1]) 
    """
    from numpy import array, empty                                                                        
    pvalues = array(pvalues) 
    n = float(pvalues.shape[0])                                                                           
    new_pvalues = empty(n)
    if correction_type == "Bonferroni":                                                                   
        new_pvalues = n * pvalues
    elif correction_type == "Bonferroni-Holm":                                                            
        values = [ (pvalue, i) for i, pvalue in enumerate(pvalues) ]                                      
        values.sort()
        for rank, vals in enumerate(values):                                                              
            pvalue, i = vals
            new_pvalues[i] = (n-rank) * pvalue                                                            
    elif correction_type == "Benjamini-Hochberg":                                                         
        values = [ (pvalue, i) for i, pvalue in enumerate(pvalues) ]                                      
        values.sort()
        values.reverse()                                                                                  
        new_values = []
        for i, vals in enumerate(values):                                                                 
            rank = n - i
            pvalue, index = vals                                                                          
            new_values.append((n/rank) * pvalue)                                                          
        for i in xrange(0, int(n)-1):  
            if new_values[i] < new_values[i+1]:                                                           
                new_values[i+1] = new_values[i]                                                           
        for i, vals in enumerate(values):
            pvalue, index = vals
            new_pvalues[index] = new_values[i]                                                                                                                  
    return new_pvalues

答案 3 :(得分:5)

使用Python的numpy库,根本没有调用R,这是BH方法的合理有效实现:

Edge

(基于已发布的R代码BondedDust)

答案 4 :(得分:2)

(我知道这不是答案......只是试图提供帮助。)R的p.adjust中的BH代码只是:

BH = {
        i <- lp:1L   # lp is the number of p-values
        o <- order(p, decreasing = TRUE) # "o" will reverse sort the p-values
        ro <- order(o)
        pmin(1, cummin(n/i * p[o]))[ro]  # n is also the number of p-values
      }

答案 5 :(得分:1)

老问题,但这里是python中R FDR代码的翻译(可能效率很低):

def FDR(x):
    """
    Assumes a list or numpy array x which contains p-values for multiple tests
    Copied from p.adjust function from R  
    """
    o = [i[0] for i in sorted(enumerate(x), key=lambda v:v[1],reverse=True)]
    ro = [i[0] for i in sorted(enumerate(o), key=lambda v:v[1])]
    q = sum([1.0/i for i in xrange(1,len(x)+1)])
    l = [q*len(x)/i*x[j] for i,j in zip(reversed(xrange(1,len(x)+1)),o)]
    l = [l[k] if l[k] < 1.0 else 1.0 for k in ro]
    return l

答案 6 :(得分:0)

嗯,为了让您的代码正常工作,我猜这样的事情会起作用:

import rpy2.robjects as R

pvalue_list = [2.26717873145e-10, 1.36209234286e-11 , 0.684342083821...] # my pvalues
p_adjust = R['p.adjust'](R.FloatVector(pvalue_list),method='BH')
for v in p_adjust:
    print v

如果p.adjust足够简单,你可以用Python编写它,这样你就不需要调用R.如果你想要使用它很多,你可以制作一个简单的Python包装器:

def adjust_pvalues(pvalues, method='BH'):
    return R['p.adjust'](R.FloatVector(pvalues), method=method)