在计算基因数据库的重叠时如何填充numpy.random.hypergeometric

时间:2018-08-13 19:27:27

标签: python database numpy statistics

根据此描述:http://software.broadinstitute.org/gsea/msigdb/help_annotations.jsp#overlap

我想根据以下条件填充k,K,n,N:

k is the number of genes in the intersection of the query set with a set from MSigDB
K is the number of genes in the set from MSigDB
N is the total number of gene universe (all known human gene symbols)
n is the number of genes in the query set

我的示例数据来自代码:

Nr. Gene Sets in Collections (K):  5924
Nr. Overlaps (k):  5
Nr. Genes in Universe (N):  45956
Nr. of Genes in query set n:  31736

如果您单击链接(http://software.broadinstitute.org/gsea/msigdb/gene_families.jsp),然后在第一行的“肿瘤抑制器”图上将您转发到新站点,您可以在其中选择“ C1,C2,C3”计算重叠部分,然后计算p值。 或者,您可以使用https://keisan.casio.com/exec/system/1180573201

我尝试了不同的情况:

np.random.hypergeometric(k-1,N,K,n) 
> array([0, 1, 1, ..., 0, 0, 0])

np.random.hypergeometric(k-1,n, N, K)
> *** ValueError: ngood + nbad < nsample

np.random.hypergeometric(k-1, n-k, K, size=N)
> [0 1 0 ... 1 0 0]

有人可以告诉我,如何根据上面的描述正确填写np.random.hypergeometric(k-1,n, N, K)吗?


编辑:

我发现我使用了错误的功能。这将是更好的选择:

What are equivalents to R's "phyper" function in Python?

0 个答案:

没有答案