Question

根据此描述：http://software.broadinstitute.org/gsea/msigdb/help_annotations.jsp#overlap

我想根据以下条件填充k，K，n，N：

k is the number of genes in the intersection of the query set with a set from MSigDB
K is the number of genes in the set from MSigDB
N is the total number of gene universe (all known human gene symbols)
n is the number of genes in the query set

我的示例数据来自代码：

Nr. Gene Sets in Collections (K):  5924
Nr. Overlaps (k):  5
Nr. Genes in Universe (N):  45956
Nr. of Genes in query set n:  31736

如果您单击链接（http://software.broadinstitute.org/gsea/msigdb/gene_families.jsp），然后在第一行的“肿瘤抑制器”图上将您转发到新站点，您可以在其中选择“ C1，C2，C3”计算重叠部分，然后计算p值。或者，您可以使用https://keisan.casio.com/exec/system/1180573201。

我尝试了不同的情况：

np.random.hypergeometric(k-1,N,K,n) 
> array([0, 1, 1, ..., 0, 0, 0])

np.random.hypergeometric(k-1,n, N, K)
> *** ValueError: ngood + nbad < nsample

np.random.hypergeometric(k-1, n-k, K, size=N)
> [0 1 0 ... 1 0 0]

有人可以告诉我，如何根据上面的描述正确填写np.random.hypergeometric(k-1,n, N, K)吗？

编辑：

我发现我使用了错误的功能。这将是更好的选择：

What are equivalents to R's "phyper" function in Python?

在计算基因数据库的重叠时如何填充numpy.random.hypergeometric

0 个答案: