根据此描述:http://software.broadinstitute.org/gsea/msigdb/help_annotations.jsp#overlap
我想根据以下条件填充k,K,n,N:
k is the number of genes in the intersection of the query set with a set from MSigDB
K is the number of genes in the set from MSigDB
N is the total number of gene universe (all known human gene symbols)
n is the number of genes in the query set
我的示例数据来自代码:
Nr. Gene Sets in Collections (K): 5924
Nr. Overlaps (k): 5
Nr. Genes in Universe (N): 45956
Nr. of Genes in query set n: 31736
如果您单击链接(http://software.broadinstitute.org/gsea/msigdb/gene_families.jsp),然后在第一行的“肿瘤抑制器”图上将您转发到新站点,您可以在其中选择“ C1,C2,C3”计算重叠部分,然后计算p值。 或者,您可以使用https://keisan.casio.com/exec/system/1180573201。
我尝试了不同的情况:
np.random.hypergeometric(k-1,N,K,n)
> array([0, 1, 1, ..., 0, 0, 0])
np.random.hypergeometric(k-1,n, N, K)
> *** ValueError: ngood + nbad < nsample
np.random.hypergeometric(k-1, n-k, K, size=N)
> [0 1 0 ... 1 0 0]
有人可以告诉我,如何根据上面的描述正确填写np.random.hypergeometric(k-1,n, N, K)
吗?
编辑:
我发现我使用了错误的功能。这将是更好的选择: