我有一个数据帧(p4p5_merge
),目前看起来像这样:
SampleID expr Gene Period tag \
1 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996
2 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996
3 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996
4 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996
5 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749
6 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749
7 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749
8 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749
9 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749
10 HSB195 0.214731 ENSG00000188157 4 HSB195|ENSG00000188157
11 HSB195 0.214731 ENSG00000188157 4 HSB195|ENSG00000188157
12 HSB195 0.214731 ENSG00000188157 4 HSB195|ENSG00000188157
14 HSB152 5.062444 ENSG00000188157 4 HSB152|ENSG00000188157
15 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
16 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
17 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
18 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
19 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
20 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
21 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
22 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
23 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
Consequence
1 upstream_gene_variant
2 upstream_gene_variant
3 upstream_gene_variant
4 upstream_gene_variant
5 upstream_gene_variant
6 upstream_gene_variant
7 upstream_gene_variant
8 upstream_gene_variant
9 upstream_gene_variant
10 upstream_gene_variant
11 upstream_gene_variant
12 upstream_gene_variant
14 upstream_gene_variant
15 upstream_gene_variant
16 upstream_gene_variant
17 upstream_gene_variant
18 upstream_gene_variant
19 upstream_gene_variant
20 upstream_gene_variant
21 upstream_gene_variant
22 upstream_gene_variant
23 intron_variant
我现在想按Gene
进行分组,按降序expr
进行排序,然后将数据帧向下过滤到位于expr
值PER {{ 1}}组(第10个百分点)。因此,我执行以下操作:
1)按降序排序(SUCCEEDS)
Gene
2)按基因分组并过滤掉表达/基因的前10%(FAILS)
p4p5_sort= p4p5_merge.sort_values(['expr', 'Gene'],
ascending=[False, True]).reset_index(drop=True)
第1步可以实现预期的效果,但是当我运行第2步时,我只会得到以下响应:
p4p5_bottom10 = (p4p5_sort[p4p5_sort.groupby('Gene')['expr'].
apply(lambda x: x < x.quantile(0.1))])
如果有帮助,我想完成的R等效操作是:
sys:1: DtypeWarning: Columns (15,16,22,36,37,38,39) have mixed types. Specify dtype option on import or set low_memory=False.
Empty DataFrame
Columns: [SampleID, expr, Gene, Period, tag, Consequence]
Index: []
答案 0 :(得分:0)
您可以按照以下方法将分位数直接应用于grouby:
p4p5_bottom10 = pd.DataFrame(p4p5_sort.groupby(['Gene'])['expr'].quantile(0.1))
我们必须应用pd.DataFrame()转换为DF。