根据列值删除多索引数据帧,删除级别内的所有行

时间:2016-09-15 21:06:21

标签: python pandas dataframe multi-index

我正在尝试根据一个或多个值过滤DataFrame。以下是CSV示例:

AlignmentId,TranscriptId,classifier,value
ENSMUST00000025010-1,ENSMUST00000025010,AlnCoverage,0.99612
ENSMUST00000025010-1,ENSMUST00000025010,AlnIdentity,0.93553
ENSMUST00000025010-1,ENSMUST00000025010,Badness,0.06749
ENSMUST00000025014-1,ENSMUST00000025014,AlnCoverage,1.0
ENSMUST00000025014-1,ENSMUST00000025014,AlnIdentity,0.96382
ENSMUST00000025014-1,ENSMUST00000025014,Badness,0.03618

加载时:

>>> df = pd.read_csv('tmp.csv', index_col=['AlignmentId', 'TranscriptId'])
>>> df
                                          classifier    value
AlignmentId          TranscriptId
ENSMUST00000025010-1 ENSMUST00000025010  AlnCoverage  0.99612
                     ENSMUST00000025010  AlnIdentity  0.93553
                     ENSMUST00000025010      Badness  0.06749
ENSMUST00000025014-1 ENSMUST00000025014  AlnCoverage  1.00000
                     ENSMUST00000025014  AlnIdentity  0.96382
                     ENSMUST00000025014      Badness  0.03618

我想放弃一系列AlignmentId失败的classifiers组。对于此示例,假设我要删除ENSMUST00000025010,因为AlnCoverage < 1.0。因此,我想最终得到这个数据帧:

ENSMUST00000025014-1 ENSMUST00000025014  AlnCoverage  1.00000
                     ENSMUST00000025014  AlnIdentity  0.96382
                     ENSMUST00000025014      Badness  0.03618

我该怎么办?

1 个答案:

答案 0 :(得分:2)

试试这个:

In [169]: df = df.drop(df[(df.classifier=='AlnCoverage') & (df.value < 1)].index)

In [170]: df
Out[170]:
                                          classifier    value
AlignmentId          TranscriptId
ENSMUST00000025014-1 ENSMUST00000025014  AlnCoverage  1.00000
                     ENSMUST00000025014  AlnIdentity  0.96382
                     ENSMUST00000025014      Badness  0.03618