Question

我有一个表示％丰度的计数矩阵，样本为列，观察为行，例如：

#OTUId  101.BGd_295  103.BGd_309  105.BGd_310  11.BGd_99   123.BGd_312  
OTU_200 0.016806723  0.23862789   0.148210883  0.6783      0.126310471  
OTU_54  0.253542133  0.169383866  0            0.113679432 0.173943294
OTU_2   0.033613445  16.58463833  19.66970146  16.06669119 20.92537833

我正在尝试使用pandas过滤数据帧，只保留那些至少有一个值超过0.5％的行。我最初发现了这个

df = df[(df > 0.5).sum(axis=1) >= 1]

我认为可以做到这一点但现在据我所知，这将保留那些行中的总和大于0.5的那些。如何修改它以适应？

谢谢！

Answer 1

我认为更简单的解决方案是布尔数据框架的使用条件，然后any检查每行至少一个True，最后按boolean indexing过滤：

print (df.drop('#OTUId',axis=1) > 0.5)
   101.BGd_295  103.BGd_309  105.BGd_310  11.BGd_99  123.BGd_312
0        False        False        False       True        False
1        False        False        False      False        False
2        False         True         True       True         True

print ((df.drop('#OTUId',axis=1) > 0.5).any(axis=1))
0     True
1    False
2     True
dtype: bool

df = df[(df.drop('#OTUId',axis=1) > 0.5).any(axis=1)]
print (df)
    #OTUId  101.BGd_295  103.BGd_309  105.BGd_310  11.BGd_99  123.BGd_312
0  OTU_200     0.016807     0.238628     0.148211   0.678300     0.126310
2    OTU_2     0.033613    16.584638    19.669701  16.066691    20.925378

您的代码：

df = df[(df > 0.5).sum(axis=1) >= 1]

#boolean mask
print (df > 0.5)
   #OTUId  101.BGd_295  103.BGd_309  105.BGd_310  11.BGd_99  123.BGd_312
0    True        False        False        False       True        False
1    True        False        False        False      False        False
2    True        False         True         True       True         True

#count True values per row
print ((df > 0.5).sum(axis=1))
0    2
1    1
2    5
dtype: int64

#check values by condition
print ((df > 0.5).sum(axis=1) >= 1)
0    True
1    True
2    True
dtype: bool

根据所有列

1 个答案: