Question

使用熊猫数据框，例如：

import pandas as pd
df = pd.DataFrame({'a': [1,0,0], 'b': [1,0,0]})

我使用了Pandas: sum DataFrame rows for given columns中的答案来总结两列：

foo = df[['a', 'b']].sum(axis=1)

我现在正在努力的是如何过滤分配给foo的行。因此，例如，我只希望大于0的行出现在foo中存储的结果中。有谁知道最好的方法吗？

Answer 1

使用：

foo = df[['a', 'b']]

mask = foo.gt(0).all(axis=1)

out = foo[mask].sum(axis=1)
print (out)
0    2
dtype: int64

详细信息：

用DataFrame.gt（>）比较以获得更大的值：

print (foo.gt(0))
       a      b
0   True   True
1  False  False
2  False  False

然后测试每行DataFrame.all的值是否为True，如果需要测试至少一个True，也可以使用DataFrame.any，这意味着每行的值更大行：

print (foo.gt(0).all(axis=1))
0     True
1    False
2    False
dtype: bool

但是如果要通过foo进行过滤，请使用boolean indexing，并且由于foo和df中的相同索引通过foo创建掩码并过滤原始{{1} }：

DataFrame

详细信息：

foo = df[['a', 'b']].sum(axis=1)

df = df[foo.gt(0)]
print (df)
   a  b
0  1  1

Answer 2

使用基本。您可以使用conditionality和dropna之类的熊猫基础。

df = pd.DataFrame({'a': [1,0,0], 'b': [1,0,0]})
foo = df[['a', 'b']].sum(axis=1)
foo = pd.DataFrame(foo)  # Converting foo into DataFrame
foo = foo[foo > 0]  # Applying the conditionality search
foo.dropna(axis=0, inplace=True)  # Droping the NaN values
foo.columns = ['Result']   # Changeing the name of column
foo

输出

    Result
0     2.0

希望对您有帮助。

熊猫：列求和的过滤结果

2 个答案: