Question

我想根据其中一列的总和条件选择数据框中的行。例如，我想要数据帧第一行的索引，其中 B 列的总和小于 3：

df = pd.DataFrame({'A':[z, y, x, w], 'B':[1, 1, 1, 1]})

我唯一的解决方案是一个单独的数据帧和一个 while 循环：

df2 = pd.DataFrame({'A':[], 'B':[]})
index = 0
while df2['B'].sum() < 3:
    df2 = df2.append(df1.loc[index])
    index += 1

逻辑让我找到了我需要的地方，但似乎不必要地低效。有没有人有创造性的方法使用 Pandas 根据列的总和条件过滤数据框？

Answer 1

您所描述的是累积总和 (cumsum)。

在循环中向 DataFrame 追加行是 horribly inefficient，因为它在每次迭代时复制整个 DataFrame 只是为了追加额外的少量数据。相反，您应该使用布尔掩码对原始 DataFrame 进行切片；在这种情况下，检查 cumsum 小于 3 的位置。

df2 = df[df['B'].cumsum().lt(3)]

#   A  B
#0  z  1
#1  y  1

df['B'].cumsum()
#0    1
#1    2
#2    3
#3    4

df['B'].cumsum().lt(3)
#0     True     <- Slicing with this Boolean Series
#1     True     <- keeps only these True rows
#2    False
#3    False

使用熊猫过滤列总和上的行

1 个答案: