Question

我有一个Pandas数据帧X，有两列，'report_me'和'n'。我想获得一个列表（或系列），对于report_me为true的每个元素X，包含数据帧的前两个元素的n个值的总和（无论它们的report_me值如何）。例如，如果数据框是：

X = pd.DataFrame({"report_me":[False,False,False,True,False,
                               False,True,False,False,False],
                  "n":range(10)})

然后我想要结果（3,9）。

一种方法是：

sums = df['n'].shift(1) + df['n'].shift(2)
display(sums[df["report_me"]])

但这很慢，因为它计算所有指数的总和值，而不仅仅是要报告的指数。也可以先尝试使用report_me进行过滤：

reported = df[df["report_me"]]
display(reported["n"].shift(1) + reported["n"].shift(2))

但是这给出了错误的答案，因为现在你已经摆脱了用于计算总和的先前值。有没有办法做到这一点，不做不必要的工作？

Answer 1

如果#include <boost/multiprecision/cpp_int.hpp> using namespace boost::multiprecision int main() { int128_t a = Func_a() int128_t b = Func_b() std::cout << std::max(a, b) << std::endl; return 0; }稀疏，您可以使用numpy解决方案获得一些速度，如下所示：

report_me

_{您可能需要一些额外的逻辑来处理边缘案例，如评论}

中所述

时序：

# find the index where report_me is True
idx = np.where(X.report_me.values)
# find previous two indices when report_me is True, subset the value from n, and sum
X.n.values[idx - np.arange(1,3)[:,None]].sum(axis=0)

Answer 2

X['report_sum'] = (X.loc[X.report_me]
                    .apply(lambda x: X.iloc[[x.name-1, x.name-2]].n.sum(), 
                           axis=1))

   n  report_me  report_sum
0  0      False         NaN
1  1      False         NaN
2  2      False         NaN
3  3       True         3.0
4  4      False         NaN
5  5      False         NaN
6  6       True         9.0
7  7      False         NaN
8  8      False         NaN
9  9      False         NaN

如果您只想要非NaN值，请从赋值语句的右侧取.values。

在pandas中，我如何只对特定的索引子集进行计算？

2 个答案: