我试图在每个滑动窗口内为这些数据计算一些相同的ID:
DATE ID s_2_count s_3_count s_5_count
2017-05-17 15:49:51 s_2 2 0 1
2017-05-17 15:49:52 s_5 1 1 1
2017-05-17 15:49:55 s_2 1 1 1
2017-05-17 15:49:56 s_3 0 1 2
2017-05-17 15:49:58 s_5 NaN NaN NaN
2017-05-17 15:49:59 s_5 NaN NaN NaN
我正在尝试计算大小为3的滚动窗口内相同ID的数量,它们相互重叠。答案应该是这样的:
category
答案 0 :(得分:2)
使用str.get_dummies
,rolling
,sum
,shift
和add_prefix
:
df.ID.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count')
输出:
s_2_count s_3_count s_5_count
DATE
2017-05-17 15:49:51 2.0 0.0 1.0
2017-05-17 15:49:52 1.0 1.0 1.0
2017-05-17 15:49:55 1.0 1.0 1.0
2017-05-17 15:49:56 0.0 1.0 2.0
2017-05-17 15:49:58 NaN NaN NaN
2017-05-17 15:49:59 NaN NaN NaN
让我们将其分配回数据帧:
df.assign(**df.ID.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count'))
或使用联接
df.join(df.ID.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count'))
输出:
ID s_2_count s_3_count s_5_count
DATE
2017-05-17 15:49:51 s_2 2.0 0.0 1.0
2017-05-17 15:49:52 s_5 1.0 1.0 1.0
2017-05-17 15:49:55 s_2 1.0 1.0 1.0
2017-05-17 15:49:56 s_3 0.0 1.0 2.0
2017-05-17 15:49:58 s_5 NaN NaN NaN
2017-05-17 15:49:59 s_5 NaN NaN NaN
选项2使用pd.crosstab
df.assign(**pd.crosstab(df.index,df.ID).rolling(3).sum().shift(-2))
或使用加入
df.join(pd.crosstab(df.index,df.ID).rolling(3).sum().shift(-2))