我有一个包含这样的列的数据框:
pct_change
0 NaN
1 -0.029767
2 0.039884 # period of one
3 -0.026398
4 0.044498 # period of two
5 0.061383 # period of two
6 -0.006618
7 0.028240 # period of one
8 -0.009859
9 -0.012233
10 0.035714 # period of three
11 0.042547 # period of three
12 0.027874 # period of three
13 -0.008823
14 -0.000131
15 0.044907 # period of one
我希望将pct变化为正的所有期间都归入一个列表,因此在示例列中将是:
raise_periods = [1,2,1,3,1]
答案 0 :(得分:1)
假设数据框的列是一个名为y
的序列,其中包含pct_changes
,下面的代码将提供一个无循环的矢量化解决方案。
y = df['pct_change']
raise_periods = (y < 0).cumsum()[y > 0]
raise_periods.groupby(raise_periods).count()
答案 1 :(得分:0)
最终,@ gioxc88提供的答案并没有使我到达想要的位置,但这确实使我朝着正确的方向前进。
我最终要做的是这样:
def get_rise_avg_period(cls, df):
df[COMPOUND_DIFF] = df[NEWS_COMPOUND].diff()
df[CONSECUTIVE_COMPOUND] = df[COMPOUND_DIFF].apply(lambda x: 1 if x > 0 else 0)
# group together the periods of rise and down changes
unfiltered_periods = [list(group) for key, group in itertools.groupby(df.consecutive_high.values.tolist())]
# filter out only the rise periods
positive_periods = [li for li in unfiltered_periods if 0 not in li]
我想获得这个正周期的平均长度,所以我在末尾添加了它:
period = round(np.mean(positive_periods_lens))