Question

我有一个包含这样的列的数据框：

      pct_change
0           NaN
1     -0.029767
2      0.039884 # period of one
3     -0.026398
4      0.044498 # period of two
5      0.061383 # period of two
6     -0.006618 
7      0.028240 # period of one
8     -0.009859
9     -0.012233
10     0.035714 # period of three
11     0.042547 # period of three
12     0.027874 # period of three
13    -0.008823
14    -0.000131
15     0.044907 # period of one

我希望将pct变化为正的所有期间都归入一个列表，因此在示例列中将是：

raise_periods = [1,2,1,3,1]

Answer 1

假设数据框的列是一个名为y的序列，其中包含pct_changes，下面的代码将提供一个无循环的矢量化解决方案。

y = df['pct_change']
raise_periods = (y < 0).cumsum()[y > 0]
raise_periods.groupby(raise_periods).count()

Answer 2

最终，@ gioxc88提供的答案并没有使我到达想要的位置，但这确实使我朝着正确的方向前进。

我最终要做的是这样：

    def get_rise_avg_period(cls, df):
        df[COMPOUND_DIFF] = df[NEWS_COMPOUND].diff()
        df[CONSECUTIVE_COMPOUND] = df[COMPOUND_DIFF].apply(lambda x: 1 if x > 0 else 0)
        # group together the periods of rise and down changes
        unfiltered_periods = [list(group) for key, group in itertools.groupby(df.consecutive_high.values.tolist())]

        # filter out only the rise periods
        positive_periods = [li for li in unfiltered_periods if 0 not in li]

我想获得这个正周期的平均长度，所以我在末尾添加了它：

period = round(np.mean(positive_periods_lens))

熊猫，获取pct更改期平均值

2 个答案: