>>> df.head()
            Open   High    Low  Close  Volume
2004-08-19  49.96  51.98  47.93  50.12     NaN
2004-08-20  50.69  54.49  50.20  54.10     NaN
2004-08-23  55.32  56.68  54.47  54.65     NaN
2004-08-24  55.56  55.74  51.73  52.38     NaN
2004-08-25  52.43  53.95  51.89  52.95     NaN

对于上面的示例,我希望另一列df [' RDA']连续每天增加,即Open列超过50.连续每天低于50,我&# 39; d喜欢第二列df [' RDB']增加和df [' RDA']重置为0.我已尝试if / then逻辑但它没有& #39; t那样,并给我一个值错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). how can i sort it out


>>> df.head()
            Open   High    Low  Close  Volume    RDA   RDB
2004-08-19  51.96  51.98  47.93  50.12     NaN    1      0
2004-08-20  50.69  54.49  50.20  54.10     NaN    2      0
2004-08-23  55.32  56.68  54.47  54.65     NaN    3      0
2004-08-24  45.56  55.74  51.73  52.38     NaN    0      1
2004-08-25  42.43  53.95  51.89  52.95     NaN    0      2
2004-08-26  41.96  51.98  47.93  50.12     NaN    0      3
2004-08-27  40.69  54.49  50.20  54.10     NaN    0      4
2004-08-28  55.32  56.68  54.47  54.65     NaN    1      0
2004-08-29  55.56  55.74  51.73  52.38     NaN    2      0
2004-08-30  52.43  53.95  51.89  52.95     NaN    3      0

这是熊猫可以做到的事情吗?我知道你可以计算一列中的值,但我到目前为止还没能找到连续值的方法。带有2个变量的if / then语句可以工作,但就像我上面提到的那样,当我尝试这个时,我得到一个值错误。任何帮助将不胜感激。

5 个答案:

答案 0 :(得分:2)

  • 我会对np.signOpen之间的差异使用50。当小于-1时,500501大于50时为np.diff
  • 接下来,我将使用cumsum来确定何时从一个值切换到另一个值
  • 然后我将使用cumcount来定义连续符号组
  • 接下来,我将使用np.where来获取群组内的计数
  • 最后,我将使用cumcounts拆分o = df.Open.values - 50 signs = np.sign(o) changes = np.append(False, signs[:-1] != signs[1:]) g = changes.cumsum() cumcounts = df.groupby(g).cumcount() + 1 a = np.where(signs == 1, cumcounts, 0) b = np.where(signs == -1, cumcounts, 0) df.assign(RDA=a, RDB=b) Open High Low Close Volume RDA RDB Date 2004-08-19 51.96 51.98 47.93 50.12 NaN 1 0 2004-08-20 50.69 54.49 50.20 54.10 NaN 2 0 2004-08-23 55.32 56.68 54.47 54.65 NaN 3 0 2004-08-24 45.56 55.74 51.73 52.38 NaN 0 1 2004-08-25 42.43 53.95 51.89 52.95 NaN 0 2 2004-08-26 41.96 51.98 47.93 50.12 NaN 0 3 2004-08-27 40.69 54.49 50.20 54.10 NaN 0 4 2004-08-28 55.32 56.68 54.47 54.65 NaN 1 0 2004-08-29 55.56 55.74 51.73 52.38 NaN 2 0 2004-08-30 52.43 53.95 51.89 52.95 NaN 3 0
答案 1 :(得分:2)


然后,您可以使用compare-cumsum-groupby pattern来识别此标记的累积分组,并将cumsum应用于每个此类组。


最后,我们删除flag列(我使用.iloc[:, :-1]删除它,因为我将其添加为最后一列)并附加新的RDARDB

target_price = 50
df = df.assign(flag=df.Open.gt(target_price))  # True if `Open` greater than `target_price`, otherwise False.

rda = df.groupby((df['flag'] != df['flag'].shift()).cumsum()).flag.cumsum()
df['flag'] = ~df['flag']  # Invert flag for RDB.
rdb = df.groupby((df['flag'] != df['flag'].shift()).cumsum()).flag.cumsum()

df = df.iloc[:, :-1].assign(RDA=rda, RDB=rdb)
>>> df
      Date   Open   High    Low  Close  Volume  RDA  RDB
0  8/19/04  51.96  51.98  47.93  50.12     NaN    1    0
1  8/20/04  50.69  54.49  50.20  54.10     NaN    2    0
2  8/23/04  55.32  56.68  54.47  54.65     NaN    3    0
3  8/24/04  45.56  55.74  51.73  52.38     NaN    0    1
4  8/25/04  42.43  53.95  51.89  52.95     NaN    0    2
5  8/26/04  41.96  51.98  47.93  50.12     NaN    0    3
6  8/27/04  40.69  54.49  50.20  54.10     NaN    0    4
7  8/28/04  55.32  56.68  54.47  54.65     NaN    1    0
8  8/29/04  55.56  55.74  51.73  52.38     NaN    2    0
9  8/30/04  52.43  53.95  51.89  52.95     NaN    3    0

答案 2 :(得分:1)


target = df.Open > 50

这将是您稍后将functools.reduce传递给&#34;减少&#34;。 Reduce基本上是map,但在列表元素中保留一个值。这可以用来做你想要的。



通过将初始化程序设置为0中的值为[0]的列表,这需要一点点精力,因此在第一遍中它可以采用&#34; last&# 34;元素,并做一些事情,而不是错误。


您的True列完全相同,但您要确保目标列表中不是not,只需要在条件语句中添加import functools # Create a boolean series of your Open column target = df.Open > 50 # For every item in your boolean series add a 1 to the previous value if it's over 50, otherwise reset df['RDA'] = functools.reduce(lambda x, y: x + ([x[-1] + 1] if y else [0]), target, [0])[1:] # Repeat, but for every `False` value in the series df['RDB'] = functools.reduce(lambda x, y: x + ([x[-1] + 1] if not y else [0]), target, [0])[1:] >>> df.head() Open High Low Close Volume RDA RDB Date 2004-08-19 49.96 51.98 47.93 50.12 NaN 0 1 2004-08-20 50.69 54.49 50.20 54.10 NaN 1 0 2004-08-23 55.32 56.68 54.47 54.65 NaN 2 0 2004-08-24 55.56 55.74 51.73 52.38 NaN 3 0 2004-08-25 52.43 53.95 51.89 52.95 NaN 4 0 。< / p>



答案 3 :(得分:0)

答案 4 :(得分:0)


In [226]: def increment(row):
     ...:     global rda
     ...:     global rdb
     ...:     if row.Open > 50:
     ...:         row.RDA = int(next(rda))
     ...:         rdb = count()
     ...:     else:
     ...:         row.RDB = next(rdb)
     ...:         rda = int(count())
     ...:     return row
In [227]: df['RDA'] = 0
In [228]: df['RDB'] = 0
In [229]: df.apply(increment, axis=1)
             Open   High    Low  Close  Volume  RDA  RDB
2004-08-19  49.96  51.98  47.93  50.12     NaN  0.0  1.0
2004-08-20  50.69  54.49  50.20  54.10     NaN  0.0  0.0
2004-08-23  55.32  56.68  54.47  54.65     NaN  1.0  0.0
2004-08-24  55.56  55.74  51.73  52.38     NaN  2.0  0.0
2004-08-25  52.43  53.95  51.89  52.95     NaN  3.0  0.0

我不知道为什么他们会在列中出现漂浮物,我猜大熊猫认为这就是你想要的。数据最初来自count int。我通常不喜欢全局变量,但DataFrame.apply在变量超出increment函数时无法访问变量。