比较熊猫数据框中的值并返回新值

时间:2020-08-12 14:59:54

标签: python pandas

我有一个这样的熊猫数据框:

         date        id      tier
0      2020-06-02    23      3
1      2020-06-02    23      2
2      2020-06-02    23      1
3      2020-06-02    7       3
23026  2020-06-20     7      3
41740  2020-07-07    9       3

如果以前的值与当前值相同或没有以前的值,我想从值为0的“层”中创建一个新列,如果以前的值大于当前值,则为1,并且-每隔1个案例,例如:

         date        id      tier  move
0      2020-06-02    23      3      0
1      2020-06-02    23      2      1
2      2020-06-02    23      1      1
3      2020-06-02    23      3      -1
23026  2020-06-20     7      3       0
41740  2020-07-07    9       3       0

根据我的回答,我主要尝试了.shift(),但无济于事。当我这样做时:

if df['tier'].shift() < df['tier']:
  df['Movement'] = -1
elif df['tier'].shift() == df['tier']:
  df['Movement'] = 0
else:
  df['Movement'] = 1

这将导致DF的形状不同'ValueError:操作数不能与形状一起广播(78792,)(385,2)' 但是只有一个df正在使用,不知道我的代码是否不好,或者(385,2)是从哪里来的? 谢谢!

4 个答案:

答案 0 :(得分:2)

使用numpy.select

import numpy as np
conditions=[df['tier'].shift().fillna(df['tier']).eq(df['tier']),
            df['tier'].shift().fillna(df['tier']).gt(df['tier'])]
choices=[0,1]

df['move']=np.select(conditions, choices, default=-1)

输出:

df
             date  id  tier  move
0      2020-06-02  23     3     0
1      2020-06-02  23     2     1
2      2020-06-02  23     1     1
3      2020-06-02   7     3    -1
23026  2020-06-20   7     3     0
41740  2020-07-07   9     3     0

答案 1 :(得分:1)

您可以使用series.diffseries.clip

>>> df.assign(move= (-df.tier.diff(1)).fillna(0).clip(-1,1).astype(int))
             date  id  tier  move
0      2020-06-02  23     3     0
1      2020-06-02  23     2     1
2      2020-06-02  23     1     1
3      2020-06-02   7     3    -1
23026  2020-06-20   7     3     0
41740  2020-07-07   9     3     0

答案 2 :(得分:1)

您可以在np.sign的反面使用diff

df['move'] = np.sign(-df['tier'].diff().fillna(0))
             date  id  tier  move
0      2020-06-02  23     3   0.0
1      2020-06-02  23     2   1.0
2      2020-06-02  23     1   1.0
3      2020-06-02   7     3  -1.0
23026  2020-06-20   7     3   0.0
41740  2020-07-07   9     3   0.0

答案 3 :(得分:0)

向量化的np.where效果很好

import numpy as np
data = """rid         date        id      tier
0      2020-06-02    23      3
1      2020-06-02    23      2
2      2020-06-02    23      1
3      2020-06-02    7       3
23026  2020-06-20     7      3
41740  2020-07-07    9       3"""
a = [[t for t in l.split(" ") if t!=""]  for l in data.split("\n")]
df = ( pd.DataFrame(a[1:], columns=a[0])
 .astype({"id":"int64","rid":"int64","tier":"int64","date":"datetime64"})
 .set_index("rid")
)
df.assign(move=lambda dfa: 
          np.where(dfa["tier"].shift()<dfa["tier"], -1, 
                  np.where(dfa["tier"].shift()>dfa["tier"], 1, 0))
         )