我有一个这样的熊猫数据框:
date id tier
0 2020-06-02 23 3
1 2020-06-02 23 2
2 2020-06-02 23 1
3 2020-06-02 7 3
23026 2020-06-20 7 3
41740 2020-07-07 9 3
如果以前的值与当前值相同或没有以前的值,我想从值为0的“层”中创建一个新列,如果以前的值大于当前值,则为1,并且-每隔1个案例,例如:
date id tier move
0 2020-06-02 23 3 0
1 2020-06-02 23 2 1
2 2020-06-02 23 1 1
3 2020-06-02 23 3 -1
23026 2020-06-20 7 3 0
41740 2020-07-07 9 3 0
根据我的回答,我主要尝试了.shift(),但无济于事。当我这样做时:
if df['tier'].shift() < df['tier']:
df['Movement'] = -1
elif df['tier'].shift() == df['tier']:
df['Movement'] = 0
else:
df['Movement'] = 1
这将导致DF的形状不同'ValueError:操作数不能与形状一起广播(78792,)(385,2)' 但是只有一个df正在使用,不知道我的代码是否不好,或者(385,2)是从哪里来的? 谢谢!
答案 0 :(得分:2)
使用numpy.select
:
import numpy as np
conditions=[df['tier'].shift().fillna(df['tier']).eq(df['tier']),
df['tier'].shift().fillna(df['tier']).gt(df['tier'])]
choices=[0,1]
df['move']=np.select(conditions, choices, default=-1)
输出:
df
date id tier move
0 2020-06-02 23 3 0
1 2020-06-02 23 2 1
2 2020-06-02 23 1 1
3 2020-06-02 7 3 -1
23026 2020-06-20 7 3 0
41740 2020-07-07 9 3 0
答案 1 :(得分:1)
您可以使用series.diff
和series.clip
:
>>> df.assign(move= (-df.tier.diff(1)).fillna(0).clip(-1,1).astype(int))
date id tier move
0 2020-06-02 23 3 0
1 2020-06-02 23 2 1
2 2020-06-02 23 1 1
3 2020-06-02 7 3 -1
23026 2020-06-20 7 3 0
41740 2020-07-07 9 3 0
答案 2 :(得分:1)
您可以在np.sign
的反面使用diff
:
df['move'] = np.sign(-df['tier'].diff().fillna(0))
date id tier move
0 2020-06-02 23 3 0.0
1 2020-06-02 23 2 1.0
2 2020-06-02 23 1 1.0
3 2020-06-02 7 3 -1.0
23026 2020-06-20 7 3 0.0
41740 2020-07-07 9 3 0.0
答案 3 :(得分:0)
向量化的np.where
效果很好
import numpy as np
data = """rid date id tier
0 2020-06-02 23 3
1 2020-06-02 23 2
2 2020-06-02 23 1
3 2020-06-02 7 3
23026 2020-06-20 7 3
41740 2020-07-07 9 3"""
a = [[t for t in l.split(" ") if t!=""] for l in data.split("\n")]
df = ( pd.DataFrame(a[1:], columns=a[0])
.astype({"id":"int64","rid":"int64","tier":"int64","date":"datetime64"})
.set_index("rid")
)
df.assign(move=lambda dfa:
np.where(dfa["tier"].shift()<dfa["tier"], -1,
np.where(dfa["tier"].shift()>dfa["tier"], 1, 0))
)