df1 = pd.DataFrame({'id':[44,44,44,88,88,90,95],
'Old Status': ['Draft','Submit','Return','Submit','Accept',
'Draft','Draft'],
'New Status' : ['Submit','Return','Reject','Accept','Develop',
'Submit','Reject'],
'Datetime': ['2018-10-24 08:12:02',
'2018-10-24 18:12:02', '2018-11-24 08:56:02',
'2018-10-24 10:12:02','2018-10-29 13:17:02',
'2018-12-30 08:43:12', '2019-01-24 06:12:02']
}, columns = ['id','Old Status', 'New Status', 'Datetime'])
df1['Datetime'] = pd.to_datetime(df1['Datetime'])
df1
id Old Status New Status Datetime
0 44 Draft Submit 2018-10-24 08:12:02
1 44 Sumbit Return 2018-10-24 18:12:02
2 44 Return Reject 2018-11-24 08:56:02
3 88 Submit Accept 2018-10-24 10:12:02
4 88 Accept Develop 2018-10-29 13:17:02
5 90 Draft Submit 2018-12-30 08:43:12
6 95 Draft Reject 2019-01-24 06:12:02
我具有上述格式的数据框,但是在可视化数据时我需要使事情变得更容易,因此我需要两列,即“状态输入”和“状态输出”日期。对于任何Datetime.loc[n]
,“状态输入”将等于Status Out
列,Datetime.loc[n+1]
将等于id
。
下一行具有新的id
时,可以假设New Status
是当前状态,因此Status Out
将是null
。
我一直在研究,但似乎找不到与此相关的任何问题。因此,我开始使用循环,但是感觉很丑陋,我知道必须有更多的“熊猫”方式来做到这一点。
到目前为止,我有以下内容。然后,我计划添加条件来处理id
的更改,然后转换为数据框,但是感觉很错:
df['Status In'] = df['Datetime']
s_out = [0]*(df['Status In'].count()-1)
for el in range(0,df['Status In'].count()-1):
s_out[el] = df['Status In'].iloc[el+1]
最终结果类似于:
id Old Status New Status Status In Status Out
0 44 Draft Submit 2018-10-24 08:12:02 2018-10-24 18:12:02
1 44 Sumbit Return 2018-10-24 18:12:02 2018-11-24 08:56:02
2 44 Return Reject 2018-11-24 08:56:02 NaN
3 88 Submit Accept 2018-10-24 10:12:02 2018-10-29 13:17:02
4 88 Accept Develop 2018-10-29 13:17:02 NaN
5 90 Draft Submit 2018-12-30 08:43:12 NaN
6 95 Draft Reject 2019-01-24 06:12:02 NaN
在Python / Pandas中是否有更好,更清洁的方法而不使用for循环和if语句?
答案 0 :(得分:1)
首先使用shift
,然后使用Series.where
遮罩eq
:
shifted = df1.groupby('id')['Datetime','Old Status'].shift(-1)
print (shifted)
Datetime Old Status
0 2018-10-24 18:12:02 Submit
1 2018-11-24 08:56:02 Return
2 NaT NaN
3 2018-10-29 13:17:02 Accept
4 NaT NaN
5 NaT NaN
6 NaT NaN
df1['Status Out'] = shifted['Datetime'].where(df1['New Status'].eq(shifted['Old Status']))
print (df1)
id Old Status New Status Datetime Status Out
0 44 Draft Submit 2018-10-24 08:12:02 2018-10-24 18:12:02
1 44 Submit Return 2018-10-24 18:12:02 2018-11-24 08:56:02
2 44 Return Reject 2018-11-24 08:56:02 NaT
3 88 Submit Accept 2018-10-24 10:12:02 2018-10-29 13:17:02
4 88 Accept Develop 2018-10-29 13:17:02 NaT
5 90 Draft Submit 2018-12-30 08:43:12 NaT
6 95 Draft Reject 2019-01-24 06:12:02 NaT