朋友们从A国开往H国,经过B,C,D,E,F,G和H国。他们在州C和州F交换司机,这些州被称为“交换”。我有每个州之间的经过时间。根据这些数据,我找到了每个州与最终目的地之间的时间以及每个州和下一个州之间的时间。我现在需要找到的是交换之间的时间,即状态C和状态F之间的时间。我需要在大型数据集中为个别旅行做这三件事。
我最初计划在交换之间找到时间,是从第一个交换的ETA中减去第二个交换的ETA。那么,如何在较大的groupby对象中找到组的最后一行与下一组的最后一行之间的差异?即在Trip_Key中,找到最后一个A1 ETA和最后一个A3 ETA之间的差异?谢谢!
以下是生成我的数据帧的代码:
user_dict2 = {'A': {('A'):{'eta':0,'type':'T'},('B'):{'eta':1,'type':'T'},('C'):{'eta':2,'type':'I'},('D'):{'eta':3,'type':'T'},
('E'):{'eta':4,'type':'T'},('F'):{'eta':5,'type':'I'},('G'):{'eta':6,'type':'T'},('H'):{'eta':7,'type':'T'}},
'B':{('A'):{'eta':0,'type':'T'},('B'):{'eta':1,'type':'T'},('C'):{'eta':2,'type':'I'},('D'):{'eta':3,'type':'T'},
('E'):{'eta':4,'type':'T'},('F'):{'eta':5,'type':'I'},('G'):{'eta':6,'type':'T'},('H'):{'eta':7,'type':'T'}}}
d = pd.DataFrame.from_dict({(i,j): user_dict2[i][j]
for i in user_dict2.keys()
for j in user_dict2[i].keys()},
orient='index')
d = d.reset_index()
d['Trip_Key'] = d['level_0']
d['State'] = d['level_1']
del d['level_0']
del d['level_1']
# Groupby Trip_Key and label where 'type' changes
d = d
d["e3"] = d.groupby('Trip_Key')["type"].shift(1)
d["e4"] = d["type"] != d["e3"]
d["e5"] = d.groupby('Trip_Key')["e4"].cumsum()
d.loc[d['type'] == 'I', 'e5'] = d['e5'].shift(1)
d['Inter_Key'] = d['Trip_Key'] + d['e5'].map(int).map(str)
del d['e3']
del d['e4']
del d['e5']
df = d
df['ETA_Shift'] = df.groupby('Trip_Key')['eta'].transform(lambda x: x.shift(-1))
df.fillna(0)
df['ETA_Sum'] = df.iloc[::-1].groupby('Trip_Key')['ETA_Shift'].cumsum()[::-1]
g = df.groupby('Trip_Key').last().reset_index()
df = df.merge(g[['Trip_Key','State']],on=['Trip_Key'],how='outer')
#df['Pair'] = '('+df['SPLC_x']+', '+df['SPLC_y']+')'
df = df.rename(columns={'State_x':'State',
'State_y':'Destination'})
df['ETI_Shift'] = df.groupby('Inter_Key')['eta'].transform(lambda x: x.shift(-1))
df.fillna(0)
df['ETI_Sum'] = df.iloc[::-1].groupby('Inter_Key')['ETI_Shift'].cumsum()[::-1]
g2 = df.groupby('Inter_Key').last().reset_index()
df = df.merge(g2[['Inter_Key','State']],on='Inter_Key',how='outer')
#df['Pair'] = '('+df['State_x']+', '+df['State_y']+')'
df = df.rename(columns={'State_x':'Origin',
'State_y':'Inter_Dest'})
del df['ETA_Shift']
del df['ETI_Shift']
这是它的样子:
+----+-------+--------+------------+----------+-------------+-----------+---------------+-----------+--------------+
| | eta | type | Trip_Key | Origin | Inter_Key | ETA_Sum | Destination | ETI_Sum | Inter_Dest |
|----+-------+--------+------------+----------+-------------+-----------+---------------+-----------+--------------|
| 0 | 0 | T | A | A | A1 | 28 | H | 3 | C |
| 1 | 1 | T | A | B | A1 | 27 | H | 2 | C |
| 2 | 2 | I | A | C | A1 | 25 | H | nan | C |
| 3 | 3 | T | A | D | A3 | 22 | H | 9 | F |
| 4 | 4 | T | A | E | A3 | 18 | H | 5 | F |
| 5 | 5 | I | A | F | A3 | 13 | H | nan | F |
| 6 | 6 | T | A | G | A5 | 7 | H | 7 | H |
| 7 | 7 | T | A | H | A5 | nan | H | nan | H |
| 8 | 0 | T | B | A | B1 | 28 | H | 3 | C |
| 9 | 1 | T | B | B | B1 | 27 | H | 2 | C |
| 10 | 2 | I | B | C | B1 | 25 | H | nan | C |
| 11 | 3 | T | B | D | B3 | 22 | H | 9 | F |
| 12 | 4 | T | B | E | B3 | 18 | H | 5 | F |
| 13 | 5 | I | B | F | B3 | 13 | H | nan | F |
| 14 | 6 | T | B | G | B5 | 7 | H | 7 | H |
| 15 | 7 | T | B | H | B5 | nan | H | nan | H |
+----+-------+--------+------------+----------+-------------+-----------+---------------+-----------+--------------+
编辑:
预期输出为:
+----+--------+-------+------------+----------+-------------+-----------+---------------+-----------+--------------+--------------+-----------------------------+
| | type | eta | Trip_Key | Origin | Inter_Key | ETA_Sum | Destination | ETI_Sum | Inter_Dest | Inter_Time | Index ETA_Sum Subtraction |
|----+--------+-------+------------+----------+-------------+-----------+---------------+-----------+--------------+--------------+-----------------------------|
| 0 | T | 0 | A | A | A1 | 28 | H | 3 | C | 0 | 0 |
| 1 | T | 1 | A | B | A1 | 27 | H | 2 | C | 0 | 0 |
| 2 | I | 2 | A | C | A1 | 25 | H | 0 | C | 12 | 2-5 |
| 3 | T | 3 | A | D | A3 | 22 | H | 9 | F | 0 | 0 |
| 4 | T | 4 | A | E | A3 | 18 | H | 5 | F | 0 | 0 |
| 5 | I | 5 | A | F | A3 | 13 | H | 0 | F | 13 | 5-7 |
| 6 | T | 6 | A | G | A5 | 7 | H | 7 | H | 0 | 0 |
| 7 | T | 7 | A | H | A5 | 0 | H | 0 | H | 0 | 0 |
| 8 | T | 0 | B | A | B1 | 28 | H | 3 | C | 0 | 0 |
| 9 | T | 1 | B | B | B1 | 27 | H | 2 | C | 0 | 0 |
| 10 | I | 2 | B | C | B1 | 25 | H | 0 | C | 12 | 10-13 |
| 11 | T | 3 | B | D | B3 | 22 | H | 9 | F | 0 | 0 |
| 12 | T | 4 | B | E | B3 | 18 | H | 5 | F | 0 | 0 |
| 13 | I | 5 | B | F | B3 | 13 | H | 0 | F | 13 | 13-15 |
| 14 | T | 6 | B | G | B5 | 7 | H | 7 | H | 0 | 0 |
| 15 | T | 7 | B | H | B5 | 0 | H | 0 | H | 0 | 0 |
+----+--------+-------+------------+----------+-------------+-----------+---------------+-----------+--------------+--------------+-----------------------------+
注意:标有“Index ETA_Sum Subtraction”的列仅供说明之用。
答案 0 :(得分:0)
如果我理解你的问题,你可以做类似
的事情In [37]: df.groupby('Trip_Key').apply(lambda x: x[x.Inter_Key.str[-1] == '1'].iloc[-1].ETA_Sum - x[x.Inter_Key.str[-1] == '3'].iloc[-1].ETA_Sum)
Out[37]:
Trip_Key
A 12.0
B 12.0
dtype: float64