计算pandas groupby对象中最后一行之间的差异

时间:2018-01-12 19:58:44

标签: python pandas

朋友们从A国开往H国,经过B,C,D,E,F,G和H国。他们在州C和州F交换司机,这些州被称为“交换”。我有每个州之间的经过时间。根据这些数据,我找到了每个州与最终目的地之间的时间以及每个州和下一个州之间的时间。我现在需要找到的是交换之间的时间,即状态C和状态F之间的时间。我需要在大型数据集中为个别旅行做这三件事。

我最初计划在交换之间找到时间,是从第一个交换的ETA中减去第二个交换的ETA。那么,如何在较大的groupby对象中找到组的最后一行与下一组的最后一行之间的差异?即在Trip_Key中,找到最后一个A1 ETA和最后一个A3 ETA之间的差异?谢谢!

以下是生成我的数据帧的代码:

user_dict2 = {'A': {('A'):{'eta':0,'type':'T'},('B'):{'eta':1,'type':'T'},('C'):{'eta':2,'type':'I'},('D'):{'eta':3,'type':'T'},
             ('E'):{'eta':4,'type':'T'},('F'):{'eta':5,'type':'I'},('G'):{'eta':6,'type':'T'},('H'):{'eta':7,'type':'T'}},
             'B':{('A'):{'eta':0,'type':'T'},('B'):{'eta':1,'type':'T'},('C'):{'eta':2,'type':'I'},('D'):{'eta':3,'type':'T'},
             ('E'):{'eta':4,'type':'T'},('F'):{'eta':5,'type':'I'},('G'):{'eta':6,'type':'T'},('H'):{'eta':7,'type':'T'}}}

d = pd.DataFrame.from_dict({(i,j): user_dict2[i][j] 
                           for i in user_dict2.keys() 
                           for j in user_dict2[i].keys()},
                       orient='index')
d = d.reset_index()
d['Trip_Key'] = d['level_0']
d['State'] = d['level_1']
del d['level_0']
del d['level_1']

# Groupby Trip_Key and label where 'type' changes
d = d
d["e3"] = d.groupby('Trip_Key')["type"].shift(1)
d["e4"] = d["type"] != d["e3"]
d["e5"] = d.groupby('Trip_Key')["e4"].cumsum()
d.loc[d['type'] == 'I', 'e5'] = d['e5'].shift(1)

d['Inter_Key'] = d['Trip_Key'] + d['e5'].map(int).map(str)
del d['e3']
del d['e4']
del d['e5']

df = d
df['ETA_Shift'] = df.groupby('Trip_Key')['eta'].transform(lambda x: x.shift(-1))
df.fillna(0)

df['ETA_Sum'] = df.iloc[::-1].groupby('Trip_Key')['ETA_Shift'].cumsum()[::-1]
g = df.groupby('Trip_Key').last().reset_index()
df = df.merge(g[['Trip_Key','State']],on=['Trip_Key'],how='outer')
#df['Pair'] = '('+df['SPLC_x']+', '+df['SPLC_y']+')'
df = df.rename(columns={'State_x':'State',
                        'State_y':'Destination'})

df['ETI_Shift'] = df.groupby('Inter_Key')['eta'].transform(lambda x: x.shift(-1))
df.fillna(0)

df['ETI_Sum'] = df.iloc[::-1].groupby('Inter_Key')['ETI_Shift'].cumsum()[::-1]
g2 = df.groupby('Inter_Key').last().reset_index()
df = df.merge(g2[['Inter_Key','State']],on='Inter_Key',how='outer')
#df['Pair'] = '('+df['State_x']+', '+df['State_y']+')'
df = df.rename(columns={'State_x':'Origin',
                        'State_y':'Inter_Dest'})

del df['ETA_Shift']
del df['ETI_Shift'] 

这是它的样子:

+----+-------+--------+------------+----------+-------------+-----------+---------------+-----------+--------------+
|    |   eta | type   | Trip_Key   | Origin   | Inter_Key   |   ETA_Sum | Destination   |   ETI_Sum | Inter_Dest   |
|----+-------+--------+------------+----------+-------------+-----------+---------------+-----------+--------------|
|  0 |     0 | T      | A          | A        | A1          |        28 | H             |         3 | C            |
|  1 |     1 | T      | A          | B        | A1          |        27 | H             |         2 | C            |
|  2 |     2 | I      | A          | C        | A1          |        25 | H             |       nan | C            |
|  3 |     3 | T      | A          | D        | A3          |        22 | H             |         9 | F            |
|  4 |     4 | T      | A          | E        | A3          |        18 | H             |         5 | F            |
|  5 |     5 | I      | A          | F        | A3          |        13 | H             |       nan | F            |
|  6 |     6 | T      | A          | G        | A5          |         7 | H             |         7 | H            |
|  7 |     7 | T      | A          | H        | A5          |       nan | H             |       nan | H            |
|  8 |     0 | T      | B          | A        | B1          |        28 | H             |         3 | C            |
|  9 |     1 | T      | B          | B        | B1          |        27 | H             |         2 | C            |
| 10 |     2 | I      | B          | C        | B1          |        25 | H             |       nan | C            |
| 11 |     3 | T      | B          | D        | B3          |        22 | H             |         9 | F            |
| 12 |     4 | T      | B          | E        | B3          |        18 | H             |         5 | F            |
| 13 |     5 | I      | B          | F        | B3          |        13 | H             |       nan | F            |
| 14 |     6 | T      | B          | G        | B5          |         7 | H             |         7 | H            |
| 15 |     7 | T      | B          | H        | B5          |       nan | H             |       nan | H            |
+----+-------+--------+------------+----------+-------------+-----------+---------------+-----------+--------------+

编辑:

预期输出为:

+----+--------+-------+------------+----------+-------------+-----------+---------------+-----------+--------------+--------------+-----------------------------+
|    | type   |   eta | Trip_Key   | Origin   | Inter_Key   |   ETA_Sum | Destination   |   ETI_Sum | Inter_Dest   |   Inter_Time | Index ETA_Sum Subtraction   |
|----+--------+-------+------------+----------+-------------+-----------+---------------+-----------+--------------+--------------+-----------------------------|
|  0 | T      |     0 | A          | A        | A1          |        28 | H             |         3 | C            |            0 | 0                           |
|  1 | T      |     1 | A          | B        | A1          |        27 | H             |         2 | C            |            0 | 0                           |
|  2 | I      |     2 | A          | C        | A1          |        25 | H             |         0 | C            |           12 | 2-5                         |
|  3 | T      |     3 | A          | D        | A3          |        22 | H             |         9 | F            |            0 | 0                           |
|  4 | T      |     4 | A          | E        | A3          |        18 | H             |         5 | F            |            0 | 0                           |
|  5 | I      |     5 | A          | F        | A3          |        13 | H             |         0 | F            |           13 | 5-7                         |
|  6 | T      |     6 | A          | G        | A5          |         7 | H             |         7 | H            |            0 | 0                           |
|  7 | T      |     7 | A          | H        | A5          |         0 | H             |         0 | H            |            0 | 0                           |
|  8 | T      |     0 | B          | A        | B1          |        28 | H             |         3 | C            |            0 | 0                           |
|  9 | T      |     1 | B          | B        | B1          |        27 | H             |         2 | C            |            0 | 0                           |
| 10 | I      |     2 | B          | C        | B1          |        25 | H             |         0 | C            |           12 | 10-13                       |
| 11 | T      |     3 | B          | D        | B3          |        22 | H             |         9 | F            |            0 | 0                           |
| 12 | T      |     4 | B          | E        | B3          |        18 | H             |         5 | F            |            0 | 0                           |
| 13 | I      |     5 | B          | F        | B3          |        13 | H             |         0 | F            |           13 | 13-15                       |
| 14 | T      |     6 | B          | G        | B5          |         7 | H             |         7 | H            |            0 | 0                           |
| 15 | T      |     7 | B          | H        | B5          |         0 | H             |         0 | H            |            0 | 0                           |
+----+--------+-------+------------+----------+-------------+-----------+---------------+-----------+--------------+--------------+-----------------------------+

注意:标有“Index ETA_Sum Subtraction”的列仅供说明之用。

1 个答案:

答案 0 :(得分:0)

如果我理解你的问题,你可以做类似

的事情
In [37]: df.groupby('Trip_Key').apply(lambda x: x[x.Inter_Key.str[-1] == '1'].iloc[-1].ETA_Sum - x[x.Inter_Key.str[-1] == '3'].iloc[-1].ETA_Sum)
Out[37]:
Trip_Key
A    12.0
B    12.0
dtype: float64