我有以下数据类型:
id point 1 point 2 count Time
018 Paris London 01 2016-05-20 10:50:00
015 Paris London 01 2016-05-19 11:50:00
002 Prague Munich 15 2016-05-18 17:55:00
003 Frankfurt London 01 2016-05-17 21:15:00
015 London Paris 08 2016-05-21 13:50:00
003 Barcelona Vienna 15 2016-05-19 03:20:00
003 London Barcelona 01 2016-05-18 06:45:00
002 Vienna Prague 15 2016-05-19 02:45:00
我想先按id和时间对它们进行分组,
df = df.sort_values(['id','time'])
得到这些结果:
id point 1 point 2 count Time
002 Vienna Prague 15 2016-05-18 02:45:00
002 Prague Munich 15 2016-05-18 17:55:00
003 Frankfurt London 01 2016-05-17 21:15:00
003 London Barcelona 01 2016-05-18 06:45:00
003 Barcelona Vienna 15 2016-05-19 03:20:00
015 Paris London 01 2016-05-19 11:50:00
015 London Paris 08 2016-05-21 13:50:00
018 Paris London 01 2016-05-20 10:50:00
如果第一行的第2行和第二行的第1行相同,则start是第一行的第1行和第二行的结束ID第2点。[id 002]
但是,如果第一行的第2行与第二行的第1行相同并且第一行的第1行与第一行的第2行相同,则开始和结束不会改变。[id 015]
结果:
id point 1 point 2 count Time Start End
002 Vienna Prague 15 2016-05-19 02:45:00 Vienna Munich
002 Prague Munich 15 2016-05-18 17:55:00 Vienna Munich
003 Frankfurt London 01 2016-05-17 21:15:00 Frankfurt Vienna
003 London Barcelona 01 2016-05-18 06:45:00 Frankfurt Vienna
003 Barcelona Vienna 15 2016-05-19 03:20:00 Frankfurt Vienna
015 Paris London 01 2016-05-19 11:50:00 Paris London
015 London Paris 08 2016-05-21 13:50:00 London Paris
018 Paris London 01 2016-05-20 10:50:00 Paris London
我尝试了第一个条件,使用:
df = df.assign(start = np.where(df['point2'] == df['point1'].shift(),df.shift(1).point2,df.point1))
答案 0 :(得分:0)
我认为自定义函数需要numpy.roll
:
#sort values first
df = df.sort_values(['id','Time'])
#create new columns
df['Start'] = df['point 1']
df['End'] = df['point 2']
def f(x):
#roll values of point 2 and compare with point 1 per groups
#all function for scalar True if all values are True
m = (np.roll(x['point 2'].values, -1) != x['point 1']).all()
if m:
#assign first and last values
x['Start'] = x['point 1'].iat[0]
x['End'] = x['point 2'].iat[-1]
return x
#apply custom function
df = df.groupby('id').apply(f)
print (df)
id point 1 point 2 count Time Start End
7 002 Vienna Prague 15 2016-05-19 02:45:00 Vienna Munich
2 002 Prague Munich 15 2016-05-19 17:55:00 Vienna Munich
3 003 Frankfurt London 01 2016-05-17 21:15:00 Frankfurt Vienna
6 003 London Barcelona 01 2016-05-18 06:45:00 Frankfurt Vienna
5 003 Barcelona Vienna 15 2016-05-19 03:20:00 Frankfurt Vienna
1 015 Paris London 01 2016-05-19 11:50:00 Paris London
4 015 London Paris 08 2016-05-21 13:50:00 London Paris
0 018 Paris London 01 2016-05-20 10:50:00 Paris London