我有一个看起来像这样的DataFrame:
test1 = pd.DataFrame( {
"ROUTE" : ["MIA-ORD", "MIA-AUA", "ORD-MIA", "MIA-HOU", "MIA-JFK", "JFK-MIA", "JFK-YYZ"],
"TICKET" : ["123", "345", "123", "678", "456", "345", "456"],
"COUPON" : [1,4,2,1,1,3,2],
"PAX" : ["Jessica", "Alex", "Jessica", "Jamanica", "Ernest","Alex", "Ernest"],
"PAID": [100.00,200.00,100.00,100.00,200.00,200.00,200.00]})
这给了我
ROUTE TICKET COUPON PAX PAID
0 MIA-ORD 123 1 Jessica 100.0
1 MIA-AUA 345 4 Alex 200.0
2 ORD-MIA 123 2 Jessica 100.0
3 MIA-HOU 678 1 Jamanica 100.0
4 MIA-JFK 456 1 Ernest 200.0
5 JFK-MIA 345 3 Alex 200.0
6 JFK-YYZ 456 2 Ernest 200.0
我想做的是将路线和优惠券数据合并为
ROUTE TICKET COUPON PAX PAID
0 MIA-ORD-ORD-MIA 123 1-2 Jessica 100.0
1 JFK-MIA-MIA-AUA 345 3-4 Alex 200.0
2 MIA-HOU 678 1 Jamanica 100.0
3 MIA-JFK-JFK-YYZ 456 1-2 Ernest 200.0
到目前为止,由于它具有明显的通用标识符,而且由于对“ ALEX”的航班顺序进行了倒置,因此我能够对票进行分组。
rs1 = test1.groupby(['TICKET']).apply(pd.DataFrame.sort_values,'COUPON')
此结果
ROUTE TICKET COUPON PAX PAID
TICKET
123 0 MIA-ORD 123 1 Jessica 100.0
2 ORD-MIA 123 2 Jessica 100.0
345 5 JFK-MIA 345 3 Alex 200.0
1 MIA-AUA 345 4 Alex 200.0
456 4 MIA-JFK 456 1 Ernest 200.0
6 JFK-YYZ 456 2 Ernest 200.0
678 3 MIA-HOU 678 1 Jamanica 100.0
但是从这里开始,我无法合并ROUTE和COUPON。
我尝试过:
st1=test1.groupby('TICKET').apply(lambda group: ','.join(group['ROUTE']))
但是,这只会带来合并后的单独排序的社团。而不是其余数据。
TICKET
123 MIA-ORD,ORD-MIA
345 MIA-AUA,JFK-MIA
456 MIA-JFK,JFK-YYZ
678 MIA-HOU
dtype: object
有什么想法吗?
答案 0 :(得分:3)
我们可以将groupby
与agg
结合使用,然后应用'-'.join()
:
test1['COUPON']=test1['COUPON'].astype(str)
final = test1.groupby(['TICKET', 'PAX', 'PAID']).agg({'ROUTE':'-'.join,
'COUPON':'-'.join}).reset_index()
print(final)
TICKET PAX PAID ROUTE COUPON
0 123 Jessica 100.0 MIA-ORD-ORD-MIA 1-2
1 345 Alex 200.0 MIA-AUA-JFK-MIA 4-3
2 456 Ernest 200.0 MIA-JFK-JFK-YYZ 1-2
3 678 Jamanica 100.0 MIA-HOU 1