给出:
import pandas as pd
lis1= ('baseball', 'basketball', 'baseball', 'hockey', 'hockey', 'basketball')
lis2= ('I had lots of fun', 'This was the most boring sport', "I hit the ball hard", 'the puck went too fast', 'I scored a goal', 'the basket was broken')
pd.DataFrame({'topic':lis1, 'review':lis2})
topic review
0 baseball I had lots of fun
1 basketball This was the most boring sport
2 baseball I hit the ball hard
3 hockey the puck went too fast
4 hockey I scored a goal
5 basketball the basket was broken
我需要将此作为pd.DataFrame:
lis1= ('baseball', 'basketball', 'hockey')
lis2= ("I had lots of fun, I hit the ball hard", "This was the most boring sport, the basket was broken","the puck went too fast I scored a goal")
pd.DataFrame({'topic':lis1, 'review':lis2})
topic review
0 baseball I had lots of fun, I hit the ball hard
1 basketball This was the most boring sport, the basket was...
2 hockey the puck went too fast I scored a goal
我很困惑,因为我想要分组的列是一个字符串,并且我想将字符串组合在一起。字符串不必用逗号分隔。
答案 0 :(得分:2)
使用groupby
并通过str.join
聚合字符串:
df.groupby('topic', as_index=False).agg({'review' : ', '.join})
topic review
0 baseball I had lots of fun, I hit the ball hard
1 basketball This was the most boring sport, the basket was...
2 hockey the puck went too fast, I scored a goal
或者,groupby
并调用apply
,语法略有不同:
df.groupby('topic')['review'].apply(', '.join).reset_index()
topic review
0 baseball I had lots of fun, I hit the ball hard
1 basketball This was the most boring sport, the basket was...
2 hockey the puck went too fast, I scored a goal