我具有以下数据结构。
+--------------------------+----------------------+
¦ test1 ¦ test2 ¦
¦--------------------------¦----------------------+
¦ https: // test.com / 123 ¦ st1 ¦
¦ https: // test.com / 123 ¦ st2 ¦
¦ https: // test.com / 1234¦ st3 ¦
¦ https: // test.com / 1234¦ st4 ¦
+----------------------+--------------------------+
我想基于相同的test1列值合并test2的值
我尝试了以下代码
import pandas as pd
test = 'test.xlsx'
df1 = pd.read_excel(test)
df_isnull_have_keywords = df1.groupby(by='test1').apply(
lambda x: [','.join('%s' % key for key in x['test2'])])
df_isnull_have_keywords.to_excel('test.xlsx')
但是在输出中,test2列为0
我不知道,请帮助我
答案 0 :(得分:1)
重置索引,就可以了:
df1.groupby('test1')['test2'].agg(list).reset_index()
输出:
test1 test2
0 https://test.com/123 [st1, st2]
1 https://test.com/1234 [st3, st4]