示例df:
df = pd.DataFrame({
'id': ['1', '1', '2', '2', '2', '2', '3', '3', '3', '3', '3', '3'],
'dialog': ['answer1', 'answer2', 'answer1', 'answer2', 'answer3', 'answer4', 'answer1', 'answer2', 'answer3', 'answer4', 'answer5', 'answer6']
})
我想按id将其分组,然后将每对答案转换为行(组中的答案数始终为偶数),却不知道该怎么做:
id phrase1 phrase2
1 answer1 answer2
2 answer1 answer2
2 answer3 answer4
3 answer1 answer2
3 answer3 answer4
3 answer5 answer6
答案 0 :(得分:3)
您可以尝试:
(df.set_index(['id', df.index // 2, (df.index % 2) + 1])['dialog']
.unstack()
.add_prefix('phrase')
.reset_index(level=1, drop=True))
输出:
phrase1 phrase2
id
1 answer1 answer2
2 answer1 answer2
2 answer3 answer4
3 answer1 answer2
3 answer3 answer4
3 answer5 answer6
答案 1 :(得分:2)
由于其始终为偶数,因此您可以通过切片简单地连接它们:
df = df.set_index("id")
print (pd.concat([df.iloc[::2],df.iloc[1::2]],ignore_index=True, axis=1)
.rename(columns={0:"phrase1",1:"phrase2"}))
phrase1 phrase2
id
1 answer1 answer2
2 answer1 answer2
2 answer3 answer4
3 answer1 answer2
3 answer3 answer4
3 answer5 answer6
对于不均匀的df:
s = df.groupby(["id", df.index//2], as_index=False).agg(list)
print (pd.concat([s, pd.DataFrame(s["dialog"].tolist())], axis=1).drop("dialog", 1))
id 0 1
0 1 answer1 answer2
1 2 answer1 answer2
2 2 answer3 answer4
3 3 answer1 answer2
4 3 answer3 answer4
5 3 answer5 answer6
6 3 answer7 None