熊猫将组转换为行

时间:2020-09-04 14:10:57

标签: python pandas group-by

示例df:

df = pd.DataFrame({
    'id': ['1', '1', '2', '2', '2', '2', '3', '3', '3', '3', '3', '3'],
    'dialog': ['answer1', 'answer2', 'answer1', 'answer2', 'answer3', 'answer4', 'answer1', 'answer2', 'answer3', 'answer4', 'answer5', 'answer6']
})

我想按id将其分组,然后将每对答案转换为行(组中的答案数始终为偶数),却不知道该怎么做:

id phrase1 phrase2
1  answer1 answer2
2  answer1 answer2
2  answer3 answer4
3  answer1 answer2
3  answer3 answer4
3  answer5 answer6

2 个答案:

答案 0 :(得分:3)

您可以尝试:

(df.set_index(['id', df.index // 2, (df.index % 2) + 1])['dialog']
   .unstack()
   .add_prefix('phrase')
   .reset_index(level=1, drop=True))

输出:

    phrase1  phrase2
id                  
1   answer1  answer2
2   answer1  answer2
2   answer3  answer4
3   answer1  answer2
3   answer3  answer4
3   answer5  answer6

答案 1 :(得分:2)

由于其始终为偶数,因此您可以通过切片简单地连接它们:

df = df.set_index("id")

print (pd.concat([df.iloc[::2],df.iloc[1::2]],ignore_index=True, axis=1)
         .rename(columns={0:"phrase1",1:"phrase2"}))

    phrase1  phrase2
id                  
1   answer1  answer2
2   answer1  answer2
2   answer3  answer4
3   answer1  answer2
3   answer3  answer4
3   answer5  answer6

对于不均匀的df:

s = df.groupby(["id", df.index//2], as_index=False).agg(list)

print (pd.concat([s, pd.DataFrame(s["dialog"].tolist())], axis=1).drop("dialog", 1))

  id        0        1
0  1  answer1  answer2
1  2  answer1  answer2
2  2  answer3  answer4
3  3  answer1  answer2
4  3  answer3  answer4
5  3  answer5  answer6
6  3  answer7     None