在pandas数据框中合并多行并按列分组

时间:2020-03-04 23:25:02

标签: python pandas-groupby

这是我的熊猫数据框的外观。我的要求是根据User_type组合发话列,并按Chat_sequence_number排序,并按case_id和Interaction_id分组

       Case_ID    Interaction_ID  Chat_Sequence_Number User_Type        Utterances
          1          123                   3           Person1            are
          1          123                   4           Person1              you
          1          123                   1           Person1              Hello,
          1          123                   2           Person1              how
          1          123                   5           Person1              feeling?
          1          123                   6           Person2              I'm
          1          123                   6           Person2              fine.

有没有一种方法可以根据上述要求创建新的数据框。 我的最终输出应该像这样

案例ID互动ID用户类型话语 1 123 Person1您好,您感觉如何? 1123人1我很好。

1 个答案:

答案 0 :(得分:0)

您可以按照以下步骤进行操作:

  1. 按Chat_Sequence_Number排序
  2. groupby Case_ID,Interaction_ID和User_Type
  3. 使用.apply()连接字符串

这项工作在下面的一行中完成

<ol>
  <li>
    <p>Places nicely on the same line.</p>
  </li>
  <li>
    <p>Places nicely on the same line.</p>
  </li>
</ol>

<ol class="custom">
  <li><p>Places poorly on the second line.</p></li>
  <li><p>Places poorly on the second line.</p> <p>Places poorly on the same line.</p></li>
  <li><p>Places poorly on the second line.</p></li>
  <li><p>Places poorly on the second line.</p> <p>Places poorly on the same line.</p></li>
</ol>

输出:

import pandas as pd

# Create the dataframe
df = pd.DataFrame(columns=['Case_ID','Interaction_ID','Chat_Sequence_Number','User_Type','Utterances'])
df['Utterances'] = 'are','you','Hello','how','feeling?',"I'm",'fine.'
df['User_Type'] = ['Person1']*5+['Person2']*2
df['Chat_Sequence_Number'] = 3,4,1,2,5,6,7
df['Case_ID'] = 1
df['Interaction_ID'] = 123

# Do the grouping
output = df.sort_values(['Chat_Sequence_Number']).groupby(['Case_ID','Interaction_ID','User_Type'])['Utterances'].apply(' '.join).reset_index()
print(output)