这是我的熊猫数据框的外观。我的要求是根据User_type组合发话列,并按Chat_sequence_number排序,并按case_id和Interaction_id分组
Case_ID Interaction_ID Chat_Sequence_Number User_Type Utterances
1 123 3 Person1 are
1 123 4 Person1 you
1 123 1 Person1 Hello,
1 123 2 Person1 how
1 123 5 Person1 feeling?
1 123 6 Person2 I'm
1 123 6 Person2 fine.
有没有一种方法可以根据上述要求创建新的数据框。 我的最终输出应该像这样
案例ID互动ID用户类型话语 1 123 Person1您好,您感觉如何? 1123人1我很好。
答案 0 :(得分:0)
您可以按照以下步骤进行操作:
这项工作在下面的一行中完成
<ol>
<li>
<p>Places nicely on the same line.</p>
</li>
<li>
<p>Places nicely on the same line.</p>
</li>
</ol>
<ol class="custom">
<li><p>Places poorly on the second line.</p></li>
<li><p>Places poorly on the second line.</p> <p>Places poorly on the same line.</p></li>
<li><p>Places poorly on the second line.</p></li>
<li><p>Places poorly on the second line.</p> <p>Places poorly on the same line.</p></li>
</ol>
输出:
import pandas as pd
# Create the dataframe
df = pd.DataFrame(columns=['Case_ID','Interaction_ID','Chat_Sequence_Number','User_Type','Utterances'])
df['Utterances'] = 'are','you','Hello','how','feeling?',"I'm",'fine.'
df['User_Type'] = ['Person1']*5+['Person2']*2
df['Chat_Sequence_Number'] = 3,4,1,2,5,6,7
df['Case_ID'] = 1
df['Interaction_ID'] = 123
# Do the grouping
output = df.sort_values(['Chat_Sequence_Number']).groupby(['Case_ID','Interaction_ID','User_Type'])['Utterances'].apply(' '.join).reset_index()
print(output)