我正在使用Python3和pandas版本'0.19.2'。
我有一只熊猫df如下:
chat_id line
1 'Hi.'
1 'Hi, how are you?.'
1 'I'm well, thanks.'
2 'Is it going to rain?.'
2 'No, I don't think so.'
我想按'chat_id'进行分组,然后在'line'上执行滚动总和以获取以下内容:
chat_id line conversation
1 'Hi.' 'Hi.'
1 'Hi, how are you?.' 'Hi. Hi, how are you?.'
1 'I'm well, thanks.' 'Hi. Hi, how are you?. I'm well, thanks.'
2 'Is it going to rain?.' 'Is it going to rain?.'
2 'No, I don't think so.' 'Is it going to rain?. No, I don't think so.'
我相信df.groupby('chat_id')['line']。cumsum()只适用于数字列。
我也尝试过df.groupby(by = ['chat_id'],as_index = False)['line']。apply(list)获取完整对话中所有行的列表,但是我可以弄清楚如何解压缩该列表以创建'滚动总和'样式的对话列。
答案 0 :(得分:1)
对我来说apply
与Series.cumsum
合作,如果需要分隔符添加space
:
df['new'] = df.groupby('chat_id')['line'].apply(lambda x: (x + ' ').cumsum().str.strip())
print (df)
chat_id line new
0 1 Hi. Hi.
1 1 Hi, how are you?. Hi. Hi, how are you?.
2 1 I'm well, thanks. Hi. Hi, how are you?. I'm well, thanks.
3 2 Is it going to rain?. Is it going to rain?.
4 2 No, I don't think so. Is it going to rain?. No, I don't think so.
df['line'] = df['line'].str.strip("'")
df['new'] = df.groupby('chat_id')['line'].apply(lambda x: "'" + (x + ' ').cumsum().str.strip() + "'")
print (df)
chat_id line \
0 1 Hi.
1 1 Hi, how are you?.
2 1 I'm well, thanks.
3 2 Is it going to rain?.
4 2 No, I don't think so.
new
0 'Hi.'
1 'Hi. Hi, how are you?.'
2 'Hi. Hi, how are you?. I'm well, thanks.'
3 'Is it going to rain?.'
4 'Is it going to rain?. No, I don't think so.'