我有一个这样的Pandas DataFrame:
text is_from_me 0 Happy birthday bud!!! 1 1 Thanks man! 0 2 Definitely would've come back had I thought ab... 1 3 Your good 0 4 Okay haha 1 5 Have a good one 1 6 Yea you too. What are you up to? 0 7 No hw like I'm doing all day 1 8 Just got up 1 9 Same here. I went to the football game last... 0 10 I think I saw that in your story 1 11 Win? 1 12 Lost in last second 0 13 Aw, that sucks 1 14 Means it was a good game tho? 1 15 Really good game. They were on the 1/2 yard li... 0 16 Dang 1
我正在尝试产生以下内容:
input output 0 Happy birthday bud!!! Thanks man! 2 Thanks man! Definitely would've come back had I thought ab... 3 Definitely would've come back had I thought ab... Your good 4 Your good Okay haha\nHave a good one 6 Okay haha\nHave a good one Yea you too. What are you up to? 7 Yea you too. What are you up to? No hw like I'm doing all day\nJust got up 9 No hw like I'm doing all day\nJust got up Same here. I went to the football game last... 10 Same here. I went to the football game last... I think I saw that in your story\nWin? 12 I think I saw that in your story\nWin? Lost in last second 13 Lost in last second Aw, that sucks\nMeans it was a good game tho? 15 Aw, that sucks\nMeans it was a good game tho? Really good game. They were on the 1/2 yard li... 16 Really good game. They were on the 1/2 yard li... Dang
我可以用以下代码完成一些事情:
pd.concat([df['text'].reset_index(drop=True), df['text'].shift(-1).reset_index(drop=True)], axis=1)
但是,这不会合并基于is_from_me
的文本,在该文本中,组文本与换行符(用于分隔原始字符串)结合在一起。这是一个简单的示例,可能会有多于2行被分组为一行。
我尝试了一种简单的方法来定义此分组,但是我所能管理的只是一个复杂的for循环,sorta以一种骇人的方式完成了这项工作。我可以编写一个聚合函数来为我完成此任务吗?
答案 0 :(得分:1)
使用-
input_ = df.groupby((df.is_from_me != df.is_from_me.shift()).cumsum())['text'].apply(lambda x: '\n'.join(x))
output = input_.shift(-1)
pd.concat([input_, output], axis=1)
输出
text text
is_from_me
1 Happy birthday bud!!! Thanks man!
2 Thanks man! Definitely would've come back had I thought ab...
3 Definitely would've come back had I thought ab... Your good
4 Your good Okay haha\nHave a good one
5 Okay haha\nHave a good one Yea you too. What are you up to?
6 Yea you too. What are you up to? No hw like I'm doing all day\nJust got up
7 No hw like I'm doing all day\nJust got up Same here. I went to the football game last...
8 Same here. I went to the football game last... I think I saw that in your story\nWin?
9 I think I saw that in your story\nWin? Lost in last second
10 Lost in last second Aw. that sucks\nMeans it was a good game tho?
11 Aw. that sucks\nMeans it was a good game tho? Really good game. They were on the 1/2 yard li...
12 Really good game. They were on the 1/2 yard li... Dang
13 Dang NaN
答案 1 :(得分:1)
您可以使用pd.groupby
。输出看起来很丑,但是应该是您所需要的
a = df.groupby([df.is_from_me.diff().ne(0).cumsum()]).agg(lambda x: tuple(x))
a['output'] = a['text']
a['input'] = a.shift()['text']
输出
input \
is_from_me
1 NaN
2 (Happy birthday bud!!!,)
3 (Thanks man!,)
4 (Definitely would've come back had I thought a...
5 (Your good,)
6 (Okay haha, Have a good one)
7 (Yea you too. What are you up to?,)
8 (No hw like I'm doing all day, Just got up)
9 (Same here. I went to the football game last...,)
10 (I think I saw that in your story, Win?)
11 (Lost in last second,)
12 (Aw, that sucks, Means it was a good game tho?)
13 (Really good game. They were on the 1/2 yard l...
output
is_from_me
1 (Happy birthday bud!!!,)
2 (Thanks man!,)
3 (Definitely would've come back had I thought a...
4 (Your good,)
5 (Okay haha, Have a good one)
6 (Yea you too. What are you up to?,)
7 (No hw like I'm doing all day, Just got up)
8 (Same here. I went to the football game last...,)
9 (I think I saw that in your story, Win?)
10 (Lost in last second,)
11 (Aw, that sucks, Means it was a good game tho?)
12 (Really good game. They were on the 1/2 yard l...
13 (Dang,)