根据条件合并相邻行

时间:2019-12-05 13:13:25

标签: python pandas pandas-groupby

我已经尝试过有关此主题的其他文章,但似乎找不到正确的解决方案。

我有一个数据框,其中描述了由演讲者分隔的对话:

import pandas as pd
data = [[1, 'hello'], [2, 'hi there'], [1, 'how are you?'],[2, 'i am well'], [2, 'how are you?']] 
df = pd.DataFrame(data, columns = ['speaker', 'turn']) 

我想要做的是合并存在相同扬声器标签的相邻行。换句话说,我希望能够合并最后两行,因为它们都应计为同一会话回合。

data = [[1, 'hello'], [2, 'hi there'], [1, 'how are you?'],[2, 'i am well', 'how are you?']

我怀疑答案与groupby函数有关,但到目前为止,我尝试使其工作仍未奏效。

3 个答案:

答案 0 :(得分:3)

在熊猫中,字符串处理不当;这些操作可能看起来是矢量的,但实际上不是。无论如何,您只想在此阶段汇总列表,并且该格式也不适合您期望标量值的df。使用itertools.groupby

import itertools

from operator import itemgetter


data = [[1, 'hello'], [2, 'hi there'], [1, 'how are you?'],[2, 'i am well'], 
        [2, 'how are you?']] 

rebuilt_list = []
for speaker, comment_group in itertools.groupby(data, itemgetter(0)):

    comments = [speaker] # To make sure you have the speaker id as first value

    for comment in comment_group:
        comments.extend(comment[1:])

    rebuilt_list.append(comments)

答案 1 :(得分:2)

熊猫的另一种实现方式:

services.AddScoped<IParser, EventCounterParser>();
services.AddScoped<IParser, EventLevelParser>();
services.AddScoped<EventHandlerFactory>();

答案 2 :(得分:1)

IIUC,

# get all occurrences where speaker is eq to above and below row.
s = df['speaker'].eq(df['speaker'].shift()) | df['speaker'].eq(df['speaker'].shift(-1))
# filter out the above rows and concat the frame with a groupby
print(
     pd.concat(
            [
                df.loc[~s],
                df.loc[s]
                .groupby("speaker")["turn"]
                .apply(lambda x: ",".join(x))
                .reset_index(),
            ]).reset_index())
结果
     speaker                  turn
0        1                   hello
1        2                hi there
2        1            how are you?
3        2  i am well,how are you?

您可以编辑应用以匹配所需的结果。 (如果要在逗号后留空格)

由于使用了Apply,因此不适用于大型数据集。