Question

我已经尝试过有关此主题的其他文章，但似乎找不到正确的解决方案。

我有一个数据框，其中描述了由演讲者分隔的对话：

import pandas as pd
data = [[1, 'hello'], [2, 'hi there'], [1, 'how are you?'],[2, 'i am well'], [2, 'how are you?']] 
df = pd.DataFrame(data, columns = ['speaker', 'turn'])

我想要做的是合并存在相同扬声器标签的相邻行。换句话说，我希望能够合并最后两行，因为它们都应计为同一会话回合。

data = [[1, 'hello'], [2, 'hi there'], [1, 'how are you?'],[2, 'i am well', 'how are you?']

我怀疑答案与groupby函数有关，但到目前为止，我尝试使其工作仍未奏效。

Answer 1

在熊猫中，字符串处理不当；这些操作可能看起来是矢量的，但实际上不是。无论如何，您只想在此阶段汇总列表，并且该格式也不适合您期望标量值的df。使用itertools.groupby

import itertools

from operator import itemgetter


data = [[1, 'hello'], [2, 'hi there'], [1, 'how are you?'],[2, 'i am well'], 
        [2, 'how are you?']] 

rebuilt_list = []
for speaker, comment_group in itertools.groupby(data, itemgetter(0)):

    comments = [speaker] # To make sure you have the speaker id as first value

    for comment in comment_group:
        comments.extend(comment[1:])

    rebuilt_list.append(comments)

Answer 2

熊猫的另一种实现方式：

services.AddScoped<IParser, EventCounterParser>();
services.AddScoped<IParser, EventLevelParser>();
services.AddScoped<EventHandlerFactory>();

Answer 3

IIUC，

# get all occurrences where speaker is eq to above and below row.
s = df['speaker'].eq(df['speaker'].shift()) | df['speaker'].eq(df['speaker'].shift(-1))
# filter out the above rows and concat the frame with a groupby
print(
     pd.concat(
            [
                df.loc[~s],
                df.loc[s]
                .groupby("speaker")["turn"]
                .apply(lambda x: ",".join(x))
                .reset_index(),
            ]).reset_index())

结果

     speaker                  turn
0        1                   hello
1        2                hi there
2        1            how are you?
3        2  i am well,how are you?

您可以编辑应用以匹配所需的结果。（如果要在逗号后留空格）

由于使用了Apply，因此不适用于大型数据集。

根据条件合并相邻行

3 个答案: