使用pandas聚合函数按空格连接字符串

时间:2019-05-31 00:52:59

标签: python pandas dataframe

我有一个CSV文件,其中包含具有相似ID的行。我发现使用数据框执行此操作的一种不错的方法,并且我从this帖子中找到了执行此操作的代码。

示例CSv文件:

    id               messages
0   11  I am not driving home
1   11      Please pick me up
2   11     I don't have money
3  103   The car already park
4  103     No need for ticket
5  104       I will buy a car
6  104       I will buy a car

期望输出为:

示例CSv文件:

id   messages        
011   I am not driving home Please pick me up I don't have money     
103   The car already park No need for ticket         
104   I will buy a car              

现在我到目前为止的代码是:

aggregation_functions = {'message':'sum'}
df_new = df.groupby(df['id']).aggregate(aggregation_functions)

现在我得到的这段代码是:

id   messages        
011   I am not driving homePlease pick me upI don't have money      
103   The car already parkNo need for ticket         
104   I will buy a car 

我只想在单词之间留空格(例如“ homePlease”>“ home Please”),并避免重复,例如两次I will buy a car

我已经检查了帖子2,但找不到答案。

我还需要在.reindex(columns=df.columns)之后使用aggregate(aggregation_functions)

赞:

df_new = df.groupby(df['id']).aggregate(aggregation_functions).reindex(columns=df.columns)

3 个答案:

答案 0 :(得分:2)

您最好将applyjoin结合使用:

>>> df
    id               messages
0   11  I am not driving home
1   11      Please pick me up
2   11     I don't have money
3  103   The car already park
4  103     No need for ticket
5  104       I will buy a car
6  104       I will buy a car

>>> df.groupby('id')['messages'].apply(lambda x: ' '.join(x))
id
11     I am not driving home Please pick me up I don'...
103              The car already park No need for ticket
104                    I will buy a car I will buy a car
Name: messages, dtype: object

答案 1 :(得分:2)

要删除冗余,我建议在GroupBy.unique之后加上str.join

df.groupby('id')['messages'].unique().str.join(' ')

或者,将GroupBy.aggset + ' '.join结合使用:

df.groupby('id')['messages'].agg(lambda x: ' '.join(set(x)))

两个都打印

# id
# 11     I don't have money I am not driving home Pleas...
# 103              No need for ticket The car already park
# 104                                     I will buy a car
# Name: messages, dtype: object

要返回DataFrame,请在末尾调用reset_index ...

df.groupby('id')['messages'].unique().str.join(' ').reset_index()

#     id                                           messages
# 0   11  I am not driving home Please pick me up I don'...
# 1  103            The car already park No need for ticket
# 2  104                                   I will buy a car

答案 2 :(得分:2)

所以首先是drop_duplicatesagg join

df.drop_duplicates().groupby('id',as_index=False).messages.agg(' '.join)