Question

所以，我是 Python 新手，我有这个包含公司名称、国家信息和活动描述的数据框。我正在尝试按名称对所有这些信息进行分组，并连接国家和活动字符串。

首先，我做了这样的事情：

df3_['Country'] = df3_.groupby(['Name', 'Activity'])['Country'].transform(lambda x: ','.join(x))

df4_ = df3_.drop_duplicates()

df4_['Activity'] = df4_.groupby(['Name', 'Country'])['Activity'].transform(lambda x: ','.join(x))

这样，我得到了一个“SettingWithCopyWarning”，所以我阅读了一些关于这个错误的信息，并在应用函数（不起作用）和使用 .loc（不起作用）之前尝试复制数据帧：

df3_.loc[:, 'Country'] = df3_.groupby(['Name', 'Activity'])['Country'].transform(lambda x: ','.join(x))

知道如何解决这个问题吗？

编辑：我被要求发布我的数据示例。第一张图是我的，第二张图应该是什么样子

Answer 1

以下应该有效，

import pandas as pd

data = {
    'Country Code': ['HK','US','SG','US','','US'],
    'Company Name': ['A','A','A','A','B','B'],
    'Activity': ['External services','Commerce','Transfer','Others','Others','External services'],
}

df = pd.DataFrame(data)

#grouping
grp = df.groupby('Company Name')

#custom function for replacing space and adding ,
def str_replace(ser):
  s = ','.join(ser.values)
  
  if s[0] == ',':
    s = s[1:]
  
  if s[len(s)-1] == ',':
    s = s[:len(s)-1]

  return s 

#using agg functions
res = grp.agg({'Country Code':str_replace,'Activity':str_replace}).reset_index()
res

输出：

Company Name    Country Code    Activity
0   A       HK,US,SG,US     External services,Commerce,Transfer,Others
1   B       US              Others,External services

Answer 2

您想按公司名称分组，然后对其他列使用一些聚合函数，例如：

df.groupby('Company Name').agg({'Country Code':', '.join, 'Activity':', '.join})

你反其道而行之。请注意，此聚合的空字符串值 ('') 会变得丑陋，因此您可以使用这样的聚合使其更加困难：

df.groupby('Company Name').agg({'Country Code':lambda x: ', '.join(filter(None,x)), 'Activity':', '.join})

Answer 3

这次使用 transform() 的另一种方法

# group the companies and concatenate the activities 
df['Activities'] = df.groupby(['Company Name'])['Activity'] \
  .transform(lambda x : ', '.join(x))

# group the companies and concatenate the country codes
df['Country Codes'] = df.groupby(['Company Name'])['Country Code'] \
  .transform(lambda x : ', '.join([i for i in x if i != '']))
# the list comprehension deals with missing country codes (that have the value '')


# take this, drop the original columns and remove all the duplicates
result = df.drop(['Activity', 'Country Code'], axis=1) \
  .drop_duplicates().reset_index(drop=True)
# reset index isn't really necessary

结果是

  Company Name                                      Activitys   Country Codes
0            A  External services, Commerce, Transfer, Others  HK, US, SG, US
1            B                      Others, External services              US

尝试在 Python 中使用 groupby 连接字符串时出错

3 个答案: