Question

我有一个数据框，我使用group by将它们分组，如下所示

Name      Nationality    age
Peter     UK             28
John      US             29 
Wiley     UK             28 
Aster     US             29 

grouped = self_ex_df.groupby([Nationality, age])

现在我想为每个值附加一个唯一的ID

我正在尝试这个虽然不确定它是否有效？

uniqueID = 'ID_'+ grouped.groups.keys().astype(str)

    uniqueID    Name      Nationality    age
     ID_UK28    Peter       UK             28
     ID_US29    John        US             29 
     ID_UK28    Wiley       UK             28 
     ID_US29    Aster       US             29

我想现在将它组合成一个新的DF，就像这样

 uniqueID   Nationality    age   Text
  ID_UK28     UK           28    Peter and Whiley have a combined age of 56
  ID_US_29    US           29    John and Aster have a combined age of 58

我如何实现上述目标？

Answer 1

希望足够接近，无法获得平均年龄：

import pandas as pd

#create dataframe
df = pd.DataFrame({'Name': ['Peter', 'John', 'Wiley', 'Aster'], 'Nationality': ['UK', 'US', 'UK', 'US'], 'age': [28, 29, 28, 29]})

#make uniqueID
df['uniqueID'] = 'ID_' + df['Nationality'] + df['age'].astype(str)

#groupby has agg method that can take dict and preform multiple aggregations
df = df.groupby(['uniqueID', 'Nationality']).agg({'age': 'sum', 'Name': lambda x: ' and '.join(x)})

#to get text you just combine new Name and sum of age
df['Text'] = df['Name'] + ' have a combined age of ' + df['age'].astype(str)

Answer 2

您不需要groupby来创建uniqueID，您可以稍后将该uniqueID分组以根据年龄和国籍获取组。我使用自定义函数来构建文本str。这是一种做法。

df1 = df.assign(uniqueID='ID_'+df.Nationality+df.age.astype(str))

def myText(x):
    str = ' and '.join(x.Name)
    str += ' have a combined age of {}.'.format(x.age.sum())
    return str

df2 = df1.groupby(['uniqueID', 'Nationality','age']).apply(myText).reset_index().rename(columns={0:'Text'})
print(df2)

输出：

  uniqueID Nationality  age                                        Text
0  ID_UK28          UK   28  Peter and Wiley have a combined age of 56.
1  ID_US29          US   29   John and Aster have a combined age of 58.

如何按键将组应用于相关组

2 个答案: