我有一个数据框,我使用group by将它们分组,如下所示
Name Nationality age
Peter UK 28
John US 29
Wiley UK 28
Aster US 29
grouped = self_ex_df.groupby([Nationality, age])
我正在尝试这个虽然不确定它是否有效?
uniqueID = 'ID_'+ grouped.groups.keys().astype(str)
uniqueID Name Nationality age
ID_UK28 Peter UK 28
ID_US29 John US 29
ID_UK28 Wiley UK 28
ID_US29 Aster US 29
我想现在将它组合成一个新的DF,就像这样
uniqueID Nationality age Text
ID_UK28 UK 28 Peter and Whiley have a combined age of 56
ID_US_29 US 29 John and Aster have a combined age of 58
我如何实现上述目标?
答案 0 :(得分:1)
希望足够接近,无法获得平均年龄:
import pandas as pd
#create dataframe
df = pd.DataFrame({'Name': ['Peter', 'John', 'Wiley', 'Aster'], 'Nationality': ['UK', 'US', 'UK', 'US'], 'age': [28, 29, 28, 29]})
#make uniqueID
df['uniqueID'] = 'ID_' + df['Nationality'] + df['age'].astype(str)
#groupby has agg method that can take dict and preform multiple aggregations
df = df.groupby(['uniqueID', 'Nationality']).agg({'age': 'sum', 'Name': lambda x: ' and '.join(x)})
#to get text you just combine new Name and sum of age
df['Text'] = df['Name'] + ' have a combined age of ' + df['age'].astype(str)
答案 1 :(得分:1)
您不需要groupby来创建uniqueID,您可以稍后将该uniqueID分组以根据年龄和国籍获取组。我使用自定义函数来构建文本str。这是一种做法。
df1 = df.assign(uniqueID='ID_'+df.Nationality+df.age.astype(str))
def myText(x):
str = ' and '.join(x.Name)
str += ' have a combined age of {}.'.format(x.age.sum())
return str
df2 = df1.groupby(['uniqueID', 'Nationality','age']).apply(myText).reset_index().rename(columns={0:'Text'})
print(df2)
输出:
uniqueID Nationality age Text
0 ID_UK28 UK 28 Peter and Wiley have a combined age of 56.
1 ID_US29 US 29 John and Aster have a combined age of 58.