我有一个客户数据库,因为保密不能共享,但这里有一个例子:
ID | 姓名 | 销售 | 电子邮件 | 等 |
---|---|---|---|---|
01 | 巴勃罗 | 1000 美元 | pablo@pablo.com | ------ |
02 | 巴勃罗 | 1000 美元 | pablo@pablo.com | ------ |
03 | 约翰 | 1000 美元 | john@john.com | ------ |
04 | 爱德华 | 1000 美元 | edward@edward.com | ------ |
05 | 约翰 | 1000 美元 | john@john.com | ------ |
我正在寻找的是一个函数,它的输出看起来像这样,使用重复的电子邮件:
ID | 姓名 | 销售 | 电子邮件 | 等 |
---|---|---|---|---|
01 | 巴勃罗 | $2000 | pablo@pablo.com | ------ |
02 | 约翰 | $2000 | john@john.com | ------ |
03 | 爱德华 | 1000 美元 | edward@edward.com | ------ |
我试过了,但不知道问题出在哪里:
def unificate (df):
for i in df:
for x in df:
if i['Email'] == x['Email']:
i['Sales'] =+ x['Sales']
ID = x['ID']
index = df[df['ID'] == ID].index
df.drop(index, inplace = True)
return df
提前致谢!!
答案 0 :(得分:0)
我明白了:
def groupby_sum(df, group_vars, agg_var='Total', sort_var='Total'):
'''
Return: a Pandas dataframe object where rows have been gruped by a given group of columns (categorical variables).
The resulting dataframe will be sorted descending from highest to lowest amount of deaths and the index column will be reset.
Input parameters:
- df -> Pandas dataframe object: a dataframe with categorical variables and an aggregation variable.
- group_vars -> list object: a list of values with the name of a group of categorical variables (e.g.: ['Sexo', 'Edad']).
- agg_var -> string: a string with the name of the variable to be aggregated. In this case the variable 'Total' (number of deaths) is set as default.
- sort_var -> string: a string with the name of the variable to sort the dataframe by. In this case the variable 'Total' (number of deaths) is set as default.
'''
df = df.groupby(group_vars, as_index=False).agg({agg_var:'sum'})
df = df.sort_values(by=sort_var, ascending=False)
return df.reset_index(drop=True)