Question

我有一个客户数据库，因为保密不能共享，但这里有一个例子：

<头>

ID	姓名	销售	电子邮件	等
01	巴勃罗	1000 美元	pablo@pablo.com	------
02	巴勃罗	1000 美元	pablo@pablo.com	------
03	约翰	1000 美元	john@john.com	------
04	爱德华	1000 美元	edward@edward.com	------
05	约翰	1000 美元	john@john.com	------

我正在寻找的是一个函数，它的输出看起来像这样，使用重复的电子邮件：

<头>

ID	姓名	销售	电子邮件	等
01	巴勃罗	$2000	pablo@pablo.com	------
02	约翰	$2000	john@john.com	------
03	爱德华	1000 美元	edward@edward.com	------

我试过了，但不知道问题出在哪里：

def unificate (df):
for i in df:
    for x in df:
        if i['Email'] == x['Email']:
            i['Sales'] =+ x['Sales']
            ID = x['ID']
            index = df[df['ID'] == ID].index
            df.drop(index, inplace = True)
            return df

提前致谢！！

Answer 1

我明白了：

def groupby_sum(df, group_vars, agg_var='Total', sort_var='Total'):
'''
Return: a Pandas dataframe object where rows have been gruped by a given group of columns (categorical variables). 
        The resulting dataframe will be sorted descending from highest to lowest amount of deaths and the index column will be reset.
Input parameters:
    - df -> Pandas dataframe object: a dataframe with categorical variables and an aggregation variable.
    - group_vars -> list object: a list of values with the name of a group of categorical variables (e.g.: ['Sexo', 'Edad']).
    - agg_var -> string: a string with the name of the variable to be aggregated. In this case the variable 'Total' (number of deaths) is set as default.
    - sort_var -> string: a string with the name of the variable to sort the dataframe by. In this case the variable 'Total' (number of deaths) is set as default.
'''
df = df.groupby(group_vars, as_index=False).agg({agg_var:'sum'})
df = df.sort_values(by=sort_var, ascending=False)
return df.reset_index(drop=True)

如何编写一个函数，在添加客户列销售时从数据库中删除重复的客户？通过销售保持客户的独特性

1 个答案: