如何编写一个函数,在添加客户列销售时从数据库中删除重复的客户?通过销售保持客户的独特性

时间:2021-07-07 15:56:02

标签: python pandas duplicates

我有一个客户数据库,因为保密不能共享,但这里有一个例子:

<头>
ID 姓名 销售 电子邮件
01 巴勃罗 1000 美元 pablo@pablo.com ------
02 巴勃罗 1000 美元 pablo@pablo.com ------
03 约翰 1000 美元 john@john.com ------
04 爱德华 1000 美元 edward@edward.com ------
05 约翰 1000 美元 john@john.com ------

我正在寻找的是一个函数,它的输出看起来像这样,使用重复的电子邮件:

<头>
ID 姓名 销售 电子邮件
01 巴勃罗 $2000 pablo@pablo.com ------
02 约翰 $2000 john@john.com ------
03 爱德华 1000 美元 edward@edward.com ------

我试过了,但不知道问题出在哪里:

def unificate (df):
for i in df:
    for x in df:
        if i['Email'] == x['Email']:
            i['Sales'] =+ x['Sales']
            ID = x['ID']
            index = df[df['ID'] == ID].index
            df.drop(index, inplace = True)
            return df

提前致谢!!

1 个答案:

答案 0 :(得分:0)

我明白了:

def groupby_sum(df, group_vars, agg_var='Total', sort_var='Total'):
'''
Return: a Pandas dataframe object where rows have been gruped by a given group of columns (categorical variables). 
        The resulting dataframe will be sorted descending from highest to lowest amount of deaths and the index column will be reset.
Input parameters:
    - df -> Pandas dataframe object: a dataframe with categorical variables and an aggregation variable.
    - group_vars -> list object: a list of values with the name of a group of categorical variables (e.g.: ['Sexo', 'Edad']).
    - agg_var -> string: a string with the name of the variable to be aggregated. In this case the variable 'Total' (number of deaths) is set as default.
    - sort_var -> string: a string with the name of the variable to sort the dataframe by. In this case the variable 'Total' (number of deaths) is set as default.
'''
df = df.groupby(group_vars, as_index=False).agg({agg_var:'sum'})
df = df.sort_values(by=sort_var, ascending=False)
return df.reset_index(drop=True)
相关问题