根据熊猫中特殊字符分隔列中的每个项目进行汇总

时间:2018-09-06 08:40:32

标签: python pandas csv aggregate

我有以下输入数据

Date        Investment Type                                    Medium
1/1/2000    Mutual Fund, Stocks, Fixed Deposit, Real Estate    Own, Online,Through Agent
1/2/2000    Mutual Fund, Stocks, Real Estate                   Own
1/3/2000    Fixed Deposit                                      Online
1/3/2000    Mutual Fund, Fixed Deposit, Real Estate            Through Agent
1/2/2000    Stocks                                             Own, Online,                            Through Agent

我函数的输入是Medium。它可以是列表的单个值。我想根据中度输入来搜索数据,然后按如下所示汇总数据。对于“中”中的值,签出什么投资类型,然后汇总每种投资类型的数据

Medium                                Investment Type           Date
Own,Online                            Mutual Fund               1/1/2000,1/2/2000 
Own,Online                            Stocks                    1/1/2000,1/2/2000
Own,Online                            Fixed Deposit             1/1/2000,1/3/2000
Own,Online                            Real Estate               1/1/2000

1 个答案:

答案 0 :(得分:2)

您可以使用:

L = ['Online','Own']
pat = '|'.join(r"\b{}\b".format(x) for x in L)
df['New_Medium'] = df.pop('Medium').str.findall('('+ pat + ')').str.join(', ')
#remove rows with empty values
df = df[df['New_Medium'].astype(bool)]

from  itertools import product
df1 = pd.DataFrame([j for i in df.apply(lambda x: x.str.split(',\s*')).values 
                      for j in product(*i)], columns=df.columns)
print (df1)
        Date Investment Type New_Medium
0   1/1/2000     Mutual Fund        Own
1   1/1/2000     Mutual Fund     Online
2   1/1/2000          Stocks        Own
3   1/1/2000          Stocks     Online
4   1/1/2000   Fixed Deposit        Own
5   1/1/2000   Fixed Deposit     Online
6   1/1/2000     Real Estate        Own
7   1/1/2000     Real Estate     Online
8   1/2/2000     Mutual Fund        Own
9   1/2/2000          Stocks        Own
10  1/2/2000     Real Estate        Own
11  1/3/2000   Fixed Deposit     Online
12  1/2/2000          Stocks        Own
13  1/2/2000          Stocks     Online

#get all combinations and aggregate join by unique values
df = df1.groupby('Investment Type').agg(lambda x: ', '.join(x.unique())).reset_index()
print (df)
  Investment Type                Date   New_Medium
0   Fixed Deposit  1/1/2000, 1/3/2000  Own, Online
1     Mutual Fund  1/1/2000, 1/2/2000  Own, Online
2     Real Estate  1/1/2000, 1/2/2000  Own, Online
3          Stocks  1/1/2000, 1/2/2000  Own, Online