我有以下输入数据
Date Investment Type Medium 1/1/2000 Mutual Fund, Stocks, Fixed Deposit, Real Estate Own, Online,Through Agent 1/2/2000 Mutual Fund, Stocks, Real Estate Own 1/3/2000 Fixed Deposit Online 1/3/2000 Mutual Fund, Fixed Deposit, Real Estate Through Agent 1/2/2000 Stocks Own, Online, Through Agent
我函数的输入是Medium。它可以是列表的单个值。我想根据中度输入来搜索数据,然后按如下所示汇总数据。对于“中”中的值,签出什么投资类型,然后汇总每种投资类型的数据
Medium Investment Type Date Own,Online Mutual Fund 1/1/2000,1/2/2000 Own,Online Stocks 1/1/2000,1/2/2000 Own,Online Fixed Deposit 1/1/2000,1/3/2000 Own,Online Real Estate 1/1/2000
答案 0 :(得分:2)
您可以使用:
L = ['Online','Own']
pat = '|'.join(r"\b{}\b".format(x) for x in L)
df['New_Medium'] = df.pop('Medium').str.findall('('+ pat + ')').str.join(', ')
#remove rows with empty values
df = df[df['New_Medium'].astype(bool)]
from itertools import product
df1 = pd.DataFrame([j for i in df.apply(lambda x: x.str.split(',\s*')).values
for j in product(*i)], columns=df.columns)
print (df1)
Date Investment Type New_Medium
0 1/1/2000 Mutual Fund Own
1 1/1/2000 Mutual Fund Online
2 1/1/2000 Stocks Own
3 1/1/2000 Stocks Online
4 1/1/2000 Fixed Deposit Own
5 1/1/2000 Fixed Deposit Online
6 1/1/2000 Real Estate Own
7 1/1/2000 Real Estate Online
8 1/2/2000 Mutual Fund Own
9 1/2/2000 Stocks Own
10 1/2/2000 Real Estate Own
11 1/3/2000 Fixed Deposit Online
12 1/2/2000 Stocks Own
13 1/2/2000 Stocks Online
#get all combinations and aggregate join by unique values
df = df1.groupby('Investment Type').agg(lambda x: ', '.join(x.unique())).reset_index()
print (df)
Investment Type Date New_Medium
0 Fixed Deposit 1/1/2000, 1/3/2000 Own, Online
1 Mutual Fund 1/1/2000, 1/2/2000 Own, Online
2 Real Estate 1/1/2000, 1/2/2000 Own, Online
3 Stocks 1/1/2000, 1/2/2000 Own, Online