我有一个问题,保留一些数据不重复,并希望将有价值的信息添加到数据框中的新列。
import pandas as pd
data = {'id':[1,2,2,3],'company':[1,2,2,1],'bank':['a', 'x', 'y', 'a'],
'round': ['seed', 'seed', 'seed', 'series a'], 'funding': [100, 200, 200, 300],
'date': ['2006-12-01', '2004-09-01', '2004-09-01', '2007-05-01']}
df = pd.DataFrame(data, columns = ['id','company', 'round', 'bank', 'funding', 'date'])
print df
收率:
id company round bank funding date
0 1 1 seed a 100 2006-12-01
1 2 2 seed x 200 2004-09-01
2 2 2 seed y 200 2004-09-01
3 3 1 series a a 300 2007-05-01
期望的输出:
company round_0 bank_0 funding_0 date_0 round_1 bank_1 funding_1 date_1
0 1 seed a 100 2006-12-01 series a a 300 2007-05-01
1 2 seed [x, y] 200 2004-09-01 None None None None
我认为枢轴/融合可能会起作用?与groupby('公司','回合')一起?
此外,我愿意改变新银行列表(即字典)的dtype。并且,整数可以用“圆形”列信息替换。制作专栏:
company, bank_seed, funding_seed, date_seed, bank_series_a, funding_series_a, date_series_a