我是python新手,我很难完成一项任务,希望我的问题不是很愚蠢。
我导出了CSV文件,其中的数据组织如下例: Table example
Company City Company Country Accelerator $ Accelerator Date Angel $ Angel Date Seed $ Seed Date Series A $ Series A Date
United Kingdom 0 7/3/2017 0 1/0/1900 0.0 1/0/1900 0.0 1/0/1900
Roubaix France 0.02 9/1/2016 0 1/0/1900 0.0 1/0/1900 2.15 11/2/2015
Montpellier France 0 12/4/2014 0 1/0/1900 0.0 1/0/1900 0.0 1/0/1900
Beijing China 0 1/0/1900 0 1/0/1900 0.0 1/0/1900 16.0 2/7/2018
我需要以这种方式组织数据: enter image description here
2014 2015 2016 2017
Angel $4,690,000 $4,150,000 $16,683,000 $6,520,000
Seed $17,890,000 $35,590,000 $53,860,000 $24,700,000
Series A $49,500,000 $123,430,000 $110,810,000 $123,220,000
如果你们能帮助我,我将非常高兴!
答案 0 :(得分:0)
您可以使用:
#create MultiIndex from columns
df = df.set_index(['Company City','Company Country'])
#or remove columns
#df = df.drop(['Company City','Company Country'], axis=1)
#create MultiIndex in columns by split from right by first whitespace
df.columns = df.columns.str.rsplit(n=1, expand=True)
#reshape to 2 column df
df = df.stack(0)
#extract year by last 4 letters
df['Date'] = df['Date'].str[-4:].astype(int)
#pivoting
df = df.reset_index().pivot_table(index='level_2',columns='Date',values='$', aggfunc='sum')
print (df)
Date 1900 2014 2015 2016 2017 2018
level_2
Accelerator 0.0 0.0 NaN 0.02 0.0 NaN
Angel 0.0 NaN NaN NaN NaN NaN
Seed 0.0 NaN NaN NaN NaN NaN
Series A 0.0 NaN 2.15 NaN NaN 16.0