我有以下格式的csv数据文件,我想将行更改为列,但需要按库存和每个日期进行此转换。
Ticker,Indicator,Date,Value
STOCK A,ACCRUALS,3/31/2005,-10.44
STOCK A,ACCRUALS,3/31/2006,0.44
STOCK A,AE,3/31/2005,3.97
STOCK A,AE,3/31/2006,3.67
STOCK A,ASETTO,3/31/2005,0.762
STOCK A,ASETTO,3/31/2006,0.9099
输出
Ticker,Date,ACCRUALS,AE,ASETTO
STOCK A,3/31/2005,-10.44,3.97,0.762
STOCK A,3/31/2006,0.44,3.67,0.9099
答案 0 :(得分:0)
Ticker,Indicator,Date,Value
STOCK A,ACCRUALS,3/31/2005,-10.44
STOCK A,ACCRUALS,3/31/2006,0.44
STOCK A,AE,3/31/2005,3.97
STOCK A,AE,3/31/2006,3.67
STOCK A,ASETTO,3/31/2005,0.762
STOCK A,ASETTO,3/31/2006,0.9099
我们只是说您的数据位于名为df
的数据框中:
>>> import pandas as pd
>>> df = df.set_index(df['Date'])
>>> for ind in set(df['Indicator']):
... filtered_df = df[df['Indicator'] == ind]
... df[ind] = filtered_df['Value']
...
>>> cols_to_keep = ['Ticker', 'Date'] + list(set(df['Indicator']))
>>> trimmed_df = df[cols_to_keep]
>>> trimmed_df = trimmed_df.drop_duplicates()
>>> trimmed_df
Ticker Date ACCRUALS AE ASETTO
Date
3/31/2005 STOCK A 3/31/2005 -10.44 3.97 0.7620
3/31/2006 STOCK A 3/31/2006 0.44 3.67 0.9099
这应该为df['Indicator']
获取每个唯一值,并为该特定指标的df['Value']
列中的新列。
您可以使用reset_index()
将数据框的索引设置回零:
>>> trimmed_df.reset_index(drop = True)
而且,您可以执行以下操作:
,而不是使用cols_to_keep
>>> trimmed_df.drop("Indicator", axis = 1, inplace = True)