我正试图重组熊猫df。我有股票代码名称为“ SPY”,“ JPM”,“ AAPL”,“ GLD”的列,并且每列都有调整后的收盘数据。我正在索引日期。我要创建一个多索引df,其中以名称为第一级,以日期为第二级。我以一种难看的回旋方式完成了此任务,但很好奇我是否可以使用支点轴或其他方法来完成此任务。我一直在浏览PD的一般功能和PD df重塑文档,但对于解决该问题,我似乎一点都没联系。
这是我如何完成此任务的,但是看起来很脏,我想知道是否有一种更干净的方法来完成此任务。
>>>sym_df = get_data(symbol, pd.date_range(sd, ed)) # automatically adds SPY
>>>print(sym_df)
SPY JPM AAPL GLD
2010-01-04 108.27 40.87 213.10 109.80
2010-01-05 108.56 41.67 213.46 109.70
2010-01-06 108.64 41.89 210.07 111.51
2010-01-07 109.10 42.72 209.68 110.82
2010-01-08 109.46 42.62 211.07 111.37
... ... ... ... ...
2011-12-23 125.19 32.84 401.61 156.31
2011-12-27 125.29 32.31 404.79 154.91
2011-12-28 123.64 31.94 400.92 151.03
2011-12-29 124.92 32.69 403.39 150.34
2011-12-30 124.31 32.53 403.27 151.99
[504 rows x 4 columns]
>>>data = {}
>>>for sym in sym_df.columns:
>>> sym_df = sym_df.rename(columns={sym: 'Adj_Close_Price'})
>>> data[sym] = sym_df['Adj_Close_Price']
>>> sym_df = sym_df.drop(['Adj_Close_Price'], axis=1)
>>>df = pd.concat(data.values(), keys=data.keys())
>>>df = df.reset_index()
>>>df = df.rename(columns={'level_0': 'Symbol', 'level_1': "Date"})
>>>df.set_index(['Symbol', 'Date'], inplace=True)
>>>df.sort_index(inplace=True)
>>>df = df.fillna(method='ffill')
>>>df = df.fillna(method='bfill')
>>>print(df)
Adj_Close_Price
Symbol Date
AAPL 2010-01-04 213.10
2010-01-05 213.46
2010-01-06 210.07
2010-01-07 209.68
2010-01-08 211.07
... ...
SPY 2011-12-23 125.19
2011-12-27 125.29
2011-12-28 123.64
2011-12-29 124.92
2011-12-30 124.31
[2016 rows x 1 columns]
答案 0 :(得分:0)
您可以尝试:
import pandas as pd
import io
data_string = """DATE;SPY;JPM;AAPL;GLD
2010-01-04;108.27;40.87;213.10;109.80
2010-01-04;108.56;41.67;213.46;109.70
2010-01-05;108.64;41.89;210.07;111.51
2010-01-05;109.10;42.72;209.68;110.82
2010-01-06;109.46;42.62;211.07;111.37
2011-12-23;125.19;32.84;401.61;156.31
2011-12-23;125.29;32.31;404.79;154.91
2011-12-28;123.64;31.94;400.92;151.03
2011-12-29;124.92;32.69;403.39;150.34
2011-12-30;124.31;32.53;403.27;151.99
"""
data = io.StringIO(data_string)
df = pd.read_csv(data, sep=";")
df = pd.melt(df, id_vars=['DATE'], value_vars=['SPY', 'JPM', 'AAPL', 'GLD'], var_name='STOCK', value_name='CLOSE')
df['DATE'] = pd.DatetimeIndex(df['DATE'])
df.set_index(['STOCK', 'DATE'], inplace=True)
print(df)
结果:
CLOSE
STOCK DATE
SPY 2010-01-04 108.27
2010-01-04 108.56
2010-01-05 108.64
2010-01-05 109.10
2010-01-06 109.46
2011-12-23 125.19
2011-12-23 125.29
2011-12-28 123.64
2011-12-29 124.92
2011-12-30 124.31
JPM 2010-01-04 40.87
2010-01-04 41.67
2010-01-05 41.89
2010-01-05 42.72
2010-01-06 42.62
2011-12-23 32.84
2011-12-23 32.31
2011-12-28 31.94
2011-12-29 32.69
2011-12-30 32.53
AAPL 2010-01-04 213.10
2010-01-04 213.46
2010-01-05 210.07
2010-01-05 209.68
2010-01-06 211.07
2011-12-23 401.61
2011-12-23 404.79
2011-12-28 400.92
2011-12-29 403.39
2011-12-30 403.27
GLD 2010-01-04 109.80
2010-01-04 109.70
2010-01-05 111.51
2010-01-05 110.82
2010-01-06 111.37
2011-12-23 156.31
2011-12-23 154.91
2011-12-28 151.03
2011-12-29 150.34
2011-12-30 151.9
答案 1 :(得分:0)
假设“数据”是您的初始df(索引上带有日期的数字)
data['Date'] = data.index
data = data.reset_index(level = 0)
data = pd.melt(data, id_vars = ['Date'], value_vars=['SPY', 'JPM', 'AAPL', 'GLD'])
data = data.set_index(['variable', 'Date'])