熊猫DF重组

时间:2020-04-29 18:16:59

标签: python pandas dataframe multi-index

我正试图重组熊猫df。我有股票代码名称为“ SPY”,“ JPM”,“ AAPL”,“ GLD”的列,并且每列都有调整后的收盘数据。我正在索引日期。我要创建一个多索引df,其中以名称为第一级,以日期为第二级。我以一种难看的回旋方式完成了此任务,但很好奇我是否可以使用支点轴或其他方法来完成此任务。我一直在浏览PD的一般功能和PD df重塑文档,但对于解决该问题,我似乎一点都没联系。

这是我如何完成此任务的,但是看起来很脏,我想知道是否有一种更干净的方法来完成此任务。

>>>sym_df = get_data(symbol, pd.date_range(sd, ed))  # automatically adds SPY
>>>print(sym_df)

               SPY    JPM    AAPL     GLD
2010-01-04  108.27  40.87  213.10  109.80
2010-01-05  108.56  41.67  213.46  109.70
2010-01-06  108.64  41.89  210.07  111.51
2010-01-07  109.10  42.72  209.68  110.82
2010-01-08  109.46  42.62  211.07  111.37
...            ...    ...     ...     ...
2011-12-23  125.19  32.84  401.61  156.31
2011-12-27  125.29  32.31  404.79  154.91
2011-12-28  123.64  31.94  400.92  151.03
2011-12-29  124.92  32.69  403.39  150.34
2011-12-30  124.31  32.53  403.27  151.99

[504 rows x 4 columns]

>>>data = {}  
>>>for sym in sym_df.columns:
>>>  sym_df = sym_df.rename(columns={sym: 'Adj_Close_Price'})
>>>  data[sym] = sym_df['Adj_Close_Price']
>>>  sym_df = sym_df.drop(['Adj_Close_Price'], axis=1)
>>>df = pd.concat(data.values(), keys=data.keys())
>>>df = df.reset_index()
>>>df = df.rename(columns={'level_0': 'Symbol', 'level_1': "Date"})
>>>df.set_index(['Symbol', 'Date'], inplace=True)
>>>df.sort_index(inplace=True)

>>>df = df.fillna(method='ffill')
>>>df = df.fillna(method='bfill')
>>>print(df)
                   Adj_Close_Price
Symbol Date
AAPL   2010-01-04           213.10
       2010-01-05           213.46
       2010-01-06           210.07
       2010-01-07           209.68
       2010-01-08           211.07
...                            ...
SPY    2011-12-23           125.19
       2011-12-27           125.29
       2011-12-28           123.64
       2011-12-29           124.92
       2011-12-30           124.31

[2016 rows x 1 columns] 

2 个答案:

答案 0 :(得分:0)

您可以尝试:

import pandas as pd
import io

data_string = """DATE;SPY;JPM;AAPL;GLD
2010-01-04;108.27;40.87;213.10;109.80
2010-01-04;108.56;41.67;213.46;109.70
2010-01-05;108.64;41.89;210.07;111.51
2010-01-05;109.10;42.72;209.68;110.82
2010-01-06;109.46;42.62;211.07;111.37
2011-12-23;125.19;32.84;401.61;156.31
2011-12-23;125.29;32.31;404.79;154.91
2011-12-28;123.64;31.94;400.92;151.03
2011-12-29;124.92;32.69;403.39;150.34
2011-12-30;124.31;32.53;403.27;151.99
"""

data = io.StringIO(data_string)
df = pd.read_csv(data, sep=";")

df = pd.melt(df, id_vars=['DATE'], value_vars=['SPY', 'JPM', 'AAPL', 'GLD'], var_name='STOCK', value_name='CLOSE')
df['DATE'] = pd.DatetimeIndex(df['DATE'])
df.set_index(['STOCK', 'DATE'], inplace=True)

print(df)

结果:

                   CLOSE
STOCK DATE              
SPY   2010-01-04  108.27
      2010-01-04  108.56
      2010-01-05  108.64
      2010-01-05  109.10
      2010-01-06  109.46
      2011-12-23  125.19
      2011-12-23  125.29
      2011-12-28  123.64
      2011-12-29  124.92
      2011-12-30  124.31
JPM   2010-01-04   40.87
      2010-01-04   41.67
      2010-01-05   41.89
      2010-01-05   42.72
      2010-01-06   42.62
      2011-12-23   32.84
      2011-12-23   32.31
      2011-12-28   31.94
      2011-12-29   32.69
      2011-12-30   32.53
AAPL  2010-01-04  213.10
      2010-01-04  213.46
      2010-01-05  210.07
      2010-01-05  209.68
      2010-01-06  211.07
      2011-12-23  401.61
      2011-12-23  404.79
      2011-12-28  400.92
      2011-12-29  403.39
      2011-12-30  403.27
GLD   2010-01-04  109.80
      2010-01-04  109.70
      2010-01-05  111.51
      2010-01-05  110.82
      2010-01-06  111.37
      2011-12-23  156.31
      2011-12-23  154.91
      2011-12-28  151.03
      2011-12-29  150.34
      2011-12-30  151.9

答案 1 :(得分:0)

假设“数据”是您的初始df(索引上带有日期的数字)

data['Date'] = data.index
data = data.reset_index(level = 0)

data = pd.melt(data, id_vars = ['Date'], value_vars=['SPY', 'JPM', 'AAPL', 'GLD'])

data = data.set_index(['variable', 'Date'])