我有一个数据框,该数据框是根据以下格式的Excel文件创建的:
Ticker 0 Ticker 1 Ticker 2 Delta 0 ... Gamma 1 Gamma 2 IL Var
2019-01-01 -0.0 -1.0 -1.0 0.0 ... -3.0 2.0 10 5
2019-01-02 0.0 -0.0 -1.0 -1.0 ... 0.0 0.0 10 5
2019-01-03 2.0 -1.0 1.0 0.0 ... -0.0 -2.0 10 5
2019-01-04 1.0 0.0 0.0 -1.0 ... -0.0 -1.0 10 5
2019-01-05 1.0 -1.0 -0.0 -1.0 ... -0.0 -1.0 10 5
2019-01-06 2.0 1.0 1.0 -1.0 ... 0.0 0.0 10 5
鉴于在每个日期,股票代码 ticker i 上的数据都对应于 Delta i 和 Gamma i 上的数据,所以我希望制作以下形式的表格:
Date Ticker Delta Gamma IL Var
2019-01-01 NaN NaN NaN 10 5
2019-01-01 NaN NaN NaN 10 5
2019-01-01 NaN NaN NaN 10 5
2019-01-01 NaN NaN NaN 10 5
2019-01-01 NaN NaN NaN 10 5
2019-01-01 NaN NaN NaN 10 5
2019-01-02 NaN NaN NaN 10 5
2019-01-02 NaN NaN NaN 10 5
.
.
.
2019-01-03 NaN NaN NaN 10 5
.
.
.
.
2019-01-04 NaN NaN NaN 10 5
2019-01-05 NaN NaN NaN 10 5
2019-01-06 NaN NaN NaN 10 5
我尝试使用pd.melt()
方法,但是我不知道如何使日期多次出现...
为了重新创建类似的数据框,我使用了代码:
import pandas as pd
import numpy as np
l=[]
for i in range(3):
l.append('Ticker ' + str(i))
for i in range(3):
l.append('Delta ' + str(i))
for i in range(3):
l.append('Gamma ' + str(i))
dates = pd.date_range('20190101', periods=6)
data = np.random.randn(6, len(l))
df = pd.DataFrame(data.round(0), index = dates, columns = l)
df['IL']=10
df['Var']=5
df
Out[9]:
Ticker 0 Ticker 1 Ticker 2 Delta 0 ... Gamma 1 Gamma 2 IL Var
2019-01-01 -0.0 -1.0 -1.0 0.0 ... -3.0 2.0 10 5
2019-01-02 0.0 -0.0 -1.0 -1.0 ... 0.0 0.0 10 5
2019-01-03 2.0 -1.0 1.0 0.0 ... -0.0 -2.0 10 5
2019-01-04 1.0 0.0 0.0 -1.0 ... -0.0 -1.0 10 5
2019-01-05 1.0 -1.0 -0.0 -1.0 ... -0.0 -1.0 10 5
2019-01-06 2.0 1.0 1.0 -1.0 ... 0.0 0.0 10 5
[6 rows x 11 columns]
非常感谢您的帮助。
答案 0 :(得分:1)
好像您正在从宽格式转换为纵向格式。试试
df.reset_index(inplace = True)
df = pd.wide_to_long(df, ['Ticker', 'Delta', 'Gamma'], i = 'index', j = 'timepoint', sep = " ")
其中变量的存根名称为['Ticker', 'Delta', 'Gamma']
,您正在根据它们的日期标识行,并且时间点为0、1、2。
Out[19]:
Var IL Ticker Delta Gamma
index timepoint
2019-01-01 0 5 10 -2.0 -1.0 -0.0
2019-01-02 0 5 10 0.0 -0.0 1.0
2019-01-03 0 5 10 -1.0 -0.0 -2.0
2019-01-04 0 5 10 1.0 -0.0 -1.0
2019-01-05 0 5 10 -1.0 -1.0 -1.0
2019-01-06 0 5 10 2.0 -1.0 -1.0
2019-01-01 1 5 10 0.0 1.0 -1.0
2019-01-02 1 5 10 1.0 -1.0 2.0
2019-01-03 1 5 10 -1.0 -0.0 -0.0
2019-01-04 1 5 10 0.0 1.0 0.0
2019-01-05 1 5 10 0.0 1.0 2.0
2019-01-06 1 5 10 1.0 1.0 -0.0
2019-01-01 2 5 10 -0.0 -2.0 0.0
2019-01-02 2 5 10 -1.0 -2.0 -0.0
2019-01-03 2 5 10 -1.0 1.0 -1.0
2019-01-04 2 5 10 0.0 2.0 -1.0
2019-01-05 2 5 10 -0.0 2.0 1.0
2019-01-06 2 5 10 -2.0 1.0 1.0
添加
df.sort_values(by=['index', 'timepoint']).reset_index()
要按日期和时间点排序,然后使用reset_index()
将其返回到列。
答案 1 :(得分:0)
主要问题是Ticker 0 Ticker 1 Ticker 2 Delta 0 ... Gamma 1 Gamma 2
本质上multi-index
伪装成字符串。 Ticker 0
带有两个需要分开的标签Ticker
和0
。参见下面的代码,
df2 = df.set_index(['IL','Var'],append=True) # IL and Var are not multiindex and need to be separated.
df2.columns = df2.columns.str.split(' ',expand=True) # Spilt Ticker/Gamma/Delta and 0/1/2
df2.stack().reset_index(['IL','Var']) # Melting