我在尝试删除log_returns
矩阵中的第一行时遇到困难。基本上,我想摆脱第一行,因为它有NaN值。我没有快乐地试过isnan()
,最后登上numpy.delete()
方法听起来最有希望,但仍然没有达到目的。
import pandas as pd
from pandas_datareader import data as web
import numpy as np
symbols = ['XOM', 'CVX', 'SLB', 'PXD', 'EOG', 'OXY', 'HAL', 'KMI', 'SE', 'PSX', 'VLO','COP','APC','TSO','WMB','BHI','APA','COG','DVN','MPC','NBL','CXO','NOV','HES','MRO','EQT','XEC','FTI','RRC','OKE','SWN','NFX','HP','MUR','CHK','RIG','DO']
try:
h9 = pd.HDFStore('port.h9')
data = h9['norm']
h9.close()
except:
data = pd.DataFrame()
for sym in symbols:
data[sym] = web.DataReader(sym, data_source='yahoo',
start='1/1/2010')['Adj Close']
data = data.dropna()
h9 = pd.HDFStore('port.h9')
h9['norm'] = data
h9.close()
data.info()
log_returns = np.log(data / data.shift(1))
log_returns.head()
np.delete(log_returns, 0, 0)
上面的最后一行(要删除)会抛出以下异常,因为row = 0
,location = 0
肯定不会超出log_returns
矩阵的范围,这是不合理的它的形状(1116,37)。
ValueError: Shape of passed values is (37, 1115), indices imply (37, 1116)
答案 0 :(得分:0)
演示:
In [202]: from pandas_datareader import data as web
In [218]: df = web.DataReader('XOM', 'yahoo', start='1/1/2010')['Adj Close']
In [219]: pd.options.display.max_rows = 10
In [220]: df
Out[220]:
Date
2010-01-04 57.203028
2010-01-05 57.426378
2010-01-06 57.922715
2010-01-07 57.740730
2010-01-08 57.509100
...
2016-09-12 87.290001
2016-09-13 85.209999
2016-09-14 84.599998
2016-09-15 85.080002
2016-09-16 84.029999
Name: Adj Close, dtype: float64
In [221]: np.log(df.head(10).pct_change() + 1)
Out[221]:
Date
2010-01-04 NaN
2010-01-05 0.003897
2010-01-06 0.008606
2010-01-07 -0.003147
2010-01-08 -0.004020
2010-01-11 0.011157
2010-01-12 -0.004991
2010-01-13 -0.004011
2010-01-14 0.000144
2010-01-15 -0.008214
Name: Adj Close, dtype: float64
解决方案:
In [224]: np.log(df.pct_change() + 1).dropna()
Out[224]:
Date
2010-01-05 0.003897
2010-01-06 0.008606
2010-01-07 -0.003147
2010-01-08 -0.004020
2010-01-11 0.011157
...
2016-09-12 0.005169
2016-09-13 -0.024117
2016-09-14 -0.007185
2016-09-15 0.005658
2016-09-16 -0.012418
Name: Adj Close, dtype: float64
或:
In [225]: np.log(df.pct_change() + 1).iloc[1:]
Out[225]:
Date
2010-01-05 0.003897
2010-01-06 0.008606
2010-01-07 -0.003147
2010-01-08 -0.004020
2010-01-11 0.011157
...
2016-09-12 0.005169
2016-09-13 -0.024117
2016-09-14 -0.007185
2016-09-15 0.005658
2016-09-16 -0.012418
Name: Adj Close, dtype: float64
或:
In [227]: np.log(df.pct_change() + 1).drop(df.index[0])
Out[227]:
Date
2010-01-05 0.003897
2010-01-06 0.008606
2010-01-07 -0.003147
2010-01-08 -0.004020
2010-01-11 0.011157
...
2016-09-12 0.005169
2016-09-13 -0.024117
2016-09-14 -0.007185
2016-09-15 0.005658
2016-09-16 -0.012418
Name: Adj Close, dtype: float64