我很想知道版本0.15.2和0.17之间的Pandas更改会影响分配Dataframe列的行为。以前的功能正常:
import numpy as np
import pandas as pd
initial_capital = 10000
short_window = 30
long_window = 60
symbol = 'DIA'
bars = pd.read_csv(r'http://www.google.com/finance/getprices?i=60&p=10d&f=d,o,h,l,c,v&df=cpct&q=DIA', skiprows=7, header=None, names=['Date', 'Close', 'High', 'Low', 'Open', 'Volume'])
signals = pd.DataFrame(index=bars.index)
signals['signal'] = 0.0
signals['short_mavg'] = pd.ewma(bars['Close'], span=short_window, min_periods=1)
signals['long_mavg'] = pd.ewma(bars['Close'], span=long_window, min_periods=1)
signals['signal'][short_window:] = np.where(signals['short_mavg'][short_window:] > signals['long_mavg'][short_window:], 1.0, -1.0)
signals['positions'] = signals['signal'].diff()
positions = pd.DataFrame(index=signals.index).fillna(0.0)
positions[symbol] = signals['signal']*10
portfolio = positions*bars.Close
pos_diff = positions.diff()
portfolio['holdings'] = positions*bars.Close
portfolio['cash'] = initial_capital - (pos_diff*bars.Close).cumsum()
portfolio['total'] = portfolio['cash'] + portfolio['holdings']
portfolio['returns'] = portfolio['total'].pct_change()
但是在更新到pandas 0.17时,这会引发一个ValueError ValueError: Wrong number of items passed 3391, placement implies 1
(传递的项目数可能会有所不同,因为数据Feed会在工作日的930A EST到4P EST之间逐分更新)
现在,根据Stefan in another post的建议,我已经能够纠正代码并让它再次运行。功能代码与上述相同,从第21行开始的更改如下:
portfolio = positions.mul(bars.Close, axis=0)
pos_diff = positions.diff()
portfolio['holdings'] = positions.mul(bars.Close, axis=0)
portfolio['cash'] = initial_capital - (pos_diff.mul(bars.Close, axis=0)).cumsum()
从我可以从发行说明中收集的内容中,沿着日期时间索引see here进行广播算术运算存在长期问题,并且最终将其删除。新行为建议将df <op> df.A
替换为df.<op>(df.A, axis=0 or 'index')
。这似乎与我的问题相似,但我的bars
数据框的索引未设置为任何列,并且'Date'列由于其格式而被读取为对象dtype。
有人可以解释为什么我需要使用df.<op>(df.A, axis=0)
的新格式来成功地在我的数据帧中执行算术运算,尽管没有日期时间索引。
感谢您的任何见解!