Python Pandas Dataframe:索引的长度不匹配 - df [' column'] = ndarray

时间:2018-04-09 18:14:38

标签: python pandas dataframe time-series valueerror

我有一个包含EOD财务数据​​(OHLC)的pandas Dataframe用于分析。

我使用https://github.com/cirla/tulipy库来生成技术指标值,这些值具有一定的时间段作为选项。例如。时间段= 5的ADX显示过去5天的ADX。

由于此时间段,生成的指标值数组的长度始终短于Dataframe。因为前5天的价格用于在第6天生成ADX ..

    pdi14, mdi14 = ti.di(
    high=highData, low=lowData, close=closeData, period=14)

    df['mdi_14'] = mdi14
    df['pdi_14'] = pdi14
    >> ValueError: Length of values does not match length of index

不幸的是,与TA-LIB不同的是,这个郁金香库没有提供这些前几天的NaN值......

是否有一种简单的方法可以将这些NaN添加到ndarray中? 或者在某个索引处插入df&让它自动为行创建NaN吗?

先谢谢,我已经研究了好几天了!

3 个答案:

答案 0 :(得分:0)

完整MCVE

df = pd.DataFrame(1, range(10), list('ABC'))

a = np.full((len(df) - 6, df.shape[1]), 2)
b = np.full((6, df.shape[1]), np.nan)

c = np.row_stack([b, a])

d = pd.DataFrame(c, df.index, df.columns)
d

     A    B    C
0  NaN  NaN  NaN
1  NaN  NaN  NaN
2  NaN  NaN  NaN
3  NaN  NaN  NaN
4  NaN  NaN  NaN
5  NaN  NaN  NaN
6  2.0  2.0  2.0
7  2.0  2.0  2.0
8  2.0  2.0  2.0
9  2.0  2.0  2.0

答案 1 :(得分:0)

也许在代码中自行转换?

period = 14
pdi14, mdi14 = ti.di(
    high=highData, low=lowData, close=closeData, period=period
)

df['mdi_14'] = np.NAN
df['mdi_14'][period - 1:] = mdi14

我希望他们将来会在lib中用NAN填充第一个值。在没有任何标签的情况下保留这样的时间序列数据是危险的。

答案 2 :(得分:0)

tulip library的C版本为每个指标(参考:https://tulipindicators.org/usage)包括一个start函数,可用于在给定一组输入的情况下确定指标的输出长度选项。不幸的是,似乎没有python绑定库tulipy包含此功能。相反,您必须诉诸于动态地重新分配索引值以使输出与原始DataFrame对齐。

以下是使用郁金香文档中的价格系列的示例:

#Create the dataframe with close prices
prices = pd.DataFrame(data={81.59, 81.06, 82.87, 83, 83.61, 83.15, 82.84, 83.99, 84.55,
 84.36, 85.53, 86.54, 86.89, 87.77, 87.29}, columns=['close'])

#Compute the technical indicator using tulipy and save the result in a DataFrame
bbands = pd.DataFrame(data=np.transpose(ti.bbands(real = prices['close'].to_numpy(), period = 5, stddev = 2)))

#Dynamically realign the index; note from the tulip library documentation that the price/volume data is expected be ordered "oldest to newest (index 0 is oldest)"
bbands.index += prices.index.max() - bbands.index.max()

#Put the indicator values with the original DataFrame
prices[['BBANDS_5_2_low', 'BBANDS_5_2_mid', 'BBANDS_5_2_up']] = bbands
prices.head(15)

close   BBANDS_5_2_low  BBANDS_5_2_mid  BBANDS_5_2_up
0   81.06   NaN NaN NaN
1   81.59   NaN NaN NaN
2   82.87   NaN NaN NaN
3   83.00   NaN NaN NaN
4   83.61   80.530042   82.426  84.321958
5   83.15   81.494061   82.844  84.193939
6   82.84   82.533343   83.094  83.654657
7   83.99   82.471983   83.318  84.164017
8   84.55   82.417750   83.628  84.838250
9   84.36   82.435203   83.778  85.120797
10  85.53   82.511331   84.254  85.996669
11  86.54   83.142618   84.994  86.845382
12  86.89   83.536488   85.574  87.611512
13  87.77   83.870324   86.218  88.565676
14  87.29   85.288871   86.804  88.319129