Question

我有一个包含EOD财务数据（OHLC）的pandas Dataframe用于分析。

我使用https://github.com/cirla/tulipy库来生成技术指标值，这些值具有一定的时间段作为选项。例如。时间段= 5的ADX显示过去5天的ADX。

由于此时间段，生成的指标值数组的长度始终短于Dataframe。因为前5天的价格用于在第6天生成ADX ..

    pdi14, mdi14 = ti.di(
    high=highData, low=lowData, close=closeData, period=14)

    df['mdi_14'] = mdi14
    df['pdi_14'] = pdi14
    >> ValueError: Length of values does not match length of index

不幸的是，与TA-LIB不同的是，这个郁金香库没有提供这些前几天的NaN值......

是否有一种简单的方法可以将这些NaN添加到ndarray中？或者在某个索引处插入df＆amp;让它自动为行创建NaN吗？

先谢谢，我已经研究了好几天了！

Answer 1

完整MCVE

df = pd.DataFrame(1, range(10), list('ABC'))

a = np.full((len(df) - 6, df.shape[1]), 2)
b = np.full((6, df.shape[1]), np.nan)

c = np.row_stack([b, a])

d = pd.DataFrame(c, df.index, df.columns)
d

     A    B    C
0  NaN  NaN  NaN
1  NaN  NaN  NaN
2  NaN  NaN  NaN
3  NaN  NaN  NaN
4  NaN  NaN  NaN
5  NaN  NaN  NaN
6  2.0  2.0  2.0
7  2.0  2.0  2.0
8  2.0  2.0  2.0
9  2.0  2.0  2.0

Answer 2

也许在代码中自行转换？

period = 14
pdi14, mdi14 = ti.di(
    high=highData, low=lowData, close=closeData, period=period
)

df['mdi_14'] = np.NAN
df['mdi_14'][period - 1:] = mdi14

我希望他们将来会在lib中用NAN填充第一个值。在没有任何标签的情况下保留这样的时间序列数据是危险的。

Answer 3

tulip library的C版本为每个指标（参考：https://tulipindicators.org/usage）包括一个start函数，可用于在给定一组输入的情况下确定指标的输出长度选项。不幸的是，似乎没有python绑定库tulipy包含此功能。相反，您必须诉诸于动态地重新分配索引值以使输出与原始DataFrame对齐。

以下是使用郁金香文档中的价格系列的示例：

#Create the dataframe with close prices
prices = pd.DataFrame(data={81.59, 81.06, 82.87, 83, 83.61, 83.15, 82.84, 83.99, 84.55,
 84.36, 85.53, 86.54, 86.89, 87.77, 87.29}, columns=['close'])

#Compute the technical indicator using tulipy and save the result in a DataFrame
bbands = pd.DataFrame(data=np.transpose(ti.bbands(real = prices['close'].to_numpy(), period = 5, stddev = 2)))

#Dynamically realign the index; note from the tulip library documentation that the price/volume data is expected be ordered "oldest to newest (index 0 is oldest)"
bbands.index += prices.index.max() - bbands.index.max()

#Put the indicator values with the original DataFrame
prices[['BBANDS_5_2_low', 'BBANDS_5_2_mid', 'BBANDS_5_2_up']] = bbands
prices.head(15)

close   BBANDS_5_2_low  BBANDS_5_2_mid  BBANDS_5_2_up
0   81.06   NaN NaN NaN
1   81.59   NaN NaN NaN
2   82.87   NaN NaN NaN
3   83.00   NaN NaN NaN
4   83.61   80.530042   82.426  84.321958
5   83.15   81.494061   82.844  84.193939
6   82.84   82.533343   83.094  83.654657
7   83.99   82.471983   83.318  84.164017
8   84.55   82.417750   83.628  84.838250
9   84.36   82.435203   83.778  85.120797
10  85.53   82.511331   84.254  85.996669
11  86.54   83.142618   84.994  86.845382
12  86.89   83.536488   85.574  87.611512
13  87.77   83.870324   86.218  88.565676
14  87.29   85.288871   86.804  88.319129

Python Pandas Dataframe：索引的长度不匹配 - df [＆＃39; column＆＃39;] = ndarray

3 个答案: