我在尝试使用存储在HDF文件中的pandas数据帧上的大量数据进行计算时询问了一个奇怪的行为。数据集如下所示:
import pandas as pd
with pd.HDFStore('datasets/eurusd.h5') as store:
df = store['rs']
print(df)
ask_close ask_open ask_high ask_low bid_close bid_open bid_high bid_low
2011-01-02 17:00:12 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:13 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:14 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:15 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:16 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:17 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:18 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:19 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:20 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:21 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:22 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:23 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:24 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:25 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:26 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:27 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:28 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:29 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:30 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:31 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:32 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:33 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:34 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:35 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:36 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:37 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:38 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:39 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:40 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
2011-01-02 17:00:41 1.3345 1.3345 1.3345 1.3345 1.3348 1.3348 1.3348 1.3348
... ... ... ... ... ... ... ... ...
2018-06-08 16:59:21 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:22 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:23 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:24 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:25 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:26 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:27 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:28 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:29 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:30 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:31 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:32 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:33 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:34 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:35 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:36 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:37 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:38 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:39 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:40 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:41 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:42 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:43 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:44 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:45 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:46 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:47 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:48 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:49 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
2018-06-08 16:59:50 1.1767 1.1767 1.1767 1.1767 1.1771 1.1771 1.1771 1.1771
[234489580 rows x 8 columns]
由于我们的数据很大,我们只会前10000行。 我们将删除除每行的收盘价之外的所有列。
df = store['rs'][:10000].copy()
li = list(x + '_' + y for x in ['ask', 'bid'] for y in ['open', 'high', 'low'])
df.drop(li, axis = 1, inplace = True)
然后,我们将创建3个新列。
df['askbid'] = df.ask_close / df.bid_close
df['ask'] = df.ask_close.pct_change()
df['bid'] = df.bid_close.pct_change()
print(df)
ask_close bid_close askbid ask
2011-01-02 17:00:12 1.3345 1.3348 0.999775 NaN
2011-01-02 17:00:13 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:14 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:15 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:16 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:17 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:18 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:19 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:20 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:21 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:22 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:23 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:24 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:25 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:26 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:27 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:28 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:29 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:30 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:31 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:32 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:33 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:34 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:35 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:36 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:37 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:38 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:39 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:40 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:41 1.3345 1.3348 0.999775 0.000000
问题在于:只创建了2列。
让我们尝试使用引用的df.ask列创建代码。
df['askbid'] = df.ask_close / df.bid_close
#df['ask'] = df.ask_close.pct_change()
df['bid'] = df.bid_close.pct_change()
print(df)
ask_close bid_close askbid bid
2011-01-02 17:00:12 1.3345 1.3348 0.999775 NaN
2011-01-02 17:00:13 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:14 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:15 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:16 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:17 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:18 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:19 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:20 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:21 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:22 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:23 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:24 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:25 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:26 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:27 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:28 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:29 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:30 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:31 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:32 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:33 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:34 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:35 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:36 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:37 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:38 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:39 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:40 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:41 1.3345 1.3348 0.999775 0.000000
... ... ... ... ...
2011-01-02 19:46:22 1.3320 1.3323 0.999775 0.000000
2011-01-02 19:46:23 1.3320 1.3323 0.999775 0.000000
2011-01-02 19:46:24 1.3320 1.3323 0.999775 0.000000
2011-01-02 19:46:25 1.3320 1.3323 0.999775 0.000000
2011-01-02 19:46:26 1.3320 1.3323 0.999775 0.000000
2011-01-02 19:46:27 1.3320 1.3323 0.999775 0.000000
2011-01-02 19:46:28 1.3320 1.3323 0.999775 0.000000
2011-01-02 19:46:29 1.3321 1.3324 0.999775 0.000075
2011-01-02 19:46:30 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:31 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:32 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:33 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:34 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:35 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:36 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:37 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:38 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:39 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:40 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:41 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:42 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:43 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:44 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:45 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:46 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:47 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:48 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:49 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:50 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:51 1.3321 1.3324 0.999775 0.000000
看起来这样可以正常工作。
pct_change()函数可能有问题,所以我尝试了这个:
df['askbid'] = df.ask_close / df.bid_close
df['ask'] = (df.ask_close - df.ask_close.shift(1)) / df.ask_close
df['bid'] = (df.bid_close - df.bid_close.shift(1)) / df.bid_close
print(df)
ask_close bid_close askbid ask
2011-01-02 17:00:12 1.3345 1.3348 0.999775 NaN
2011-01-02 17:00:13 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:14 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:15 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:16 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:17 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:18 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:19 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:20 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:21 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:22 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:23 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:24 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:25 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:26 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:27 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:28 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:29 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:30 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:31 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:32 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:33 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:34 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:35 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:36 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:37 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:38 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:39 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:40 1.3345 1.3348 0.999775 0.000000
2011-01-02 17:00:41 1.3345 1.3348 0.999775 0.000000
... ... ... ... ...
2011-01-02 19:46:22 1.3320 1.3323 0.999775 0.000000
2011-01-02 19:46:23 1.3320 1.3323 0.999775 0.000000
2011-01-02 19:46:24 1.3320 1.3323 0.999775 0.000000
2011-01-02 19:46:25 1.3320 1.3323 0.999775 0.000000
2011-01-02 19:46:26 1.3320 1.3323 0.999775 0.000000
2011-01-02 19:46:27 1.3320 1.3323 0.999775 0.000000
2011-01-02 19:46:28 1.3320 1.3323 0.999775 0.000000
2011-01-02 19:46:29 1.3321 1.3324 0.999775 0.000075
2011-01-02 19:46:30 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:31 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:32 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:33 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:34 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:35 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:36 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:37 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:38 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:39 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:40 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:41 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:42 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:43 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:44 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:45 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:46 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:47 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:48 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:49 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:50 1.3321 1.3324 0.999775 0.000000
2011-01-02 19:46:51 1.3321 1.3324 0.999775 0.000000
同样的问题仍然存在。
我不知道为什么会发生这种情况。