Pandas奇怪的行为与HDFStore,pct.change()

时间:2018-06-17 14:52:21

标签: python pandas hdf5

我在尝试使用存储在HDF文件中的pandas数据帧上的大量数据进行计算时询问了一个奇怪的行为。数据集如下所示:

import pandas as pd

with pd.HDFStore('datasets/eurusd.h5') as store:
    df = store['rs']
    print(df)


                      ask_close ask_open ask_high ask_low bid_close bid_open bid_high bid_low

2011-01-02 17:00:12    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:13    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:14    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:15    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:16    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:17    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:18    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:19    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:20    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:21    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:22    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:23    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:24    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:25    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:26    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:27    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:28    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:29    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:30    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:31    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:32    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:33    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:34    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:35    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:36    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:37    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:38    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:39    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:40    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
2011-01-02 17:00:41    1.3345   1.3345   1.3345  1.3345    1.3348   1.3348   1.3348  1.3348
...                       ...      ...      ...     ...       ...      ...      ...     ...
2018-06-08 16:59:21    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:22    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:23    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:24    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:25    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:26    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:27    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:28    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:29    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:30    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:31    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:32    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:33    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:34    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:35    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:36    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:37    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:38    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:39    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:40    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:41    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:42    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:43    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:44    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:45    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:46    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:47    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:48    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:49    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771
2018-06-08 16:59:50    1.1767   1.1767   1.1767  1.1767    1.1771   1.1771   1.1771  1.1771

[234489580 rows x 8 columns]

由于我们的数据很大,我们只会前10000行。 我们将删除除每行的收盘价之外的所有列。

df = store['rs'][:10000].copy()
    li = list(x + '_' + y for x in ['ask', 'bid'] for y in ['open', 'high', 'low'])
df.drop(li, axis = 1, inplace = True)

然后,我们将创建3个新列。

df['askbid'] = df.ask_close / df.bid_close
df['ask'] = df.ask_close.pct_change()
df['bid'] = df.bid_close.pct_change()
print(df)

                     ask_close bid_close    askbid      ask

2011-01-02 17:00:12    1.3345    1.3348  0.999775       NaN
2011-01-02 17:00:13    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:14    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:15    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:16    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:17    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:18    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:19    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:20    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:21    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:22    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:23    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:24    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:25    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:26    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:27    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:28    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:29    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:30    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:31    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:32    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:33    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:34    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:35    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:36    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:37    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:38    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:39    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:40    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:41    1.3345    1.3348  0.999775  0.000000

问题在于:只创建了2列。

让我们尝试使用引用的df.ask列创建代码。

df['askbid'] = df.ask_close / df.bid_close
#df['ask'] = df.ask_close.pct_change()
df['bid'] = df.bid_close.pct_change()
print(df)

                    ask_close bid_close    askbid       bid

2011-01-02 17:00:12    1.3345    1.3348  0.999775       NaN
2011-01-02 17:00:13    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:14    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:15    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:16    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:17    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:18    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:19    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:20    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:21    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:22    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:23    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:24    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:25    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:26    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:27    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:28    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:29    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:30    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:31    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:32    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:33    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:34    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:35    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:36    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:37    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:38    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:39    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:40    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:41    1.3345    1.3348  0.999775  0.000000
...                       ...       ...       ...       ...
2011-01-02 19:46:22    1.3320    1.3323  0.999775  0.000000
2011-01-02 19:46:23    1.3320    1.3323  0.999775  0.000000
2011-01-02 19:46:24    1.3320    1.3323  0.999775  0.000000
2011-01-02 19:46:25    1.3320    1.3323  0.999775  0.000000
2011-01-02 19:46:26    1.3320    1.3323  0.999775  0.000000
2011-01-02 19:46:27    1.3320    1.3323  0.999775  0.000000
2011-01-02 19:46:28    1.3320    1.3323  0.999775  0.000000
2011-01-02 19:46:29    1.3321    1.3324  0.999775  0.000075
2011-01-02 19:46:30    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:31    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:32    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:33    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:34    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:35    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:36    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:37    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:38    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:39    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:40    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:41    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:42    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:43    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:44    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:45    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:46    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:47    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:48    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:49    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:50    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:51    1.3321    1.3324  0.999775  0.000000

看起来这样可以正常工作。

pct_change()函数可能有问题,所以我尝试了这个:

df['askbid'] = df.ask_close / df.bid_close
df['ask'] = (df.ask_close - df.ask_close.shift(1)) / df.ask_close
df['bid'] = (df.bid_close - df.bid_close.shift(1)) / df.bid_close
print(df)

                    ask_close bid_close    askbid       ask

2011-01-02 17:00:12    1.3345    1.3348  0.999775       NaN
2011-01-02 17:00:13    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:14    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:15    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:16    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:17    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:18    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:19    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:20    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:21    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:22    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:23    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:24    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:25    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:26    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:27    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:28    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:29    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:30    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:31    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:32    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:33    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:34    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:35    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:36    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:37    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:38    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:39    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:40    1.3345    1.3348  0.999775  0.000000
2011-01-02 17:00:41    1.3345    1.3348  0.999775  0.000000
...                       ...       ...       ...       ...
2011-01-02 19:46:22    1.3320    1.3323  0.999775  0.000000
2011-01-02 19:46:23    1.3320    1.3323  0.999775  0.000000
2011-01-02 19:46:24    1.3320    1.3323  0.999775  0.000000
2011-01-02 19:46:25    1.3320    1.3323  0.999775  0.000000
2011-01-02 19:46:26    1.3320    1.3323  0.999775  0.000000
2011-01-02 19:46:27    1.3320    1.3323  0.999775  0.000000
2011-01-02 19:46:28    1.3320    1.3323  0.999775  0.000000
2011-01-02 19:46:29    1.3321    1.3324  0.999775  0.000075
2011-01-02 19:46:30    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:31    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:32    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:33    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:34    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:35    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:36    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:37    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:38    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:39    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:40    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:41    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:42    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:43    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:44    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:45    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:46    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:47    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:48    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:49    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:50    1.3321    1.3324  0.999775  0.000000
2011-01-02 19:46:51    1.3321    1.3324  0.999775  0.000000

同样的问题仍然存在。

我不知道为什么会发生这种情况。

0 个答案:

没有答案