熊猫:变量赋值中的不确定性广播失败

时间:2018-02-07 19:00:48

标签: python pandas

这是将交易数据转换为OHLCV格式的程序的缩减版本。

import pandas as pd

data = pd.DataFrame({ 'time' : [pd.Timestamp('2017-12-26 16:01:04.628431600')], 'price': [100.0], 'size': [0.06] })
data.set_index('time', inplace=True)
data = data.resample('1s').apply({ 'price' : 'ohlc', 'size': 'sum' })

我收到以下错误

Traceback (most recent call last):
  File "/home/jun/.local/lib/python3.5/site-packages/pandas/core/common.py", line 1404, in _asarray_tuplesafe
    result[:] = values
ValueError: could not broadcast input array from shape (4) into shape (1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/tmp.py", line 5, in <module>
    data = data.resample('1s').apply({ 'price' : 'ohlc', 'size': 'sum' })
  File "/home/jun/.local/lib/python3.5/site-packages/pandas/tseries/resample.py", line 293, in aggregate
    result, how = self._aggregate(arg, *args, **kwargs)
  File "/home/jun/.local/lib/python3.5/site-packages/pandas/core/base.py", line 560, in _aggregate
    result = DataFrame(result)
  File "/home/jun/.local/lib/python3.5/site-packages/pandas/core/frame.py", line 224, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "/home/jun/.local/lib/python3.5/site-packages/pandas/core/frame.py", line 360, in _init_dict
    return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/home/jun/.local/lib/python3.5/site-packages/pandas/core/frame.py", line 5236, in _arrays_to_mgr
    arrays = _homogenize(arrays, index, dtype)
  File "/home/jun/.local/lib/python3.5/site-packages/pandas/core/frame.py", line 5546, in _homogenize
    raise_cast_failure=False)
  File "/home/jun/.local/lib/python3.5/site-packages/pandas/core/series.py", line 2922, in _sanitize_array
    subarr = _asarray_tuplesafe(data, dtype=dtype)
  File "/home/jun/.local/lib/python3.5/site-packages/pandas/core/common.py", line 1407, in _asarray_tuplesafe
    result[:] = [tuple(x) for x in values]
ValueError: cannot copy sequence with size 4 to array axis with dimension 1

这没有意义。 IIUC,第5行的分配只是为一个变量分配一个新的DataFrame,因此无需广播。更奇怪的是,这种失败是不确定的:运行脚本有时会导致错误,但有时却不会。

我做错了什么,或者这是熊猫的错误​​?这个计划中非决定论的来源在哪里?我正在使用Python 3.5.3和pandas 0.18.1。

0 个答案:

没有答案