pandas DataFrame错误“元组索引超出范围”

时间:2017-04-02 11:41:32

标签: python pandas numpy dataframe

使用当前版本的pandas向后填充numpy日期向量时遇到问题。相同的代码适用于早期版本。以下是我的问题:

旧版本(0.7.3)可以使用

C:\WINDOWS\system32>pip show pandas
Name: pandas
Version: 0.7.3
Summary: Powerful data structures for data analysis and statistics
Home-page: http://pandas.pydata.org
Author: The PyData Development Team
Author-email: pydata@googlegroups.com
License: BSD
Location: c:\program files\python\python27\lib\site-packages
Requires: python-dateutil, numpy

C:\WINDOWS\system32>python
Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:24:40) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> d=np.array([None, None, None, None, dt.now(), None])
>>> b = DataFrame(d)
>>> b.fillna(method='backfill')
                            0
0  2017-04-02 12:21:18.175000
1  2017-04-02 12:21:18.175000
2  2017-04-02 12:21:18.175000
3  2017-04-02 12:21:18.175000
4  2017-04-02 12:21:18.175000
5                        None
>>>

目前的版本(0.19.2)不起作用:

C:\WINDOWS\system32>pip show pandas
Name: pandas
Version: 0.19.2
Summary: Powerful data structures for data analysis, time series,and statistics
Home-page: http://pandas.pydata.org
Author: The PyData Development Team
Author-email: pydata@googlegroups.com
License: BSD
Location: c:\program files\python\python27\lib\site-packages
Requires: pytz, python-dateutil, numpy


C:\WINDOWS\system32>python
Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:24:40) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime as dt
>>> import numpy as np
>>> from pandas import DataFrame
>>> d=np.array([None, None, None, None, dt.now(), None])
>>> b = DataFrame(d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Python\Python27\lib\site-packages\pandas\core\frame.py", line 297, in __init__
    copy=copy)
  File "C:\Program Files\Python\Python27\lib\site-packages\pandas\core\frame.py", line 474, in _init_ndarray
    return create_block_manager_from_blocks([values], [columns, index])
  File "C:\Program Files\Python\Python27\lib\site-packages\pandas\core\internals.py", line 4256, in create_block_manager_from_blocks
    construction_error(tot_items, blocks[0].shape[1:], axes, e)
  File "C:\Program Files\Python\Python27\lib\site-packages\pandas\core\internals.py", line 4230, in construction_error
    if block_shape[0] == 0:
IndexError: tuple index out of range
>>>

我做错了什么,或者我认为是熊猫的错误​​?如果它是一个bug,我该如何举报?

编辑:这是作为与熊猫的错误​​报告提交的,并将在下一次小修复中修复(0.19.3)

2 个答案:

答案 0 :(得分:2)

DataFrame(d)失败了,我不确定原因,但Series(d)有效,所以你可以这样做:

pd.DataFrame({0:d})

也就是说,明确地告诉Pandas d是一个名为0的系列,这是它在古代0.7版本中暗含做的。

如果您确实要报告错误,可以简单地说这有效:

pd.DataFrame([None, None, datetime.datetime.now()])

但这失败了:

pd.DataFrame([None, None, None, datetime.datetime.now()])

答案 1 :(得分:0)

尝试明确指定(或强制转换)dtype

In [18]: d=np.array([None, None, None, None, pd.datetime.now(), None])

In [19]: b = DataFrame(d.astype('datetime64[ms]'))

In [20]: b
Out[20]:
                        0
0                     NaT
1                     NaT
2                     NaT
3                     NaT
4 2017-04-02 20:34:20.381
5                     NaT

In [21]: b.bfill()
Out[21]:
                        0
0 2017-04-02 20:34:20.381
1 2017-04-02 20:34:20.381
2 2017-04-02 20:34:20.381
3 2017-04-02 20:34:20.381
4 2017-04-02 20:34:20.381
5                     NaT