我收到以下错误:
exportStore.append(key, hdfStoreLocal, index = False, data_columns = True)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 911, in append
**kwargs)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 1270, in _write_to_group
s.write(obj=value, append=append, complib=complib, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 3605, in write
**kwargs)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 3293, in create_axes
raise e
ValueError: invalid itemsize in generic type tuple
有关为何会发生这种情况的任何想法?这是一个相当大的项目,所以我不确定我能提供什么代码,但这发生在第一个附加内容上。非常感谢任何帮助。
EDIT ::::::
显示版本结果:
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-35-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.14.1
nose: None
Cython: 0.20.2
numpy: 1.8.1
scipy: 0.13.3
statsmodels: None
IPython: 1.2.1
sphinx: 1.2.2
patsy: None
scikits.timeseries: None
dateutil: 1.5
pytz: 2012c
bottleneck: None
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.8
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None
信息结果:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 61500 entries, 0 to 61499
Data columns (total 48 columns):
Sequential_Code_1 61500 non-null float64
Age_1 61500 non-null float64
Sex_1 61500 non-null object
Race_1 61500 non-null object
Ethnicity_1 61500 non-null object
Principal_Code_1 61500 non-null object
Admitting_Code_1 61500 non-null object
Principal_Code_2 61500 non-null object
Other_Codes_1 61500 non-null object
Other_Codes_2 61500 non-null object
Other_Codes_3 61500 non-null object
Other_Codes_4 61500 non-null object
Other_Codes_5 61500 non-null object
Other_Codes_6 61500 non-null object
Other_Codes_7 61500 non-null object
Other_Codes_8 61500 non-null object
Other_Codes_9 61500 non-null object
Other_Codes_10 61500 non-null object
Other_Codes_11 61500 non-null object
Other_Codes_12 61500 non-null object
Other_Codes_13 61500 non-null object
Other_Codes_14 61500 non-null object
Other_Codes_15 61500 non-null object
Other_Codes_16 61500 non-null object
Other_Codes_17 61500 non-null object
Other_Codes_18 61500 non-null object
Other_Codes_19 61500 non-null object
Other_Codes_20 61500 non-null object
Other_Codes_21 61500 non-null object
Other_Codes_22 61500 non-null object
Other_Codes_23 61500 non-null object
Other_Codes_24 61500 non-null object
External_Code_1 61500 non-null object
Place_Code_1 61500 non-null object
目:
head Sequential_Number_1 Age_1 Sex_1 Race_1 \
1128 2.000000e+13 73 F 01
2185 2.000000e+13 52 M 01
2202 2.000000e+13 64 M 01
2283 2.000000e+13 72 F 01
4471 2.000000e+13 62 F 01
答案 0 :(得分:1)
问题是您需要指定min_itemsize
,请参阅文档here。
它控制列对于类似字符串的列的大小。如果你没有任何长度的任何值它失败(prob可能是一个更好的错误消息)。它将花费传递值的最大长度来确定它需要的大小。
指定这个的原因是说你要附加多个块。你可以在块2中有一个更长的字符串,这意味着列应该至少是那个大小,但只看到块1并没有告诉你这个。
进一步预先处理这些数据,使其不具有0-len字符串,而是使用np.nan
作为缺失值(HDFstore / pandas)正确处理。