加载STATA文件:分类值必须是唯一的

时间:2015-08-03 08:04:05

标签: python pandas

我正在尝试将此zip file后面的.dta文件加载到pandas。但是,我立即收到错误。我的命令也有stata,但由于错误信息没有告诉我更多信息,比如故障列,我不知道该怎么做。

如何将文件加载到pandas

>>> df = pd.read_stata('cepr_org_2014.dta')

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/local/Cellar/python/2.7.8_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.15.2-py2.7-macosx-10.9-x86_64.egg/pandas/io/stata.py", line 69, in read_stata
    order_categoricals)
  File "/usr/local/Cellar/python/2.7.8_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.15.2-py2.7-macosx-10.9-x86_64.egg/pandas/io/stata.py", line 1315, in data
    cat_data.categories = categories
  File "/usr/local/Cellar/python/2.7.8_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.15.2-py2.7-macosx-10.9-x86_64.egg/pandas/core/categorical.py", line 442, in _set_categories
    categories = self._validate_categories(categories)
  File "/usr/local/Cellar/python/2.7.8_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.15.2-py2.7-macosx-10.9-x86_64.egg/pandas/core/categorical.py", line 437, in _validate_categories
    raise ValueError('Categorical categories must be unique')
ValueError: Categorical categories must be unique

1 个答案:

答案 0 :(得分:3)

使用pandas.read_stata('cepr_org_2014.dta', convert_categoricals=False, convert_missing=True)加载此内容并查看数据的外观。 (可选)使用ipdb进行调试,如问题中所述,显示数据中存在重复的类别。