Pandas / Jupyter-使用.contains()-只能将.str访问器与字符串值一起使用

时间:2018-11-15 21:38:17

标签: python pandas jupyter-notebook

所以我正在使用熊猫在Jupyter Notebook上进行作业。

重点是调整包含人员学位信息的DF列。我需要用数字替换度数条目(字符串)。 (1 =高中,2 =技术,3 =研究生,4 =研究生)

import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

formacao = pd.read_csv("bases/formacao.csv")
formacao['grau'] = formacao.degree
formacao.grau.fillna(0, inplace=True)
formacao.loc[formacao.grau.str.contains('Tecnico|Curso T|Technical|Técnico|Technician|Minor|Technologist',case=False,na=False)] = 2
formacao.loc[formacao.grau.str.contains('Undergraduate|High School|Ensino Médio|Ensino Medio|Cursando|Under graduate',case=False,na=False)] = 1
formacao.loc[formacao.grau.str.contains('Bachelor|Bacharel|Licenciatura|B.S.|College|Engenheiro|Engenharia|Graduate|Ciencia|Ciência|Science|Graduação',case=False,na=False)] = 3
formacao.loc[formacao.grau.str.contains('Master|MBA|Mestrado|Pós|Post|Especialista|Specialist|Specialization|Especialização',case=False,na=False)] = 4
formacao.grau.unique()

那是我的代码,问题在于有时它可以工作。有时并非如此。我早些时候使用了这个确切的代码,但尚未涵盖所有结果。然后开始添加新字符串,然后出现此错误:

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

我关闭了jupyter,它又可以工作了。我更改了一个字母,并且出现相同的错误。 现在,我知道这没有道理,但是我无法交付这样的作业。不仅因为未完成,而且我如何知道老师是否能够运行代码。 可能有什么问题吗?

这是完整的追溯:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-4bad11236a95> in <module>
      1 formacao['grau'] = formacao.degree
      2 formacao.grau.fillna(0, inplace=True)
----> 3 formacao.loc[formacao.grau.str.contains('Tecnico|Curso T|Technical|Técnico|Technician|Minor|Technologist',case=False,na=False)] = 2
      4 formacao.loc[formacao.grau.str.contains('Undergraduate|High School|Ensino Médio|Ensino Medio|Cursando|Under graduate',case=False,na=False)] = 1
      5 formacao.loc[formacao.grau.str.contains('Bachelor|Bacharel|Licenciatura|B.S.|College|Engenheiro|Engenharia|Graduate|Ciencia|Ciência|Science|Graduação',case=False,na=False)] = 3

c:\users\user\appdata\local\programs\python\python36\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   4370         if (name in self._internal_names_set or name in self._metadata or
   4371                 name in self._accessors):
-> 4372             return object.__getattribute__(self, name)
   4373         else:
   4374             if self._info_axis._can_hold_identifiers_and_holds_name(name):

c:\users\user\appdata\local\programs\python\python36\lib\site-packages\pandas\core\accessor.py in __get__(self, obj, cls)
    131             # we're accessing the attribute of the class, i.e., Dataset.geo
    132             return self._accessor
--> 133         accessor_obj = self._accessor(obj)
    134         # Replace the property with the accessor object. Inspired by:
    135         # http://www.pydanny.com/cached-property.html

c:\users\user\appdata\local\programs\python\python36\lib\site-packages\pandas\core\strings.py in __init__(self, data)
   1893 
   1894     def __init__(self, data):
-> 1895         self._validate(data)
   1896         self._is_categorical = is_categorical_dtype(data)
   1897 

c:\users\user\appdata\local\programs\python\python36\lib\site-packages\pandas\core\strings.py in _validate(data)
   1915             # (instead of test for object dtype), but that isn't practical for
   1916             # performance reasons until we have a str dtype (GH 9343)
-> 1917             raise AttributeError("Can only use .str accessor with string "
   1918                                  "values, which use np.object_ dtype in "
   1919                                  "pandas")

0 个答案:

没有答案