所以我正在使用熊猫在Jupyter Notebook上进行作业。
重点是调整包含人员学位信息的DF列。我需要用数字替换度数条目(字符串)。 (1 =高中,2 =技术,3 =研究生,4 =研究生)
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
formacao = pd.read_csv("bases/formacao.csv")
formacao['grau'] = formacao.degree
formacao.grau.fillna(0, inplace=True)
formacao.loc[formacao.grau.str.contains('Tecnico|Curso T|Technical|Técnico|Technician|Minor|Technologist',case=False,na=False)] = 2
formacao.loc[formacao.grau.str.contains('Undergraduate|High School|Ensino Médio|Ensino Medio|Cursando|Under graduate',case=False,na=False)] = 1
formacao.loc[formacao.grau.str.contains('Bachelor|Bacharel|Licenciatura|B.S.|College|Engenheiro|Engenharia|Graduate|Ciencia|Ciência|Science|Graduação',case=False,na=False)] = 3
formacao.loc[formacao.grau.str.contains('Master|MBA|Mestrado|Pós|Post|Especialista|Specialist|Specialization|Especialização',case=False,na=False)] = 4
formacao.grau.unique()
那是我的代码,问题在于有时它可以工作。有时并非如此。我早些时候使用了这个确切的代码,但尚未涵盖所有结果。然后开始添加新字符串,然后出现此错误:
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
我关闭了jupyter,它又可以工作了。我更改了一个字母,并且出现相同的错误。 现在,我知道这没有道理,但是我无法交付这样的作业。不仅因为未完成,而且我如何知道老师是否能够运行代码。 可能有什么问题吗?
这是完整的追溯:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-12-4bad11236a95> in <module>
1 formacao['grau'] = formacao.degree
2 formacao.grau.fillna(0, inplace=True)
----> 3 formacao.loc[formacao.grau.str.contains('Tecnico|Curso T|Technical|Técnico|Technician|Minor|Technologist',case=False,na=False)] = 2
4 formacao.loc[formacao.grau.str.contains('Undergraduate|High School|Ensino Médio|Ensino Medio|Cursando|Under graduate',case=False,na=False)] = 1
5 formacao.loc[formacao.grau.str.contains('Bachelor|Bacharel|Licenciatura|B.S.|College|Engenheiro|Engenharia|Graduate|Ciencia|Ciência|Science|Graduação',case=False,na=False)] = 3
c:\users\user\appdata\local\programs\python\python36\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
4370 if (name in self._internal_names_set or name in self._metadata or
4371 name in self._accessors):
-> 4372 return object.__getattribute__(self, name)
4373 else:
4374 if self._info_axis._can_hold_identifiers_and_holds_name(name):
c:\users\user\appdata\local\programs\python\python36\lib\site-packages\pandas\core\accessor.py in __get__(self, obj, cls)
131 # we're accessing the attribute of the class, i.e., Dataset.geo
132 return self._accessor
--> 133 accessor_obj = self._accessor(obj)
134 # Replace the property with the accessor object. Inspired by:
135 # http://www.pydanny.com/cached-property.html
c:\users\user\appdata\local\programs\python\python36\lib\site-packages\pandas\core\strings.py in __init__(self, data)
1893
1894 def __init__(self, data):
-> 1895 self._validate(data)
1896 self._is_categorical = is_categorical_dtype(data)
1897
c:\users\user\appdata\local\programs\python\python36\lib\site-packages\pandas\core\strings.py in _validate(data)
1915 # (instead of test for object dtype), but that isn't practical for
1916 # performance reasons until we have a str dtype (GH 9343)
-> 1917 raise AttributeError("Can only use .str accessor with string "
1918 "values, which use np.object_ dtype in "
1919 "pandas")