我正在处理几乎只有字符串的大型csv文件。我想进行一些统计测试,例如定义集群,但是为此,我需要将字符串转换为int。 (我也是python,pandas,scikitlearn的新手。)
这是我的代码:
#replace str as int
df.WORK_TYPE[df.WORK_TYPE == 'aaa']=1
df.WORK_TYPE[df.WORK_TYPE == 'bbb']=2
df.WORK_TYPE[df.WORK_TYPE == 'ccc']=3
df.WORK_TYPE[df.WORK_TYPE == 'ddd']=4
print(df)
这是我的错误消息:
C:\Users\ishemf64\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
C:\Users\ishemf64\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
C:\Users\ishemf64\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
This is separate from the ipykernel package so we can avoid doing imports until
C:\Users\ishemf64\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
after removing the cwd from sys.path.
我不明白为什么会出现此错误,如果我想进行分析,您能否告诉我是否还有其他方法和/或强制性来转换文本。
答案 0 :(得分:0)
这看起来像是警告,而不是错误。比我在这里解释的更好的人:https://www.dataquest.io/blog/settingwithcopywarning/
由于您似乎只有几个类别,您会考虑使用get_dummies
吗?它使用pd.Series
中包含分类数据,并帮助您将其转换为伪变量(如果存在则为1,否则为0)。在此处查看:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html