Question

我正在处理几乎只有字符串的大型csv文件。我想进行一些统计测试，例如定义集群，但是为此，我需要将字符串转换为int。（我也是python，pandas，scikitlearn的新手。）

这是我的代码：

#replace str as int
df.WORK_TYPE[df.WORK_TYPE == 'aaa']=1
df.WORK_TYPE[df.WORK_TYPE == 'bbb']=2
df.WORK_TYPE[df.WORK_TYPE == 'ccc']=3
df.WORK_TYPE[df.WORK_TYPE == 'ddd']=4
print(df)

这是我的错误消息：

C:\Users\ishemf64\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame 

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
C:\Users\ishemf64\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

C:\Users\ishemf64\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
C:\Users\ishemf64\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.

我不明白为什么会出现此错误，如果我想进行分析，您能否告诉我是否还有其他方法和/或强制性来转换文本。

Answer 1

这看起来像是警告，而不是错误。比我在这里解释的更好的人：https://www.dataquest.io/blog/settingwithcopywarning/

由于您似乎只有几个类别，您会考虑使用get_dummies吗？它使用pd.Series中包含分类数据，并帮助您将其转换为伪变量（如果存在则为1，否则为0）。在此处查看：https://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html

如何处理熊猫绳子

1 个答案: