Question

我正在使用Pandas处理我的CSV文件以进行机器学习。

CSV文件包含一系列用英语编写的标签，例如“数学”和“文学”。我想将这些标签映射到像“数学”这样的整数：1，“文学”：2。我怎么能用熊猫做到这一点？

Answer 1

您可以将dict字符串作为键（＆＃39; math＆＃39;等），将整数作为值输入map方法。例如：

>>> df

   x     subject
0  1        math
1  2  literature
2  3        math
3  4        math
4  5     science

>>> df['num'] = df.subject.map({'math':0,'literature':1,'science':2})
>>> df

   x     subject  num
0  1        math    0
1  2  literature    1
2  3        math    0
3  4        math    0
4  5     science    2

你也可以使用factorize来完成同样的事情，但你不会控制从字符串到整数的映射（尽管在这个例子中它最终是相同的）：

>>> df['num'] = pd.factorize(df.subject)[0]
>>> df

   x     subject  num
0  1        math    0
1  2  literature    1
2  3        math    0
3  4        math    0
4  5     science    2

使用Pandas处理CSV文件中的字符串值列

1 个答案: