Question

我尝试使用这样的映射将Pandas数据框的一列转换为int值（假设给定的数据框：my_dataframe和colum：target_column）：

targets = my_dataframe[target_column].unique()
map_to_int = {name: n  for n, name in enumerate(targets)}

在Pandas中使用Python 3.6我不知道为什么

A）

my_dataframe['Integer-Column'] = map_to_int[my_dataframe[target_column]]

导致

TypeError：“系列”对象是可变的，因此不能进行散列处理

同时

B）

my_dataframe['Integer-Column'] = my_dataframe[target_column].replace(map_to_int)

工作正常。

我想了解为什么会这样。在替换中是否有任何魔术提示没有抛出TypeError或我是否缺少其他东西？我已经知道，字典键不允许更改。但是，由于以下原因，我仍然很难真正理解这一点：

    words = my_dataframe[target_column].unique()
    # words = ['car' 'bike' 'plain']

    foo = 'car'
    map_to_int[foo] = 0
    foo = 'bike'
    map_to_int["bike"] = 1

任何能帮助我理解B）为什么没有A）麻烦的尝试都将受到赞赏。

Answer 1

您的解决方案不起作用，因为您正在尝试使用map_to_int[my_dataframe[target_column]]对象作为字典键。

此外，我建议您仅在特定情况下使用pd.Series；对于字典映射，通常应使用pd.Series.map，即replace。有关更多详细信息，请参见Replace values in a pandas series via dictionary efficiently。

但是此功能已在熊猫中以Categorical Data的形式实现。我建议您使用分类数据，这是一种将一系列项目映射到整数的高效且语法干净的方法。

这是一个例子：

my_dataframe[target_column].map(map_to_int)

Answer 2

显然my_dataframe[target_column]是python（3.6）认为可变的东西。在字典中使用可变的东西作为键会引发TypeError。因此，调用map_to_int之类的字典会引发错误。

在版本B中）仍使用字典map_to_int，但未明确提及字典中的键。而且，它们是targets中所持有内容的不可变表示。因此，当替换功能（https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html）使用字典时，它将使用那些不可变的键。因此，没有理由引发TypeError，也就是观察到的原因。

使用Python字典映射DataFrame列时发生TypeError

2 个答案: