Question

我要替换的数据示例

数据具有以下属性

购买v-high，high，med，low Maint v高，高，中，低门2,3,4,5-更多人数2,4-更多 lug_boot小，中，大安全低，中高

这是我所做的

enter code here
#Buying price generalization 
df["Buying_Price"]=df["Buying_Price"].replace({"vhigh":4})
df["Buying_Price"]=df["Buying_Price"].replace({"high":3})
df["Buying_Price"]=df["Buying_Price"].replace({"med":2})
df["Buying_Price"]=df["Buying_Price"].replace({"low":1})

#Maintanace generalization 
df["Maintanance_price"]=df["Maintanance_price"].replace({"vhigh":4}) 
df["Maintanance_price"]=df["Maintanance_price"].replace({"high":3})   
df["Maintanance_price"]=df["Maintanance_price"].replace({"med":2})
df["Maintanance_price"]=df["Maintanance_price"].replace({"low":1})

#lug_boot generalization 
df["Lug_boot"]=df["Lug_boot"].replace({"small":1})
df["Lug_boot"]=df["Lug_boot"].replace({"med":2})
df["Lug_boot"]=df["Lug_boot"].replace({"big":3})

#Safety Generalization 
df["Safety"]=df["Safety"].replace({"low":1})
df["Safety"]=df["Safety"].replace({"med":2})
df["Safety"]=df["Safety"].replace({"big":3})

print(df.head())

在打印时显示“无法比较类型'ndarray（dtype = int64）'和'str'”

Answer 1

您传递的某个string值中有一个int值，实际上是ndarray个int64值。您在此列中只有int64( here actually ndarray(dtype=int64))类型的数据。请参阅文档pandas.Dataframe.replace()。 replace()尝试将它们与您传递的str值进行比较。

df["Buying_Price"]=df["Buying_Price"].replace({"vhigh":4})

找到所有"vhigh"值，然后与当前包含的值进行比较，将其替换为4。在进行比较时，由于尝试将str数据与int64 ('ndarray(dtype=int64)')

进行比较而失败

一个简单的例子来模拟这一点：

import pandas as pd
import numpy as np

a = np.array([1])
df = pd.DataFrame({"Maintanance_price": a})
df["Maintanance_price"] = df["Maintanance_price"].replace({"a":1})

print(df)

退出：

TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'str'

Answer 2

我面临着同样的问题，对我有用的是将特征的数据类型转换为对象类型。

   train['Some_feature']=train.Some_feature.astype(object)

希望有帮助。

Answer 3

您可以尝试以下代码：

df['Maintanance_price'].replace(to_replace = ['low', 'med','high','vhigh'], value =[1,2,3,4], inplace=True)
df.head()

此外，根据@ouiemboughrra的建议，以防重新运行单元格时检查值是否已转换为数字。

Answer 4

我遇到了这个帖子，因为我刚刚遇到了同样的问题。我试图将 lug_boot 列中的值从字符串 'small'、'med' 和 'big' 映射到整数 1、2 和 3。

这是我从 url 读取数据并将其存储在数据帧中的方法。请注意，网址中的文件格式为 csv。此外，我使用了一个函数来提取数据集的第一行，并将其作为列标题传递给 'names' 参数。

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data',
             header=None,
             names=[list of column names])

当我检查数据及其子集时，我意识到某些列包含 NaN。 NaN 被视为浮点值。所以，我意识到 ndarray(dtype=float64) 可能引用这些 NaN 值的部分。

我的解决方案是调整我的 read_csv，我添加了“dtype=str”作为最后一个参数。这是更新的版本：

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data',
             header=None,
             names=[list of column names], dtype = str)

然后我运行了我的映射函数并且它起作用了。这是最初不起作用的代码，但在将 dtype = str 添加到我的 read_csv 函数后起作用了。

mapping = {'small': 1, 'med':2, 'big': 3}
df.replace({'lug_boot': mapping})

请注意，在您的数据类型中将所有内容作为字符串导入可能并不总是有效。您可能想寻求其他解决方案。例如，您可能想先处理 NaN 或缺失值。

Answer 5

有一个类似的问题，但没有一个答案（到目前为止）对我有用 - 我找到的解决方案是将列转换为 'str' not type ' 对象'。

df['col'] = df['col'].astype('str')

Answer 6

你必须从两边删除空格然后进行替换

 #remove white space at both ends:
#'buying', 'maint','doors','persons','lug_boot','safety','class'
df.buying=df.buying.astype(str).str.strip()
df.maint=df.maint.astype(str).str.strip()
df.persons=df.persons.astype(str).str.strip()
df.lug_boot=df.lug_boot.astype(str).str.strip()
df.safety=df.safety.astype(str).str.strip()
df.class_account=df.class_account.astype(str).str.strip()

Answer 7

此错误的原因是由于您执行单元的次数。第一次执行单元格时，将发生转换。此时，如果您尝试重新运行同一单元格，则PhysicalEvidence和contact的数据类型现在是int而不是字符串。因此，当replace方法运行时，它会查找字符串，但找不到任何东西，因为所有值都已被整数替换。

如果发生这种情况，您只需要再次重新加载数据帧（df）并重新运行单元一次即可。

无法比较类型'ndarray（dtype = int64）'和'str'

7 个答案: