卡在TypeError中:无法比较类型'ndarray(dtype = object)'和'str'

时间:2019-04-09 07:55:28

标签: python python-3.x string pandas numpy

我将字段存储为df中的对象,并且尝试使用以下代码将低频值替换为“其他”:

cols = ['Keyword']

for col in cols:
    val = df_ch[col].value_counts()
    y = val[val < 10000].index

df_ch[col] = df_ch[col].replace({x:'other' for x in y})

但是我一直出现此错误:

TypeError: Cannot compare types 'ndarray(dtype=object)' and 'str'

我想念什么?

这是字段的外观:

df_ch['Keyword'].head(20)
Out[55]: 
0                 (not provie)
1                 (not provie)
2                    (not set)
3                    (not set)
4                 (not provie)
5                 (not provie)
6                    (not set)
7                    (not set)
8                     keyword1
9                 (not provie)
10                   (not set)
11                   (not set)
12                (not provie)
13                (not provie)
14                   (not set)
15                (not provie)
16                (not provie)
17                     display
18                (not provie)
19                (not provie)
Name: Keyword, dtype: object

1 个答案:

答案 0 :(得分:0)

IIUC,如果组名的数量少于特定数目,则要用Other替换组名。

在您的方法中,replace函数的使用是错误的。在这种情况下,您应该传递一个具有列值映射关系的字典作为输入:

df_ch.replace({'Keyword': {x:'other' for x in y}}, inplace=True)

这是没有循环的另一种方法

首先计算数字

x = df_ch['Keyword'].value_counts().reset_index()
#     index          0
#0  (not provie)    11
#1  (not set)        7
#2  keyword1         1
#3  display          1

然后将Other分配给计数低于特定阈值(此处为5)的组。

df_ch.loc[df_ch['Keyword'].isin(x['index'].loc[x['Keyword']<5]), 'Keyword']='Other'
df_ch['Keyword']


0   (not provie)
1   (not provie)
2   (not set)
3   (not set)
4   (not provie)
5   (not provie)
6   (not set)
7   (not set)
8   Other
9   (not provie)
10  (not set)
11  (not set)
12  (not provie)
13  (not provie)
14  (not set)
15  (not provie)
16  (not provie)
17  Other
18  (not provie)
19  (not provie)