我正在使用以下代码替换数据框列中的字符串值。这些值正在从字典中的值中替换。代码如下:
def ConvertColumnValuesToDict(df, colName):
uniqueValues = df[colName].unique()
colName2Num = {}
num2ColName = {}
count = 0
for value in uniqueValues:
colName2Num[str(value)] = str(count)
num2ColName[str(count)] = str(value)
count = count + 1
return colName2Num, num2ColName
artifactId2Num, num2ArtifactId = ConvertColumnValuesToDict(Sessions, 'artifact_id')
Sessions['artifact_id'] = Sessions['artifact_id'].astype(str)
Sessions['artifact_id'].replace(artifactId2Num, inplace=True)
最终的replace
代码会引发以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-08fcc844456d> in <module>()
1 # Sessions['artifact_id'] = Sessions['artifact_id'].astype(int)
----> 2 Sessions['artifact_id'].replace(artifactId2Num, inplace=True)
/home/prateek/anaconda2/lib/python2.7/site-packages/pandas/core/generic.pyc in replace(self, to_replace, value, inplace, limit, regex, method, axis)
3723
3724 return self.replace(to_replace, value, inplace=inplace,
-> 3725 limit=limit, regex=regex)
3726 else:
3727
/home/prateek/anaconda2/lib/python2.7/site-packages/pandas/core/generic.pyc in replace(self, to_replace, value, inplace, limit, regex, method, axis)
3772 dest_list=value,
3773 inplace=inplace,
-> 3774 regex=regex)
3775
3776 else: # [NA, ''] -> 0
/home/prateek/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in replace_list(self, src_list, dest_list, inplace, regex, mgr)
3257 return block, val
3258
-> 3259 masks = [comp(s) for i, s in enumerate(src_list)]
3260
3261 result_blocks = []
/home/prateek/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in comp(s)
3245 if isnull(s):
3246 return isnull(values)
-> 3247 return _maybe_compare(values, getattr(s, 'asm8', s), operator.eq)
3248
3249 def _cast_scalar(block, scalar):
/home/prateek/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in _maybe_compare(a, b, op)
4617 type_names[1] = 'ndarray(dtype=%s)' % b.dtype
4618
-> 4619 raise TypeError("Cannot compare types %r and %r" % tuple(type_names))
4620 return result
4621
TypeError: Cannot compare types 'ndarray(dtype=object)' and 'str'
注意:我在外部将字典中的键和值转换为代码中定义的函数中的str
。我尝试了另一种变体,其中我没有进行外部转换,并让这些值为int
。那是我收到以下错误的时候:
TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'int64'
这是我的样本数据集:
Session_id artifact_id
A 234
A 123
B 123
B 678
字典的内容如下:
{'1':'234','2':'123','3':'678'}
我希望最终数据集看起来像这样:
Session_id artifact_id
A 1
A 2
B 2
B 3
我该如何解决这个问题?