我正在为pandas中的新列指定标量值:
df[col] = srs[some_index]
我有一个warnings.simplefilter(“错误”,UnicodeWarning)来捕捉本来会发出的警告(如果我理解发生了什么并且忽略了它,我可能会关闭这个刺激因素。)
这是我得到的追溯:
File "/my_virtualenv/lib/python2.7/site-packages/pandas/core/frame.py", line 2299, in __setitem__
self._set_item(key, value)
File "/my_virtualenv/lib/python2.7/site-packages/pandas/core/frame.py", line 2367, in _set_item
NDFrame._set_item(self, key, value)
File "/my_virtualenv/lib/python2.7/site-packages/pandas/core/generic.py", line 1208, in _set_item
self._data.set(key, value)
File "/my_virtualenv/lib/python2.7/site-packages/pandas/core/internals.py", line 3331, in set
loc = self.items.get_loc(item)
File "/my_virtualenv/lib/python2.7/site-packages/pandas/core/index.py", line 1759, in get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)
File "pandas/index.pyx", line 152, in pandas.index.IndexEngine.get_loc (pandas/index.c:3782)
File "pandas/index.pyx", line 178, in pandas.index.IndexEngine._get_loc_duplicates (pandas/index.c:4213)
File "pandas/index.pyx", line 195, in pandas.index.IndexEngine._maybe_get_bool_indexer (pandas/index.c:4469)
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
FWIW,df和srs中的数据都来自Excel工作表(使用pandas.read_excel()获取)。可能(因为它不会发生在所有电子表格中),某些地方的数据中存在非ascii,unicode字符。如果不确切地知道哪个数据正在被炸毁,我想让代码对这种情况保持稳健。
有什么建议吗?
编辑我尝试过的其他内容:
df[unicode(col)] = srs[unicode(some_index)]
LATER EDIT 我提供的解决方案不修复的其他表现形式(会产生大致相同的错误):
df.ix[df.my_col.astype(unicode).eq(""), "my_col"] = 0.0
(显然我需要一个非项目符号列表项目行,以便下面的代码示例正确显示。)
ipdb> type(df.my_col.astype(unicode).iloc[0])
<type 'unicode'>
ipdb> bytes(df.my_col.astype(unicode).iloc[0])
'50000'
'50000'
显然不是unicode对象,看起来像u'50000'
如果这确实是问题,是否有一个简单的解决方案,还是取决于xlrd的开发人员?
答案 0 :(得分:0)
这摆脱了错误,但解决方案感觉不完整,我想更好地理解真正的问题:
df[str(col)] = srs[some_index]