Question

我试图确保我的数据框Comment中名为headlamp的一列中存在任何字符串问题

原因是因为我尝试使用.to_excel()将数据帧导出为excel，并且始终会引发unicode错误。

我在网上阅读了很多资料，也在这里解决了这个问题，但到目前为止我无法管理它。我尝试使用encode()像下面的代码一样解决，但是，我仍然遇到同样的问题。

headlamp = part_dataframe(ro, 'PN 3D', '921')
headlamp['Comment'] = headlamp.Comment.apply(lambda x: x.encode('ascii', 
'ignore'))
headlamp['word'] = headlamp.Comment.str.split().apply(lambda x: 
pd.value_counts(x).to_dict())
len(headlamp)

错误：

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-57-29454fde650e> in <module>()
  1 headlamp = part_dataframe(ro, 'PN 3D', '921')
----> 2 headlamp['Comment'] = headlamp.Comment.apply(lambda x: 
x.encode('ascii', 'ignore'))
  3 headlamp['word'] = headlamp.Comment.str.split().apply(lambda x: 
  4 pd.value_counts(x).to_dict())
  5 len(headlamp)

C:\Users\Rafael\Anaconda2\envs\gl-env\lib\site-
packages\pandas\core\series.pyc in apply(self, func, convert_dtype, args, 
**kwds)
2218         else:
2219             values = self.asobject
-> 2220             mapped = lib.map_infer(values, f, convert=convert_dtype)
2221 
2222         if len(mapped) and isinstance(mapped[0], Series):

pandas\src\inference.pyx in pandas.lib.map_infer (pandas\lib.c:62658)()

<ipython-input-57-29454fde650e> in <lambda>(x)
  1 headlamp = part_dataframe(ro, 'PN 3D', '921')
----> 2 headlamp['Comment'] = headlamp.Comment.apply(lambda x: 
x.encode('ascii', 'ignore'))
  3 headlamp['word'] = headlamp.Comment.str.split().apply(lambda x: 
  4 pd.value_counts(x).to_dict())
  5 len(headlamp)

 UnicodeDecodeError: 'ascii' codec can't decode byte 0xb4 in position 71: 
 ordinal not in range(128)

我完全失去了这个问题，因此，任何帮助都将受到高度赞赏。

我正在使用Jupyter Ipython

Answer 1

0xb4是反引号的unicode字符：http://www.fileformat.info/info/unicode/char/00b4/index.htm

看起来你输入的输入中有非ascii字符。尝试将其编码为utf-8，看看是否有帮助。

如果你仍然需要在ascii中使用它，你可以试试这个解决方案：Convert a Unicode string to a string in Python (containing extra symbols)

UnicodeDecodeError：＆＃39; ascii＆＃39;编解码器，如何解决？

1 个答案: