来自unicode字符串的unicodecsv阅读器无法正常工作?

时间:2014-01-31 11:59:15

标签: python csv unicode

我在将unicode CSV字符串读入python-unicodescv时遇到问题:

>>> import unicodecsv, StringIO
>>> f = StringIO.StringIO(u'é,é')
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> row = r.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/guy/test/.env/lib/python2.7/site-packages/unicodecsv/__init__.py", line 101, in next
    row = self.reader.next()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)

我猜这是如何将我的unicode字符串转换为StringIO文件的问题? python-unicodecsv github页面上的示例工作正常:

>>> import unicodecsv
>>> from cStringIO import StringIO
>>> f = StringIO()
>>> w = unicodecsv.writer(f, encoding='utf-8')
>>> w.writerow((u'é', u'ñ'))
>>> f.seek(0)
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> row = r.next()
>>> print row[0], row[1]
é ñ

使用cStringIO尝试我的代码失败,因为cStringIO无法接受unicode(为什么示例有效,我不知道!)

>>> from cStringIO import StringIO
>>> f = StringIO(u'é')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)

我需要从web textarea表单字段接受UTF-8 CSV格式的输入,因此不能只从文件读入。

有什么想法吗?

1 个答案:

答案 0 :(得分:7)

unicodecsv文件为您读取和解码字节字符串。您正在传递unicode字符串。在输出时,使用配置的编解码器将您的unicode值编码为字节串。

此外,cStringIO.StringIO只能处理编码的字节串,而pure-python StringIO.StringIO类很乐意将unicode值视为字节字符串。

解决方案是在将unicode值放入StringIO对象之前编码

>>> import unicodecsv, StringIO, cStringIO
>>> f = StringIO.StringIO(u'é,é'.encode('utf8'))
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> next(r)
[u'\xe9', u'\xe9']
>>> f = cStringIO.StringIO(u'é,é'.encode('utf8'))
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> next(r)
[u'\xe9', u'\xe9']