Question

当我尝试跑步时：

import csv

with open('data.csv', 'rU') as csvfile:
  reader = csv.DictReader(csvfile)
  for row in reader:
    pgd = Player.objects.get_or_create(
      player_name=row['Player'],
      team=row['Team'], 
      position=row['Position']
    )

我的大多数数据都是在数据库中创建的，除了一个特定的行。当我的脚本到达行时，我收到错误：

ProgrammingError: You must not use 8-bit bytestrings unless you use a
text_factory that can interpret 8-bit bytestrings (like text_factory = str). 
It is highly recommended that you instead just switch your application to Unicode strings.`

CSV中导致此错误的特定行是：

>>> row
{'FR\xed\x8aD\xed\x8aRIC.ST-DENIS', 'BOS', 'G'}

我已经查看过具有相同或类似问题的其他类似Stackoverflow线程，但大多数并不特定于将Sqlite与Django一起使用。有什么建议？

如果重要的话，我通过调用python manage.py shell进入Django shell运行脚本，然后将其复制粘贴，而不是从命令行调用脚本。

这是我得到的堆栈跟踪：

Traceback (most recent call last):
  File "<console>", line 4, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 108, in next
    row = self.reader.next()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 302, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc in position 1674: invalid continuation byte

编辑：我决定只是手动将此条目导入我的数据库，而不是尝试根据Alastair McCormack的反馈从我的CSV中读取该条目

根据您问题的输出，看起来像制作CSV的人一样 - 它似乎不代表FRÉDÉRIC.ST-DENIS。您可以尝试使用Windows-1252而不是utf-8，但我认为您最终会在数据库中使用FRíŠDíŠRIC.ST-DENIS。

Answer 1

我怀疑你正在使用Python 2 - open()返回str，它只是字节字符串。

错误告诉您在使用之前需要解码您的文本到Unicode字符串。

最简单的方法是解码每个单元格：

with open('data.csv', 'r') as csvfile: # 'U' means Universal line mode and is not necessary
  reader = csv.DictReader(csvfile)
  for row in reader:
    pgd = Player.objects.get_or_create(
      player_name=row['Player'].decode('utf-8),
      team=row['Team'].decode('utf-8), 
      position=row['Position'].decode('utf-8)
    )

这样做很有效，但它很难添加解码到处都有，它在Python 3中不起作用.Python 3通过在文本模式下打开文件并返回Python 3字符串来改进，这些字符串相当于Py2中的Unicode字符串。

要在Python 2中获得相同的功能，请使用io模块。这为您提供了open()方法，该方法具有encoding选项。令人讨厌的是，Python 2.x CSV模块已被Unicode打破，因此您需要安装一个反向移植版本：

pip install backports.csv

为了整理您的代码并将来证明它，请执行：

import io
from backports import csv 

with io.open('data.csv', 'r', encoding='utf-8') as csvfile:
  reader = csv.DictReader(csvfile)
  for row in reader:
    # now every row is automatically decoded from UTF-8
    pgd = Player.objects.get_or_create(
      player_name=row['Player'],
      team=row['Team'], 
      position=row['Position']
    )

Answer 2

使用播放器名称中的.encode('utf-8')在utf-8中对播放器名称进行编码 import csv

with open('data.csv', 'rU') as csvfile:
  reader = csv.DictReader(csvfile)
  for row in reader:
    pgd = Player.objects.get_or_create(
      player_name=row['Player'].encode('utf-8'),
      team=row['Team'], 
      position=row['Position']
    )

Answer 3

在Django中，使用latin-1 csv.DictReader(io.StringIO(csv_file.read().decode('latin-1')))进行解码，它将吞噬您在utf-8中获得的所有特殊字符和所有逗号异常。

从CSV读取时出现Django编码错误

3 个答案: