Question

我有以下代码。我使用Python 2.7

import csv
import sqlite3

conn = sqlite3.connect('torrents.db')
c = conn.cursor()

# Create table
c.execute('''DROP TABLE torrents''')
c.execute('''CREATE TABLE IF NOT EXISTS torrents
             (name text, size long, info_hash text, downloads_count long, 
             category_id text, seeders long, leechers long)''')

with open('torrents_mini.csv', 'rb') as csvfile:
    spamreader = csv.reader(csvfile, delimiter='|')
    for row in spamreader:
        name = unicode(row[0])
        size = row[1]
        info_hash = unicode(row[2])
        downloads_count = row[3]
        category_id = unicode(row[4])
        seeders = row[5]
        leechers = row[6]
        c.execute('INSERT INTO torrents (name, size, info_hash, downloads_count, 
                   category_id, seeders, leechers) VALUES (?,?,?,?,?,?,?)',
                   (name, size, info_hash, downloads_count, category_id, seeders, leechers))

conn.commit()
conn.close()

我收到的错误消息是

Traceback (most recent call last):
  File "db.py", line 15, in <module>
    name = unicode(row[0])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 14: ordinal not in range(128)

如果我没有转换为unicode，那么我得到的错误是

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

添加name = row[0].decode('UTF-8')会给我带来另一个错误

Traceback (most recent call last):
  File "db.py", line 27, in <module>
    for row in spamreader:
_csv.Error: line contains NULL byte

csv文件中包含的数据采用以下格式

Tha Twilight New Moon DVDrip 2009 XviD-AMiABLE|694554360|2cae2fc76d110f35917d5d069282afd8335bc306|0|movies|0|1

编辑：我最终放弃了尝试并使用sqlite3命令行工具完成了任务（这很简单）。我还不知道导致错误的原因，但是当sqlite3导入了所说的csv文件时，它会不断弹出有关“未转义字符”的警告，该字符为引号（“）。

感谢所有试图提供帮助的人。

Answer 1

您的数据未编码为ASCII。为您的数据使用正确的编解码器。

您可以告诉Python使用哪种编解码器：

unicode(row[0], correct_codec)

或使用str.decode()方法：

row[0].decode(correct_codec)

正确的编解码器是什么，我们无法告诉你。你必须咨询你收到的任何文件。

如果你无法弄清楚使用了什么编码，你可以使用像chardet这样的包进行有根据的猜测，但要考虑到这样的库不是防故障的。

修复UnicodeDecodeError

1 个答案: