我还在学习Python,作为一个小项目我编写了一个脚本,它将我在文本文件中的值插入到sqlite3数据库中。但有些名字有奇怪的字母(我猜你会把它们称为非ASCII),并在它们出现时产生错误。这是我的小脚本(请告诉我,无论如何它可能更像Pythonic): import sqlite3
f = open('complete', 'r')
fList = f.readlines()
conn = sqlite3.connect('tpb')
cur = conn.cursor()
for i in fList:
exploaded = i.split('|')
eList = (
(exploaded[1], exploaded[5])
)
cur.execute('INSERT INTO magnets VALUES(?, ?)', eList)
conn.commit()
cur.close()
它会产生此错误:
Traceback (most recent call last):
File "C:\Users\Admin\Desktop\sortinghat.py", line 13, in <module>
cur.execute('INSERT INTO magnets VALUES(?, ?)', eList)
sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a te
xt_factory that can interpret 8-bit bytestrings (like text_factory = str). It is
highly recommended that you instead just switch your application to Unicode str
ings.
答案 0 :(得分:4)
要将文件内容转换为unicode,您需要根据其所在的编码进行解码
它看起来像你在Windows上,所以一个好的赌注是cp1252
如果您从其他地方获得该文件,则所有投注均已关闭。
一旦您对编码进行了排序,一种简单的解码方法就是使用codecs
模块,例如:
import codecs
# ...
with codecs.open('complete', encoding='cp1252') as fin: # or utf-8 or whatever
for line in fin:
to_insert = (line.split('|')[1], line.split('|')[5])
cur.execute('INSERT INTO magnets VALUES (?,?)', to_insert)
conn.commit()
# ...