我试图创建一个聊天机器人,每当我尝试运行我的代码时,我都会得到这个,
第107行,在 对于f中的行: 解码中的文件“ /Users/usr/anaconda3/lib/python3.6/encodings/ascii.py”,第26行 返回codecs.ascii_decode(input,self.errors)[0] UnicodeDecodeError:“ ascii”编解码器无法解码位置102的字节0xf8:序数不在范围(128)中
我要尝试添加,
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
但是,我被告知 setdefaultencoding 使用不安全,所以我从没尝试过。
下面是我的代码,
import sqlite3
import json
from datetime import datetime
timeframe = '2015-01'
sql_transaction = []
connection = sqlite3.connect('/Users/usr/Desktop/fileName/RC_{}'.format(timeframe))
c = connection.cursor()
def create_table():
c.execute("""CREATE TABLE IF NOT EXISTS parent_reply
(parent_id TEXT PRIMARY KEY, comment_id TEXT UNIQUE, parent TEXT,
comment TEXT, subreddit TEXT, unix INT, score INT)""")
def format_data(data):
data = data.replace("\n", " newlinechar ").replace("\r", " newlinechar ").replace('"', "'")
return data
def find_existing_score(pid):
try:
sql = "SELECT score FROM parent_reply WHERE parent_id = '{}' LIMIT 1".format(pid)
c.execute(sql)
result = c.fetchon()
if result != None:
return result[0]
else:
return False
except Exception as e:
# print("find_parent", e)
return False
if __name__ == "__main__":
create_table()
row_counter = 0
paired_rows = 0
with open("/Users/usr/Desktop/fileName/RC_{}".format(timeframe), buffering=1000) as f:
for row in f:
row_counter += 1
row = json.loads(row)
parent_id = row['parent_id']
body = format_data(row['body'])
created_utc = row['created_utc']
score = row['score']
subreddit = row['subreddit']
comment_id = row['name']
parent_data = find_parent(parent_id)
if score >= 2:
if acceptable(body):
existing_comment_score = find_existing_score(parent_id)
if existing_comment_score:
if score > existing_comment_score:
sql_insert_replace_comment(comment_id, parent_id, parent_data, body, subreddit, created_utc, score)
else:
if parent_data:
sql_insert_has_parent(comment_id, parent_id, parent_data, body, subreddit, created_utc, score)
paired_rows += 1
else:
sql_insert_no_parent(comment_id, parent_id, body, subreddit, created_utc, score)
if row_counter % 100000 == 0:
print("Total rows read: {}, Paired rows: {}, Time: {}".format(row_counter, paired_rows, str(datetime.now())))
RC_2015-01是从名为RC_2015-01.bz2的zip文件中提取的。我不确定这是否有问题。
答案 0 :(得分:0)
我解决了!
timeframe = '2015-01'
sql_transaction = []
connection = sqlite3.connect('/Users/usr/Desktop/fileName/RC_{}.db'.format(timeframe))
c = connection.cursor()
问题在于它将创建一个不是数据库的数据库,并且将与尝试解码的文件具有相同的名称。通过添加 .db ,文件将成为数据库并获得其他名称。