如何修复“ UnicodeDecodeError:”?

时间:2019-05-28 00:27:45

标签: python json sqlite decode

我试图创建一个聊天机器人,每当我尝试运行我的代码时,我都会得到这个,

第107行,在     对于f中的行:   解码中的文件“ /Users/usr/anaconda3/lib/python3.6/encodings/ascii.py”,第26行     返回codecs.ascii_decode(input,self.errors)[0] UnicodeDecodeError:“ ascii”编解码器无法解码位置102的字节0xf8:序数不在范围(128)中

我要尝试添加,

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

但是,我被告知 setdefaultencoding 使用不安全,所以我从没尝试过。

下面是我的代码,

import sqlite3
import json
from datetime import datetime

timeframe = '2015-01'
sql_transaction = []

connection = sqlite3.connect('/Users/usr/Desktop/fileName/RC_{}'.format(timeframe))
c = connection.cursor()


def create_table():
    c.execute("""CREATE TABLE IF NOT EXISTS parent_reply
    (parent_id TEXT PRIMARY KEY, comment_id TEXT UNIQUE, parent TEXT,
     comment TEXT, subreddit TEXT, unix INT, score INT)""")


def format_data(data):
    data = data.replace("\n", " newlinechar ").replace("\r", " newlinechar ").replace('"', "'")
    return data


def find_existing_score(pid):
    try:
        sql = "SELECT score FROM parent_reply WHERE parent_id = '{}' LIMIT 1".format(pid)
        c.execute(sql)
        result = c.fetchon()
        if result != None:
            return result[0]
        else:
            return False
    except Exception as e:
        # print("find_parent", e)
        return False

if __name__ == "__main__":
    create_table()
    row_counter = 0
    paired_rows = 0


    with open("/Users/usr/Desktop/fileName/RC_{}".format(timeframe), buffering=1000) as f:
        for row in f:
            row_counter += 1
            row = json.loads(row)
            parent_id = row['parent_id']
            body = format_data(row['body'])
            created_utc = row['created_utc']
            score = row['score']
            subreddit = row['subreddit']
            comment_id = row['name']
            parent_data = find_parent(parent_id)

            if score >= 2:
                if acceptable(body):
                    existing_comment_score = find_existing_score(parent_id)
                    if existing_comment_score:
                        if score > existing_comment_score:
                            sql_insert_replace_comment(comment_id, parent_id, parent_data, body, subreddit, created_utc, score)

                    else:
                        if parent_data:
                            sql_insert_has_parent(comment_id, parent_id, parent_data, body, subreddit, created_utc, score)
                            paired_rows += 1
                        else:
                            sql_insert_no_parent(comment_id, parent_id, body, subreddit, created_utc, score)
            if row_counter % 100000 == 0:
                print("Total rows read: {}, Paired rows: {}, Time: {}".format(row_counter, paired_rows, str(datetime.now())))

RC_2015-01是从名为RC_2015-01.bz2的zip文件中提取的。我不确定这是否有问题。

1 个答案:

答案 0 :(得分:0)

我解决了!

timeframe = '2015-01'
sql_transaction = []

connection = sqlite3.connect('/Users/usr/Desktop/fileName/RC_{}.db'.format(timeframe))
c = connection.cursor()

问题在于它将创建一个不是数据库的数据库,并且将与尝试解码的文件具有相同的名称。通过添加 .db ,文件将成为数据库并获得其他名称。