错误:“期望属性名称用双引号引起来:第2行第1列(字符2)”

时间:2019-02-05 08:25:57

标签: python json python-3.x

所以我要让这个ChatBot经过一个月的reddit评论培训。我目前正在使用的脚本会创建一个数据库,并使用JSON文件中的一些数据加载该数据库。

当我运行代码时,实际上可以创建sqlite3 DB,但是会打印出错误:

1

能告诉我如何解决此问题的任何人?

顺便说一句,这是整个代码:

Expecting property name enclosed in double quotes: line 2 column 1 (char 2)
 Extra data: line 1 column 16 (char 15)
 Extra data: line 1 column 8 (char 7)
 Extra data: line 1 column 11 (char 10)
 Extra data: line 1 column 8 (char 7)
 Extra data: line 1 column 9 (char 8)
 Extra data: line 1 column 15 (char 14)
 Extra data: line 1 column 9 (char 8)
 Extra data: line 1 column 10 (char 9)
 Extra data: line 1 column 17 (char 16)
 Extra data: line 1 column 6 (char 5)
 Extra data: line 1 column 12 (char 11)
 Extra data: line 1 column 13 (char 12)
 Extra data: line 1 column 13 (char 12)
 Extra data: line 1 column 26 (char 25)
 Extra data: line 1 column 21 (char 20)
 Extra data: line 1 column 10 (char 9)
 Extra data: line 1 column 16 (char 15)
 Extra data: line 1 column 7 (char 6)
 Extra data: line 1 column 20 (char 19)
 Extra data: line 1 column 16 (char 15)
 Extra data: line 1 column 10 (char 9)
 Expecting value: line 1 column 1 (char 0)

还有JSON文件(它包含的注释比这更多,但是不想粘贴200.000行...):

import sqlite3
import json
from datetime import datetime
import time
import ast

timeframe = '2015-01'
sql_transaction = []
start_row = 0
cleanup = 1000000

connection = sqlite3.connect('{}.db'.format(timeframe))
c = connection.cursor()


def create_table():
    c.execute("CREATE TABLE IF NOT EXISTS parent_reply(parent_id TEXT PRIMARY KEY, comment_id TEXT UNIQUE, parent TEXT, comment TEXT, subreddit TEXT, unix INT, score INT)")


def format_data(data):
    data = data.replace('\n', ' newlinechar ').replace('\r', ' newlinechar ').replace('"', "'")
    return data


def transaction_bldr(sql):
    global sql_transaction
    sql_transaction.append(sql)
    if len(sql_transaction) > 1000:
        c.execute('BEGIN TRANSACTION')
        for s in sql_transaction:
            try:
                c.execute(s)
            except:
                pass
        connection.commit()
        sql_transaction = []


def sql_insert_replace_comment(commentid, parentid, parent, comment, subreddit, time, score):
    try:
        sql = """UPDATE parent_reply SET parent_id = ?, comment_id = ?, parent = ?, comment = ?, subreddit = ?, unix = ?, score = ? WHERE parent_id =?;""".format(
            parentid, commentid, parent, comment, subreddit, int(time), score, parentid)
        transaction_bldr(sql)
    except Exception as e:
        print('s0 insertion', str(e))


def sql_insert_has_parent(commentid, parentid, parent, comment, subreddit, time, score):
    try:
        sql = """INSERT INTO parent_reply (parent_id, comment_id, parent, comment, subreddit, unix, score) VALUES ("{}","{}","{}","{}","{}",{},{});""".format(
            parentid, commentid, parent, comment, subreddit, int(time), score)
        transaction_bldr(sql)
    except Exception as e:
        print('s0 insertion', str(e))


def sql_insert_no_parent(commentid, parentid, comment, subreddit, time, score):
    try:
        sql = """INSERT INTO parent_reply (parent_id, comment_id, comment, subreddit, unix, score) VALUES ("{}","{}","{}","{}",{},{});""".format(
            parentid, commentid, comment, subreddit, int(time), score)
        transaction_bldr(sql)
    except Exception as e:
        print('s0 insertion', str(e))


def acceptable(data):
    if len(data.split(' ')) > 1000 or len(data) < 1:
        return False
    elif len(data) > 32000:
        return False
    elif data == '[deleted]':
        return False
    elif data == '[removed]':
        return False
    else:
        return True


def find_parent(pid):
    try:
        sql = "SELECT comment FROM parent_reply WHERE comment_id = '{}' LIMIT 1".format(pid)
        c.execute(sql)
        result = c.fetchone()
        if result != None:
            return result[0]
        else:
            return False
    except Exception as e:
        # print(str(e))
        return False


def find_existing_score(pid):
    try:
        sql = "SELECT score FROM parent_reply WHERE parent_id = '{}' LIMIT 1".format(pid)
        c.execute(sql)
        result = c.fetchone()
        if result != None:
            return result[0]
        else:
            return False
    except Exception as e:
        # print(str(e))
        return False


if __name__ == '__main__':
    create_table()
    row_counter = 0
    paired_rows = 0

    with open(r'C:\Users\hermans\Desktop\RedditBot\RC_2015-01.json', buffering=1000) as f:
        for row in f:
            # print(row)
            # time.sleep(555)
            row_counter += 1

            if row_counter > start_row:
                try:
                    row = json.loads(row)
                    parent_id = row['parent_id'].split('_')[1]
                    body = format_data(row['body'])
                    created_utc = row['created_utc']
                    score = row['score']

                    comment_id = row['id']

                    subreddit = row['subreddit']
                    parent_data = find_parent(parent_id)

                    existing_comment_score = find_existing_score(parent_id)
                    if existing_comment_score:
                        if score > existing_comment_score:
                            if acceptable(body):
                                sql_insert_replace_comment(comment_id, parent_id, parent_data, body, subreddit, created_utc, score)

                    else:
                        if acceptable(body):
                            if parent_data:
                                if score >= 2:
                                    sql_insert_has_parent(comment_id, parent_id, parent_data, body, subreddit, created_utc, score)
                                    paired_rows += 1
                            else:
                                sql_insert_no_parent(comment_id, parent_id, body, subreddit, created_utc, score)
                except Exception as e:
                    print(str(e))

            if row_counter % 100000 == 0:
                print('Total Rows Read: {}, Paired Rows: {}, Time: {}'.format(row_counter, paired_rows, str(datetime.now())))

            #if row_counter > start_row:
            #    if row_counter % cleanup == 0:
            #        print("Cleanin up!")
            #        sql = "DELETE FROM parent_reply WHERE parent IS NULL"
            #        c.execute(sql)
            #        connection.commit()
            #        c.execute("VACUUM")
            #        connection.commit()

编辑: 我现在尝试尝试尝试:除:,但是现在遇到一个我不理解的新错误,实际上是在较早时遇到的:

{
    "score_hidden": false,
    "name": "t1_cnas8zv",
    "link_id": "t3_2qyr1a",
    "body": "Most of us have some family members like this. *Most* of my family is like this. ",
    "downs": 0,
    "created_utc": "1420070400",
    "score": 14,
    "author": "YoungModern",
    "distinguished": null,
    "id": "cnas8zv",
    "archived": false,
    "parent_id": "t3_2qyr1a",
    "subreddit": "exmormon",
    "author_flair_css_class": null,
    "author_flair_text": null,
    "gilded": 0,
    "retrieved_on": 1425124282,
    "ups": 14,
    "controversiality": 0,
    "subreddit_id": "t5_2r0gj",
    "edited": false
} {
    "distinguished": null,
    "id": "cnas8zw",
    "archived": false,
    "author": "RedCoatsForever",
    "score": 3,
    "created_utc": "1420070400",
    "downs": 0,
    "body": "But Mill's career was way better. Bentham is like, the Joseph Smith to Mill's Brigham Young.",
    "link_id": "t3_2qv6c6",
    "name": "t1_cnas8zw",
    "score_hidden": false,
    "controversiality": 0,
    "subreddit_id": "t5_2s4gt",
    "edited": false,
    "retrieved_on": 1425124282,
    "ups": 3,
    "author_flair_css_class": "on",
    "gilded": 0,
    "author_flair_text": "Ontario",
    "subreddit": "CanadaPolitics",
    "parent_id": "t1_cnas2b6"
}

1 个答案:

答案 0 :(得分:2)

  

还有JSON文件(它包含的注释比这更多,但是不想粘贴200.000行...):

您显示的内容不是有效的JSON。剪掉一堆数据线,我们看到了普遍的问题:

{
    "score_hidden": false,
} {
    "distinguished": null,
}

} {是因为您的数据一个接一个地包含多个JSON对象(如JSON标准所称),而不是将它们嵌套在另一层(可能是JSON数组,又是标准术语)中。它应该看起来像这样:

[
    {
        "score_hidden": false,
    }, {
        "distinguished": null,
    }
]

您遇到的错误是有关JSON解析器无法解释无效JSON(因为它无效)的详细信息。当您正确阅读错误消息时-通过查看异常回溯,这一点变得很清楚。但是,您编写的代码会阻止您执行此操作,方法是仅打印异常消息,然后继续进行,就好像没有发生任何不良情况一样:

try:
    row = json.loads(row)
    # lots more code not relevant to the reported error                    
except Exception as e:
    print(str(e))

不这样做。您只会让自己变得更难。解决代码问题的方法是一次编写更少的代码,然后确保其正常工作,然后再继续操作。这种异常处理是相反的,并导致在SO上发布大量与该问题无关的代码,因为您已经丢失了相关指南:)

如果您省略了该try / except块,则您的代码将在出现第一个错误时立即退出紧急状态,但是它将向您展示一些更有用的东西。它将指向row = json.loads(row)行,并将错误标记为json.decoder.JSONDecodeError,这是一个很大的提示。但更重要的是,在出现问题后仍能继续运行的代码,而又未真正尝试解决问题(或至少正确确定可以安全地忽略它),则有机会破坏数据进一步。从长远来看,这会给您带来很多痛苦和痛苦,所以这是我现在尝试使您摆脱习惯的方法:)