为什么我的代码没有产生“ UNIQUE约束失败:parent_reply.parent_id”错误,而我遵循的却没有?

时间:2019-05-02 12:49:06

标签: python sql database sqlite

我目前正在遵循sentdex's tutorial来使用Python和TensorFlow创建深度学习聊天机器人。它使用了一个月的reddit评论数据集和一个sqlite3数据库。

我的问题是,每当我尝试运行我的代码时,我都会得到:
UNIQUE constraint failed: parent_reply.parent_id

虽然我知道会产生错误的原因,但我找不到导致本教程中的原始代码无缝运行而我的代码无法正常工作的原因(我已经从头到尾遍历了我的代码及其代码,找不到任何显着差异)。

我已经尝试过将insert_parent_existsinsert_no_parent方法中的sql查询从INSERT INTO parent_reply更改为INSERT OR REPLACEINSERT OR IGNORE,但是它们都产生了一个数据库False作为父项的值(如适用)。我还注意到,在代码的最后一个else块中注释掉其中一种方法,可使代码运行时没有任何错误,但不会产生任何成对的注释(据我所知,如果只有其中之一函数运行,不会违反PRIMARY KEY,因此没有错误)。

下面,我提供了我的代码和senddex的代码(主脚本循环和用于数据库插入的方法)。

我的代码:

import json
import sqlite3
from datetime import datetime

path = '/Users/MateuszGrzybek/Desktop/DL-Chatbot/data/RC_2015-01'
db_transaction = []


def db_connect(conn, cursor):
    """Create all the necessary tables."""
    try:
        cursor.execute('DROP TABLE IF EXISTS parent_reply;')
        # create table
        print('Creating tables...')
        cursor.execute(
            """CREATE TABLE IF NOT EXISTS parent_reply (parent_id TEXT PRIMARY
            KEY, comment_id TEXT UNIQUE, parent TEXT, comment TEXT,
            subreddit TEXT, unix INT, score INT);""")
    except Exception as error:
        print(error)
    finally:
        if conn is not None:
            print('Table created.')


def replace_comment(parent_id, comment_id, parent_data, body,
                    subreddit, created_utc, score):
    """Replace a comment if it doesn't fit."""
    try:
        query = """UPDATE parent_reply SET parent_id = '{}', comment_id = '{}',
        parent = '{}', comment = '{}', subreddit = '{}', unix = {},
        score = {} WHERE parent_id = '{}';""".format(parent_id, comment_id,
                                                     parent_data, body,
                                                     subreddit,
                                                     int(created_utc), score,
                                                     parent_id)
        transaction_builder(query)
    except Exception as e:
        print(str(e))


def insert_parent_exists(parent_id, comment_id, parent_data, body, subreddit,
                         created_utc, score):
    try:
        query = """INSERT INTO parent_reply (parent_id, comment_id, 
        parent, comment, subreddit, unix, score) VALUES ('{}', '{}',
        '{}', '{}', '{}', {}, {});""".format(parent_id, comment_id,
                                             parent_data, body, subreddit,
                                             int(created_utc), score)
        transaction_builder(query)
    except Exception as e:
        print(str(e))


def insert_no_parent(parent_id, comment_id, body, subreddit,
                     created_utc, score):
    try:
        query = """INSERT INTO parent_reply (parent_id, comment_id,
        comment, subreddit, unix, score) VALUES ('{}', '{}', '{}', '{}',
        {}, {});""".format(parent_id, comment_id, body, subreddit, int(created_utc),
                           score)
        transaction_builder(query)
    except Exception as e:
        print(str(e))


def transaction_builder(query):
    """Build a database transaction"""
    global db_transaction
    db_transaction.append(query)
    if len(db_transaction) > 1000:
        cursor.execute('BEGIN TRANSACTION;')
        for query in db_transaction:
            try:
                cursor.execute(query)
            except Exception as e:
                print(str(e))
        conn.commit()
        db_transaction = []


if __name__ == "__main__":
    conn = sqlite3.connect('2015-01-1.db')
    cursor = conn.cursor()
    db_connect(conn, cursor)
    row_count = 0
    paired_rows = 0

    with open(path, buffering=1000) as f:
        for row in f:
            row_count += 1
            row = json.loads(row)
            body = format_body(row['body'])
            parent_id = row['parent_id']
            score = row['score']
            subreddit = row['subreddit']
            comment_id = row['name']
            created_utc = row['created_utc']
            parent_data = find_parent(parent_id)

            if score >= 2:
                existing_comment_score = find_existing_score(parent_id)
                if existing_comment_score:
                    if score > existing_comment_score:
                        if acceptable_comment(body):
                            replace_comment(parent_id, comment_id,
                                            parent_data, body, subreddit,
                                            created_utc, score)
                else:
                    if acceptable_comment(body):
                        if parent_data:
                            insert_parent_exists(parent_id, comment_id,
                                                 parent_data, body,
                                                 subreddit, created_utc, score)
                            paired_rows += 1
                        else:
                            insert_no_parent(parent_id, comment_id, body,
                                             subreddit, created_utc, score)

            if row_count % 100000 == 0:
                print('Total rows analyzed: {}\nPaired Rows: {}\nTime: {}'.
                      format(row_count, paired_rows, str(datetime.now())))

教程代码:

import sqlite3
import json
from datetime import datetime

timeframe = '2015-01'
sql_transaction = []
path = '/Users/MateuszGrzybek/Desktop/DL-Chatbot/data/RC_2015-01'
connection = sqlite3.connect('sent(1).db')
c = connection.cursor()


def create_table():
    c.execute('DROP TABLE IF EXISTS parent_reply;')
    c.execute(
        """CREATE TABLE IF NOT EXISTS parent_reply(parent_id TEXT PRIMARY KEY,
        comment_id TEXT UNIQUE, parent TEXT, comment TEXT, subreddit TEXT,
        unix INT, score INT)""")


def transaction_bldr(sql):
    global sql_transaction
    sql_transaction.append(sql)
    if len(sql_transaction) > 1000:
        c.execute('BEGIN TRANSACTION')
        for s in sql_transaction:
            try:
                c.execute(s)
            except:
                pass
        connection.commit()
        sql_transaction = []


def sql_insert_replace_comment(commentid, parentid, parent, comment, subreddit,
                               time, score):
    try:
        sql = """UPDATE parent_reply SET parent_id = ?, comment_id = ?,
        parent = ?, comment = ?, subreddit = ?, unix = ?, score = ?
        WHERE parent_id = ?;""".format(parentid, commentid, parent, comment,
                                       subreddit, int(time), score, parentid)
        transaction_bldr(sql)
    except Exception as e:
        print('s0 insertion', str(e))


def sql_insert_has_parent(commentid, parentid, parent, comment, subreddit,
                          time, score):
    try:
        sql = """INSERT INTO parent_reply (parent_id, comment_id, parent,
        comment, subreddit, unix, score) VALUES ("{}", "{}", "{}", "{}", "{}",
        {}, {});""".format(parentid, commentid, parent, comment, subreddit,
                           int(time), score)
        transaction_bldr(sql)
    except Exception as e:
        print('s0 insertion', str(e))


def sql_insert_no_parent(commentid, parentid, comment, subreddit, time, score):
    try:
        sql = """INSERT INTO parent_reply (parent_id, comment_id, comment,
        subreddit, unix, score) VALUES ("{}", "{}", "{}", "{}", {}, {});""".format(parentid, commentid, comment, subreddit, int(time), score)
        transaction_bldr(sql)
    except Exception as e:
        print('s0 insertion', str(e))


if __name__ == '__main__':
    create_table()
    row_counter = 0
    paired_rows = 0

    with open(path, buffering=1000) as f:
        for row in f:
            row_counter += 1
            row = json.loads(row)
            parent_id = row['parent_id']
            body = format_data(row['body'])
            created_utc = row['created_utc']
            score = row['score']
            comment_id = row['name']
            subreddit = row['subreddit']
            parent_data = find_parent(parent_id)
            if score >= 2:
                existing_comment_score = find_existing_score(parent_id)
                if existing_comment_score:
                    if score > existing_comment_score:
                        if acceptable(body):
                            sql_insert_replace_comment(comment_id, parent_id,
                                                       parent_data, body,
                                                       subreddit, created_utc,
                                                       score)

                else:
                    if acceptable(body):
                        if parent_data:
                            sql_insert_has_parent(comment_id, parent_id,
                                                  parent_data, body, subreddit,
                                                  created_utc, score)
                            paired_rows += 1
                        else:
                            sql_insert_no_parent(comment_id, parent_id, body,
                                                 subreddit, created_utc, score)

            if row_counter % 100000 == 0:
                print('Total Rows Read: {}, Paired Rows: {}, Time: {}'.format(
                    row_counter, paired_rows, str(datetime.now())))

我希望输出结果是最后一条打印语句的结果:

Total Rows Read: 100000
Paired Rows: 3718
Time: 2019-05-02 14:43:52.472389

不是我的代码由于某种原因产生的错误。

1 个答案:

答案 0 :(得分:0)

通过反复试验,我发现他在他的代码中确实做到了:

def transaction_bldr(sql):
    for s in sql_transaction:
        try:
            c.execute(s)
        except:
            pass

我在做的时候:

def transaction_builder(query):
    for query in db_transaction:
        try:
            cursor.execute(query)
        except Exception as e:
            print(str(e))

他只是传递了sql事务期间可能发生的任何异常,因此在运行他的代码时没有错误。将except块更改为pass会在我的代码中引发另一个错误:Incorrect number of bindings supplied. The current statement uses 8, and there are 0 supplied方法的replace_comment。我通过将替换字段从{}更改为?来解决了这个问题。我可以详细说明一下,为什么会这么重要。