Question

我正在开发一个应用程序，它将通过HTTP从多个地方收集数据，在本地缓存数据，然后通过HTTP提供服务。

所以我看着以下内容。我的应用程序将首先创建多个线程，这些线程将以指定的时间间隔收集数据，并将该数据本地缓存到SQLite数据库中。

然后在主线程中启动一个CherryPy应用程序，它将查询该SQLite数据库并提供数据。

我的问题是：如何从我的线程和CherryPy应用程序处理与SQLite数据库的连接？

如果我为数据库的每个线程建立连接，我是否也可以创建/使用内存数据库？

Answer 1

简短回答：不要在线程应用程序中使用Sqlite3。

Sqlite3数据库可以很好地扩展大小，但对于并发而言非常可靠。您将被“数据库锁定”错误所困扰。

如果这样做，则每个线程都需要连接，并且必须确保这些连接自行清理。传统上这是使用线程本地会话来处理的，并且使用SQLAlchemy的ScopedSession执行得相当好（例如）。如果你是我，我会使用它，即使你没有使用SQLAlchemy ORM功能。

Answer 2

您可以使用that之类的内容。

Answer 3

“...创建多个线程，这些线程将以指定的时间间隔收集数据，并将该数据本地缓存到sqlite数据库中。然后在主线程中启动一个CherryPy应用程序，它将查询该sqlite数据库并提供数据。“

不要在线程上浪费大量时间。您描述的内容只是操作系统进程。只需启动普通流程即可收集并运行Cherry Py。

对于单个进程，您没有真正使用并发线程。以简单的操作系统进程完成后，可以非常简单地安排以指定的时间间隔收集数据。例如，Cron在这方面做得很好。

CherryPy App也是一个操作系统进程，而不是某个更大进程的单个线程。

只是使用进程 - 线程对你没用。

Answer 4

根据应用程序，数据库可能是一个真正的开销。如果我们谈论易失性数据，也许您可以通过DB完全跳过通信，并通过IPC在数据收集过程和数据服务过程之间共享数据。如果必须保留数据，这不是一个选项。

Answer 5

根据数据速率，sqlite可能正是这样做的正确方法。每次写入都会锁定整个数据库，因此您不会扩展到每秒1000次同时写入。但是，如果你只有一些，这是确保你不会互相覆盖的最安全的方法。

Answer 6

正在进行此测试以确定从SQLite数据库写入和读取的最佳方法。我们遵循以下3种方法

没有任何线程读取和写入（使用正常一词的方法）
使用主题进行读写
使用流程进行读写

我们的样本数据集是一个虚拟生成的OHLC数据集，带有符号，时间戳和6个虚假值，用于ohlc和volumefrom，volumeto

读取

普通方法需要大约0.25秒才能阅读
线程方法需要10秒
处理需要0.25秒才能阅读

获胜者：处理和正常

写入

正常方法需要约1.5秒才能写入
螺纹方法大约需要30秒
处理大约需要30秒

获胜者：正常

注意：所有记录都不是使用线程和处理过的写入方法编写的。当写入排队时，线程和处理的写入方法显然会遇到数据库锁定错误 SQlite仅将写入排队到某个阈值，然后抛出sqlite3.OperationalError，表明数据库已被锁定。理想的方法是重新插入相同的块，但没有意义，因为即使没有重试，并行插入的方法执行也比顺序读取更多。锁定/失败的插入在没有重试的情况下，97％的行被写入并且仍然比连续写入花费多10倍的时间

外卖策略：

首选阅读SQLite并将其写在同一个帖子中
如果必须进行多线程处理，请使用多处理来读取具有或多或少相同性能的内容，并遵循单线程写入操作
不要使用线程进行读写，因为两者都慢了10倍，你可以感谢GIL

以下是完整测试的代码

import sqlite3
import time
import random
import string
import os
import timeit
from functools import wraps
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import threading
import os

database_file = os.path.realpath('../files/ohlc.db')

create_statement = 'CREATE TABLE IF NOT EXISTS database_threading_test (symbol TEXT, ts INTEGER, o REAL, h REAL, l REAL, c REAL, vf REAL, vt REAL, PRIMARY KEY(symbol, ts))'
insert_statement = 'INSERT INTO database_threading_test VALUES(?,?,?,?,?,?,?,?)'
select = 'SELECT * from database_threading_test'

def time_stuff(some_function):
    def wrapper(*args, **kwargs):
        t0 = timeit.default_timer()
        value = some_function(*args, **kwargs)
        print(timeit.default_timer() - t0, 'seconds')
        return value
    return wrapper

def generate_values(count=100):
    end = int(time.time()) - int(time.time()) % 900
    symbol = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(10))
    ts = list(range(end - count * 900, end, 900))
    for i in range(count):
        yield (symbol, ts[i], random.random() * 1000, random.random() * 1000, random.random() * 1000, random.random() * 1000, random.random() * 1e9, random.random() * 1e5)

def generate_values_list(symbols=1000,count=100):
    values = []
    for _ in range(symbols):
        values.extend(generate_values(count))
    return values

@time_stuff
def sqlite_normal_read():
    """

    100k records in the database, 1000 symbols, 100 rows
    First run
    0.25139795300037804 seconds
    Second run

    Third run
    """
    conn = sqlite3.connect(os.path.realpath('../files/ohlc.db'))
    try:
        with conn:
            conn.execute(create_statement)
            results = conn.execute(select).fetchall()
            print(len(results))
    except sqlite3.OperationalError as e:
        print(e)

@time_stuff
def sqlite_normal_write():
    """
    1000 symbols, 100 rows
    First run
    2.279409104000024 seconds
    Second run
    2.3364172020001206 seconds
    Third run
    """
    l = generate_values_list()
    conn = sqlite3.connect(os.path.realpath('../files/ohlc.db'))
    try:
        with conn:
            conn.execute(create_statement)
            conn.executemany(insert_statement, l)

    except sqlite3.OperationalError as e:
        print(e)

@time_stuff
def sequential_batch_read():
    """
    We read all the rows for each symbol one after the other in sequence
    First run
    3.661222331999852 seconds
    Second run
    2.2836898810001003 seconds
    Third run
    0.24514851899994028 seconds
    Fourth run
    0.24082150699996419 seconds
    """
    conn = sqlite3.connect(os.path.realpath('../files/ohlc.db'))
    try:
        with conn:
            conn.execute(create_statement)
            symbols = conn.execute("SELECT DISTINCT symbol FROM database_threading_test").fetchall()
            for symbol in symbols:
                results = conn.execute("SELECT * FROM database_threading_test WHERE symbol=?", symbol).fetchall()
    except sqlite3.OperationalError as e:
        print(e)  



def sqlite_threaded_read_task(symbol):
    results = []
    conn = sqlite3.connect(os.path.realpath('../files/ohlc.db'))
    try:
        with conn:
            results = conn.execute("SELECT * FROM database_threading_test WHERE symbol=?", symbol).fetchall()
    except sqlite3.OperationalError as e:
        print(e)
    finally:
        return results

def sqlite_multiprocessed_read_task(symbol):
    results = []
    conn = sqlite3.connect(os.path.realpath('../files/ohlc.db'))
    try:
        with conn:
            results = conn.execute("SELECT * FROM database_threading_test WHERE symbol=?", symbol).fetchall()
    except sqlite3.OperationalError as e:
        print(e)
    finally:
        return results

@time_stuff
def sqlite_threaded_read():
    """
    1000 symbols, 100 rows per symbol
    First run
    9.429676861000189 seconds
    Second run
    10.18928106400017 seconds
    Third run
    10.382290903000467 seconds
    """
    conn = sqlite3.connect(os.path.realpath('../files/ohlc.db'))
    symbols = conn.execute("SELECT DISTINCT SYMBOL from database_threading_test").fetchall()
    with ThreadPoolExecutor(max_workers=8) as e:
        results = e.map(sqlite_threaded_read_task, symbols, chunksize=50)
        for result in results:
            pass

@time_stuff
def sqlite_multiprocessed_read():
    """
    1000 symbols, 100 rows
    First run
    0.2484774920012569 seconds!!!
    Second run
    0.24322178500005975 seconds
    Third run
    0.2863524549993599 seconds
    """
    conn = sqlite3.connect(os.path.realpath('../files/ohlc.db'))
    symbols = conn.execute("SELECT DISTINCT SYMBOL from database_threading_test").fetchall()
    with ProcessPoolExecutor(max_workers=8) as e:
        results = e.map(sqlite_multiprocessed_read_task, symbols, chunksize=50)
        for result in results:
            pass

def sqlite_threaded_write_task(n):
    """
    We ignore the database locked errors here. Ideal case would be to retry but there is no point writing code for that if it takes longer than a sequential write even without database locke errors
    """
    conn = sqlite3.connect(os.path.realpath('../files/ohlc.db'))
    data = list(generate_values())
    try:
        with conn:
            conn.executemany("INSERT INTO database_threading_test VALUES(?,?,?,?,?,?,?,?)",data)
    except sqlite3.OperationalError as e:
        print("Database locked",e)
    finally:
        conn.close()
        return len(data)

def sqlite_multiprocessed_write_task(n):
    """
    We ignore the database locked errors here. Ideal case would be to retry but there is no point writing code for that if it takes longer than a sequential write even without database locke errors
    """
    conn = sqlite3.connect(os.path.realpath('../files/ohlc.db'))
    data = list(generate_values())
    try:
        with conn:
            conn.executemany("INSERT INTO database_threading_test VALUES(?,?,?,?,?,?,?,?)",data)
    except sqlite3.OperationalError as e:
        print("Database locked",e)
    finally:
        conn.close()
        return len(data)

@time_stuff
def sqlite_threaded_write():
    """

    Did not write all the results but the outcome with 97400 rows written is still this...
    Takes 20x the amount of time as a normal write
    1000 symbols, 100 rows
    First run
    28.17819765000013 seconds
    Second run
    25.557972323000058 seconds
    Third run
    """
    symbols = [i for i in range(1000)]
    with ThreadPoolExecutor(max_workers=8) as e:
        results = e.map(sqlite_threaded_write_task, symbols, chunksize=50)
        for result in results:
            pass

@time_stuff
def sqlite_multiprocessed_write():
    """
    1000 symbols, 100 rows
    First run
    30.09209805699993 seconds
    Second run
    27.502465319000066 seconds
    Third run
    """
    symbols = [i for i in range(1000)]
    with ProcessPoolExecutor(max_workers=8) as e:
        results = e.map(sqlite_multiprocessed_write_task, symbols, chunksize=50)
        for result in results:
            pass


sqlite_normal_write()

Python，SQLite和线程

6 个答案: