实施例

Question

I'd like a unique dict (key/value) database to be accessible from multiple Python scripts running at the same time.

If script1.py updates d[2839], then script2.py should see the modified value when querying d[2839] a few seconds after.

I thought about using SQLite but it seems that concurrent write/read from multiple processes is not SQLite's strength (let's say script1.py has just modified d[2839], how would script2.py's SQLite connection know it has to reload this specific part of the database?)
I also thought about locking the file when I want to flush the modifications (but it's rather tricky to do), and use json.dump to serialize, then trying to detect the modifications, use json.load to reload if any modification, etc. ... oh no I'm reinventing the wheel, and reinventing a particularly inefficient key/value database!
redis looked like a solution but it does not officially support Windows, the same applies for leveldb.
multiple scripts might want to write at exactly the same time (even if this is a very rare event), is there a way to let the DB system handle this (thanks to a locking parameter? it seems that by default SQLite can't do this because "SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time.")

What would be a Pythonic solution for this?

Note: I'm on Windows, and the dict should have maximum 1M items (key and value both integers).

Answer 1

除了SQLite之外的嵌入式数据存储区的Mose并没有优化并发访问，我也对SQLite并发性能感到好奇，所以我做了一个基准测试：

ActivityOptions options = ActivityOptions.makeSceneTransitionAnimation(this);
Intent intent = new Intent(MainActivity.this, SecondActivitiy.class);
startActivity(intent, options.toBundle());

结果在我的4核macOS盒，SSD卷：

import time
import sqlite3
import os
import random
import sys
import multiprocessing


class Store():

    def __init__(self, filename='kv.db'):
        self.conn = sqlite3.connect(filename, timeout=60)
        self.conn.execute('pragma journal_mode=wal')
        self.conn.execute('create table if not exists "kv" (key integer primary key, value integer) without rowid')
        self.conn.commit()

    def get(self, key):
        item = self.conn.execute('select value from "kv" where key=?', (key,))
        if item:
            return next(item)[0]

    def set(self, key, value):
        self.conn.execute('replace into "kv" (key, value) values (?,?)', (key, value))
        self.conn.commit()


def worker(n):
    d = [random.randint(0, 1<<31) for _ in range(n)]
    s = Store()
    for i in d:
        s.set(i, i)
    random.shuffle(d)
    for i in d:
        s.get(i)


def test(c):
    n = 5000
    start = time.time()
    ps = []
    for _ in range(c):
        p = multiprocessing.Process(target=worker, args=(n,))
        p.start()
        ps.append(p)
    while any(p.is_alive() for p in ps):
        time.sleep(0.01)
    cost = time.time() - start
    print(f'{c:<10d}\t{cost:<7.2f}\t{n/cost:<20.2f}\t{n*c/cost:<14.2f}')


def main():
    print(f'concurrency\ttime(s)\tpre process TPS(r/s)\ttotal TPS(r/s)')
    for c in range(1, 9):
        test(c)


if __name__ == '__main__':
    main()

导致8核Windows Server 2012云服务器，SSD卷：

concurrency time(s) pre process TPS(r/s)    total TPS(r/s)
1           0.65    7638.43                 7638.43
2           1.30    3854.69                 7709.38
3           1.83    2729.32                 8187.97
4           2.43    2055.25                 8221.01
5           3.07    1629.35                 8146.74
6           3.87    1290.63                 7743.78
7           4.80    1041.73                 7292.13
8           5.37    931.27                  7450.15

无论并发性如何，整体吞吐量都是一致的，并且SQLite在Windows上比macOS慢，希望这有用。

由于SQLite写锁是数据库方式，为了获得更多TPS，您可以将数据分区为多数据库文件：

concurrency     time(s) pre process TPS(r/s)    total TPS(r/s)
1               4.12    1212.14                 1212.14
2               7.87    634.93                  1269.87
3               14.06   355.56                  1066.69
4               15.84   315.59                  1262.35
5               20.19   247.68                  1238.41
6               24.52   203.96                  1223.73
7               29.94   167.02                  1169.12
8               34.98   142.92                  1143.39

结果在我的Mac上有20个分区：

class MultiDBStore():

    def __init__(self, buckets=5):
        self.buckets = buckets
        self.conns = []
        for n in range(buckets):
            conn = sqlite3.connect(f'kv_{n}.db', timeout=60)
            conn.execute('pragma journal_mode=wal')
            conn.execute('create table if not exists "kv" (key integer primary key, value integer) without rowid')
            conn.commit()
            self.conns.append(conn)

    def _get_conn(self, key):
        assert isinstance(key, int)
        return self.conns[key % self.buckets]

    def get(self, key):
        item = self._get_conn(key).execute('select value from "kv" where key=?', (key,))
        if item:
            return next(item)[0]

    def set(self, key, value):
        conn = self._get_conn(key)
        conn.execute('replace into "kv" (key, value) values (?,?)', (key, value))
        conn.commit()

总TPS高于单个数据库文件。

Answer 2

在有redis之前，有Memcached（适用于Windows）。这是一个教程。 https://realpython.com/blog/python/python-memcache-efficient-caching/

Answer 3

我考虑2个选项，两个都是嵌入式数据库

SQlite的

已回答here和here，应该没问题

的BerkeleyDB

link

Berkeley DB（BDB）是一个软件库，旨在为键/值数据提供高性能的嵌入式数据库

它的设计完全符合您的目的

BDB可以支持数千个同步控制线程或并发进程操作数据库大小为256 terabytes，3适用于各种操作系统，包括大多数操作系统类Unix和Windows系统，以及实时操作系统。

它很强大，已经存在多年，如果不是几十年

提出redis / memcached /其他需要sysops参与的其他完整的基于套接字的服务器IMO是在同一个盒子上的两个脚本之间交换数据的任务的开销

Answer 4

您可以将python词典用于此目的。

创建一个名为G的泛型类或脚本，用于初始化其中的字典。 G将运行script1.py＆amp; script2.py并将字典传递给两个脚本文件，在python字典中默认通过引用传递。通过这种方式，将使用单个字典来存储数据，并且两个脚本都可以修改字典值，可以在两个脚本中看到更改。我希望script1.py和script2.py是基于类的。它不保证数据的持久性。对于持久性，您可以在x间隔后将数据存储在数据库中。

实施例

script1.py

class SCRIPT1:

    def __init__(self, dictionary):
        self.dictionary = dictionary
        self.dictionary.update({"a":"a"})
        print("SCRIPT1 : ", self.dictionary)

    def update(self):
        self.dictionary.update({"c":"c"})

script2.py

class SCRIPT2:
    def __init__(self, dictionary):
        self.dictionary = dictionary
        self.dictionary.update({"b":"b"})
        print("SCRIPT 2 : " , self.dictionary)

main_script.py

import script1
import script2

x = {}

obj1 = script1.SCRIPT1(x) # output: SCRIPT1 :  {'a': 'a'}
obj2 = script2.SCRIPT2(x) # output: SCRIPT 2 :  {'a': 'a', 'b': 'b'}
obj1.update()
print("SCRIPT 1 dict: ", obj1.dictionary) # output: SCRIPT 1 dict:  {'c': 'c', 'a': 'a', 'b': 'b'}

print("SCRIPT 2 dict: ", obj2.dictionary) # output: SCRIPT 2 dict:  {'c': 'c', 'a': 'a', 'b': 'b'}

还要在运行脚本的目录中创建一个空的_ init _.py文件。

另一种选择是：

Redis的

Answer 5

您可以使用基于文档的数据库管理器。对于您的系统而言可能过于沉重，但并发访问通常是数据库管理系统和API连接到它们的原因之一。

我使用Python的MongoDB，它工作正常。 Python API文档非常好，每个文档（数据库的元素）都是一个可以加载到python的字典。

Answer 6

我会使用pub / sub websocket-framework，比如Autobahn/Python，将一个脚本用作＆＃34;服务器＆＃34;它处理所有的文件通信，但它取决于规模，也许这可能是Overkill。

Answer 7

CodernintyDB可能值得探索，使用服务器版本。

http://labs.codernity.com/codernitydb/

服务器版本： http://labs.codernity.com/codernitydb/server.html

Answer 8

听起来你真的需要的是某种数据库。

如果redis不能用于windows，那么我会看一下MongoDB。

http://www.webslesson.info/2016/11/php-ajax-jquery-load-dynamic-content-in-bootstrap-popover.html

MongoDB在python中运行良好，可以像redis一样运行。以下是PyMongo的安装文档： https://docs.mongodb.com/manual/tutorial/install-mongodb-on-windows/

此外，很多人都提到了SQlite。我觉得你担心它一次只允许一个作家，但这不是你担心的问题。我认为它的含义是，如果有两个作者，第二个将被阻止，直到第一个完成。这可能适合你的情况。

Share a dict with multiple Python scripts

8 个答案:

SQlite的

的BerkeleyDB

实施例