使用Pickle和filelock在多线程中加载和转储到文件 - IOError:[Errno 13]

时间:2017-07-06 11:49:40

标签: python multithreading pickle filelock

我有一个服务,使用 python 2.7 dict将数据从python cPickle加载和转储到文件中。许多用户可以同时调用此服务。

什么方法允许cPickle在多线程上下文中读取数据并将数据转储到单个文件中,以避免在操作期间出现数据失步(在另一个进程正在转储时加载)的问题?

我在考虑使用filelock,但我还没有成功。

根据我的代码,cPickle.load(cache_file)IOError: [Errno 13] Permission denied"init_cache()

中的文件始终出现update_cache()错误
''' example of a dict dumped by pickle

  { 
     "version": "1499180895", 
     "queries": { 
         "001::id,name,age" : "aBase64EncodedString==",
         "002::id,name,sex" : "anotherBase64EncodedString=="
      }
   }

'''


import cPickle as pickle
import filelock
from os import path

self.cache_file_path = "\\\\serverDisk\\cache\\cache.pkl"
self.select_by_values = "001"
self.out_fields = ["id", "name", "age"]

def get_from_cache_fn(self):
    try:
        server_version = self.query_version()
        query_id = "{}::{}".format(self.select_by_values, ",".join(self.out_fields))
        if path.isfile(self.cache_file_path):
            cache_dict = self.load_cache(server_version, query_id)
            if cache_dict["version"] == server_version:
                if query_id in cache_dict["queries"]:
                     return cache_dict["queries"][query_id]
                else:
                    return self.update_cache(cache_dict, query_id)["queries"][query_id]
            else:
                return self.init_cache(server_version, query_id)["queries"][query_id]
        else:
            return self.init_cache(server_version, query_id)["queries"][query_id]
    except Exception:
        self.add_service_error(ERRORS["get_from_cache"][0], traceback.format_exc())


def load_cache(self, server_version, query_id):
    with open(self.cache_file_path, "rb") as cache_file:
        try:
            cache_dict = pickle.load(cache_file)
            return cache_dict
        except StandardError:
            return self.init_cache(server_version, query_id)


def init_cache(self, server_version, query_id):
    cache_dict = {
        "version" : server_version,
        "queries" : {
            query_id : base64.b64encode(zlib.compress(json.dumps(self.query_features())))
        }
    }
    lock = filelock.FileLock(self.cache_file_path)
    try:
        with lock.acquire(timeout=10):
            with open(self.cache_file_path, "wb") as cache_file:
                pickle.dump(cache_dict, cache_file)
                return cache_dict
    except lock.Timeout:
        self.add_service_error("init_cache timeout", traceback.format_exc())


def update_cache(self, cache_dict, query_id):
    cache_dict["queries"][query_id] = base64.b64encode(zlib.compress(json.dumps(self.query_features())))
    lock = filelock.FileLock(self.cache_file_path)
    try:
        with lock.acquire(timeout = 10):
            with open(self.cache_file_path, "wb") as cache_file:
                pickle.dump(cache_dict, cache_file)
                return cache_dict
    except lock.Timeout:
        self.add_service_error("update_cache timeout", traceback.format_exc())

2 个答案:

答案 0 :(得分:1)

根据Filelock文档,您应该将lock.acquire包装在try{}except{}中。否则,当您的获取超时时,它可能会使您的应用程序因未处理的异常而崩溃。见https://pypi.python.org/pypi/filelock

答案 1 :(得分:1)

我找到了解决问题的方法。

您似乎必须提供与您正在打开的文件不同的锁定名称。

lock = filelock.FileLock("{}.lock".format(self.cache_file_path))代替lock = filelock.FileLock(self.cache_file_path)

例如:

def update_cache(self, cache_dict, query_id):
    cache_dict["queries"][query_id] = base64.b64encode(zlib.compress(json.dumps(self.query_features())))
    lock = lock = filelock.FileLock("{}.lock".format(self.cache_file_path))
    try:
        with lock.acquire(timeout = 10):
            with open(self.cache_file_path, "wb") as cache_file:
                pickle.dump(cache_dict, cache_file)
                return cache_dict
    except lock.Timeout:
        self.add_service_error("update_cache timeout", traceback.format_exc())