我遇到了设计问题:有一个全局资源无法同时从多个线程访问,因此我需要锁定它来序列化对它的访问。但是,Python的垃圾收集器可以运行__del__
方法,而我在执行一些处理时保持锁定。如果析构函数试图访问资源,则最终会出现死锁。
作为一个例子,考虑以下无辜的单线程代码,如果你运行它会死锁:
import threading
class Handle(object):
def __init__(self):
self.handle = do_stuff("get")
def close(self):
h = self.handle
self.handle = None
if h is not None:
do_stuff("close %d" % h)
def __del__(self):
self.close()
_resource_lock = threading.Lock()
def do_stuff(what):
_resource_lock.acquire()
try:
# GC can be invoked here -> deadlock!
for j in range(20):
list()
return 1234
finally:
_resource_lock.release()
for j in range(1000):
xs = []
b = Handle()
xs.append(b)
xs.append(xs)
资源可以处理同时打开的几个“句柄”,我需要处理它们的生命周期。将其抽象为Handle
类并将清理放在__del__
中似乎是一个聪明的举动,但上述问题打破了这一点。
处理清理的一种方法是保留一个“挂起清理”句柄列表,如果在运行__del__
时保持锁定,则在那里插入句柄,稍后清理列表。
问题是:
是否有gc.disable()
/ gc.enable()
的线程安全版本可以更清晰地解决此问题?
其他想法如何处理?
答案 0 :(得分:1)
Python的垃圾收集器will not cleanup circular dependencies that have a "custom" __del__
method。
由于您已经有__del__
方法,所以您只需要一个循环依赖项来“禁用”这些对象的GC:
class Handle(object):
def __init__(self):
self.handle = do_stuff("get")
self._self = self
现在,这会造成内存泄漏,那么我们该如何解决这个问题?
准备好释放对象后,只需删除循环依赖项:
import threading
import gc
class Handle(object):
def __init__(self):
self.handle = do_stuff("get")
self._self = self
def close(self):
h = self.handle
self.handle = None
if h is not None:
do_stuff("close %d" % h)
def __del__(self):
self.close()
_resource_lock = threading.Lock()
def do_stuff(what):
_resource_lock.acquire()
try:
# GC can be invoked here -> deadlock!
for j in range(20):
list()
return 1234
finally:
_resource_lock.release()
for j in range(1000):
xs = []
b = Handle()
xs.append(b)
xs.append(xs)
# Make sure the GC is up to date
gc.collect()
print "Length after work", len(gc.garbage)
# These are kept along due to our circular depency
# If we remove them from garbage, they come back
del gc.garbage[:]
gc.collect()
print "Length now", len(gc.garbage)
# Let's break it
for handle in gc.garbage:
handle._self = None
# Now, our objects don't come back
del gc.garbage[:]
gc.collect()
print "Length after breaking circular dependencies", len(gc.garbage)
会这样做:
Length after work 999
Length now 999
Length after breaking circular dependencies 0
另一方面,为什么需要在清理代码中访问这个复杂的库,而你的执行却无法控制?
这里更清洁的解决方案可能是在循环中进行清理,并在清理后打破循环依赖关系,以便GC可以执行其操作。
这是一个实现:
import threading
import gc
class Handle(object):
def __init__(self):
self.handle = do_stuff("get")
self._self = self
def close(self):
h = self.handle
self.handle = None
if h is not None:
do_stuff("close %d" % h)
del self._self
def __del__(self):
# DO NOT TOUCH THIS
self._ = None
_resource_lock = threading.Lock()
def do_stuff(what):
_resource_lock.acquire()
try:
# GC can be invoked here -> deadlock!
for j in range(20):
list()
return 1234
finally:
_resource_lock.release()
for j in range(1000):
xs = []
b = Handle()
xs.append(b)
xs.append(xs)
# Make sure the GC is up to date
gc.collect()
print "Length after work", len(gc.garbage)
# These are kept along due to our circular depency
# If we remove them from garbage, they come back
del gc.garbage[:]
gc.collect()
print "Length now", len(gc.garbage)
# Let's break it
for handle in gc.garbage:
handle.close()
# Now, our objects don't come back
del gc.garbage[:]
gc.collect()
print "Length after breaking circular dependencies", len(gc.garbage)
输出显示我们的循环依赖确实阻止了收集:
Length after work 999
Length now 999
Length after breaking circular dependencies 0
答案 1 :(得分:0)
循环引用不是此问题的关键。您可能让对象a
和b
相互引用以形成一个圆圈,a.resource
指向一个带有c
的对象__del__
。收集a
和b
后(他们没有__del__
,因此收集它们是安全的),c
会自动收集,c.__del__
} 叫做。它可能发生在整个代码中,你无法控制它,所以它可能会造成死锁。
还有其他Python实现(例如PyPy),没有引用计数。使用这些解释器,GC始终会收集对象。
使用__del__
的唯一安全方法是在其中使用一些原子操作。锁不工作:它们要么死锁(threading.Lock
),要么永不工作(threading.RLock
)。由于附加到列表是Python中的原子操作,您可以将一些标志(或一些闭包)放到全局列表中,并检查其他线程中的列表以执行“真正的破坏”。
Python 3.7中引入的新GC模式可能会解决问题https://www.python.org/dev/peps/pep-0556/