我正在开发一个主要是API的应用程序,但它还有一个多线程后台作业处理系统,用于执行预定作业以及即时API响应需要太长时间的临时作业。
这将通过gunicorn分叉10次。任何单个分叉进程都能够获取要运行的作业,因此在处理API请求的过程之间可以平衡作业处理。
我的挑战在于每个流程如何继续声称作业处理所需的最大内存量。有些工作需要1.5GB-2GB的内存。
如果有足够的时间,最终所有10个进程将不得不处理这些类型的工作,每个进程将紧紧抓住2GB以上的内存。即使进程的平均内存使用量很少超过100MB。
这些密集型作业只能通过流程中的专用线程运行。
是否有任何机制强制Python在线程关闭时释放专门为线程声明的内存?或者强制Python进程将内存重置为当时主动需要的任何通用机制?
旁注:我也在探索分叉而不是线程,但到目前为止,这引入了其他问题,我不确定我可以解决。
答案 0 :(得分:2)
如果您的API和工作进程/线程正在执行的操作没有具体示例,则难以提供特定答案。
Python是一种引用计数语言:当一个对象没有被任何其他对象引用时,它可以自由地被垃圾收集。可以强制垃圾收集器运行(参见https://docs.python.org/3/library/gc.html),但几乎总是最好让它做它的事情。
当你的工作线程退出时,线程中创建的任何对象都可能被垃圾收集;例外情况是放置在某些全局数据结构中的对象(但是你的用例听起来不像你要做的那样)。
答案 1 :(得分:0)
为了证明线程在作业完成后被销毁,您可以运行以下代码:
def job(o: dict):
count = 1
r = random.randrange(10, 20)
while count < r:
print(f"{o['name']}={count}/{r}")
count += 1
time.sleep(1)
print(f"{o['name']} finished.")
def run_thread(o: dict):
threading.Thread(target=job, args=(o,)).start()
if __name__ == '__main__':
obj1 = {"name": "A"}
run_thread(obj1)
obj2 = {"name": "B"}
run_thread(obj2)
while True:
time.sleep(1)
print(f"THREADS: {len(threading.enumerate())}")
输出将是这样的:
A=1/14
B=1/10
THREADS: 3
B=2/10
A=2/14
THREADS: 3
...
B finished.
A=10/14
A=11/14
THREADS: 2
A=12/14
THREADS: 2
A=13/14
THREADS: 2
A finished.
THREADS: 1
THREADS: 1
THREADS: 1
如您所见,每当线程结束时,总线程数就会减少。
更新:
确定。我希望这个脚本能满足你。
from typing import List
import random
import threading
import time
import os
import psutil
def get_mem_usage():
return PROCESS.memory_info().rss // 1024
def show_mem_usage():
global MAX_MEMORY
while True:
mem = get_mem_usage()
print(f"Currently used memory={mem} KB")
MAX_MEMORY = max(mem, MAX_MEMORY)
time.sleep(5)
def job(name: str):
print(f"{name} started.")
job_memory: List[int] = []
total_bit_length = 0
while command['stop_thread'] is False:
num = random.randrange(100000, 999999)
job_memory.append(num)
total_bit_length += int.bit_length(num)
time.sleep(0.0000001)
if len(job_memory) % 100000 == 0:
print(f"{name} Memory={total_bit_length//1024} KB")
print(f"{name} finished.")
def start_thread(name: str):
threading.Thread(target=job, args=(name,), daemon=True).start()
if __name__ == '__main__':
command = {'stop_thread': False}
STOP_THREAD = False
PROCESS = psutil.Process(os.getpid())
mem_before_threads = get_mem_usage()
MAX_MEMORY = 0
print(f"Starting memory={mem_before_threads} KB")
threading.Thread(target=show_mem_usage, daemon=True).start()
input("Press enter to START threads...\n")
for i in range(20):
start_thread("Job" + str(i + 1))
input("Press enter to STOP threads...\n")
print("Stopping threads...")
command['stop_thread'] = True
time.sleep(2) # give some time to stop threads
print("Threads stopped.")
mem_after_threads = get_mem_usage()
print(f"Memory before threads={mem_before_threads} KB")
print(f"Max Memory while threads running={MAX_MEMORY} KB")
print(f"Memory after threads stopped={mem_after_threads} KB")
input("Press enter to exit.")
这是输出:
Starting memory=12980 KB
Currently used memory=13020 KB
Press enter to START threads...
Job1 started.
Job2 started.
Job3 started.
Job4 started.
Job5 started.
Job6 started.
Job7 started.
Job8 started.
Job9 started.
Job10 started.
Job11 started.
Job12 started.
Job13 started.
Job14 started.
Job15 started.
Job16 started.
Job17 started.
Job18 started.
Job19 started.
Job20 started.
Press enter to STOP threads...
Currently used memory=16740 KB
Currently used memory=19764 KB
Currently used memory=22516 KB
Currently used memory=25420 KB
Currently used memory=28340 KB
Stopping threads...
Job12 finished.
Job20 finished.
Job11 finished.
Job7 finished.
Job18 finished.
Job2 finished.
Job4 finished.
Job19 finished.
Job16 finished.
Job10 finished.
Job1 finished.
Job9 finished.
Job6 finished.
Job13 finished.
Job15 finished.
Job17 finished.
Job3 finished.
Job5 finished.
Job8 finished.
Job14 finished.
Threads stopped.
Memory before threads=12980 KB
Max Memory while threads running=28340 KB
Memory after threads stopped=13384 KB
Press enter to exit.
Currently used memory=13388 KB
我真的不知道为什么有408 KB的差异购买它可能是使用15 MB内存的开销。