我有数据,它在一个文本文件中。每一行都是一个计算。该文件大约有 100 000 000 行。
首先我将所有内容加载到 ram 中,然后我有一个方法来执行计算并给出以下结果:
def process(data_line):
#do computation
return result
然后我用 2000 行的数据包这样调用它,然后将结果保存到磁盘:
POOL_SIZE = 15 #nbcore - 1
PACKET_SIZE = 2000
pool = Pool(processes=POOL_SIZE)
data_lines = util.load_data_lines(to_be_computed_filename)
number_of_packets = int(number_of_lines/ PACKET_SIZE)
for i in range(number_of_packets):
lines_packet = data_lines[:PACKET_SIZE]
data_lines = data_lines[PACKET_SIZE:]
results = pool.map(process, lines_packet)
save_computed_data_to_disk(to_be_computed_filename, results)
# process the last packet, which is smaller
results.extend(pool.map(process, data_lines))
save_computed_data_to_disk(to_be_computed_filename, results)
print("Done")
问题是,当我写入磁盘时,我的 CPU 没有计算任何东西并且有 8 个内核。它在看任务管理器,似乎损失了相当多的 CPU 时间。
我必须在完成计算后写入磁盘,因为结果比输入大 1000 倍。 无论如何,我将不得不在某个时候写入磁盘。时间不在这里浪费,以后就会消失。
我该怎么做才能让一个内核写入磁盘,同时仍然与其他内核一起计算?切换到 C?
按照这个速度,我可以在 75 小时内处理 1 亿行,但我有 120 亿行要处理,所以欢迎任何改进。
时间示例:
Processing packet 2/15 953 of C:/processing/drop_zone\to_be_processed_txt_files\t_to_compute_303620.txt
Launching task and waiting for it to finish...
Task completed, Continuing
Packet was processed in 11.534576654434204 seconds
We are currently going at a rate of 0.002306915330886841 sec/words
Which is 433.47928145051293 words per seconds
Saving in temporary file
Printing writing 5000 computed line to disk took 0.04400920867919922 seconds
saving word to resume from : 06 20 25 00 00
Estimated time for processing the remaining packets is : 51:19:25
答案 0 :(得分:2)
注意:此 SharedMemory 仅适用于 Python >= 3.8,因为它首次出现在那里
启动 3 种进程:Reader、Processor(s)、Writer。
让 Reader 进程以增量方式读取文件,通过 shared_memory
和队列共享结果。
让处理器消耗队列,消耗共享内存,并通过另一个队列返回结果。同样,作为 shared_memory。
让 Writer 进程使用第二个队列,写入目标文件。
让它们都通过一些 Event
或 DictProxy
与将充当协调器的 MainProcess 进行通信。
import time
import random
import hashlib
import multiprocessing as MP
from queue import Queue, Empty
# noinspection PyCompatibility
from multiprocessing.shared_memory import SharedMemory
from typing import Dict, List
def readerfunc(
shm_arr: List[SharedMemory], q_out: Queue, procr_ready: Dict[str, bool]
):
numshm = len(shm_arr)
for batch in range(1, 6):
print(f"Reading batch #{batch}")
for shm in shm_arr:
#### Simulated Reading ####
for j in range(0, shm.size):
shm.buf[j] = random.randint(0, 255)
#### ####
q_out.put((batch, shm))
# Need to sync here because we're reusing the same SharedMemory,
# so gotta wait until all processors are done before sending the
# next batch
while not q_out.empty() or not all(procr_ready.values()):
time.sleep(1.0)
def processorfunc(
q_in: Queue, q_out: Queue, suicide: type(MP.Event()), procr_ready: Dict[str, bool]
):
pname = MP.current_process().name
procr_ready[pname] = False
while True:
time.sleep(1.0)
procr_ready[pname] = True
if q_in.empty() and suicide.is_set():
break
try:
batch, shm = q_in.get_nowait()
except Empty:
continue
print(pname, "got batch", batch)
procr_ready[pname] = False
#### Simulated Processing ####
h = hashlib.blake2b(shm.buf, digest_size=4, person=b"processor")
time.sleep(random.uniform(5.0, 7.0))
#### ####
q_out.put((pname, h.hexdigest()))
def writerfunc(q_in: Queue, suicide: type(MP.Event())):
while True:
time.sleep(1.0)
if q_in.empty() and suicide.is_set():
break
try:
pname, digest = q_in.get_nowait()
except Empty:
continue
print("Writing", pname, digest)
#### Simulated Writing ####
time.sleep(random.uniform(3.0, 6.0))
#### ####
print("Writing", pname, digest, "done")
def main():
shm_arr = [
SharedMemory(create=True, size=1024)
for _ in range(0, 5)
]
q_read = MP.Queue()
q_write = MP.Queue()
procr_ready = MP.Manager().dict()
poison = MP.Event()
poison.clear()
reader = MP.Process(target=readerfunc, args=(shm_arr, q_read, procr_ready))
procrs = []
for n in range(0, 3):
p = MP.Process(
target=processorfunc, name=f"Proc{n}", args=(q_read, q_write, poison, procr_ready)
)
procrs.append(p)
writer = MP.Process(target=writerfunc, args=(q_write, poison))
reader.start()
[p.start() for p in procrs]
writer.start()
reader.join()
print("Reader has ended")
while not all(procr_ready.values()):
time.sleep(5.0)
poison.set()
[p.join() for p in procrs]
print("Processors have ended")
writer.join()
print("Writer has ended")
[shm.close() for shm in shm_arr]
[shm.unlink() for shm in shm_arr]
if __name__ == '__main__':
main()
答案 1 :(得分:0)
代码首先想到的是在线程中运行保存功能。这样我们就排除了等待磁盘写入的瓶颈。像这样:
executor = ThreadPoolExecutor(max_workers=2)
future = executor.submit(save_computed_data_to_disk, to_be_computed_filename, results)
saving_futures.append(future)
...
concurrent.futures.wait(saving_futures, return_when=ALL_COMPLETED) # wait all saved to disk after processing
print("Done")
答案 2 :(得分:0)
你说你有 8 个核心,但你有:
POOL_SIZE = 15 #nbcore - 1
假设您想保留一个处理器空闲(大概用于主进程?)为什么这个数字不是 7?但是你为什么要免费阅读处理器呢?您正在连续调用 map
。当主进程等待这些调用返回时,它需要知道 CPU。这就是为什么如果您在实例化池时没有指定池大小,它默认为您拥有的 CPU 数量而不是该数量减一。我将在下面对此进行更多说明。
由于您有一个非常大的内存列表,您是否有可能在循环中花费腰循环,在循环的每次迭代中重写此列表。相反,您可以只取列表的一部分并将其作为可迭代参数传递给 map
:
POOL_SIZE = 15 # ????
PACKET_SIZE = 2000
data_lines = util.load_data_lines(to_be_computed_filename)
number_of_packets, remainder = divmod(number_of_lines, PACKET_SIZE)
with Pool(processes=POOL_SIZE) as pool:
offset = 0
for i in range(number_of_packets):
results = pool.map(process, data_lines[offset:offset+PACKET_SIZE])
offset += PACKET_SIZE
save_computed_data_to_disk(to_be_computed_filename, results)
if remainder:
results = pool.map(process, data_lines[offset:offset+remainder])
save_computed_data_to_disk(to_be_computed_filename, results)
print("Done")
在每次调用 map
之间,主进程将结果写出到 to_be_computed_filename
。与此同时,池中的每个进程都处于空闲状态。这应该交给另一个进程(实际上是在主进程下运行的一个线程):
import multiprocessing
import queue
import threading
POOL_SIZE = 15 # ????
PACKET_SIZE = 2000
data_lines = util.load_data_lines(to_be_computed_filename)
number_of_packets, remainder = divmod(number_of_lines, PACKET_SIZE)
def save_data(q):
while True:
results = q.get()
if results is None:
return # signal to terminate
save_computed_data_to_disk(to_be_computed_filename, results)
q = queue.Queue()
t = threading.Thread(target=save_data, args=(q,))
t.start()
with Pool(processes=POOL_SIZE) as pool:
offset = 0
for i in range(number_of_packets):
results = pool.map(process, data_lines[offset:offset+PACKET_SIZE])
offset += PACKET_SIZE
q.put(results)
if remainder:
results = pool.map(process, data_lines[offset:offset+remainder])
q.put(results)
q.put(None)
t.join() # wait for thread to terminate
print("Done")
我选择在主进程的线程中运行 save_data
。这也可能是另一个过程,在这种情况下,您需要使用 multiprocessing.Queue
实例。但我认为主进程线程主要是在等待 map
完成并且不会有 GIL 的竞争。现在,如果您不为线程作业 save_data
留出空闲处理器,则它可能会在创建所有结果后才完成大部分保存。您需要对此进行一些试验。
理想情况下,我还会修改对输入文件的读取,以便不必先将其全部读入内存,而是逐行读取,产生 2000 行块并将它们作为作业提交给 map
过程:
import multiprocessing
import queue
import threading
POOL_SIZE = 15 # ????
PACKET_SIZE = 2000
def save_data(q):
while True:
results = q.get()
if results is None:
return # signal to terminate
save_computed_data_to_disk(to_be_computed_filename, results)
def read_data():
"""
yield lists of PACKET_SIZE
"""
lines = []
with open(some_file, 'r') as f:
for line in iter(f.readline(), ''):
lines.append(line)
if len(lines) == PACKET_SIZE:
yield lines
lines = []
if lines:
yield lines
q = queue.Queue()
t = threading.Thread(target=save_data, args=(q,))
t.start()
with Pool(processes=POOL_SIZE) as pool:
for l in read_data():
results = pool.map(process, l)
q.put(results)
q.put(None)
t.join() # wait for thread to terminate
print("Done")
答案 3 :(得分:0)
我做了两个假设:写入达到 I/O 限制,而不是 CPU 限制 - 这意味着将更多内核投入写入不会提高性能。而 apply plugin: 'com.android.application'
android {
compileSdkVersion 21
buildToolsVersion "21.1.2"
defaultConfig {
applicationId "com.numix.calculator_pro"
testApplicationId "com.numix.calculator_pro_pro.pro.tests"
testInstrumentationRunner "android.test.InstrumentationTestRunner"
}
lintOptions {
checkReleaseBuilds false
// Or, if you prefer, you can continue to check for errors in release builds,
// but continue the build even when errors are found:
abortOnError false
}
buildTypes {
release {
runProguard false
proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.txt'
}
}
}
dependencies {
implementation 'com.android.support:support-v4:21.0.3'
compile files('libs/achartengine.jar')
compile files('libs/ejml-0.21.jar')
compile files('libs/arity-2.1.6.jar')
compile files('libs/slider.jar')
compile files('libs/acra-4.5.0-sources.jar')
compile files('libs/acra-4.5.0-javadoc.jar')
}
函数包含一些繁重的计算。
我会以不同的方式处理它:
示例代码如下:
process