我所拥有的:字节文件,最大偏移量为16 GB(例如100字节)。 我需要什么:以最快的方式处理代码中的动作“ f”,例如我希望可以进行多处理。
我试图实现http://effbot.org/zone/wide-finder.htm这种方法。 那篇文章的多线程Python解决方案比原始代码慢两倍。我无法实现多处理器Python解决方案,因为我的python级别还不够好。我读了多处理模块描述,但对我没有帮助,我在代码方面遇到了一些问题...
from time import perf_counter
from random import getrandbits
def create_byte_data(size):
creation_start = perf_counter()
my_by = bytes(getrandbits(8) for i in range(size)) # creates 50MB random byte data
print('creation my_by time = %.1f' % (perf_counter() - creation_start))
return my_by
def write_to_file(file, data, b):
writing_start = perf_counter()
with open(file, "wb") as f: # binary file creation
for a in range(offset):
f.write(b'0')
# for n in range(b): # for creating bigger files
# f.write(data)
f.write(data)
print('writing time = %.1f' % (perf_counter() - writing_start))
def abs_pixel(pixel: bytes) -> int: # converting signed bytes to absolute (0 +127) values, and collection sum of them to "result"
result = 0
for a in pixel:
if a > 127:
result += 256 - a
else:
result += a
return result
def f(file, offset, time): # this function must be accelerated
sum_list = list()
with open(file, "rb") as f:
f.seek(offset)
while True:
chunk = f.read(time)
if not chunk:
break
sum_list.append(abs_pixel(chunk))
return sum_list
if __name__ == '__main__':
filename = 'bytes.file'
offset = 100
x = 512
y = 512
time = 200
fs = 2 # file size in GBytes # for creating bigger files
xyt = x * y * time
b = fs*1024*1024*1024//xyt # parameter for writing data file of size 'fs'
my_data = create_byte_data(xyt) # don't needed after created ones
write_to_file(filename, my_data, b) # don't needed after created ones
start = perf_counter()
result = f(filename, offset, time) # this function must be accelerated
print('function time = %.1f' % (perf_counter() - start))
print(result[:10])
任务:使用块(长度为“时间”)进行一些数学运算,并将结果收集到列表中。文件可能很大,因此RAM一定不能过载。 上面的代码可以创建随机字节文件(开始时为50 Mb,或在进一步测试时为更大)。与上面的代码相比,我预计运行功能“ f”至少要加速4倍。实际上,对于50 MB字节的文件,在我的电脑上大约需要6秒钟,而对于2 GB字节的文件,则需要大约240秒。
答案 0 :(得分:0)
我发现了如何并行化代码mith多处理的某些部分,以及如何使abs_pixel更快地工作。现在,代码的运行速度提高了2倍(例如,在我的PC上,每100 MB的运行速度为6,1s vs.11,9s,而对于2 GB的运行速度则为119s vs 246s)。
from multiprocessing import Pool
from struct import unpack_from
def abs_pixel_2(pixel: bytes) -> int: # integral abs
a = unpack_from('<%ib' % len(pixel), pixel)
return sum(map(abs, a))
def f_mp(file, offset, time): # read in lines
sum_list = list()
p = Pool()
with open(file, "rb") as f:
f.seek(offset)
for z in range(y*2): # (y*2) for 100 MByte file (2 times original 50 MBytes created data)
line = list()
for i in range(x): # read a line
pixel = f.read(time)
line.append(pixel)
sums_line = p.map(abs_pixel_2, line) # line with sums
# sums_line = list(map(abs_pixel, line)) # line with sums without using of Pool
sum_list.append(sums_line) # -> list of lines (lists)
sum_list = [item for sublist in sum_list for item in sublist] # flatten of list
p.close()
p.join()
return sum_list
但是我仍然希望找到更多的加速点。