我正在编写一个简单的应用程序,它将一个大文本文件拆分成较小的文件,我编写了两个版本,一个使用列表,一个使用生成器。我使用memory_profiler模块对这两个版本进行了分析,它清楚地显示了生成器版本的更好的内存效率,但是当使用生成器的版本进行分析时,奇怪的是,它增加了执行时间。下面的演示解释了我的意思
使用列表的版本
Enter
使用探查器运行时
from memory_profiler import profile
@profile()
def main():
file_name = input("Enter the full path of file you want to split into smaller inputFiles: ")
input_file = open(file_name).readlines()
num_lines_orig = len(input_file)
parts = int(input("Enter the number of parts you want to split in: "))
output_files = [(file_name + str(i)) for i in range(1, parts + 1)]
st = 0
p = int(num_lines_orig / parts)
ed = p
for i in range(parts-1):
with open(output_files[i], "w") as OF:
OF.writelines(input_file[st:ed])
st = ed
ed = st + p
with open(output_files[-1], "w") as OF:
OF.writelines(input_file[st:])
if __name__ == "__main__":
main()
在没有探查器的情况下运行
$ time py36 Splitting\ text\ files_BAD_usingLists.py
Enter the full path of file you want to split into smaller inputFiles: /apps/nttech/rbhanot/Downloads/test.txt
Enter the number of parts you want to split in: 3
Filename: Splitting text files_BAD_usingLists.py
Line # Mem usage Increment Line Contents
================================================
6 47.8 MiB 0.0 MiB @profile()
7 def main():
8 47.8 MiB 0.0 MiB file_name = input("Enter the full path of file you want to split into smaller inputFiles: ")
9 107.3 MiB 59.5 MiB input_file = open(file_name).readlines()
10 107.3 MiB 0.0 MiB num_lines_orig = len(input_file)
11 107.3 MiB 0.0 MiB parts = int(input("Enter the number of parts you want to split in: "))
12 107.3 MiB 0.0 MiB output_files = [(file_name + str(i)) for i in range(1, parts + 1)]
13 107.3 MiB 0.0 MiB st = 0
14 107.3 MiB 0.0 MiB p = int(num_lines_orig / parts)
15 107.3 MiB 0.0 MiB ed = p
16 108.1 MiB 0.7 MiB for i in range(parts-1):
17 107.6 MiB -0.5 MiB with open(output_files[i], "w") as OF:
18 108.1 MiB 0.5 MiB OF.writelines(input_file[st:ed])
19 108.1 MiB 0.0 MiB st = ed
20 108.1 MiB 0.0 MiB ed = st + p
21
22 108.1 MiB 0.0 MiB with open(output_files[-1], "w") as OF:
23 108.1 MiB 0.0 MiB OF.writelines(input_file[st:])
real 0m6.115s
user 0m0.764s
sys 0m0.052s
现在使用发电机的那个
$ time py36 Splitting\ text\ files_BAD_usingLists.py
Enter the full path of file you want to split into smaller inputFiles: /apps/nttech/rbhanot/Downloads/test.txt
Enter the number of parts you want to split in: 3
real 0m5.916s
user 0m0.696s
sys 0m0.080s
使用探查器选项
运行时@profile()
def main():
file_name = input("Enter the full path of file you want to split into smaller inputFiles: ")
input_file = open(file_name)
num_lines_orig = sum(1 for _ in input_file)
input_file.seek(0)
parts = int(input("Enter the number of parts you want to split in: "))
output_files = ((file_name + str(i)) for i in range(1, parts + 1))
st = 0
p = int(num_lines_orig / parts)
ed = p
for i in range(parts-1):
file = next(output_files)
with open(file, "w") as OF:
for _ in range(st, ed):
OF.writelines(input_file.readline())
st = ed
ed = st + p
if num_lines_orig - ed < p:
ed = st + (num_lines_orig - ed) + p
else:
ed = st + p
file = next(output_files)
with open(file, "w") as OF:
for _ in range(st, ed):
OF.writelines(input_file.readline())
if __name__ == "__main__":
main()
在没有探查器的情况下运行
$ time py36 -m memory_profiler Splitting\ text\ files_GOOD_usingGenerators.py
Enter the full path of file you want to split into smaller inputFiles: /apps/nttech/rbhanot/Downloads/test.txt
Enter the number of parts you want to split in: 3
Filename: Splitting text files_GOOD_usingGenerators.py
Line # Mem usage Increment Line Contents
================================================
4 47.988 MiB 0.000 MiB @profile()
5 def main():
6 47.988 MiB 0.000 MiB file_name = input("Enter the full path of file you want to split into smaller inputFiles: ")
7 47.988 MiB 0.000 MiB input_file = open(file_name)
8 47.988 MiB 0.000 MiB num_lines_orig = sum(1 for _ in input_file)
9 47.988 MiB 0.000 MiB input_file.seek(0)
10 47.988 MiB 0.000 MiB parts = int(input("Enter the number of parts you want to split in: "))
11 48.703 MiB 0.715 MiB output_files = ((file_name + str(i)) for i in range(1, parts + 1))
12 47.988 MiB -0.715 MiB st = 0
13 47.988 MiB 0.000 MiB p = int(num_lines_orig / parts)
14 47.988 MiB 0.000 MiB ed = p
15 48.703 MiB 0.715 MiB for i in range(parts-1):
16 48.703 MiB 0.000 MiB file = next(output_files)
17 48.703 MiB 0.000 MiB with open(file, "w") as OF:
18 48.703 MiB 0.000 MiB for _ in range(st, ed):
19 48.703 MiB 0.000 MiB OF.writelines(input_file.readline())
20
21 48.703 MiB 0.000 MiB st = ed
22 48.703 MiB 0.000 MiB ed = st + p
23 48.703 MiB 0.000 MiB if num_lines_orig - ed < p:
24 48.703 MiB 0.000 MiB ed = st + (num_lines_orig - ed) + p
25 else:
26 48.703 MiB 0.000 MiB ed = st + p
27
28 48.703 MiB 0.000 MiB file = next(output_files)
29 48.703 MiB 0.000 MiB with open(file, "w") as OF:
30 48.703 MiB 0.000 MiB for _ in range(st, ed):
31 48.703 MiB 0.000 MiB OF.writelines(input_file.readline())
real 1m48.071s
user 1m13.144s
sys 0m19.652s
那么为什么分析会让我的代码首先变慢?其次,如果在分析时影响执行速度,那么为什么这个效果没有显示在使用列表的代码版本上。
答案 0 :(得分:1)
我使用line_profiler cpu_profiled代码,这次我得到了答案,生成器版本需要更多时间的原因是因为下面的行
19 2 11126.0 5563.0 0.2 with open(file, "w") as OF:
20 379886 200418.0 0.5 3.0 for _ in range(st, ed):
21 379884 2348653.0 6.2 35.1 OF.writelines(input_file.readline())
为什么它不会减慢列表版本的速度是因为
19 2 9419.0 4709.5 0.4 with open(output_files[i], "w") as OF:
20 2 1654165.0 827082.5 65.1 OF.writelines(input_file[st:ed])
对于列表,只需通过对列表进行切片来获取列表的副本即可编写新文件,这实际上就是单个语句。但是对于生成器版本,通过逐行读取输入文件来填充新文件,这使得每一行的内存分析器配置文件相当于增加了cpu时间。