我有一个程序,每行读取3个字符串50000.然后它做其他事情。读取文件并转换为整数的部分占总运行时间的80%。
我的代码段如下:
import time
file = open ('E:/temp/edges_big.txt').readlines()
start_time = time.time()
for line in file[1:]:
label1, label2, edge = line.strip().split()
# label1 = int(label1); label2 = int(label2); edge = float(edge)
# Rest of the loop deleted
print ('processing file took ', time.time() - start_time, "seconds")
以上大约需要 0.84秒。现在,当我取消注释行
label1 = int(label1);label2 = int(label2);edge = float(edge)
运行时升至 3.42秒。
输入文件的格式为:每行str1 str2 str3
函数int()
和float()
是否会变慢?我怎么能优化这个?
答案 0 :(得分:3)
如果文件在OS缓存中,那么在我的机器上解析文件需要几毫秒:
name time ratio comment
read_read 145 usec 1.00 big.txt
read_readtxt 2.07 msec 14.29 big.txt
read_readlines 4.94 msec 34.11 big.txt
read_james_otigo 29.3 msec 201.88 big.txt
read_james_otigo_with_int_float 82.9 msec 571.70 big.txt
read_map_local 93.1 msec 642.23 big.txt
read_map 95.6 msec 659.57 big.txt
read_numpy_loadtxt 321 msec 2213.66 big.txt
read_*()
函数的位置:
def read_read(filename):
with open(filename, 'rb') as file:
data = file.read()
def read_readtxt(filename):
with open(filename, 'rU') as file:
text = file.read()
def read_readlines(filename):
with open(filename, 'rU') as file:
lines = file.readlines()
def read_james_otigo(filename):
file = open (filename).readlines()
for line in file[1:]:
label1, label2, edge = line.strip().split()
def read_james_otigo_with_int_float(filename):
file = open (filename).readlines()
for line in file[1:]:
label1, label2, edge = line.strip().split()
label1 = int(label1); label2 = int(label2); edge = float(edge)
def read_map(filename):
with open(filename) as file:
L = [(int(l1), int(l2), float(edge))
for line in file
for l1, l2, edge in [line.split()] if line.strip()]
def read_map_local(filename, _i=int, _f=float):
with open(filename) as file:
L = [(_i(l1), _i(l2), _f(edge))
for line in file
for l1, l2, edge in [line.split()] if line.strip()]
import numpy as np
def read_numpy_loadtxt(filename):
a = np.loadtxt('big.txt', dtype=[('label1', 'i'),
('label2', 'i'),
('edge', 'f')])
使用以下代码生成big.txt
#!/usr/bin/env python
import numpy as np
n = 50000
a = np.random.random_integers(low=0, high=1<<10, size=2*n).reshape(-1, 2)
np.savetxt('big.txt', np.c_[a, np.random.rand(n)], fmt='%i %i %s')
它产生50000行:
150 952 0.355243621018
582 98 0.227592557278
478 409 0.546382780254
46 879 0.177980983303
...
要重现结果,download the code并运行:
# write big.txt
python generate-file.py
# run benchmark
python read-array.py
答案 1 :(得分:3)
我能和你的时间几乎相同。我认为问题在于我的代码正在执行时间:
read_james_otigo 40 msec big.txt
read_james_otigo_with_int_float 116 msec big.txt
read_map 134 msec big.txt
read_map_local 131 msec big.txt
read_numpy_loadtxt 400 msec big.txt
read_read 488 usec big.txt
read_readlines 9.24 msec big.txt
read_readtxt 4.36 msec big.txt
name time ratio comment
read_read 488 usec 1.00 big.txt
read_readtxt 4.36 msec 8.95 big.txt
read_readlines 9.24 msec 18.95 big.txt
read_james_otigo 40 msec 82.13 big.txt
read_james_otigo_with_int_float 116 msec 238.64 big.txt
read_map_local 131 msec 268.05 big.txt
read_map 134 msec 274.87 big.txt
read_numpy_loadtxt 400 msec 819.42 big.txt
read_james_otigo 39.4 msec big.txt
read_readtxt 4.37 msec big.txt
read_readlines 9.21 msec big.txt
read_map_local 131 msec big.txt
read_james_otigo_with_int_float 116 msec big.txt
read_map 134 msec big.txt
read_read 487 usec big.txt
read_numpy_loadtxt 398 msec big.txt
name time ratio comment
read_read 487 usec 1.00 big.txt
read_readtxt 4.37 msec 8.96 big.txt
read_readlines 9.21 msec 18.90 big.txt
read_james_otigo 39.4 msec 80.81 big.txt
read_james_otigo_with_int_float 116 msec 238.51 big.txt
read_map_local 131 msec 268.84 big.txt
read_map 134 msec 275.11 big.txt
read_numpy_loadtxt 398 msec 816.71 big.txt
答案 2 :(得分:1)
我根本无法重现这一点。
我生成了一个50000行的文件,每行包含三个随机数(两个整数,一个浮点数),用空格分隔。
然后我在该文件上运行了您的脚本。原始脚本在我三岁的PC上以0.05秒结束,带有未注释行的脚本需要0.15秒才能完成。当然,将字符串转换为int / float转换需要更长的时间,但肯定不会达到几秒钟的规模。除非你的目标机器是运行嵌入式Windows CE的烤面包机。