My file looks as follows, where the first 3 numbers per line denote a triangle/triplet of things, and the 4th number is a marker for each triangle:
1 2 3 1
5 6 7 0
300 10 11 5
0 14 15 9
I currently read it as follows:
import numpy as np
file = open(fname, 'r')
lines = [x for x in file.readlines() if not x.startswith('#')]
n = ... # number of lines to read
tri = np.empty([n, 3], dtype=int) # array of triplets
tri_mark = np.empty([n], dtype=int) # a marker for each triplet
for i in range(n):
s = lines[i].split()
tri[i, :] = [int(v) for v in s[ : -1]]
tri_mark[i] = int(s[-1])
When the number of lines goes into the millions, it turns out, that the for loop is an incredible bottleneck. I observe that an external program that I also use can read the file very quick, so I think it should be possible to read and convert much faster.
Is there a way to faster convert the list of strings into an ndarray
?
(Switching to a binary file is currently not an option.)
答案 0 :(得分:3)
Use np.loadtxt
to read in the whole file:
>>> import numpy as np
>>> arr = np.loadtxt(fname, dtype=int)
>>> arr
array([[ 1, 2, 3, 1],
[ 5, 6, 7, 0],
[300, 10, 11, 5],
[ 0, 14, 15, 9]])
and then slicing to get the appropriate subarrays:
>>> tri = arr[:, 0:3]
>>> tri
array([[ 1, 2, 3],
[ 5, 6, 7],
[300, 10, 11],
[ 0, 14, 15]])
>>> tri_mark = arr[:, 3]
>>> tri_mark
array([1, 0, 5, 9])