我尝试使用
阅读表格数据numpy.fromfile()
速度快,但它只读第一行。 怎么读整个表? 我不想使用pandas或numpy.loadtext()
np.fromfile('abc.txt', count=-1, sep=",")
答案 0 :(得分:0)
我可以读取一个以空格分隔的多行文件:
In [312]: cat mytest.txt
0.26 0.63 0.97 1.01 0.42
1.66 1.54 1.07 2.13 1.44
2.57 2.73 2.45 2.47 2.29
3.75 3.91 3.37 3.32 4.32
4.27 4.33 4.05 4.21 4.48
0.37 0.58 0.07 0.59 0.48
2.17 1.99 1.61 1.30 2.09
2.82 2.08 2.39 2.48 2.51
3.12 3.36 2.76 3.62 3.25
4.24 4.97 4.51 4.25 4.65
0.42 0.03 0.29 0.10 0.46
1.11 2.05 1.40 1.86 1.36
2.07 2.16 2.81 2.47 2.37
3.65 3.25 3.60 3.23 3.80
4.23 3.75 4.67 4.34 4.78
In [313]: np.fromfile('mytest.txt',count=-1,dtype=float,sep=' ')
Out[313]:
array([ 0.26, 0.63, 0.97, 1.01, 0.42, 1.66, 1.54, 1.07, 2.13,
1.44, 2.57, 2.73, 2.45, 2.47, 2.29, 3.75, 3.91, 3.37,
3.32, 4.32, 4.27, 4.33, 4.05, 4.21, 4.48, 0.37, 0.58,
0.07, 0.59, 0.48, 2.17, 1.99, 1.61, 1.3 , 2.09, 2.82,
2.08, 2.39, 2.48, 2.51, 3.12, 3.36, 2.76, 3.62, 3.25,
4.24, 4.97, 4.51, 4.25, 4.65, 0.42, 0.03, 0.29, 0.1 ,
0.46, 1.11, 2.05, 1.4 , 1.86, 1.36, 2.07, 2.16, 2.81,
2.47, 2.37, 3.65, 3.25, 3.6 , 3.23, 3.8 , 4.23, 3.75,
4.67, 4.34, 4.78])
换行被视为另一个空格。
但,
分隔文件不跨越线边界
In [315]: cat test.txt
-0.22424938, 0.16117005, -0.39249256
-0.22424938, 0.16050598, -0.39249256
-0.22424938, 0.15984190, -0.39249256
0.09214371, -0.26184322, -0.39249256
0.09214371, -0.26250729, -0.39249256
0.09214371, -0.26317136, -0.39249256
In [316]: np.fromfile('test.txt',count=-1,dtype=float,sep=',')
Out[316]: array([-0.22424938, 0.16117005, -0.39249256])
loadtxt
和genfromtxt
专为表格数据而设计。是的,它们很慢,逐行读取文件。但他们有更多的灵活性。 pandas
有一个更快的csv阅读器。
对该ws分隔文件进行速度测试:
In [319]: timeit np.loadtxt('mytest.txt')
1000 loops, best of 3: 623 µs per loop
In [320]: timeit np.fromfile('mytest.txt',count=-1,dtype=float,sep=' ')
The slowest run took 4.90 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 174 µs per loop