Question

我正在尝试处理存储在文本文件中的数据，该文件看起来像$options = array( 'http'=>array( 'method'=>"GET", 'header'=>"User-Agent: Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko\r\n" ) ); $context = stream_context_create($options); $str = file_get_contents($url, false, $context);：

test.dat

然而，该文件是几GB，我非常希望以小行块的形式阅读它。我想使用-1411.85 2.6888 -2.09945 -0.495947 0.835799 0.215353 0.695579 -1411.72 2.82683 -0.135555 0.928033 -0.196493 -0.183131 -0.865999 -1412.53 0.379297 -1.00048 -0.654541 -0.0906588 0.401206 0.44239 -1409.59 -0.0794765 -2.68794 -0.84847 0.931357 -0.31156 0.552622 -1401.63 -0.0235102 -1.05206 0.065747 -0.106863 -0.177157 -0.549252 .... .... numpy's函数，因为这会将所有内容快速转换为loadtxt。但是，到目前为止，我还没有能够管理这个功能似乎只提供像这样的列选择：

numpy array

任何想法如何实现这一目标？如果data = np.loadtxt("test.dat", delimiter=' ', skiprows=1, usecols=range(1,7))中没有loadtxt可用的任何其他选项，则无法使用<{1}}

Answer 1

如果您可以使用pandas，那会更容易：

In [2]: import pandas as pd

In [3]: df = pd.read_table('test.dat', delimiter='  ', skiprows=1, usecols=range(1,7), nrows=3, header=None)

In [4]: df.values
Out[4]:
array([[ 2.82683  , -0.135555 ,  0.928033 , -0.196493 , -0.183131 ,
        -0.865999 ],
       [ 0.379297 , -1.00048  , -0.654541 , -0.0906588,  0.401206 ,
         0.44239  ],
       [-0.0794765, -2.68794  , -0.84847  ,  0.931357 , -0.31156  ,
         0.552622 ]])

修改

如果您想阅读每k行，请指定chunksize。例如，

reader = pd.read_table('test.dat', delimiter=' ', usecols=range(1,7), header=None, chunksize=2) for chunk in reader: print(chunk.values)

输出：

[[ 2.6888 -2.09945 -0.495947 0.835799 0.215353 0.695579] [ 2.82683 -0.135555 0.928033 -0.196493 -0.183131 -0.865999]] [[ 0.379297 -1.00048 -0.654541 -0.0906588 0.401206 0.44239 ] [-0.0794765 -2.68794 -0.84847 0.931357 -0.31156 0.552622 ]] [[-0.0235102 -1.05206 0.065747 -0.106863 -0.177157 -0.549252 ]]

你必须按照自己的意愿处理如何将它们存储在for循环中。请注意，在这种情况下，reader是TextFileReader，而不是DataFrame，因此您可以懒惰地遍历它。

您可以阅读this了解详情。

Answer 2

hpaulj在评论中指出了正确的方向。

使用以下代码非常适合我：

import numpy as np
import itertools
with open('test.dat') as f_in:
    x = np.genfromtxt(itertools.islice(f_in, 1, 12, None), dtype=float)
    print x[0,:]

非常感谢！

如何只读取文本文件中的特定行？

2 个答案: