Question

我想填充一个numpy数组，其中包含一些来自文件的浮点值。数据将按如下方式存储：

第一行给出第一个和最后一个索引，在下面的行中给出实际数据。我目前的方法是拆分每个数据行，在每个部分上使用float，并将值存储在预先分配的数组中，逐个切片。我现在就是这样做的：

data_file ='data.txt'
# Non needed stuff at the beginning
skip_lines = 0

with open(data_file, 'r') as f:
    # Skip any lines if needed
    for _ in range(skip_lines):
        f.readline()
    # Get the data size and preallocate the numpy array
    first, last = map(int, f.readline().split())
    size = last - first + 1
    data = np.zeros(size)

    beg, end = (-1, 0) # Keep track of where to fill the array
    for line in f:
        if end - 1 == last:
            break
        samples = line.split()
        beg = end
        end += len(samples)
        data[beg:end] = [float(s) for s in samples]

Python中是否有一种方法可以逐个读取数据值？

import numpy as np
f = open('data.txt', 'r')
first, last = map(int, f.readline().split())
arr = np.zeros(last - first + 1)
for k in range(last - first + 1):
    data = f.read() # This does not work. Any idea?
    # In C++, it could be done this way: double data; cin >> data
    arr[k] = data

编辑唯一可以肯定的是，前两个数字是第一个和最后一个索引，最后一个数据行只有最后一个数字。数据编号后可能还有还其他内容。因此，人们不能在＆＃34;第一个，最后一个＆＃34;之后读取所有行。行。

EDIT 2 添加（工作）初始方法（拆分每个数据行，在每个部分上使用float，并将值存储在预先分配的数组中，逐个切片）实现

Answer 1

由于您的示例在每行中的列数相同（第一行除外），因此我们可以将其读作csv，例如使用loadtxt：

In [1]: cat stack43307063.txt
0 11
5 6.2 4 6
2 5 3.2 6
7 1.4 5 11
In [2]: arr = np.loadtxt('stack43307063.txt', skiprows=1)
In [3]: arr
Out[3]: 
array([[  5. ,   6.2,   4. ,   6. ],
       [  2. ,   5. ,   3.2,   6. ],
       [  7. ,   1.4,   5. ,  11. ]])

这很容易重塑和操纵。如果列不一致，那么我们需要逐行工作。

In [9]: alist = []
In [10]: with open('stack43307063.txt') as f:
    ...:     start, stop = [int(i) for i in f.readline().split()]
    ...:     print(start, stop)
    ...:     for line in f: # f.readline()
    ...:         print(line.split())
    ...:         alist.append([float(i) for i in line.split()])
    ...:         
0 11
['5', '6.2', '4', '6']
['2', '5', '3.2', '6']
['7', '1.4', '5', '11']
In [11]: alist
Out[11]: [[5.0, 6.2, 4.0, 6.0], [2.0, 5.0, 3.2, 6.0], [7.0, 1.4, 5.0, 11.0]]

将append替换为extend以收集平面列表中的值：

alist.extend([float(i) for i in line.split()])
[5.0, 6.2, 4.0, 6.0, 2.0, 5.0, 3.2, 6.0, 7.0, 1.4, 5.0, 11.0]

c++ io通常使用流。 Python可以实现流式传输，但文本文件更常用于逐行读取。

In [15]: lines = open('stack43307063.txt').readlines()
In [16]: lines
Out[16]: ['0 11\n', '5 6.2 4 6\n', '2 5 3.2 6\n', '7 1.4 5 11\n']

可以按上述方式处理的行列表。

也可以使用

fromfile，除了它丢失原始中的任何行/列结构：

In [20]: np.fromfile('stack43307063.txt',sep=' ')
Out[20]: 
array([  0. ,  11. ,   5. ,   6.2,   4. ,   6. ,   2. ,   5. ,   3.2,
         6. ,   7. ,   1.4,   5. ,  11. ])

此加载包括第一行。我们可以用开放的readline跳过它。

In [21]: with open('stack43307063.txt') as f:
    ...:     start, stop = [int(i) for i in f.readline().split()]
    ...:     print(start, stop)
    ...:     arr = np.fromfile(f, sep=' ')        
0 11
In [22]: arr
Out[22]: 
array([  5. ,   6.2,   4. ,   6. ,   2. ,   5. ,   3.2,   6. ,   7. ,
         1.4,   5. ,  11. ])

fromfile也会使用count参数，可以从start和stop设置。但除非您只是想读取子集，否则不需要它。

Answer 2

仅假设前两个数字代表后面数字所需值的索引。不同数量的数字可以出现在第一行或后续行中。不会读取last以外的令牌。

from io import StringIO
sample = StringIO('''3 11 5\n 6.2 4\n6 2 5 3.2 6 7\n1.4 5 11''')
from shlex import shlex
lexer = shlex(instream=sample, posix=False)
lexer.wordchars = r'0123456789.'
lexer.whitespace = ' \n'
lexer.whitespace_split = True

def oneToken():
    while True:
        token = lexer.get_token()
        if token:
            token = token.strip()
            if not token:
                return
        else:
            return
        token = token.replace('\n', '')
        yield token

tokens = oneToken()

first = int(next(tokens))
print (first)

last = int(next(tokens))
print (last)

all_available = [float(next(tokens)) for i in range(0, last)]
print (all_available)

data = all_available[first:last]
print (data)

输出：

3
11
[5.0, 6.2, 4.0, 6.0, 2.0, 5.0, 3.2, 6.0, 7.0, 1.4, 5.0]
[6.0, 2.0, 5.0, 3.2, 6.0, 7.0, 1.4, 5.0]

Answer 3

f.read()会将剩余数字作为字符串提供给您。您必须split他们和map到float。

import numpy as np
f = open('data.txt', 'r')
first, last = map(int, f.readline().split())
arr = np.zeros(last - first + 1)

data = map(float, f.read().split())

Answer 4

Python可以快速处理字符串处理。因此，您可以尝试用两个分隔符来解决这个阅读问题。将它减少到一个分隔符，然后阅读（Python 3。）：

import numpy as np
from io import StringIO

data = np.loadtxt(StringIO(''.join(l.replace(' ', '\n') for l in open('tata.txt'))),delimiter=' ',skiprows=2)

https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html

默认情况下，数据类型为float。

Python读取（浮点）值一次一个

4 个答案: