Question

我有一个我正在运行的程序的输出文件（不是我自己创建的程序），我需要访问的一些数据是在输出文件中的注释（前导＃）行。我想要的输出文件的段总是以相同的行开始和结束，但它们相对于文件开头和彼此的位置并不总是相同。

我们说我的输出文件名为output.txt。我在output.txt中访问所需行的尝试如下：

data_file = open("output.txt", "r")
block = ""
found = False

for line in data_file:
    if found:
        block += line
        if line.strip() == "# This isn't the actual line either, but I want to stop here:": break
    else:
        if line.strip() == "# This isn't the actual line, but I'm making a working example:":
            found = True
            block = "# This isn't the actual line, but I'm making a working example:"

这确实让我得到了我想要的线条。但是，这给我留下的是我不确定如何使用的东西。我想要的只是数值列。我已经考虑过使用split()命令，但我不想将block分解为字符串...我想保留以制表符分隔的漂亮列并将它们放入一个NumPy数组。

# This isn't the actual line, but I'm making a working example:
# 
#    point     c[0]        c[1]        c[2]     
# -0.473359  7161.325229    -609.475403  49128.219132   
# -0.459864  7162.047233    -102.060363  1189.270542    
# -0.404065  7160.055198     467.778393 -23832.885052   
# -0.385952  7160.708981     0.675271    2.177786   
# 
# This isn't the actual line either, but I want to stop here:

所以我最终需要的是：

获取我想要的output.txt行的方法（如果有比我现在做的更好的话）;
一种只读取block中数字数据的行的方法，以便将它们放入NumPy数组中;
一种完成1＆amp;的方法2（如果可能的话）不涉及字符串。

最后一点，我还没有使用numpy.genfromtxt()，因为此文件中的数据也不在评论（＃）后面。

任何建议都将受到赞赏。

Answer 1

将块分成字符串并不是什么大问题。实际上，当您逐行读取文件以查找开始/结束条件时，这正是您所做的。在读取大文件时的问题是在处理之前将整个内容拉入内存。

numpy.genfromtxt()可以处理生成器，因为它逐行加载目标数据，它比预读所有内容更有效。这是一个生成器，它会丢弃行，直到找到你想要的行，然后将它们输入到numpy中。它是为python 3编写的，但也适用于2。

import numpy

def block_reader(fp):
    for line in fp:
        if line.strip() == b"# This isn't the actual line, but I'm making a working example:":
             break
    for line in fp:
        if line.strip() == b"# This isn't the actual line either, but I want to stop here:":
            break
        line = line[2:].strip()
        if line:
            yield line

a = numpy.genfromtxt(block_reader(open('somefile.txt', 'rb')), skip_header=1)
print(a)

Answer 2

按照您已经完成的操作，您可以按照以下方式进行修改，以获得您想要的效果。将代码转换为一个函数，在开始标记和结束标记之间，yield所有仅包含数字的行，最后是＆＃39;＃＆＃39;在行的开头签名。为此，我定义了两个识别数字的辅助函数，并检查一行是否只包含数字。使用函数的输出提供np.genfromtxt，请参见下文。

import numpy as np

is_number = lambda x: x.strip('-+').replace('.','',1).isdigit()
all_number = lambda x: all(is_number(var) for var in x.split())
def read_bloc(fileName):
    with open(fileName, "rb") as data_file:
        found = False
        for line in data_file:
            if found:
                cleaned = line.strip().strip('#')
                if all_number(cleaned):
                    yield cleaned
                if line.strip() == "# This isn't the actual line either, but I want to stop here:": break
            else:
                if line.strip() == "# This isn't the actual line, but I'm making a working example:":
                    found = True
            #
#
print np.genfromtxt( read_bloc("output.txt") )

一个问题是，对于标志号，标志和号码本身之间不应有空格。

Python - 输出文件中的数据位置不方便

2 个答案: