Question

我有一个file.txt，看起来像这样。

testings 1
response 1-a
time 32s

testings 2
response 2-a
time 32s

testings 3
*blank*

testings 4
error

testings 5
response 5-a
time 26s

并打印

['testings 1', 'testings 2', 'testings 3', 'testings 4', 'testings 5']     
['response 1-a', 'response 2-a', 'response 5-a']
['time 32s', 'time 20s', 'time 26s']

因此我拥有一个简单的代码，它将打开文件，使用readlines()并查找关键字testings，response和time，然后将字符串附加到3个单独的列表。如file.txt所示，某些testings x是*blank*或有error而不是response。我的问题是我需要列表始终具有相同的长度。像这样：

 ['testings 1', 'testings 2', 'testings 3', 'testings 4', 'testings 5']
 ['response 1-a', 'response 2-a', '*error*', '*error*', 'response 5-a']
 ['time 32s', 'time 20s', '*error*', '*error*',  'time 26s']

所以我在想是否可以“同时读取3行”并有一个if语句，其中所有3行都需要有正确的关键字（“为True”），否则插入{在响应和时间列表中使用{1}}来保持长度正确。还是有更好的方法来保持3个清单的长度相同？

*error*

Answer 1

文本文件为iterables，这意味着您可以直接在它们上循环，也可以使用next() function从它们中获取另一行。无论使用哪种方法，即使使用混合技术，文件对象也始终会在文件中产生下一行。

您可以使用它在for循环中拉入更多行：

with open("textfile.txt",'r') as txt_file:
    for line in txt_file:
        line = line.strip()
        if line.startswith('testings'):
            # expect two more lines, response and time
            response_line = next(txt_file, '')
            if not response_line.startswith('response'):
                # not a valid block, scan forward to the next testings
                continue
            time_line = next(txt_file, '')
            if not time_line.startswith('time'):
                # not a valid block, scan forward to the next testings
                continue
            # valid block, we got our three elements
            test.append(line) 
            response.append(response_line.strip())
            time.append(time_line.strip())

因此，当找到以testings开头的行时，代码将拉入下一行。如果该行以response开头，则插入另一行。如果该行以time开头，则所有三行都将附加到您的数据结构中。如果这两个条件均不满足，则继续进行外部for循环，并继续读取文件，直到找到另一行testings。

增加的好处是，永远不会一次性将文件读入内存。文件缓冲可以保持这种效率，但是，否则，您将不需要比最后一组列表（有效数据）和当前正在测试的三行所需的内存更多。

旁注：我强烈建议您不要使用三个长度相等的单独列表。您可以将单个列表与元组一起使用：

test_data = []
# ... in the loop ...
test_data.append((line, response_line.strip(), time_line.strip()))

，然后使用该单个列表将每个三元组信息保持在一起。您甚至可以使用named tuple：

from collections import namedtuple

TestEntry = namedtuple('TestEntry', 'test response time')

# ... in the loop
test_data.append(TestEntry(line, response_line.strip(), time_line.strip()))

此时test_data列表中的每个条目都是具有test，response和time属性的对象：

for entry in test_data:
    print(entry.test, entry.response, entry.time)

Answer 2

此代码段符合您的要求。您可以使用next(txt_file, '')检索下一行，而不必先将文件加载到内存中。然后，您只寻找包含“测试”的行，然后，您将比较下两行。每当找到“测试”时，它将始终向每个列表添加一个字符串，但是，如果找不到“响应”或“时间”，它将在适当的地方插入错误。这是代码，使用您在上方输入的内容。

with open("textfile.txt", "r") as txt_file:
     test = []
     response = []
     time = []
     for line in txt_file:
         if "testings" in line:
             test_line = line.strip()
             response_line = next(txt_file, '').strip()
             time_line = next(txt_file, '').strip()
             test.append(test_line)
             if "response" in response_line:
                 response.append(response_line)
             else:
                 response.append("*error*")
             if "time" in time_line:
                 time.append(time_line)
             else:
                 time.append("*error*")

和输出：

In : test
Out: ['testings 1', 'testings 2', 'testings 3', 'testings 4', 'testings 5']

In : response
Out: ['response 1-a', 'response 2-a', '*error*', '*error*', 'response 5-a']

In : time
Out: ['time 32s', 'time 32s', '*error*', '*error*', 'time 26']

In : len(test), len(response), len(time)
Out: (5, 5, 5)

Answer 3

从答案here

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

with open("textfile.txt",'r') as txt_file:
    for batch in grouper(txt.readlines, 3):
        if ("testings") in batch[0]:
            test.append(line.strip())
        else:
            test.append('error')
        if ("response") in batch[1]:
            response.append(line.strip())
        else:
            response.append('error')
        if ("time") in batch[2]:
            time.append(line.strip())
        else:
            time.append('error')

这假定总是有相同顺序的行，并且该文件总是以三行的批次组织，即使那只是一个空行。由于实际上看起来您的输入文件在每3个组之间都有一个空白行，因此您可能需要更改grouper才能读取4个批次。

Answer 4

一种快速工作的解决方案，它只是将文件作为一个字符串读取，然后对字符串进行操作以获得所需的输出。

# -*- coding: utf-8 -*-

file = "test.txt"

with open(file, "r") as f:
    data = f.read()

data = data.split("testings")

# First one is empty because it start with "testings"
del data[0]

data = [elt.split("\n") for elt in data]

# Add the neccessary errors.
for i, elt in enumerate(data):
    if "response" not in elt[1]:
        data[i][1] = '*error*'
        data[i][2] = '*error*'

    # Because of the \n between the response, the elt length is not 3. Let's keep the 3 first ones.
    data[i] = data[i][:3]

print (data)

"""
[[' 1', 'response 1-a', 'time 32s'], 
[' 2', 'response 2-a', 'time 32s'], 
[' 3', '*error*', '*error*'], 
[' 4', '*error*', '*error*'], 
[' 5', 'response 5-a', 'time 26s']]
"""

# Now list comprehension to get your output

ID = ["testings" + elt[0] for elt in data]
Response = [elt[1] for elt in data]
Value = [elt[2] for elt in data]

这非常简单，并且考虑了您提出的每种情况。

输出：

print (ID)
['testings 1', 'testings 2', 'testings 3', 'testings 4', 'testings 5']

print (Response)
['response 1-a', 'response 2-a', '*error*', '*error*', 'response 5-a']

print (Value)
['time 32s', 'time 32s', '*error*', '*error*', 'time 26s']

Python-同时读取文本文件中的3行

4 个答案: