使用while循环进行条件读取文本文件

时间:2017-10-04 19:45:46

标签: python

保持简单,[省略缩放和并行],我正在尝试阅读文本文件。在该文本文件中,有一些条目在多行上运行(其他软件具有字符输入限制)。一个例子如下

#Iterating through the file
with open(fileName, 'r') as file:
     #Examining each line
     for line in file:
         #If the first three characters meet a condition
         if line[:3] == "aa ":
             #If the last character is not a condition
             if line.rstrip()[-1:] != "'":
                   #Then this entry effectively runs onto *at least* the next line
                   #Store the current line in a buffer for reuse
                   temp = line

                   #Here is my issue, I don't want to use a 'for line in file' again, as that would require me to write multiple "for" & "if" loops to consider the possibility of entries running over several lines
                   [Pseudocode]
                   while line.rstrip()[-1:] in file != "'":
                           #Concatenate the entries to date
                           temp = temp + line

                   #entry has completed
                   list.append(temp)

              else
                   #Is a single line entry
                   list.append(line)

但是,它显然不喜欢while循环。我环顾四周,没有碰到任何东西。任何想法?感谢。

3 个答案:

答案 0 :(得分:1)

这应该有效。我构建了自己的示例输入:

# Content of input.txt:
# This is a regular entry.
# aa 'This is an entry that
# continues on the next line
# and the one after that.'
# This is another regular entry.

entries = []
partial_entry = None  # We use this when we find an entry spanning multiple lines

with open('input.txt', 'r') as file:
    for line in file:
        # If this is a continuation of a previous entry
        if partial_entry is not None:
            partial_entry += line

            # If the entry is now complete
            if partial_entry.rstrip()[-1] == "'":
                entries.append(partial_entry)
                partial_entry = None
        else:
            # If this is an entry that will continue
            if line.startswith("aa ") and line.rstrip()[-1] != "'":
                partial_entry = line
            else:
                entries.append(line)

# If partial_entry is non-None here, we have some entry that never terminated
assert partial_entry is None

print(entries)

# Output:
# ['This is a regular entry.\n', "aa 'This is an entry that\ncontinues on the next line\nand the one after that.'\n", 'This is another regular entry.\n']

修改

根据上面的PM2Ring建议,这是使用next(file)的解决方案。 (与之前相同的输入和输出。)

entries = []

with open('input.txt', 'r') as file:
    for line in file:
        if line.startswith("aa "):
            while not line.rstrip().endswith("'"):
                line += next(file)
        entries.append(line)

print(entries)

答案 1 :(得分:0)

在迭代器上使用next()只获取下一个元素,而不会干扰for循环:

#Iterating through the file
with open(fileName, 'r') as file:
     #Examining each line
     for line in file:
         #If the first three characters meet a condition
         if line[:3] == "aa ":
             while not line.rstrip().endswith("'"):
                 line += next(file)

             #entry has completed
             list.append(line)

答案 2 :(得分:0)

当你读到一行继续到下一行时,只需将部分结果存储在变量中,然后让循环转到下一行并连接这些行。例如:

#Iterating through the file
result = []
with open(filename, 'r') as file:
     buffer = ''
     #Examining each line
     for line in file:
         #If the first three characters meet a condition
         if line[:3] == "aa ":
             buffer += line
             #If the last character indicates that the line is NOT to be continued, 
             if line.rstrip()[-1:] == "'":
                 result.append(buffer)
                 buffer = ''
     if buffer:
         # Might want to warn the the last line expected continuation but no subsequent line was found
         result.append(buffer)
print result

请注意,如果文件非常大,使用yield语句生成结果行而不是将它们累积在列表中可能会更好。