Question

我正在学习python，但是我的脚本有一些问题。

我有一个类似于以下文件：

我想在连续的两行中打印数字对2-1，只需在第2列中找到它们，然后在结果中打印第1列和第2列。结果将类似于以下内容：

4 2 
5 1

我正在尝试使用python，因为我的文件具有4,000,000数据。所以，这是我的脚本：

import linecache

final_lines = []
with open("file.dat") as f:
for i, line in enumerate(f, 1):
    if "1" in line:
        if "2" in linecache.getline("file.dat", i-1):
            linestart = i - 1 
            final_lines.append(linecache.getline("file.dat", linestart))
print(final_lines)

结果是：

['2\n', '2\n', '2\n']

我必须在脚本中更改哪些内容以适合所需的结果？，您能指导我吗？非常感谢。

Answer 1

对带有enumerate的{{1}}语句使用for循环来对行进行条件处理，然后如果条件为true，请将这两行附加到列表if中：

final_lines

现在：

final_lines = []
with open('file.dat') as f:
    lines = f.readlines()
    for i,line in enumerate(lines):
        if line.split()[1] == '2' and lines[i+1].split()[1] == '1':
            final_lines.extend([line,lines[i+1]])

将返回您想要的列表。

Answer 2

我认为会工作

import re
with open("info.dat") as f:
   for match in re.findall("\d+ 2[\s\n]*\d+ 1",f.read()):
       print match

另请参阅：https://repl.it/repls/TatteredViciousResources

另一种选择是

lines = f.readlines()
for line,nextline in zip(lines,lines[1:]):
    if line.strip().endswith("2") and nextline.strip().endswith("1"):
       print(line+nextline)

Answer 3

您是Python的初学者，很棒，所以我将采用更基本的方法。这是一个很大的文件，因此您最好一次读取一行并仅保留该行，但是实际上您需要两行来识别模式，因此请保留两行。考虑以下几点：

    fp = open('file.dat')
    last_line = fp.readline()
    next_line = fp.readline()
    while next_line:
        # logic to split the lines into a pair 
        # of numbers and check to see if the 
        # 2 and 1 end last_line and next_line
        # and outputting
        last_line = next_line
        next_line = fp.readline()

这遵循良好的，易读的软件模式，并且需要最少的资源。

如何使用python在连续的行中找到数字模式？

3 个答案: