Question

我是python的新手。我正在尝试提取跳过标题行的特定行，这些行在文本文件中以周期性间隔重复，并将其写在另一个文件中。我已经能够使用以下代码执行此操作，但这非常慢。

import random
import sys
import os

with open('test.txt', encoding ='latin1') as rf: 
    with open('test1.txt', 'w') as wf:
        for x, line in enumerate(rf): #reads the line number
            #nskip = 3 #number of headers to skip
            #nloop = 5 #number of loops in the file
            ndata = 7 #number of lines in each loop
            data = 4 #number of lines to be extracted 
            x+=1
            #print(x,line)

            for i in range(1,ndata+1):
                for j in range((ndata*i - data)+1, ndata*i+1):
                    if x == j:
                        #print(line)
                        wf.write(line)

例如。从这个代码我能得到Line5，Line6，Line7，Line12，Line13，Line14，Line19，Line20，Line21（如果你认为测试文件有像Line1，Line2，Line3这样的行，每行等等）所以我打算。但问题是我的真实文件更大，需要花费大量的时间和内存。必须有更快速和更快速的方式来做到这一点。

此外，我希望能够在每个循环中的行中添加循环编号，即第一个循环将在所有行中得到1（每行中的某个位置，可能是Line5 1，Line6 1，Line7 1，Line12 2，Line13 2 ，Line14 2，Line19 3等等）。虽然我想做的事情比这更复杂。但这应该铺平道路。感谢。

Answer 1

由于标题和记录的大小固定，跳过数字标题行并重复写入记录行数，直到达到文件末尾。

n_header_lines = 25
n_record_lines = 100
page_num = 0

with open('test.txt', encoding ='latin1') as rf, with open('test1.txt', 'w') as wf:
    try:
        while True:
            page_num += 1
            for _ in range(n_header_lines):
                next(rf)
            for line_num in range(1, n_record_lines + 1):
                prefix = 'Line {:3d} {:3d} '.format(line_num, page_num)
                wf.write(prefix + next(rf)))
    except StopIteration:
        pass

循环虽然文件提取行

1 个答案: