如何在正确的if语句之前打印行X行

时间:2018-04-24 21:35:27

标签: python python-2.7 list tuples enumeration

我对Python很陌生,只有通过大量网页找到的知识。

话虽这么说,我正在尝试搜索一个文件(~10k行)来查找我写的一套过滤器类似的标准,然后我希望它打印符合条件的行和一行前面的X行数。

我创建了以下脚本来打开所述文件,逐行迭代,并将符合过滤条件的行打印到输出文件,但是我对如何将其合并到当前脚本感到困惑。

import os

output_file = 'Output.txt'
filename = 'BigFile.txt'                 

numLines = 0
numWords = 0
numChrs = 0
numMes = 0

f1 = open(output_file, 'w')
print 'Output File has been Opened'

with open(filename, 'r') as file:
   for line in file:
      wordsList = line.split()
      numLines += 1
      numWords += len(wordsList)
      numChrs += len(line)

      if "X" in line and "Y" not in line and "Z" in line:
          numMes += 1
          print >>f1, line
          print 'Object found and Catalogued in Output.txt'                          

print "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print >>f1, "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)

print "There are a total of %i thing in this file" % (numMes)
print >>f1, "There are a total of %i things in this file" % (numMes)

f1.close()

print 'Output Files have been Closed'

我的第一个猜测是使用line.enumeration,但我不认为我可以说明lines - 5之类的内容,以便在lines之前打印5行:

lines = f1.enumeration()
if "blah blah" in line and "so so" not in line:
    print >>f1, lines
    print >>f1, [lines - 5]

最好的部分还没有到来,因为我必须获取Output.txt文件并与另一个文件进行比较以输出两个文件中的匹配条件......但是一次一步,对吧?

- 也可以自由添加“正确”的模糊内容。技术......我确信这个剧本可以写得更好,所以请教我一些我做错的事。

提前感谢您的帮助!

更新: 已成功实施此修复程序,感谢以下帮助:

import os

output_file = 'Output.txt'
filename = 'BigFile.txt'                 

numLines = 0
numWords = 0
numChrs = 0

numMulMes = 0

last5 = []

f1 = open(output_file, 'w')
print 'Output Files have been Opened'

with open(filename, 'r') as file:
    for line in file:
        wordsList = line.split()
        numLines += 1
        numWords += len(wordsList)
        numChrs += len(line)
        last5[:] = last5[-5:]+[line] 
        if "X" in line and "Y" not in line and "Z" not in line:
            del last5[1:5]           ###the missing piece of the puzzle!
            numMulMes += 1
            print >>f1, last5
            print 'Object found and Catalogued in Output.txt'

print "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print >>f1, "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)

print "There are a total of %i messages in this file" % (numMulMes)
print >>f1, "There are a total of %i messages in this file" % (numMulMes)

f1.close()
f3.close()

print 'Output Files have been Closed'

我一直试图通过另一个单独的脚本修改输出文件,并且在最长的时间内我正在与str操作和错误操作进行对抗。刚决定回到原来的剧本,随心所欲地将它扔进那里,并且中提琴。

感谢你推动我朝着正确的方向前进,很容易从那里弄明白!

4 个答案:

答案 0 :(得分:4)

你自己解决了大部分问题(计算单词,行,亚麻等)   - 您可以在浏览文件时记住最后n行。

示例:

t = """"zero line
one line
two line
three line
four line 
five line 
six line
seven line 
eight line
""" 

last5 = [] # memory cell
for l in t.split("\n"):  # similar to your for line in file: 
    last5[:] = last5[-4:]+[l] # keep last 4 and add current line, inplace list mod 

    if "six" in l:
        print last5

您还可以查看deque并指定最大长度(您需要导入它)

from collections import deque

last5 = deque(maxlen=5)
for l in t.split("\n"): 
    last5.append(l) # will automatically only keep 5 (maxlen)

    if "six" in l:
        print last5

输出:

 # list version
 ['two line', 'three line', 'four line ', 'five line ', 'six line'] 

 # deque version
 deque(['two line', 'three line', 'four line ', 'five line ', 'six line'], maxlen=5) 

答案 1 :(得分:2)

这里与@PatricArtner建议的解决方案相同,但是使用了环形缓冲区。它可能(或可能不是,我没有检查)使用大文件更快地工作。 这个想法非常简单:我们可以创建一个包含所需大小的列表(您应该保留的行数)和当前记录位置cnt的计数器。对于每个新行,我们应该将cnt增加1并使用缓冲区的大小来模数。因此cnt在列表中循环。例如,如果列表大小为5 cnt = (cnt+1)%5,则会提供0 1 2 3 4 0 1 2,依此类推。 cnt的每一步都将指向列表中最旧的数据,这些数据将被新数据替代。下面是一个实现的例子。

t = """"zero line
six line - surprize 
one line
two line
three line
four line 
five line 
six line
seven line 
eight line
""" 


last5 = [None,None,None,None,None]
cnt = 0
for l in t.split("\n"):
  last5[cnt]=l
  if 'six' in l:
    print last5[(cnt+1)%5]
    print last5[(cnt+2)%5]
    print last5[(cnt+3)%5]
    print last5[(cnt+4)%5]
    print last5[(cnt+0)%5]
    print
  cnt = (cnt+1)%5

输出非常简单:

None
None
None
"zero line
six line - surprize 

two line
three line
four line 
five line 
six line

注意:如果您从文件中读取,并且文件非常大并且您需要保留的字符串很大(例如,基因序列)并且您的条件不会触发通常,要聪明,不要在记忆中保留字符串。在文件中创建最后一个字符串开始的位置列表,并在需要时重新读取它们。下面是一个如何快速实现它的例子......

from numpy import random as rnd

print "Creating the file ...."
DNA=["G","C","T","A"]
with open("bigdatafile","w") as fd:
    for i in xrange(5000):
        fd.write("".join([ DNA[rnd.randint(4)] for x in xrange(2000)])+"\n")
print "DONE"
print
print "SEARCHING GGGGGGGGGGG"
last5, cnt = [0,0,0,0,0], 1
with open("bigdatafile","r") as fd:
    for i,l in enumerate(fd.readlines()):
        last5[cnt] = last5[(cnt+4)%5]+len(l)
        if "GGGGGGGGGGG" in l:
            print "FIND!"
            fd.seek(last5[(cnt+1)%5])
            print fd.read(last5[cnt]-last5[(cnt+1)%5])
        cnt = (cnt+1)%5

答案 2 :(得分:0)

我没有写入文件,而是将内容输出到字典中。处理完整个文件后,摘要数据字典将以json的形式转储到文件中。使用Artner的测试文件。

import os
import json

output_file = 'Output.txt'
filename = 'BigFile.txt'                 

#initiate output container
outDict = {}
for fields in ['numLines', 'numWords', 'numChrs', 'numMes']:
    outDict[fields] = 0

outDict['lineNum'] = []    

with open(filename, 'r') as file:
    for line in file:
      wordsList = line.strip().split("\s")
      outDict['numLines'] += 1
      outDict['numWords'] += len(wordsList)
      outDict['numChrs'] += len(line)

      #find items in the line
      if "t" in line:
          outDict['numMes'] += 1
          #save line number
          outDict['lineNum'].append(outDict['numLines']) 
          #save line content
          outDict['lineList'].append(line)

#record output          
with open(output_file, 'w') as f1:
    f1.write(json.dumps(outDict))    

##print lines of desire
#x number of lines before
x=5    
with open(filename, 'r') as file:
    for i, line in enumerate(file):
        #iterate over line numbers for which condition is met
        for j in range(0,len(outDict['lineNum'])):
            #if line number is between found line num and line num minus x, print
            if (outDict['lineNum'][j]-x) <= i <= outDict['lineNum'][j]:
                print(line)

答案 3 :(得分:0)

由于我在comments中提到过,以下是使用grep的{​​{3}}功能在* nix机器上执行相同操作的方法。

首先假设您有以下文本文件test.txt

zero line
one line
two line
three line
four line 
five line 
six line
seven line 
eight line

如果您想在匹配前获得N行,可以使用-B选项。例如,"six"之前的5行:

$ grep -B 5 six test.txt 
one line
two line
three line
four line 
five line 
six line

还有-A选项可用于在匹配后获得N行,-C可用于获取N行之前和之后的行。