Question

我有一个文本文件，我需要根据最后的匹配条件读取行。例如，在最后一次出现特定word或string之后读取所有行直到文件末尾。

示例文件：

2016 Jun 01 13:48:46:590 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300006 Engine COMPLEXITY_CALCULATOR-GenerateComplexitySheet terminating 
2016 Jun 01 13:50:47:692 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300001 Process Engine version 5.11.0, build V62_hotfix017, 2015-9-24 
2016 Jun 01 13:50:47:702 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300009 BW Plugins: version 5.11.0, build V62_hotfix017, 2015-9-24 
2016 Jun 01 13:48:46:590 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300006 Engine COMPLEXITY_CALCULATOR-GenerateComplexitySheet terminating 
2016 Jun 01 13:50:47:692 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300001 Process Engine version 5.11.0, build V62_hotfix017, 2015-9-24 
2016 Jun 01 13:50:47:702 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300009 BW Plugins: version 5.11.0, build V62_hotfix017, 2015-9-24 
2016 Jun 01 13:50:47:710 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300011 Java version: Java HotSpot(TM) 64-Bit Server VM 23.3-b01 
2016 Jun 01 13:50:47:711 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300012 OS version: amd64 Linux 2.6.32-573.3.1.el6.x86_64  
2016 Jun 01 13:50:51:776 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300002 Engine COMPLEXITY_CALCULATOR-GenerateComplexitySheet started

从上面的文件中我想读取字符串last occurrence

的COMPLEXITY_CALCULATOR-GenerateComplexitySheet terminating之后的所有行

预期产出：

2016 Jun 01 13:50:47:692 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300001 Process Engine version 5.11.0, build V62_hotfix017, 2015-9-24 
    2016 Jun 01 13:50:47:702 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300009 BW Plugins: version 5.11.0, build V62_hotfix017, 2015-9-24 
    2016 Jun 01 13:50:47:710 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300011 Java version: Java HotSpot(TM) 64-Bit Server VM 23.3-b01 
    2016 Jun 01 13:50:47:711 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300012 OS version: amd64 Linux 2.6.32-573.3.1.el6.x86_64  
    2016 Jun 01 13:50:51:776 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300002 Engine COMPLEXITY_CALCULATOR-GenerateComplexitySheet started

Answer 1

由于你在groovy标签上发布了这个，我希望你可以从shell中使用groovy。我编写了以下脚本。虽然它遍历文件两次，但它将以流式方式工作，不会炸毁你的记忆：

f = new File("sample.txt")

def lastIndex
f.eachLine { line, index ->
    if (line.contains("GenerateComplexitySheet terminating")) {
        lastIndex = index + 1
    }
}


new File("out.txt").with {
    write ""
    withWriter { writer ->
        f.eachLine { line, index ->
            if (index >= lastIndex) {
                writer.writeLine line
            }
        }
    }

    assert text == '''2016 Jun 01 13:50:47:692 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300001 Process Engine version 5.11.0, build V62_hotfix017, 2015-9-24 
2016 Jun 01 13:50:47:702 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300009 BW Plugins: version 5.11.0, build V62_hotfix017, 2015-9-24 
2016 Jun 01 13:50:47:710 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300011 Java version: Java HotSpot(TM) 64-Bit Server VM 23.3-b01 
2016 Jun 01 13:50:47:711 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300012 OS version: amd64 Linux 2.6.32-573.3.1.el6.x86_64  
2016 Jun 01 13:50:51:776 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300002 Engine COMPLEXITY_CALCULATOR-GenerateComplexitySheet started
'''
}

Answer 2

试试这个： -

def file = new File("file.txt")
def index = file.findLastIndexOf {it =~ "COMPLEXITY_CALCULATOR-GenerateComplexitySheet terminating" }
def lines = file.readLines()
lines[(index+1)..(lines.size()-1)].each { println it }

输出： -

2016 Jun 01 13:50:47:692 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300001 Process Engine version 5.11.0, build V62_hotfix017, 2015-9-24 
2016 Jun 01 13:50:47:702 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300009 BW Plugins: version 5.11.0, build V62_hotfix017, 2015-9-24 
2016 Jun 01 13:50:47:710 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300011 Java version: Java HotSpot(TM) 64-Bit Server VM 23.3-b01 
2016 Jun 01 13:50:47:711 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300012 OS version: amd64 Linux 2.6.32-573.3.1.el6.x86_64  
2016 Jun 01 13:50:51:776 GMT +0200 BW.COMPLEXITY_CALCULATOR-GenerateComplexitySheet Info [BW-Core] BWENGINE-300002 Engine COM PLEXITY_CALCULATOR-GenerateComplexitySheet started

希望它会对你有所帮助.. :)

Answer 3

无论语言如何，有两种算法可以实现这一目标：

首先：

initialise a temporary store (memory or temp file)
open input
while(read line) {
   if(line matches search pattern) {
        clear temp store
   }
   write line to temp store
}
copy temp store to output

第二

open input
while(read line) {
   if(line matches search pattern) {
       store line number in variable
   }
}
close input
open input again
read until stored line number
read / write until end

第一个选项的优点是它适用于管道输入，您无法在开始时重新打开输入。但它的缺点是你必须将输出线存储在某个临时位置，直到你到达输入的最后一行。

第二个选项的优点是它一次只能在内存中保存一行输入。它的缺点是它永远无法使用它从一开始就可以重新打开的输入源。

您应该能够在Groovy或shell中轻松实现其中任何一个。

在shell中，如果输入是文件，则可以拼凑第二种算法的版本：

 tail --lines=+$(grep -n pattern input.txt | tail -1 | cut -d: -f1) input.txt

我们在此处使用grep -n查找匹配的行（包含行号），tail -1选择最后一行，cut以提取行号，以及{ {1}}将这些行写入stdout。

使用groovy或shell脚本

3 个答案: