Question

我目前正在尝试创建一个自动化流程来动态解析一些特别大的日志文件（25MB +），并通过Java Servlet将它们返回给用户。

由于这些日志的大小，我试图在将它们加载到内存之前执行Linux解析命令以检索与用户相关的部分。这些部分可以分布在整个日志中。

我仍处于掌握正则表达式和文本解析工具（例如sed）的早期阶段，我希望有人可以指出我正确的方向来解决当前的问题。

我有一系列日志可以引用一行中的特定项目（例如KEY1），然后是关于此项目的未知数量的信息行。

然后日志将切换到下一个项目并重复

如果有基于linux的文本命令的任何组合，我可以尝试找出格式为

的文件

This is the first line and should not display
This is a section containing the text KEY1
Line 1
Line 2
Line 3
Line 4
This is a section containing the text KEY2
BadLine 1
BadLine 2
This is a second section containing the text KEY1
Line 5
Line 6
This is a section containing the text KEY3
BadLine 3
BadLine 4
BadLine 5
BadLine 6
This is a third section containing the text KEY1
Line 7
Line 8
Line 9
This is the last line

并返回：

This is a section containing the text KEY1
Line 1
Line 2
Line 3
Line 4
This is a second section containing the text KEY1
Line 5
Line 6
This is a third section containing the text KEY1
Line 7
Line 8
Line 9
This is the last line

命令

sed -n '/KEY1/,/KEY2/p' file

抓住第一部分的工作，但我找不到提取我需要的一切的通用方法。

任何帮助将不胜感激。

由于

- 编辑 -

2013/06/20 03:10:01 PM| FINE |S9180 |[Device] [ID:128] 
foo
bar
foo
bar
------------------------------------------
foo
bar
------------------------------------------
2013/06/20 03:10:02 PM| FINE |S9180 |[Device] [ID:132] 
Other foo
Other bar
------------------------------------------
Other foo
Other bar
Other foo
Other bar
------------------------------------------
2013/06/20 03:10:03 PM| FINE |S9180 |[Device] [ID:128] 
foo
bar
------------------------------------------
foo
bar
foo
bar
------------------------------------------
foo
bar

为了澄清，这是我正在使用的格式。我正在尝试获取日志中特定设备的所有信息。例如密钥 [ID：128] 下的所有文字，但忽略 [ID：132] 下的部分（或 ID：128以外的任何其他ID） 因为设备没有特定的订单，所以

Answer 1

GNU sed的代码，经过一些编辑后：

sed -rn '/\[ID:[0-9]+\]/{/\[ID:128\]/!{s/.*\B(\[ID:[0-9]+\])\B.*/\1/;H}};${x;s/\n//;s/\]\n\[/\\]|\\[/g;s@(.*)]@/\\[ID:128\\]/,/\\\1\\]/\{/\\\1\\]/!p\}@p}' file|sed -nrf - file

$cat file
2013/06/20 03:10:01 PM| FINE |S9180 |[Device] [ID:128]
foo
bar
foo
bar
------------------------------------------
foo
bar
------------------------------------------
2013/06/20 03:10:02 PM| FINE |S9180 |[Device] [ID:132]
Other foo
Other bar
------------------------------------------
Other foo
Other bar
Other foo
Other bar
------------------------------------------
2013/06/20 03:10:03 PM| FINE |S9180 |[Device] [ID:128]
foo
bar
------------------------------------------
foo
bar
foo
bar
------------------------------------------
foo
bar
2013/06/20 03:10:02 PM| FINE |S9180 |[Device] [ID:32]
Other foo
Other bar
------------------------------------------
Other foo
Other bar
Other foo
Other bar
------------------------------------------
2013/06/20 03:10:03 PM| FINE |S9180 |[Device] [ID:128]
foo
bar
------------------------------------------
foo
bar
foo
bar
------------------------------------------
foo
bar
2013/06/20 03:10:02 PM| FINE |S9180 |[Device] [ID:132]
Other foo
Other bar
------------------------------------------
Other foo
Other bar
Other foo
Other bar
------------------------------------------
2013/06/20 03:10:03 PM| FINE |S9180 |[Device] [ID:17]
foo
bar
------------------------------------------
foo
bar
foo
bar
------------------------------------------
foo
bar

$sed -rn "/\[ID:[0-9]+\]/{/\[ID:128\]/!{s/.*\B(\[ID:[0-9]+\])\B.*/\1/;H}};${x;s/\n//;s/\]\n\[/\\]|\\[/g;s@(.*)]@/\\[ID:128\\]/,/\\\1\\]/\{/\\\1\\]/!p\}@p}" file|sed -nrf - file
2013/06/20 03:10:01 PM| FINE |S9180 |[Device] [ID:128]
foo
bar
foo
bar
------------------------------------------
foo
bar
------------------------------------------
2013/06/20 03:10:03 PM| FINE |S9180 |[Device] [ID:128]
foo
bar
------------------------------------------
foo
bar
foo
bar
------------------------------------------
foo
bar
2013/06/20 03:10:03 PM| FINE |S9180 |[Device] [ID:128]
foo
bar
------------------------------------------
foo
bar
foo
bar
------------------------------------------
foo
bar

第一个sed调用“收集”除/\[ID:[0-9]+\]/之外的正则表达式[ID:128]的所有键。第二个调用使用收集的密钥过滤不需要的部分。

Answer 2

我认为更通用的方法是：

perl -ne 'print if /KEY1/../KEY(?!1)/' input.txt | perl -ne 'print unless /KEY(?!1)/'

和

perl -ne 'print if /ID:128/../ID:(?!128)/' file.txt | perl -ne 'print unless /ID:(?!128)/'

这里有一些重要的概念：

KEY（？！1）表示“KEY未跟随1”
“perl -ne”表示“默认禁用打印”
所以，只有当文本与“KEY1行，任意行数，KEY不符合1”的行匹配时才启用打印
第二个perl调用将删除带有KEY2和KEY3的行，否则将打印

我想有一种更好的方法来删除KEY2和KEY3行，但我无法弄清楚如何做到这一点：一些perl大师可以帮助你更多！

从日志文件中获取范围模式中的特定行

2 个答案: