我有一个如下文件
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="show databases"/>
<AUDIT_RECORD TIMESTAMP="2013-07-29T17:27:53" NAME="Quit" CONNECTION_ID="12" STATUS="0"/>
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="show grants for root@localhost"/>
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="create table stamp like paper"/>
此处每条记录以<AUDIT_RECORD
开头,以"/>
结尾,记录可能分布在多行中。
我的要求是显示如下结果
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="show databases"/>
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="show grants for root@localhost"/>
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="create table stamp like paper"/>
为此目的,我使用了
sed -n "/Query/,/\/>/p" file.txt
但是它显示整个文件,包括带有“Quit”字符串的记录。
任何人都可以帮我这个吗?另请告诉我是否可以匹配名为“Query”的字符串(如grep -w "Query"
)。
答案 0 :(得分:4)
使用GNU awk,您可以将RS设置为多个字符:
$ cat file
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query"
CONNECTION_ID="10" STATUS="0" SQLTEXT="show databases"/>
<AUDIT_RECORD TIMESTAMP="2013-07-29T17:27:53"
NAME="Quit" CONNECTION_ID="12" STATUS="0"/>
<AUDIT_RECORD
TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10"
STATUS="0" SQLTEXT="show grants for root@localhost"/>
<AUDIT_RECORD
TIMESTAMP="2013-07-30T17:52:29"
NAME="Query"
CONNECTION_ID="10"
STATUS="0"
SQLTEXT="create table stamp like paper"/>
$
$ gawk -v RS='\\/>\n' -v ORS= '/Query/{print $0 RT}' file
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query"
CONNECTION_ID="10" STATUS="0" SQLTEXT="show databases"/>
<AUDIT_RECORD
TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10"
STATUS="0" SQLTEXT="show grants for root@localhost"/>
<AUDIT_RECORD
TIMESTAMP="2013-07-30T17:52:29"
NAME="Query"
CONNECTION_ID="10"
STATUS="0"
SQLTEXT="create table stamp like paper"/>
$
$ gawk -v RS='\\/>\n' -v ORS= '/Query/{$1=$1; print $0 RT}' file
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="show databases"/>
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="show grants for root@localhost"/>
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="create table stamp like paper"/>
答案 1 :(得分:3)
我同意@choroba认为XML解析器是正确的工具。但是,如果没有可用的,你可以试试这个awk脚本:
awk '/Query/{print RS" "$0}' RS='<AUDIT_RECORD' file
答案 2 :(得分:2)
输入可能是XML。使用适当的解析器来处理它,特别是如果记录跨越多行。例如,xsh:
open file.xml ;
remove //AUDIT_RECORD[not(@NAME="Query")] ;
save :b ;
答案 3 :(得分:2)
我建议的sed解决方案:
sed 's/<[^>]*\"Quit\"[^>]*>//' file.txt
对于跨越多行的记录,请尝试:
sed '{:q;N;s/\n/ /g;t q}' file.txt | sed 's/<[^>]*\"Quit\"[^>]*>//'
添加换行符RS:
... | sed 's|/>|/>\n|g'