我想grep
正则表达式MR(主要)的文件,并获得与正则表达式BR(之前)匹配的所有连续前面的行,以及与正则表达式AR(之后)匹配的所有连续后续行
即。像这样的东西
grep -B [BR] -A [AR] [MR] file
e.g。对于以下部分(取自CHILDES项目):
8|10|SUBJ 9|10|AUX 10|6|ROOT 11|10|PUNCT
*CHI: here .
%mor: adv|here .
%gra: 1|0|INCROOT 2|1|PUNCT
*URS: ask her (.) okay ?
%mor: v|ask pro:poss:det|her adj|okay ?
%gra: 1|0|ROOT 2|3|MOD 3|1|OBJ 4|1|PUNCT
*URS: ask her what she can eat .
%mor: v|ask pro:obj|her pro:wh|what pro:sub|she mod|can v|eat .
%gra: 1|0|ROOT 2|1|OBJ 3|6|LINK 4|6|SUBJ 5|6|AUX 6|1|COMP 7|1|PUNCT
*URS: but what is it ?
%mor: conj|but pro:wh|what aux|be&3S pro|it ?
%gra: 1|3|LINK 2|3|OBJ 3|0|ROOT 4|3|OBJ 5|3|PUNCT
*CHI: it's peaches and pears .
查询
grep -B '^\*' -A '^%' '^%mor:\s+v' file
将返回
*URS: ask her (.) okay ?
%mor: v|ask pro:poss:det|her adj|okay ?
%gra: 1|0|ROOT 2|3|MOD 3|1|OBJ 4|1|PUNCT
*URS: ask her what she can eat .
%mor: v|ask pro:obj|her pro:wh|what pro:sub|she mod|can v|eat .
%gra: 1|0|ROOT 2|1|OBJ 3|6|LINK 4|6|SUBJ 5|6|AUX 6|1|COMP 7|1|PUNCT
换句话说,我正在寻找以动词开头的所有话语(以*开头的行),并且每个话语都应该跟随其依赖层(以%开头的行)。随意建议使用其他命令行工具而不是grep(例如awk)。
另一个例子,查询
grep -B '^[*%][gU]' -A '^%' '^%mor:\s+v' file
应该返回
%gra: 1|0|INCROOT 2|1|PUNCT
*URS: ask her (.) okay ?
%mor: v|ask pro:poss:det|her adj|okay ?
%gra: 1|0|ROOT 2|3|MOD 3|1|OBJ 4|1|PUNCT
*URS: ask her what she can eat .
%mor: v|ask pro:obj|her pro:wh|what pro:sub|she mod|can v|eat .
%gra: 1|0|ROOT 2|1|OBJ 3|6|LINK 4|6|SUBJ 5|6|AUX 6|1|COMP 7|1|PUNCT
答案 0 :(得分:2)
您可以使用awk:
awk -v br='^[*%][gU]' -v ar='^%' -v mr='^%mor:[[:blank:]]+v' '
p && $0 ~ ar {
print data RS $0
p=0
data=""
next
}
$0 ~ br {
data = (data=="" ? $0 : data RS $0)
next
}
$0 ~ mr {
data = data RS $0
p=1
next
}
{
data = ""
}' file
%gra: 1|0|INCROOT 2|1|PUNCT
*URS: ask her (.) okay ?
%mor: v|ask pro:poss:det|her adj|okay ?
%gra: 1|0|ROOT 2|3|MOD 3|1|OBJ 4|1|PUNCT
*URS: ask her what she can eat .
%mor: v|ask pro:obj|her pro:wh|what pro:sub|she mod|can v|eat .
%gra: 1|0|ROOT 2|1|OBJ 3|6|LINK 4|6|SUBJ 5|6|AUX 6|1|COMP 7|1|PUNCT
这个awk的工作原理如下:
br
匹配时,它会使用该行启动变量data
,即data=$0
mr
,则会在data
变量中添加该行并设置标记p=1
ar
时,如果设置了标志,它将打印数据和当前行。最后,它重新初始化了旗帜。