Grep与条件上下文

时间:2016-05-27 14:13:08

标签: bash shell command-line grep

我想grep正则表达式MR(主要)的文件,并获得与正则表达式BR(之前)匹配的所有连续前面的行,以及与正则表达式AR(之后)匹配的所有连续后续行

即。像这样的东西

grep -B [BR] -A [AR] [MR] file

e.g。对于以下部分(取自CHILDES项目):

8|10|SUBJ 9|10|AUX 10|6|ROOT 11|10|PUNCT
*CHI:   here .
%mor:   adv|here .
%gra:   1|0|INCROOT 2|1|PUNCT
*URS:   ask her (.) okay ?
%mor:   v|ask pro:poss:det|her adj|okay ?
%gra:   1|0|ROOT 2|3|MOD 3|1|OBJ 4|1|PUNCT
*URS:   ask her what she can eat .
%mor:   v|ask pro:obj|her pro:wh|what pro:sub|she mod|can v|eat .
%gra:   1|0|ROOT 2|1|OBJ 3|6|LINK 4|6|SUBJ 5|6|AUX 6|1|COMP 7|1|PUNCT
*URS:   but what is it ?
%mor:   conj|but pro:wh|what aux|be&3S pro|it ?
%gra:   1|3|LINK 2|3|OBJ 3|0|ROOT 4|3|OBJ 5|3|PUNCT
*CHI:   it's peaches and pears . 

查询

grep -B '^\*' -A '^%' '^%mor:\s+v' file

将返回

*URS:   ask her (.) okay ?
%mor:   v|ask pro:poss:det|her adj|okay ?
%gra:   1|0|ROOT 2|3|MOD 3|1|OBJ 4|1|PUNCT
*URS:   ask her what she can eat .
%mor:   v|ask pro:obj|her pro:wh|what pro:sub|she mod|can v|eat .
%gra:   1|0|ROOT 2|1|OBJ 3|6|LINK 4|6|SUBJ 5|6|AUX 6|1|COMP 7|1|PUNCT

换句话说,我正在寻找以动词开头的所有话语(以*开头的行),并且每个话语都应该跟随其依赖层(以%开头的行)。随意建议使用其他命令行工具而不是grep(例如awk)。

另一个例子,查询

grep -B '^[*%][gU]' -A '^%' '^%mor:\s+v' file   

应该返回

%gra:   1|0|INCROOT 2|1|PUNCT
*URS:   ask her (.) okay ?
%mor:   v|ask pro:poss:det|her adj|okay ?
%gra:   1|0|ROOT 2|3|MOD 3|1|OBJ 4|1|PUNCT
*URS:   ask her what she can eat .
%mor:   v|ask pro:obj|her pro:wh|what pro:sub|she mod|can v|eat .
%gra:   1|0|ROOT 2|1|OBJ 3|6|LINK 4|6|SUBJ 5|6|AUX 6|1|COMP 7|1|PUNCT

1 个答案:

答案 0 :(得分:2)

您可以使用awk:

awk -v br='^[*%][gU]' -v ar='^%' -v mr='^%mor:[[:blank:]]+v' '
p && $0 ~ ar {
   print data RS $0
   p=0
   data=""
   next
}
$0 ~ br {
   data = (data=="" ? $0 : data RS $0)
   next
}
$0 ~ mr {
   data = data RS $0
   p=1
   next
}
{
   data = ""
}' file


%gra:   1|0|INCROOT 2|1|PUNCT
*URS:   ask her (.) okay ?
%mor:   v|ask pro:poss:det|her adj|okay ?
%gra:   1|0|ROOT 2|3|MOD 3|1|OBJ 4|1|PUNCT
*URS:   ask her what she can eat .
%mor:   v|ask pro:obj|her pro:wh|what pro:sub|she mod|can v|eat .
%gra:   1|0|ROOT 2|1|OBJ 3|6|LINK 4|6|SUBJ 5|6|AUX 6|1|COMP 7|1|PUNCT

这个awk的工作原理如下:

  • 当它与某一行中的br匹配时,它会使用该行启动变量data,即data=$0
  • 如果匹配mr,则会在data变量中添加该行并设置标记p=1
  • 最后,当它匹配ar时,如果设置了标志,它将打印数据和当前行。最后,它重新初始化了旗帜。