如何使用一个模式为第一行连接几个连续的行,为所有后续行使用另一个模式,最好使用sed?

时间:2015-10-23 12:31:48

标签: regex bash sed

我想从这个例子中删除整个“Derived words”部分,两者都是。到目前为止,我已经提出了一个想法,即使用该行连接“Derived words:”行之后的行并将其删除,但我不能只加入以下两行,行数可能因文章而异。所以,我的想法是检查线是否匹配模式'^ Derived words:'然后检查下一行是否匹配模式'^ [az]'如果是真的,加在一起,检查下一行......听起来像工作是完美定制的对于Bash的if-then-else但是如果可能的话我更喜欢纯粹的sed解决方案。

A swift event or process happens very quickly or without delay.
Our task is to challenge the UN to make a swift decision... 
The police were swift to act. 
Syn:
quick
Derived words:
swiftly  The French have acted swiftly and decisively to protect their industries. 
swiftness  The secrecy and swiftness of the invasion shocked and amazed army officers. 
  Something that is swift moves very quickly.
With a swift movement, Matthew Jerrold sat upright. 
Syn:
quick
Derived words:
swiftly  ^[[0;37m...a swiftly flowing stream. 
swiftness  With incredible swiftness she ran down the passage. 
  A swift is a small bird with long curved wings.

预期结果

A swift event or process happens very quickly or without delay.
Our task is to challenge the UN to make a swift decision... 
The police were swift to act. 
Syn:
quick
  Something that is swift moves very quickly.
With a swift movement, Matthew Jerrold sat upright. 
Syn:
quick
  A swift is a small bird with long curved wings.

提前致谢

3 个答案:

答案 0 :(得分:1)

这可能适合你(GNU sed):

sed -n '/^Derived words:/{:a;n;/^\w/ba};p' file

使用seds grep-like flag -n并在遇到Derived words:时继续阅读,直到在一行的开头匹配非单词。

答案 1 :(得分:0)

我发现当你想要处理许多行的块时,最好的工具往往是awk,例如:

awk '/^Derived words/{skip=1} /^ /{skip=0} 1{if(!skip)print}' input

A swift event or process happens very quickly or without delay.
Our task is to challenge the UN to make a swift decision...
The police were swift to act.
Syn:
quick
  Something that is swift moves very quickly.
With a swift movement, Matthew Jerrold sat upright.
Syn:
quick
  A swift is a small bird with long curved wings.

答案 2 :(得分:0)

这应该适用于常规(非GNU)sed。可能有一种方法可以消除多余的模式,但我还没有想出它。

sed -e :a -e '/^Derived words:/N;s/\n[a-z]//;ta' -e 's/^Derived words:.*\n//'

以下是它的工作原理:

  • 您说要删除"派生词:"如果以字母开头,则跟随它的任何行(让我们称之为延续行)。
  • 所以sed读取输入并像往常一样逐行回显到stdout。
  • 但是当遇到"派生词时:"在一行的开头,在回显之前,它会读取下一行进入模式空间并附加到"派生的单词:",用换行符分隔它们(N命令),仍然没有回应任何内容它看到了#34;派生词:"。然后它尝试删除新行和紧随其后的字母字符(s命令)。

    • 如果可以,那么它必须找到一个延续线,所以它试图通过跳转到脚本的开头再次这样做(t命令,它有条件地跳转到标签" a&#34 ;使用冒号命令预先定义),它将附加下一行,依此类推。
    • 如果它不能,那么它将带有"派生词:" line加上任何连续的行(没有删除它们的换行符) plus 下一个非续行行,它与换行符分隔。
  • 如果它看到它有一行以"派生词开头:",则删除它直到并包括换行符(第二个s命令) - 离开部分在新线之后,下一条非延续线 - 它回应。然后它继续处理下一行的输入。