如何使用vim从文本中提取特定段落?

时间:2012-02-15 09:55:07

标签: vim text extract

我试图从包含这种格式文本的大文件中提取多次测试

CL blahblahblah  
SP blahblahblah blahblahblah blahblahblah  
DE blahblahblahblahblahblah blahblahblah blahblahblah   
   blahblahblah blahblahblah blahblahblah blahblahblah  
AB blahblahblah blahblahblah blahblahblah 
   blahblahblahblahblahblah blahblahblah blahblahblah
   blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah    
   blahblahblah blahblahblah blahblahblah   
C1 blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah 
   blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah   
   blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah 
   lahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah 
RP blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah
   blahblahblah blahblahblah  
EM blahblahblah blahblahblah blahblahblah blahblahblah  
NR blahblahblah blahblahblah blahblahblah blahblahblah  
TC blahblahblah blahblahblah blahblahblah blahblahblah 
   blahblahblah blahblahblah blahblahblah blahblahblah  
Z9 blahblahblah blahblahblah blahblahblah blahblahblah  
PU blahblahblah blahblahblah blahblahblah blahblahblah  
PI blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah  

我只对以C1,AB,TI开头的条目感兴趣,但有时这些条目跨越多行,并且跟随它们的XX标记行并不总是相同。有没有一种简单的方法只保留这些条目? 所以我剩下的文字应该是这样的:

TI blahblahblah  
AB blahblahblah b lah blahblah blah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah     
C1 blahblahblah blahblahblah blahblahblah blahblahblah  
   blahblahblah blahblahblah blahblahblah blahblahblah  
   blahblahblah blahblahblah blahblahblah blahblahblah 
TI blah blah blah blah blah blah  
AB blahblahblah blahblahblah blahblahblah blahblahblahblahblahblah blahblahblah blahblahblah blahblahblah blahblahblahblah blahblahblah blahblahblah blahblahblah   
   blahblahblah blahblahblah blahblahblah blahblahblah  blahblahblah blahblahblah blahblahblah blahblahblah 
   blahblahblah blahblahblah blahblahblah blahblahblah 
C1 blahblahblah blahblahblah blahblahblah blahblahblahblahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah 

依此类推......

非常感谢!

5 个答案:

答案 0 :(得分:3)

这应该有效:

:let @a="" | g/^\v<(C1|AB|TI)>/norm! "Ay/^\S^M

编辑特定于Windows:您需要在该行中添加“返回”,键入^M Cq 输入 (或C-v如果你没有使用Windows或你的vimrc没有设置behave mswin

获取寄存器"a中的行。用这些行替换缓冲区:

:%d | put a

或者,将其放入新缓冲区:

:new | put a

答案 1 :(得分:3)

我愿意:

:$put='X' | 1,$-1g/^\(\s\|C1\|AB\|TI\)\@!/   ,/^\S/-d
:$d

这将执行以下操作:

  • 在末尾插入包含“X”的行
  • 除了最后一行(1,$-1)之外的每一行,如果它以非空格开头并且不以C1,AB或TI(g/pattern/)开头,则删除(d)直到不包含空格,/pattern/的下一行(-的缩写为-1
  • 删除末尾的“X”行

为了尝试使用Gvim:

  • 将此代码复制到剪贴板
  • 在Gvim中运行:@+(从链接到剪贴板的+寄存器播放Ex命令)。

我得到了什么:

AB blahblahblah blahblahblah blahblahblah 
   blahblahblahblahblahblah blahblahblah blahblahblah
   blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah    
   blahblahblah blahblahblah blahblahblah   
C1 blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah 
   blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah   
   blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah 
   lahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah 

答案 2 :(得分:3)

awk解决方案:

awk '
BEGIN{
    tags["C1"]
    tags["AB"]
    tags["TI"]
}
{
    match($0, /^\w+/)
    if(RSTART)
        t=substr($0, RSTART, RLENGTH)
}
t in tags' input.txt

转换为vim命令:

:g/^/let t=matchstr(getline('.'), '^\w\+') | if !empty(t) | let tag=t | endif | if index(['C1', 'AB', 'TI'], tag)==-1 | d | endif

答案 3 :(得分:2)

这似乎有效,但在文件末尾留下一个空白行。

:%s/\v^(C1|AB|TI|\s)@!\_.{-}\n(C1|AB|TI|$)@=//

这个正则表达式使用了一些棘手的功能,我将尝试解释。

  • \v说这种模式“非常神奇”,只是让我们在几个地方跳过反斜杠。
  • ^(C1|AB|TI|\s)@!匹配任何不以目标代码或空格开头的行。
  • \_.匹配任何字符,包括换行符。
  • {-}尽可能少地匹配前一个原子(非贪婪)。
  • \n匹配一行的结尾。
  • (C1|AB|TI|$)@=匹配目标代码或行尾(对于最终案例),宽度为零。

测试输入的结果如下:

AB blahblahblah blahblahblah blahblahblah
   blahblahblahblahblahblah blahblahblah blahblahblah
   blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah
   blahblahblah blahblahblah blahblahblah
C1 blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah
   blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah
   blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah

答案 4 :(得分:0)

另一个awk在线人员:

awk -F' |\t' '{if($1)f=$1~/CI|AB|C1/?1:0}f' yourFile