在两个标记之间抓取一个文本块,如果块包含某些标记,则追加以下行

时间:2014-08-02 23:07:07

标签: regex sed grep

我想在标签之间抓取一段文字。 开始标记将使用正则表达式匹配进行标识,并且结束标记将是静态的。

我确实在搜索某些方法,例如thisthis,但我无法找到解决我所遇到的问题的方法,因为它有一些更具体的条件。 。 让我举一个我有的文本文件的例子:

<...text-to-ignore...>
tag_list_index
    tag1 ...................... 51
    tag2 .............. 54
    tagn ......... 243
    <...lots-of-text-to-ignore...>
        tag1
        headerA headerB headerC
        fieldx  description ...
        fieldy  description ... (a)
        fieldw  description ... 
        fieldz  description ... (c)
        fieldt  description ... (b)
                        Máx: 234+var
        (a) - Note1
        (b) - Note2
        (c) - Note3
        <...more-text-to-ignore...>
        tag2
        headerA headerB headerC
        fielda  description ...
        fieldj  description ... (a)
                        Max: 234+var
        (a) - Note1
        <...more-text-to-ignore...>
        tagn
        headerA headerB headerC
        fieldr  description ...
        fieldg  description ... 
                    Máx: 234+var
        <...more-text-to-ignore...>

所以目的是用 M á x: M tagn 和下一行之间的文本> a x:;加上在结束标记之后立即在行中的 notes ,当然,如果在抓取的文本块中有任何内容。 实际上,输出将是:

        tag1
        headerA headerB headerC
        fieldx  description ...
        fieldy  description ... (a)
        fieldw  description ... 
        fieldz  description ... (c)
        fieldt  description ... (b)
                        Máx: 214+var
        (a) - Note1
        (b) - Note2
        (c) - Note3
        tag2
        headerA headerB headerC
        fielda  description ...
        fieldj  description ... (a)
                        Max: 13
        (a) - Note1
        tagn
        headerA headerB headerC
        fieldr  description ...
        fieldg  description ... 
                        Máx: 23+var
你能救我吗? 使用该工具没有具体要求。

2 个答案:

答案 0 :(得分:0)

sed -nr '/^ +tag[0-9n]+$/,/M[áa]x: /p;:A;s/^        \([a-z]\)/&/;tB;b;:B;p;n;bA' file.txt

输出:

   tag1
    headerA headerB headerC
    fieldx  description ...
    fieldy  description ... (a)
    fieldw  description ... 
    fieldz  description ... (c)
    fieldt  description ... (b)
                    Máx: 234+var
    (a) - Note1
    (b) - Note2
    (c) - Note3
    tag2
    headerA headerB headerC
    fielda  description ...
    fieldj  description ... (a)
                    Max: 234+var
    (a) - Note1
    tagn
    headerA headerB headerC
    fieldr  description ...
    fieldg  description ... 
                Máx: 234+var

限制:如果有一个音符或多个音符,则<...more-text-to-ignore...>在下一个标记之前很重要。

答案 1 :(得分:0)

尽管答案被接受了,但它确实完成了目的,但我一开始并不清楚sed是如何做这项工作的。我确实对它进行了进一步调查并重新调整了该命令,以便在阅读时让我更清楚。 我正在分享它,以及对每个命令的一些评论,以防它对任何其他人都有用。

sed -nr '/START/,/END/  {
 #print the block of text delimited by START and END
 p
 #Label A is stated
 :A
 # Substitutes all notes (a),(b),(c),... by them self. Meaning (a) is
 # substituted by (a), (b) by (b) and so on. Indeed, nothing is done.
 # This is a trigger for the next command... 
 s/^\([a-z]\)/&/
 # Command t will jump to label B (case insensitive), 
 # if any substitution was performed.
 tb
 # A branch without a label in front is saying: go to the end of script
 b
 #Label B is stated
 :B
 #prints the line
 p
 #Prints the current line and reads the next one
 n
 # Go up to label A again
 bA
 }' file.txt