Question

我想在标签之间抓取一段文字。开始标记将使用正则表达式匹配进行标识，并且结束标记将是静态的。

我确实在搜索某些方法，例如this或this，但我无法找到解决我所遇到的问题的方法，因为它有一些更具体的条件。。让我举一个我有的文本文件的例子：

<...text-to-ignore...>
tag_list_index
    tag1 ...................... 51
    tag2 .............. 54
    tagn ......... 243
    <...lots-of-text-to-ignore...>
        tag1
        headerA headerB headerC
        fieldx  description ...
        fieldy  description ... (a)
        fieldw  description ... 
        fieldz  description ... (c)
        fieldt  description ... (b)
                        Máx: 234+var
        (a) - Note1
        (b) - Note2
        (c) - Note3
        <...more-text-to-ignore...>
        tag2
        headerA headerB headerC
        fielda  description ...
        fieldj  description ... (a)
                        Max: 234+var
        (a) - Note1
        <...more-text-to-ignore...>
        tagn
        headerA headerB headerC
        fieldr  description ...
        fieldg  description ... 
                    Máx: 234+var
        <...more-text-to-ignore...>

所以目的是用 M á x：或 M tagn 和下一行之间的文本> a x：;加上在结束标记之后立即在行中的 notes ，当然，如果在抓取的文本块中有任何内容。实际上，输出将是：

        tag1
        headerA headerB headerC
        fieldx  description ...
        fieldy  description ... (a)
        fieldw  description ... 
        fieldz  description ... (c)
        fieldt  description ... (b)
                        Máx: 214+var
        (a) - Note1
        (b) - Note2
        (c) - Note3
        tag2
        headerA headerB headerC
        fielda  description ...
        fieldj  description ... (a)
                        Max: 13
        (a) - Note1
        tagn
        headerA headerB headerC
        fieldr  description ...
        fieldg  description ... 
                        Máx: 23+var

你能救我吗？使用该工具没有具体要求。

Answer 1

sed -nr '/^ +tag[0-9n]+$/,/M[áa]x: /p;:A;s/^        \([a-z]\)/&/;tB;b;:B;p;n;bA' file.txt

输出：

   tag1
    headerA headerB headerC
    fieldx  description ...
    fieldy  description ... (a)
    fieldw  description ... 
    fieldz  description ... (c)
    fieldt  description ... (b)
                    Máx: 234+var
    (a) - Note1
    (b) - Note2
    (c) - Note3
    tag2
    headerA headerB headerC
    fielda  description ...
    fieldj  description ... (a)
                    Max: 234+var
    (a) - Note1
    tagn
    headerA headerB headerC
    fieldr  description ...
    fieldg  description ... 
                Máx: 234+var

限制：如果有一个音符或多个音符，则<...more-text-to-ignore...>在下一个标记之前很重要。

Answer 2

尽管答案被接受了，但它确实完成了目的，但我一开始并不清楚sed是如何做这项工作的。我确实对它进行了进一步调查并重新调整了该命令，以便在阅读时让我更清楚。我正在分享它，以及对每个命令的一些评论，以防它对任何其他人都有用。

sed -nr '/START/,/END/  {
 #print the block of text delimited by START and END
 p
 #Label A is stated
 :A
 # Substitutes all notes (a),(b),(c),... by them self. Meaning (a) is
 # substituted by (a), (b) by (b) and so on. Indeed, nothing is done.
 # This is a trigger for the next command... 
 s/^\([a-z]\)/&/
 # Command t will jump to label B (case insensitive), 
 # if any substitution was performed.
 tb
 # A branch without a label in front is saying: go to the end of script
 b
 #Label B is stated
 :B
 #prints the line
 p
 #Prints the current line and reads the next one
 n
 # Go up to label A again
 bA
 }' file.txt

在两个标记之间抓取一个文本块，如果块包含某些标记，则追加以下行

2 个答案: