我想在标签之间抓取一段文字。 开始标记将使用正则表达式匹配进行标识,并且结束标记将是静态的。
我确实在搜索某些方法,例如this或this,但我无法找到解决我所遇到的问题的方法,因为它有一些更具体的条件。 。 让我举一个我有的文本文件的例子:
<...text-to-ignore...>
tag_list_index
tag1 ...................... 51
tag2 .............. 54
tagn ......... 243
<...lots-of-text-to-ignore...>
tag1
headerA headerB headerC
fieldx description ...
fieldy description ... (a)
fieldw description ...
fieldz description ... (c)
fieldt description ... (b)
Máx: 234+var
(a) - Note1
(b) - Note2
(c) - Note3
<...more-text-to-ignore...>
tag2
headerA headerB headerC
fielda description ...
fieldj description ... (a)
Max: 234+var
(a) - Note1
<...more-text-to-ignore...>
tagn
headerA headerB headerC
fieldr description ...
fieldg description ...
Máx: 234+var
<...more-text-to-ignore...>
所以目的是用 M á x:或 M tagn 和下一行之间的文本> a x:;加上在结束标记之后立即在行中的 notes ,当然,如果在抓取的文本块中有任何内容。 实际上,输出将是:
tag1
headerA headerB headerC
fieldx description ...
fieldy description ... (a)
fieldw description ...
fieldz description ... (c)
fieldt description ... (b)
Máx: 214+var
(a) - Note1
(b) - Note2
(c) - Note3
tag2
headerA headerB headerC
fielda description ...
fieldj description ... (a)
Max: 13
(a) - Note1
tagn
headerA headerB headerC
fieldr description ...
fieldg description ...
Máx: 23+var
你能救我吗?
使用该工具没有具体要求。
答案 0 :(得分:0)
sed -nr '/^ +tag[0-9n]+$/,/M[áa]x: /p;:A;s/^ \([a-z]\)/&/;tB;b;:B;p;n;bA' file.txt
输出:
tag1
headerA headerB headerC
fieldx description ...
fieldy description ... (a)
fieldw description ...
fieldz description ... (c)
fieldt description ... (b)
Máx: 234+var
(a) - Note1
(b) - Note2
(c) - Note3
tag2
headerA headerB headerC
fielda description ...
fieldj description ... (a)
Max: 234+var
(a) - Note1
tagn
headerA headerB headerC
fieldr description ...
fieldg description ...
Máx: 234+var
限制:如果有一个音符或多个音符,则<...more-text-to-ignore...>
在下一个标记之前很重要。
答案 1 :(得分:0)
尽管答案被接受了,但它确实完成了目的,但我一开始并不清楚sed是如何做这项工作的。我确实对它进行了进一步调查并重新调整了该命令,以便在阅读时让我更清楚。 我正在分享它,以及对每个命令的一些评论,以防它对任何其他人都有用。
sed -nr '/START/,/END/ {
#print the block of text delimited by START and END
p
#Label A is stated
:A
# Substitutes all notes (a),(b),(c),... by them self. Meaning (a) is
# substituted by (a), (b) by (b) and so on. Indeed, nothing is done.
# This is a trigger for the next command...
s/^\([a-z]\)/&/
# Command t will jump to label B (case insensitive),
# if any substitution was performed.
tb
# A branch without a label in front is saying: go to the end of script
b
#Label B is stated
:B
#prints the line
p
#Prints the current line and reads the next one
n
# Go up to label A again
bA
}' file.txt