在下面的字符串中,我要提取子字符串。
场景:
一旦字符串中的wafers_starts
匹配,就应从前一个匹配项<ac:structured-macro
中选择该字符串。 (在下面的示例中有两个,我只需要在wafers_starts
之前的一个)
应该选择子字符串,直到它与wafers_ends
加上第一个结束标记</ac:structured-macro>
匹配为止。
示例代码:
if ($matches -ne $null) { Remove-Variable $matches }
$confluenceHtml = "<h2>Description</h2><ac:structured-macro ac:macro-id=""77f3n751-39w7-4746-acd4-bee7586449ed"" ac:name=""warning"" ac:schema-version=""1""><ac:parameter ac:name=""title"">Compatibility</ac:parameter><ac:rich-text-body><p class=""auto-cursor-target""><br/></p><table class=""wrapped""><colgroup> <col/> <col/> <col/> </colgroup><tbody><tr><th><p>Prerequisite</p><p>This needs a progressive rollout </p></td></tr></tbody></table><p class=""auto-cursor-target""><br/></p><p class=""auto-cursor-target""><br/></p><ac:structured-macro ac:macro-id=""4657sd53-e024-4ea3-a5e2-4586667542da"" ac:name=""excerpt"" ac:schema-version=""1""><ac:parameter ac:name=""hidden"">true</ac:parameter><ac:parameter ac:name=""atlassian-macro-output-type"">INLINE</ac:parameter><ac:rich-text-body><p>wafers_starts</p></ac:rich-text-body></ac:structured-macro><h2>Deployment Notes</h2><ac:structured-macro ac:macro-id=""77f5e121-31d7-4576-awq4-bej57t6d39ed"" ac:name=""warning"" ac:schema-version=""1""><ac:parameter ac:name=""title"">Compatibility</ac:parameter> <ac:rich-text-body><p class=""auto-cursor-target""><br/></p><table class=""wrapped""><colgroup> <col/> <col/> <col/> </colgroup><tbody><tr><th><p>Prerequisite</p><p>This needs a progressive rollout 2,3,4,5 and so on</p></td></tr></tbody></table><p class=""auto-cursor-target""><br/></p><ac:structured-macro ac:macro-id=""72d7h552-a5dd-44cc-a4re-6f3247574fbd"" ac:name=""excerpt"" ac:schema-version=""1""><ac:parameter ac:name=""hidden"">true</ac:parameter><ac:parameter ac:name=""atlassian-macro-output-type"">INLINE</ac:parameter><ac:rich-text-body><p>wafers_ends</p></ac:rich-text-body></ac:structured-macro><p class=""auto-cursor-target""><br/></p></ac:structured-macro>"
if ($confluenceHtml -match '\<ac:structured-macro.+?wafers_starts([\s\S]*)wafers_ends.+?\<\/ac:structured-macro\>') {
$matches[0]
}
输出:
<ac:structured-macro ac:macro-id="77f3n751-39w7-4746-acd4-bee7586449ed" ac:name="warning" ac:schema-version="1"><ac:parameter ac:name="title">Compatibility</ac:parameter><ac:rich-text-body><p class="auto-cursor-target"><br/></p><table class="wrapped"><colgroup> <col/> <col/> <col/> </colgroup><tbody><tr><th><p>Prerequisite</p><p>This needs a progressive rollout </p></td></tr></tbody></table><p class="auto-cursor-target"><br/></p><p class="auto-cursor-target"><br/></p><ac:structured-macro ac:macro-id="4657sd53-e024-4ea3-a5e2-4586667542da" ac:name="excerpt" ac:schema-version="1"><ac:parameter ac:name="hidden">true</ac:parameter><ac:parameter ac:name="atlassian-macro-output-type">INLINE</ac:parameter><ac:rich-text-body><p>wafers_starts</p></ac:rich-text-body></ac:structured-macro><h2>Deployment Notes</h2><ac:structured-macro ac:macro-id="77f5e121-31d7-4576-awq4-bej57t6d39ed" ac:name="warning" ac:schema-version="1"><ac:parameter ac:name="title">Compatibility</ac:parameter> <ac:rich-text-body><p class="auto-cursor-target"><br/></p><table class="wrapped"><colgroup> <col/> <col/> <col/> </colgroup><tbody><tr><th><p>Prerequisite</p><p>This needs a progressive rollout 2,3,4,5 and so on</p></td></tr></tbody></table><p class="auto-cursor-target"><br/></p><ac:structured-macro ac:macro-id="72d7h552-a5dd-44cc-a4re-6f3247574fbd" ac:name="excerpt" ac:schema-version="1"><ac:parameter ac:name="hidden">true</ac:parameter><ac:parameter ac:name="atlassian-macro-output-type">INLINE</ac:parameter><ac:rich-text-body><p>wafers_ends</p></ac:rich-text-body></ac:structured-macro>
问题:
子字符串的结尾可以。但是,即使经过几次尝试也无法获得子字符串的开头。正则表达式从<ac:structured-macro
的第一次出现开始就包括在内。
期望的输出:
我只想要下面的子字符串,该子字符串只包含一次<ac:structured-macro
,就在第一个匹配的字符串wafers_starts
之前
<ac:structured-macro ac:macro-id="4657sd53-e024-4ea3-a5e2-4586667542da" ac:name="excerpt" ac:schema-version="1"><ac:parameter ac:name="hidden">true</ac:parameter><ac:parameter ac:name="atlassian-macro-output-type">INLINE</ac:parameter><ac:rich-text-body><p>wafers_starts</p></ac:rich-text-body></ac:structured-macro><h2>Deployment Notes</h2><ac:structured-macro ac:macro-id="77f5e121-31d7-4576-awq4-bej57t6d39ed" ac:name="warning" ac:schema-version="1"><ac:parameter ac:name="title">Compatibility</ac:parameter> <ac:rich-text-body><p class="auto-cursor-target"><br/></p><table class="wrapped"><colgroup> <col/> <col/> <col/> </colgroup><tbody><tr><th><p>Prerequisite</p><p>This needs a progressive rollout 2,3,4,5 and so on</p></td></tr></tbody></table><p class="auto-cursor-target"><br/></p><ac:structured-macro ac:macro-id="72d7h552-a5dd-44cc-a4re-6f3247574fbd" ac:name="excerpt" ac:schema-version="1"><ac:parameter ac:name="hidden">true</ac:parameter><ac:parameter ac:name="atlassian-macro-output-type">INLINE</ac:parameter><ac:rich-text-body><p>wafers_ends</p></ac:rich-text-body></ac:structured-macro>
问题:
寻找正确的/有效的正则表达式模式。
答案 0 :(得分:2)
您需要使用此正则表达式,该正则表达式使用tempered greedy token (?:(?!ac:structured-macro).)+
模式来拒绝ac:structured-macro
的首次匹配之后的任何进一步匹配。
<ac:structured-macro(?:(?!ac:structured-macro).)+wafers_starts([\s\S]*)wafers_ends.+?<\/ac:structured-macro>