我想对父块进行分段,同时沿每个分段的文本捕获嵌套标签:
(?<tag>.)(?: href="(?<url>.+?)")?>(?<text>.+?)<
它可以工作,但是我希望在未包装在标签中的分段中,“标签”为空,但是使用当前的reg。,它们捕获了上一个分段的结束标签...:(
实时示例:https://regex101.com/r/UEZAaw/3/
我想获取的结果集,请注意,项目2和4的标签应带有null
:
{
"0":{
match: "p>The <",
tag: "p",
url: null,
text: "The "
},
"1":[
match: "a href=\"https://www.legislation.gov.uk/ukpga/2010/23/contents\">UK Bribery Act<",
tag: "a",
url: "https://www.legislation.gov.uk/ukpga/2010/23/contents",
text: "UK Bribery Act"
],
"2":[
match: "/a> (“the Act”) received Royal Assent in April 2010 and came into ... <",
tag: null
url: null,
text: " (“the Act”) received Royal Assent in April 2010 and came into ... "
],
"3":[
match: "a href=\"http://www.oecd.org/daf/anti-bribery/ConvCombatBribery_ENG.pdf\">OECD anti-bribery Convention<",
tag: "a",
url: "http://www.oecd.org/daf/anti-bribery/ConvCombatBribery_ENG.pdf",
text: "OECD anti-bribery Convention"
],
"4":[
match: "/a>. The Act outlined four prime offences, including the introduction ... <",
tag: null,
url: null,
text: ". The Act outlined four prime offences, including the introduction ... "
],
"5":[
match: "b>rest is history<",
tag: "b",
url: null,
text: "rest is history"
]
...
}
花了几个小时,还没有弄清楚,真的很感谢您的建议。
答案 0 :(得分:2)
根据我在regex101的 MATCH INFORMATION 框中看到的内容,我认为这可行:
/(?:(?<tag>(?<!\/).)|(?:\/.))(?: href="(?<url>.+?)")?>(?<text>.+?)</gm