我需要帮助使用Regex for notepad ++来匹配除XML之外的所有内容
我正在使用的正则表达式:
WITH RECURSIVE
selectedTrains(name) AS(
select train
from visits
where country in (select country from countries)
group by train
order by count(city) DESC
LIMIT 1
UNION
select train
from visits
where country in (select country from countries)
and city not in (
select city
from visits
where train in (select name from selectedTrains)
and country in (select country from countries)
)
group by train
order by count(city) DESC
LIMIT 1
),
countries(country) AS (
select country_name
from country_data
where country_name in ("USA","China","India")
)
SELECT * FROM train_data WHERE train_no IN selectedTrains;
< - 我希望与此相反(前三行)
示例代码:
(!?\<.*\>)
预期结果:
[20173003] This text is what I want to delete [<Person><Name>Foo</Name><Surname>Bar</Surname></Person>], and this text too.
[20173003] This is another text to delete [<Person><Name>Bar</Name><Surname>Foo</Surname></Person>]
[20173003] This text too... [<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>], delete me!
[20173003] But things like this make the regex to fail < [<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>], or this>
提前致谢!
答案 0 :(得分:2)
这并不完美,但应该使用看起来非常简单且结构合理的输入。
如果您只需处理一个未加载的<Person>
代码,则可以使用简单的(<Person>.*?</Person>)|.
正则表达式(将匹配并捕获到第1组{{1} } tag并将匹配任何其他char)并替换为条件替换模式<Person>
(将使用换行符重新插入(?{1}$1\n:)
标记,或者将匹配替换为空字符串):
为了使它更通用,您可以使用基于递归的Boost正则表达式以及相应的条件替换模式捕获开始和相应的结束XML标记:
查找内容:Person
替换为:(<(\w+)[^>]*>(?:(?!</?\2\b).|(?1))*</\2>)|.
(?{1}$1\n:)
匹配换行符:.
正则表达式详细信息:
ON
- 捕获第1组(稍后将通过(<(\w+)[^>]*>(?:(?!</?\2\b).|(?1))*</\2>)
子路由调用进行递归)匹配
(?1)
- 任何开头标记,其名称已捕获到第2组<(\w+)[^>]*>
- 零次或多次出现:
(?:(?!</?\2\b).|(?1))*
- 任何字符((?!</?\2\b).
)未开始.
+标记名称序列作为整个单词,前面带有可选的</
/
- 或|
- 整个第1组子模式被递归(重复)(?1)
- 相应的结束标记</\2>
- 或|
- 任何一个字符。替换模式:
.
- 如果第1组匹配:
(?{1}
- 替换为其内容+换行符$1\n
- 其他用空字符串替换:
- 替换模式结束。