Question

这是关于网站上的内容维度。这个link checker tool支持Python Regex。使用链接检查器，我想获得有关一个内容维度的信息。

我想匹配除字符de_de之外的所有内容（对于--no-follow-url选项）。

https://www.example.com/int_en
https://www.example.com/int_de
https://www.example.com/de_de  ##should not match or all others should match
https://www.example.com/be_de
https://www.example.com/fr_fr
https://www.example.com/gb_en
https://www.example.com/us_en
https://www.example.com/ch_de
https://www.example.com/ch_it
https://www.example.com/shop

我陷入了这些方法之间的某个地方：

https:\/\/www.example.com\/\bde\_de
https:\/\/www.example.com\/[^de]{2,3}[^de]
https:\/\/www.example.com\/[a-z]{2,3}\_[^d][^e]
https:\/\/www.example.com\/([a-z]{2,3}\_)(?!^de$)
https:\/\/www.example.com\/[a-z]{2,3}\_
https:\/\/www.example.com\/(?!^de\_de$)

如何使用负前瞻来匹配带有特殊字符（下划线）的字符串？我可以选择像

这样的东西吗？

(?!^de_de$)

我是regex的新手，感谢任何帮助或意见。

Answer 1

使用以下正则表达式：

https://www\.example\.com/(?!de_de(?:/|$))[a-z_]+

请参阅regex demo。如果您还想与http匹配，请在模式s? http之后添加https?://www\.example\.com/(?!de_de(?:/|$))[a-z_]+。

请注意，您应该转义点以匹配字符串中的实际文字点。 (?!de_de(?:/|$))[a-z_]+部分匹配[a-z_]+后面跟de_de或字符串结尾不是/的任何1个字母/下划线（请参阅import re ex = ["https://www.example.com/int_en","https://www.example.com/int_de","https://www.example.com/de_de","https://www.example.com/be_de","https://www.example.com/de_en","https://www.example.com/fr_en","https://www.example.com/fr_fr","https://www.example.com/gb_en","https://www.example.com/us_en","https://www.example.com/ch_de","https://www.example.com/ch_it"] rx = r"https://www\.example\.com/(?!de_de(?:/|$))[a-z_]+" for s in ex: m = re.search(rx, s) if m: print("{} => MATCHED".format(s)) else: print("{} => NOT MATCHED".format(s))）。

Python demo：

https://www.example.com/int_en => MATCHED
https://www.example.com/int_de => MATCHED
https://www.example.com/de_de => NOT MATCHED
https://www.example.com/be_de => MATCHED
https://www.example.com/de_en => MATCHED
https://www.example.com/fr_en => MATCHED
https://www.example.com/fr_fr => MATCHED
https://www.example.com/gb_en => MATCHED
https://www.example.com/us_en => MATCHED
https://www.example.com/ch_de => MATCHED
https://www.example.com/ch_it => MATCHED

输出：

executable.exe %parameter1% "%paramter2%"

Answer 2

你可以尝试：

https:\/\/www.example.com\/.+?(?<!de_de)\b

匹配：

https://www.example.com/shop

但不是：

https://www.example.com/de_de

Pythex链接here

说明：这里我们使用(?<!de_de)后面的负面外观应用于单词边界（\b）。这意味着我们必须找到一个前面没有“de_de”的单词边界。

具有特殊字符python的正则表达式负向前瞻字符串

2 个答案: