具有特殊字符python的正则表达式负向前瞻字符串

时间:2017-11-02 10:04:57

标签: python regex string lookahead

这是关于网站上的内容维度。这个link checker tool支持Python Regex。使用链接检查器,我想获得有关一个​​内容维度的信息。

我想匹配除字符de_de之外的所有内容(对于--no-follow-url选项)。

https://www.example.com/int_en
https://www.example.com/int_de
https://www.example.com/de_de  ##should not match or all others should match
https://www.example.com/be_de
https://www.example.com/fr_fr
https://www.example.com/gb_en
https://www.example.com/us_en
https://www.example.com/ch_de
https://www.example.com/ch_it
https://www.example.com/shop

我陷入了这些方法之间的某个地方:

https:\/\/www.example.com\/\bde\_de
https:\/\/www.example.com\/[^de]{2,3}[^de]
https:\/\/www.example.com\/[a-z]{2,3}\_[^d][^e]
https:\/\/www.example.com\/([a-z]{2,3}\_)(?!^de$)
https:\/\/www.example.com\/[a-z]{2,3}\_
https:\/\/www.example.com\/(?!^de\_de$)

如何使用负前瞻来匹配带有特殊字符(下划线)的字符串?我可以选择像

这样的东西吗?
(?!^de_de$)

我是regex的新手,感谢任何帮助或意见。

2 个答案:

答案 0 :(得分:1)

使用以下正则表达式:

https://www\.example\.com/(?!de_de(?:/|$))[a-z_]+

请参阅regex demo。如果您还想与http匹配,请在模式s? http之后添加https?://www\.example\.com/(?!de_de(?:/|$))[a-z_]+

请注意,您应该转义点以匹配字符串中的实际文字点。 (?!de_de(?:/|$))[a-z_]+部分匹配[a-z_]+后面跟de_de或字符串结尾不是/的任何1个字母/下划线(请参阅import re ex = ["https://www.example.com/int_en","https://www.example.com/int_de","https://www.example.com/de_de","https://www.example.com/be_de","https://www.example.com/de_en","https://www.example.com/fr_en","https://www.example.com/fr_fr","https://www.example.com/gb_en","https://www.example.com/us_en","https://www.example.com/ch_de","https://www.example.com/ch_it"] rx = r"https://www\.example\.com/(?!de_de(?:/|$))[a-z_]+" for s in ex: m = re.search(rx, s) if m: print("{} => MATCHED".format(s)) else: print("{} => NOT MATCHED".format(s)) )。

Python demo

https://www.example.com/int_en => MATCHED
https://www.example.com/int_de => MATCHED
https://www.example.com/de_de => NOT MATCHED
https://www.example.com/be_de => MATCHED
https://www.example.com/de_en => MATCHED
https://www.example.com/fr_en => MATCHED
https://www.example.com/fr_fr => MATCHED
https://www.example.com/gb_en => MATCHED
https://www.example.com/us_en => MATCHED
https://www.example.com/ch_de => MATCHED
https://www.example.com/ch_it => MATCHED

输出:

executable.exe %parameter1% "%paramter2%"

答案 1 :(得分:0)

你可以尝试:

https:\/\/www.example.com\/.+?(?<!de_de)\b

匹配:

https://www.example.com/shop

但不是:

https://www.example.com/de_de

Pythex链接here

说明:这里我们使用(?<!de_de)后面的负面外观应用于单词边界(\b)。这意味着我们必须找到一个前面没有“de_de”的单词边界。