这是关于网站上的内容维度。这个link checker tool支持Python Regex。使用链接检查器,我想获得有关一个内容维度的信息。
我想匹配除字符de_de
之外的所有内容(对于--no-follow-url
选项)。
https://www.example.com/int_en
https://www.example.com/int_de
https://www.example.com/de_de ##should not match or all others should match
https://www.example.com/be_de
https://www.example.com/fr_fr
https://www.example.com/gb_en
https://www.example.com/us_en
https://www.example.com/ch_de
https://www.example.com/ch_it
https://www.example.com/shop
我陷入了这些方法之间的某个地方:
https:\/\/www.example.com\/\bde\_de
https:\/\/www.example.com\/[^de]{2,3}[^de]
https:\/\/www.example.com\/[a-z]{2,3}\_[^d][^e]
https:\/\/www.example.com\/([a-z]{2,3}\_)(?!^de$)
https:\/\/www.example.com\/[a-z]{2,3}\_
https:\/\/www.example.com\/(?!^de\_de$)
如何使用负前瞻来匹配带有特殊字符(下划线)的字符串?我可以选择像
这样的东西吗?(?!^de_de$)
我是regex的新手,感谢任何帮助或意见。
答案 0 :(得分:1)
使用以下正则表达式:
https://www\.example\.com/(?!de_de(?:/|$))[a-z_]+
请参阅regex demo。如果您还想与http
匹配,请在模式s?
http
之后添加https?://www\.example\.com/(?!de_de(?:/|$))[a-z_]+
。
请注意,您应该转义点以匹配字符串中的实际文字点。 (?!de_de(?:/|$))[a-z_]+
部分匹配[a-z_]+
后面跟de_de
或字符串结尾不是/
的任何1个字母/下划线(请参阅import re
ex = ["https://www.example.com/int_en","https://www.example.com/int_de","https://www.example.com/de_de","https://www.example.com/be_de","https://www.example.com/de_en","https://www.example.com/fr_en","https://www.example.com/fr_fr","https://www.example.com/gb_en","https://www.example.com/us_en","https://www.example.com/ch_de","https://www.example.com/ch_it"]
rx = r"https://www\.example\.com/(?!de_de(?:/|$))[a-z_]+"
for s in ex:
m = re.search(rx, s)
if m:
print("{} => MATCHED".format(s))
else:
print("{} => NOT MATCHED".format(s))
)。
https://www.example.com/int_en => MATCHED
https://www.example.com/int_de => MATCHED
https://www.example.com/de_de => NOT MATCHED
https://www.example.com/be_de => MATCHED
https://www.example.com/de_en => MATCHED
https://www.example.com/fr_en => MATCHED
https://www.example.com/fr_fr => MATCHED
https://www.example.com/gb_en => MATCHED
https://www.example.com/us_en => MATCHED
https://www.example.com/ch_de => MATCHED
https://www.example.com/ch_it => MATCHED
输出:
executable.exe %parameter1% "%paramter2%"
答案 1 :(得分:0)
你可以尝试:
https:\/\/www.example.com\/.+?(?<!de_de)\b
匹配:
https://www.example.com/shop
但不是:
https://www.example.com/de_de
Pythex链接here
说明:这里我们使用(?<!de_de)
后面的负面外观应用于单词边界(\b
)。这意味着我们必须找到一个前面没有“de_de”的单词边界。