Question

我有一个像这样的URL列表：

http://www.toto.com/bags/handbags/test1/
http://www.toto.com/bags/handbags/smt1/
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/
http://www.toto.com/bags/handbags/smt1/smt2/testing/
http://www.toto.com/bags/handbags/smt1/smt2/testing.html

我想要的是只采用像

这样的URL

http://www.toto.com/something/else/again/more

受限于此，如果还有更多，则不予采取。

你能救我吗？：）

Answer 1

适当的正则表达式是：

^http://www.toto.com/(\w+/){4}$

过滤示例：

>>> for line in lines:
...     if re.match(r'^http://www.toto.com/(\w+/){4}$', line):
...         print line
... 
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/

Answer 2

你可以这样做：

https://regex101.com/r/gK6hR3/1

但在最后

添加$

http:\/\/www\.[a-zA-Z.-]+\/[a-zA-Z-]+[\/]{0,1}[\.a-zA-Z-]{0,}

这样：

http:\/\/www\.[a-zA-Z.-]+\/[a-zA-Z-]+[\/]{0,1}[\.a-zA-Z-]{0,}$

关于特定URL的Regexp

2 个答案: