Question

我正在尝试过滤链接列表以仅匹配那些仅属于第二级域的链接

成功：

| t2_id | t1_line | quantity1 | quantity2 |
|-------|---------|-----------|-----------|
|     1 |       1 |         4 |         2 |
|     1 |       2 |         4 |         1 |
|     1 |       3 |         8 |         3 |

失败：

https://www.thingisawesome.anything
https://thingisawesome.anything
http://www.thingisawesome.anything
http://thingisawesome.anything
http://thingisawesome.anything/
https://www.thingisawesome.anything/

这使我接近：

http://thingisawesome.ventures/index.html https://subdomain.geocities.com/ https://www.twitter.com/8288hs98ff

但是它不会拒绝失败的对象，只匹配其中的一部分。

Answer 1

基于示例（它们与TLD不匹配），这些示例显示以换行符分隔的URL列表（不包含unencoded IDNs），并假定以后不使用尝试中的捕获组，因此您想匹配（在多行模式下）：

行首
http
可选的s
://
可选的www.
2LD和TLD which can consist of letters, numbers, and hyphens (but not in the initial position)
一个可选的斜杠
行尾

将它们放在一起可以得到：

^https?://(?:www\.)?[a-zA-Z0-9][a-zA-Z0-9-]+\.[a-zA-Z0-9][a-zA-Z0-9-]+/?$

Try it.

仅匹配仅包含URL列表中二级域名的链接

1 个答案: