Question

我希望使用来自

等网址的正则表达式来获取“the-game”

http://www.somesite.com.domain.webdev.domain.com/en/the-game/another-one/another-one/another-one/
http://www.somesite.com.domain.webdev.domain.com/en/the-game/another-one/another-one/
http://www.somesite.com.domain.webdev.domain.com/en/the-game/another-one/

Answer 1

var myregexp = /^(?:[^\/]*\/){4}([^\/]+)/;
var match = myregexp.exec(subject);
if (match != null) {
    result = match[1];
} else {
    result = "";
}

匹配第四个和第五个斜杠之间的任何位置，并将结果存储在变量result中。

Answer 2

网址的哪些部分可能会有所不同，哪些部分是不变的？以下正则表达式将始终匹配示例中“/ en /” - the-game之后的斜杠中的任何内容。

(?<=/en/).*?(?=/)

假设第一组斜杠包含2或3个字符的语言代码，则此匹配将包含任何包含“webdev”的URL的第二组斜杠的内容。

(?<=.*?webdev.*?/.{2,3}/).*?(?=/)

希望您可以调整这些示例来完成您正在寻找的内容。

Answer 3

你可能应该使用某种url解析库而不是使用正则表达式。

在python中：

from urlparse import urlparse
url = urlparse('http://www.somesite.com.domain.webdev.domain.com/en/the-game/another-one/another-one/another-one/')
print url.path

哪会产生：

/en/the-game/another-one/another-one/another-one/

从那里，你可以做一些简单的事情，比如从路径的开头剥离/en/。否则，你肯定会对正则表达式做错。不要重新发明轮子！

正则表达式：从URL获取内容

3 个答案: