我试图将博客的名称提取为(stephania-bell)。
我实现了以下功能以从URL中提取期望值:
def getBlogName( def decodeUrl )
{
def urlParams = this.paramsParser.parseURIToMap( URI.create( decodeUrl ) )
def temp = decodeUrl.replace( "http://www.espn.com", "" )
.replaceAll( "(/_/|\\?).*", "" )
.replace( "/index", "" )
.replace( "/insider", "" )
.replace( "/post", "" )
.replace( "/tag", "" )
.replace( "/category", "" )
.replace( "/", "" )
.replace( "/blog/", "" )
def blogName = temp.replace( "/", "" )
return blogName
}
但是我遗漏了一些东西,它返回的值是blogstephania-bell
。您能帮我了解函数实现中缺少的内容吗?或者也许有更好的方法来做同样的事情?
答案 0 :(得分:2)
不是您问的,而是为了好玩(我以为这是您最初想要的)
@Grab('org.jsoup:jsoup:1.11.3')
import static org.jsoup.Jsoup.connect
def name = connect('http://www.espn.com/blog/stephania-bell/post/_/id/3563/key-fantasy-football-injury-updates-for-week-4-2')
.get()
.select('.sticky-header h1 a')
.text()
assert name == 'Stephania Bell Blog'
答案 1 :(得分:1)
可以通过正则表达式轻松处理这种工作。如果我们要提取http://www.espn.com/blog/
和下一个/
之间的URL部分,则可以使用以下代码:
import java.util.regex.Pattern
def url = 'http://www.espn.com/blog/stephania-bell/post/_/id/3563/key-fantasy-football-injury-updates-for-week-4-2'
def pattern = Pattern.compile('^https?://www\\.espn\\.com/blog/([^/]+)/.*$')
def (_, blog) = (url =~ pattern)[0]
assert blog == 'stephania-bell'
答案 2 :(得分:1)
将URL视为URL,然后提取路径,然后拆分并提取相关的路径段,可能会更有用。
String plainText="http://www.espn.com/blog/stephania-bell/post/_/id/3563/key-fantasy-football-injury-updates-for-week-4-2";
def url = plainText.toURL();
def fullPath=url.getPath();
def pathSegments = fullPath.split("/")
assert "stephania-bell" == pathSegments[2]