我有这段HTML代码。我想替换三个单独属性中提到的内容的链接占位符。这是我到目前为止所尝试的:
String texto2 = "url(\"primeiro url\")\n" +
"url('2 url')\n" +
"href=\"1 href\"\n" +
"src=\"1 src\"\n" +
"src='2 src'\n" +
"url('3 url')\n" +
"\n" +
".camera_target_content .camera_link {\n" +
" background: url(../images/blank.gif);\n" +
" display: block;\n" +
" height: 100%;\n" +
" text-decoration: none;\n" +
"}";
String exp = "(?:href|src)=[\"'](.+)[\"']+|(?:url)\\([\"']*(.*)[\"']*\\)";
// expressão para pegar os links do src e do href
Pattern pattern = Pattern.compile(exp);
// preparando expressao
Matcher matcher = pattern.matcher(texto2);
// pegando urls e guardando na lista
while(matcher.find()) {
System.out.println(texto2.substring(matcher.start(), matcher.end()));
}
到目前为止,非常好 - 只需要查找我需要获得干净的链接,就像这样:
img/image.gif
而不是:
href = "img/image.gif"
src =“img / image.gif” url(img / image.gif)
我想用一个变量替换一个占位符;这是我到目前为止所尝试的:
String texto2 = "url(\"primeiro url\")\n" +
"url('2 url')\n" +
"href=\"1 href\"\n" +
"src=\"1 src\"\n" +
"src='2 src'\n" +
"url('3 url')\n" +
"\n" +
".camera_target_content .camera_link {\n" +
" background: url(../images/blank.gif);\n" +
" display: block;\n" +
" height: 100%;\n" +
" text-decoration: none;\n" +
"}";
String exp = "(?:href|src)=[\"'](.+)[\"']+|(?:url)\\([\"']*(.*)[\"']*\\)";
// expressão para pegar os links do src e do href
Pattern pattern = Pattern.compile(exp);
// preparando expressao
Matcher matcher = pattern.matcher(texto2);
// pegando urls e guardando na lista
while(matcher.find()) {
String s = matcher.group(2);
System.out.println(s);
}
事实证明这个版本不起作用。它完美地抓住了网址;有人能帮我发现问题吗?
答案 0 :(得分:0)
使用jsoup
。将HTML字符串解析为DOM,然后您可以使用CSS选择器来提取值,就像使用JavaScript中的jQuery一样。请注意,这仅适用于实际使用HTML的情况;示例顶部的字符串不是HTML。