我有一个大字符串,格式如下 -
<a href="12345.html"><a href="12345.html"><a href="12345.html"><a href="12345.html">
<a href="12345.html"><a href="12345.html"><a href="12345.html"><a href="12345.html">
我想存储.html之前发生的所有值的出现。所以上面的html变成像12345.html,12345.html,12345.html,12345.html,12345.html,12345.html,12345.html,12345.html
我需要正则表达式吗?或某种替代方法。
由于
答案 0 :(得分:1)
您可以使用Jsoup等HTML解析器。
Document doc = Jsoup.parse(yourString);
Elements els = doc.select("a");
for(Element el: els){
//this only if needs the number without the HTML
//if not, only el.attr("href")
if(el.attr("href").contains(".html")){
String[] parts = el.attr("href").split(".html");
System.out.println(parts[0]);
}
}
不要使用正则表达式来解析HTML。
答案 1 :(得分:1)
您实际上并不需要正则表达式,但您可以使用基础Matcher类:
final String searchString = "12345.html";
final String txt =
"<a href=\"12345.html\"><a href=\"12345.html\"><a href=\"12345.html\"><a href=\"12345.html\">\n"
+ "<a href=\"12345.html\"><a href=\"12345.html\"><a href=\"12345.html\"><a href=\"12345.html\">";
final Matcher matcher = Pattern.compile(searchString, Pattern.LITERAL).matcher(txt);
final StringBuilder sb = new StringBuilder();
while(matcher.find()){
if(sb.length() > 0) sb.append(',');
sb.append(matcher.group());
}
System.out.println(sb.toString());
<强>输出:强>
12345.html,12345.html,12345.html,12345.html,12345.html,12345.html,12345.html,12345.html
答案 2 :(得分:-1)
如果你在java代码中访问这个字符串,你可以在“=”delimeter上拆分字符串。它会产生一堆字符串。一个字符串看起来像“
所以步骤是: 1.分割将导致字符串数组的字符串。 2.迭代生成的数组并查找模式“&gt;