我想从网站中获取特定div中的所有h4标签,并将它们全部放在List(View)中。 这是我试过的一点点代码,但它不是jsoup,我不知道为什么,但它只从网站获得每秒(或至少不是每个)h4标签。:
Pattern firstNewsPattern = Pattern.compile("<h3><strong>Aktuelle Meldungen</strong></h3>(.*?)<hr />");
Pattern newsPattern = Pattern.compile("<hr />(.*?)<hr />");
Pattern newsHeaderPattern = Pattern.compile("<h4>(.*?)</h4>");
Pattern hrefPattern = Pattern.compile("href=\"(.*?)\"");
Matcher newsHeader = null;
Matcher href = null;
Matcher firstNews = firstNewsPattern.matcher(html);
if(firstNews.find()) {
String content = firstNews.group(1).replace("./", "http://www.muckendorf-wipfing.at/");
href = hrefPattern.matcher(content);
while(href.find()) {
String url = href.group(1);
if(!url.contains("/")) {
content = content.replace("href=\"" + url + "\"", "href=\"" + "http://www.muckendorf-wipfing.at/" + url + "\"");
}
}
newsHeader = newsHeaderPattern.matcher(content);
if(newsHeader.find())
ret.add(new News(newsHeader.group(1).replaceAll("<(.*?)>", "").replaceAll("&#\\d{4};", ""), content));
}
Matcher news = newsPattern.matcher(html);
while(news.find()) {
String content = news.group(1).replace("./", "http://www.muckendorf-wipfing.at/");
href = hrefPattern.matcher(content);
while(href.find()) {
String url = href.group(1);
if(!url.contains("/")) {
content = content.replace("href=\"" + url + "\"", "href=\"" + "http://www.muckendorf-wipfing.at/" + url + "\"");
}
}
newsHeader = newsHeaderPattern.matcher(content);
if(newsHeader.find())
ret.add(new News(newsHeader.group(1).replaceAll("<(.*?)>", "").replaceAll("&#\\d{4};", ""), content));
}
由于这个Snippet不是100%由我写的,我甚至还不了解它,所以我自己用jsoup再次编写它以使其正常工作:
List<News> ret = new ArrayList();
getSharedPref sharedPrefMethod = new getSharedPref();
SharedPreferences sharedPref = sharedPrefMethod.getSharedPref();
String result = "";
try {
String pattern = "(\\<h3>\\.\\<h3>) (\\</h3>)";
Pattern r = Pattern.compile(pattern);
String html1 = html0.replace(Pattern.quote("<em>Taxigutscheine !NEU! (zum Vergrößern auf das Bild klicken)</em>"), Matcher.quoteReplacement("<h4>Neue Taxigutscheine!</h4>"));
String html2 = html1.replace(Pattern.quote("<h3>"), Matcher.quoteReplacement("<h4>"));
String html3 = html2.replace(Pattern.quote("</h3>"), Matcher.quoteReplacement("</h4>"));
String html4 = html3.replaceFirst(Pattern.quote("<h4>"), Matcher.quoteReplacement("<h3>"));
String finalHTML = html4.replaceFirst(Pattern.quote("</h4>"), Matcher.quoteReplacement("</h3>"));
Matcher m = r.matcher(finalHTML);
if (m.find()) {
} else {
}
result = finalHTML.substring(finalHTML.indexOf("<h3>") + 4, finalHTML.indexOf("</h3>"));
SharedPreferences.Editor editor = sharedPref.edit();
editor.putString("AktuelleMeldungenHeadline", result);
editor.commit();
}catch(Exception e){
}
result = sharedPref.getString("AktuelleMeldungenHeadline", "");
ret.add(new News(result, result));
有人可以帮助我并让它工作,所以我从this website的div#content获取每个h3标签吗? 谢谢!