获取List(View)中的所有h4标签

时间:2015-11-09 19:16:13

标签: java android regex list jsoup

我想从网站中获取特定div中的所有h4标签,并将它们全部放在List(View)中。 这是我试过的一点点代码,但它不是jsoup,我不知道为什么,但它只从网站获得每秒(或至少不是每个)h4标签。:

Pattern firstNewsPattern = Pattern.compile("<h3><strong>Aktuelle Meldungen</strong></h3>(.*?)<hr />");
    Pattern newsPattern = Pattern.compile("<hr />(.*?)<hr />");
    Pattern newsHeaderPattern = Pattern.compile("<h4>(.*?)</h4>");
    Pattern hrefPattern = Pattern.compile("href=\"(.*?)\"");
    Matcher newsHeader = null;
    Matcher href = null;

    Matcher firstNews = firstNewsPattern.matcher(html);
    if(firstNews.find()) {
        String content = firstNews.group(1).replace("./", "http://www.muckendorf-wipfing.at/");
        href = hrefPattern.matcher(content);
        while(href.find()) {
            String url = href.group(1);
            if(!url.contains("/")) {
                content = content.replace("href=\"" + url + "\"", "href=\"" + "http://www.muckendorf-wipfing.at/" + url + "\"");
            }
        }
        newsHeader = newsHeaderPattern.matcher(content);
        if(newsHeader.find())
            ret.add(new News(newsHeader.group(1).replaceAll("<(.*?)>", "").replaceAll("&#\\d{4};", ""), content));
    }

    Matcher news = newsPattern.matcher(html);
    while(news.find()) {
        String content = news.group(1).replace("./", "http://www.muckendorf-wipfing.at/");
        href = hrefPattern.matcher(content);
        while(href.find()) {
            String url = href.group(1);
            if(!url.contains("/")) {
                content = content.replace("href=\"" + url + "\"", "href=\"" + "http://www.muckendorf-wipfing.at/" + url + "\"");
            }
        }
        newsHeader = newsHeaderPattern.matcher(content);
        if(newsHeader.find())
            ret.add(new News(newsHeader.group(1).replaceAll("<(.*?)>", "").replaceAll("&#\\d{4};", ""), content));
    }

由于这个Snippet不是100%由我写的,我甚至还不了解它,所以我自己用jsoup再次编写它以使其正常工作:

List<News> ret = new ArrayList();
    getSharedPref sharedPrefMethod = new getSharedPref();
    SharedPreferences sharedPref = sharedPrefMethod.getSharedPref();
    String result = "";
    try {
        String pattern = "(\\<h3>\\.\\<h3>) (\\</h3>)";
        Pattern r = Pattern.compile(pattern);
        String html1 = html0.replace(Pattern.quote("<em>Taxigutscheine !NEU! (zum Vergrößern auf das Bild klicken)</em>"), Matcher.quoteReplacement("<h4>Neue Taxigutscheine!</h4>"));
        String html2 = html1.replace(Pattern.quote("<h3>"), Matcher.quoteReplacement("<h4>"));
        String html3 = html2.replace(Pattern.quote("</h3>"), Matcher.quoteReplacement("</h4>"));
        String html4 = html3.replaceFirst(Pattern.quote("<h4>"), Matcher.quoteReplacement("<h3>"));
        String finalHTML = html4.replaceFirst(Pattern.quote("</h4>"), Matcher.quoteReplacement("</h3>"));
        Matcher m = r.matcher(finalHTML);
        if (m.find()) {
        } else {
        }
        result = finalHTML.substring(finalHTML.indexOf("<h3>") + 4, finalHTML.indexOf("</h3>"));
        SharedPreferences.Editor editor = sharedPref.edit();
        editor.putString("AktuelleMeldungenHeadline", result);
        editor.commit();
    }catch(Exception e){

    }
    result = sharedPref.getString("AktuelleMeldungenHeadline", "");
    ret.add(new News(result, result));

有人可以帮助我并让它工作,所以我从this website的div#content获取每个h3标签吗? 谢谢!

0 个答案:

没有答案