Question

我正在尝试从新闻网页中提取一些内容（链接，图像等），并将其添加到我的应用中。我正在使用jsoup库进行解析。这是与解析数据有关的示例代码。该代码通常可以正常工作，但是由于所有查询都是通过某些关键字完成的，因此只要相关关键字在html中发生更改，应用程序都可能崩溃。例如，如果jsoup无法连接网站或相关属性不匹配，我希望该应用程序跳过提取链接。并寻找其他人。那么，如何避免由于jsoup而导致应用程序崩溃？是否有更好的方法从新闻网页中提取数据？

    try {
        // Connect to the related page
        Document relatedNewsPage = Jsoup.connect(link)
            .userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
             .referrer("http://www.google.com")
             .get();

             // Parse the news' title, imagelink and update time
             String title = relatedNewsPage.selectFirst("meta[property=og:title]").attr("content") ;
             String imageLink = relatedNewsPage.selectFirst("meta[property=og:image]").absUrl("content");
             String updateTime = relatedNewsPage.selectFirst("meta[property=article:modified_time]").attr("content");

             // check whether all the links and title are valid
             if (Patterns.WEB_URL.matcher(link).matches() &&
                 Patterns.WEB_URL.matcher(imageLink).matches() &&
                 !TextUtils.isEmpty(title)) {

              // add the news to the list
              news.add(new NewsItem( getContext().getResources().getString(R.string.source_ntv),
                   link,
                   title,
                   imageLink,
                   updateTime));
             }
       } catch (IOException e) {
            Log.e(LOG_TAG, "Problem parsing the Relatad News' url result", e);
    }
  }
}

如何使用jsoup解析html时避免应用程序崩溃？

0 个答案: