Question

我正在编写一个Android应用程序，它将从网站上读取一些信息并将其显示在应用程序的屏幕上。我正在使用Jsoup库以字符串的形式获取信息。首先，这是网站html的样子：

<strong>
   Now is the time<br />
   For all good men<br />
   To come to the aid<br />
   Of their country<br />
</strong>

以下是我正在检索并尝试解析文本的方法：

Document document = Jsoup.connect(WEBSITE_URL).get();
resultAggregator = "";

Elements nodePhysDon = document.select("strong");

//check results
if (nodePhysDon.size()> 0) {
   //get value
   donateResult = nodePhysDon.get(0).text();
   resultAggregator = donateResult;
}

if (resultAggregator != "") {
   // split resultAggregator into an array breaking up with br /
   String donateItems[] = resultAggregator.split("<br />");
}

但是donateItems [0]不仅仅是“现在是时间”，而是将所有四个字符串组合在一起。我也试过没有“br”和“/”之间的空格，并得到相同的结果。如果我做resultAggregator.split（“br”）;然后donateItems [0]只是第一个词：“现在”。

我怀疑问题是Jsoup方法select正在剥离标签吗？

有什么建议吗？我不能改变网站的HTML。我必须按原样使用它。

Answer 1

试试这个：

//check results
if (nodePhysDon.size()> 0) {
   //use toString() to get the selected block with tags included
   donateResult = nodePhysDon.get(0).toString();
   resultAggregator = donateResult;
}

if (resultAggregator != "") {
// remove <strong> and </strong> tags
   resultAggregator = resultAggregator.replace("<strong>", "");
   resultAggregator = resultAggregator.replace("</strong>", "");
   //then split with <br>
   String donateItems[] = resultAggregator.split("<br>");
}

请务必与<br>分开，而不是<br />

在Android

1 个答案: