JSoup - 如何解析嵌套文本?

时间:2018-03-13 17:32:27

标签: java parsing jsoup

我正在使用var _formatCurrency = function(amount) { return "$" + parseFloat(amount).toFixed(2).replace(/(\d)(?=(\d{3})+\.)/g, '$1,'); }; console.log(_formatCurrency('1')); console.log(_formatCurrency('100')); console.log(_formatCurrency('1000')); console.log(_formatCurrency('1000000.559')); console.log(_formatCurrency('10000000000.559')); console.log(_formatCurrency(1)); console.log(_formatCurrency(100)); console.log(_formatCurrency(1000)); console.log(_formatCurrency(1000000.559)); console.log(_formatCurrency(10000000000.559));解析网站的html。我想解析这一部分:

JSoup

我想这样:

<td class="lastpost">
This is a text 1<br>
<a href="post/13594">Website Page - 1</a>
</td>

我怎样才能得到这样的部件?

1 个答案:

答案 0 :(得分:1)

您的代码只会获取您选择的td元素中的所有文本。如果要将文本存储在单独的变量中,则应单独抓取部件,如下面的代码所示。添加了额外的评论,以便您了解如何/为什么获得每件作品。

// Get the first td element that has class="lastpost"
Element lastPost = document.select("td.lastpost").first();
// Get the first a element that is a child of the td
Element linkElement = lastPost.getElementsByTag("a").first();

// This text is the first child node of td, get that node and call toString
String text = lastPost.childNode(0).toString();
// This is the text within the a (link) element
String textNo = linkElement.text();
// This text is the href attribute value of the a (link) element
String link = linkElement.attr("href");