Question

我正在尝试在这种html中找到所有元素：

<body>
My text without tag
<br>Some title</br>
<img class="image" src="url">
My second text without tag
<p>Some Text</p>
<p class="MsoNormal">Some text</p>
<ul>
<li>1</li>
<li>2</li>
</ul>
</body>

我需要让所有元素都包含没有任何标记的部分。怎么一个可以得到它？

P.S。：我需要为每个元素获取“Element”数组。

Answer 1

不确定是否要求检索html中的所有文本。要做到这一点，您可以简单地执行以下操作：

String html; // your html code
Document doc = Jsoup.parse(html); //parse the string
System.out.println(doc.text());   // get all the text from tags.

<强>输出：

我的文字没有标签一些标题我的第二个文字没有标签一些文字一些文字1 2

Answer 2

如果您使用的是html文件，可以使用以下代码并检索所需的每个标记。 API是Jsoup。您可以在以下链接http://jsoup.org/

中找到更多示例

File input = new File(htmlFilePath);

InputStream is = new FileInputStream(input);

String html = IOUtils.toString(is);

Document htmlDoc = Jsoup.parse(html);

Elements pElements = htmlDoc.select("P");

Element pElement1 = pElements.get(0);

使用Jsoup获取所有元素

2 个答案: