我一直坚持如何以键值对的形式解析这些数据。请指导我
<div class="content">
<div class="label">Company Name: </div>
Cartell Chemical Co., Ltd.
<br/>
<div class="label">Business Owner: </div>
Michael Chen
<br/>
<div class="label">Employees: </div>
210
<br/>
<div class="label">Main markets: </div>
North America, Europe, China, South Asia
<br/>
<div class="label">Business Type: </div>
Manufacturer
<br/>
</div>
我需要以这些格式输出。请指导我使用Java和Jsoup库
Company Name:Cartell Chemical Co., Ltd.
Business Owner:Michael Chen
Employees:210
Main markets:North America, Europe, China, South Asia
Business Type:Manufacturer
答案 0 :(得分:4)
查看文档。
这是一个有效的例子:
public class StackOverflow20973268 {
private static String input = "<div class=\"content\">" +
"<div class=\"label\">Company Name: </div>" +
"Cartell Chemical Co., Ltd." +
"<br/>" +
"<div class=\"label\">Business Owner: </div>" +
"Michael Chen" +
"<br/>" +
"<div class=\"label\">Employees: </div>" +
"210" +
"<br/>" +
"<div class=\"label\">Main markets: </div>" +
"North America, Europe, China, South Asia" +
"<br/>" +
"<div class=\"label\">Business Type: </div>" +
"Manufacturer" +
"<br/>" +
"</div>";
public static void main(String[] args) throws IOException {
Document doc = Jsoup.parse(input);
Elements labels = doc.select("div.content div.label");
for (Element label : labels) {
System.out.println(String.format("%s:%s", label.text().trim(),
label.nextSibling().outerHtml()));
}
}
}
输出:
Company Name::Cartell Chemical Co., Ltd.
Business Owner::Michael Chen
Employees::210
Main markets::North America, Europe, China, South Asia
Business Type::Manufacturer
答案 1 :(得分:-1)
Jsoup library 非常适合解析html。它允许按类/ id名称或树dom遍历提取值。你基本上得到一个div元素,并找到它的子节点,它们可以是文本节点(包含要解析的文本)或另一个具有自己子节点的元素。 例如,您可以执行类似的操作(未使用某些伪测试)
doc = Jsoup.parse(info);
Elements divs= doc.body().getElementsByTag("div");
for (Element divElement: divs) {
//extract text of div element with div.textNodes()
//then
//div.nextNode() or something like that
}
基本上找到元素并踩到文本或下一个/上一个元素。