我正在使用jsoup从网络中提取信息,我的代码是这样的:
doc = Jsoup.connect(myurl).get();
Elements newsHeadlines = doc.select(".myclass");
如果我做了newsHeadlines的System.out.println,我得到了这个:
<span class="cmtComentario">
<span class="blaicon"></span>
<span class="blacoment"><span class="cmtHora" data-hora=""></span>
<span class="blathing" data-minutoPartido="93'"></span>
<span class="blado"></span>
<span class="blahave">
Oh yeah!<br/></span>
</span>
</span>
<span class="cmtComentario">
<span class="blaicon"></span>
<span class="blacoment"><span class="cmtHora" data-hora=""></span>
<span class="blathing" data-health="97'"></span>
<span class="blado"></span>
<span class="blahave">
This is my world</span>
</span>
</span>
如何在每个块上保存数组:
<span class="cmtComentario">
<span class="blaicon"></span>
<span class="blacoment"><span class="cmtHora" data-hora=""></span>
<span class="blathing" data-health="92'"></span>
<span class="blado"></span>
<span class="blahave">
This is my world</span>
</span>
</span>
非常感谢
答案 0 :(得分:1)
newsHeadlines
只是Element列表Elements实现列表。
因此,您可以以迭代列表的方式迭代newsHeadlines
。
for(Element element : newsHeadlines) {
System.out.println(element.toString());
}
如果这不是您需要的(我没有测试代码),您可以尝试Element.children。 这再次为您提供了可以迭代的元素。
答案 1 :(得分:0)
您还可以为每个评论添加div标记,并使用一些Java 8语法糖来收集Element
中的List
个实例
Elements elements = Jsoup.parse(markup).getAllElements().select(".myclass");
List<Element> comments = elements.stream().collect(Collectors.<Element>toList());
for(Element comment : comments) {
System.out.println(comment.html());
}
为了测试我使用了parse,而不是connect-method。
打印:
<span class="cmtComentario"> <span class="blaicon">1</span>.......
<span class="cmtComentario"> <span class="blaicon">2</span>........
测试标记:
String markup = "" +
"<div class=\"myclass\">\n" +
"<span class=\"cmtComentario\">\n" +
"<span class=\"blaicon\">1</span>\n" +
"<span class=\"blacoment\"><span class=\"cmtHora\" data-hora=\"\"></span>\n" +
"<span class=\"blathing\" data-minutoPartido=\"93'\"></span>\n" +
"<span class=\"blado\"></span>\n" +
"<span class=\"blahave\">\n" +
"Oh yeah!<br/></span>\n" +
"</span>\n" +
"</span>\n" +
"</div>" +
"<div class=\"myclass\">\n" +
"<span class=\"cmtComentario\">\n" +
"<span class=\"blaicon\">2</span>\n" +
"<span class=\"blacoment\"><span class=\"cmtHora\" data-hora=\"\"></span>\n" +
"<span class=\"blathing\" data-health=\"97'\"></span>\n" +
"<span class=\"blado\"></span>\n" +
"<span class=\"blahave\">\n" +
"This is my world</span>\n" +
"</span>\n" +
"</span>" +
"</div>";
希望它有所帮助!