我有下面的字符串,它是HTTP响应的主体,我需要计算单个HTML标记的出现次数,并使用Java按实例数对它们进行排序。
"<div><p><span class="lede">Today, the European </span>Space Agency's Rosetta spacecraft will engage its thrusters for one final maneuver: a suicidal plunge toward the comet it has been orbiting for two years and chasing for a decade. After Rosetta collides with comet 67P/Churyumov-Gerasimenko....."
请有人帮忙 提前谢谢
答案 0 :(得分:1)
使用像JSoup这样的库来获取所有文档元素并根据需要对其进行操作。
您可以为每个代码和出现次数创建HashMap<String, Long>
。然后,您可以递归地迭代JSoup Document
的所有元素,在每次传递时更新地图,最后对HashMap
进行排序。
不要忘记用反斜杠来逃避引号。 String html = "<div class=\"like-this\">div content</div>";
例如(未经测试)类似
Map<String, Long> counts = new HashMap<>();
String html = " your html string goes here ";
Document doc = Jsoup.parse(html);
Elements elements = document.body().select("*");
recursiveWalk(elements, counts);
// your map here, sort it
// method to walk the document
private void recursiveWalk(List<Element> elements, Map<String, Long> counts) {
for (Element el : elements) {
String tag = el.tagName();
long number = counts.getOrDefault(tag, 0L) + 1;
counts.put(tag, number);
recursiveWalk(elements.children(), counts);
}
}