如何计算单个HTML标签的出现次数? (JAVA)

时间:2018-04-15 15:56:39

标签: java html count tags

我有下面的字符串,它是HTTP响应的主体,我需要计算单个HTML标记的出现次数,并使用Java按实例数对它们进行排序。

 "<div><p><span class="lede">Today, the European </span>Space Agency&apos;s Rosetta spacecraft will engage its thrusters for one final maneuver: a suicidal plunge toward the comet it has been orbiting for two years and chasing for a decade. After Rosetta collides with comet 67P/Churyumov-Gerasimenko....."

请有人帮忙 提前谢谢

1 个答案:

答案 0 :(得分:1)

使用像JSoup这样的库来获取所有文档元素并根据需要对其进行操作。

您可以为每个代码和出现次数创建HashMap<String, Long>。然后,您可以递归地迭代JSoup Document的所有元素,在每次传递时更新地图,最后对HashMap进行排序。

不要忘记用反斜杠来逃避引号。 String html = "<div class=\"like-this\">div content</div>";

例如(未经测试)类似

Map<String, Long> counts = new HashMap<>();
String html = " your html string goes here ";    

Document doc = Jsoup.parse(html);

Elements elements = document.body().select("*");
recursiveWalk(elements, counts);

// your map here, sort it

// method to walk the document
private void recursiveWalk(List<Element> elements, Map<String, Long> counts) {
    for (Element el : elements) {
        String tag = el.tagName();
        long number = counts.getOrDefault(tag, 0L) + 1;
        counts.put(tag, number);
        recursiveWalk(elements.children(), counts);
    }
}