如何从String获取文本计数

时间:2015-02-24 13:08:09

标签: java html jsoup

我有以下字符串

   Salary and Benefits <span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span>
Job Security <span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span>
Career Growth <span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barnone"></span>
Work Environment <span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span>
CEO Rating <span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span><span class="read-barfull"></span>

我需要显示如下格式的计数(&#34;读取barfull&#34;计数)

Salary and Benefits 5
Job Security 5
Career Growth 4
Work Environment 5
CEO Rating 5 

请帮助我获取格式 提前谢谢你

3 个答案:

答案 0 :(得分:3)

如果要计算的“标记”字符串是静态的(或至少是“预定义”),您可以执行以下操作,使用Apache commons-lang:

String str = "Salary and Benefits <span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span>";
String spanText = "<span class=\"read-barfull\"></span>";
int count = StringUtils.countMatches(str, spanText);

答案 1 :(得分:1)

以下是使用Jsoup的方法(因为您的问题已被标记)。一般的想法是

  • 逐行阅读HTML
  • 获取此HTML行所代表的文字
  • 选择所有<span class="read-barfull"></span>元素(无论它们是否为空,但您可以根据需要进行更改) - 简单select("span.read-barfull")将为我们执行此操作
  • 所选span元素的打印计数(此处size()非常有用)

代码:

String html = "Salary and Benefits <span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span>\r\n" + 
        "Job Security <span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span>\r\n" + 
        "Career Growth <span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barnone\"></span>\r\n" + 
        "Work Environment <span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span>\r\n" + 
        "CEO Rating <span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span><span class=\"read-barfull\"></span>";

Scanner sc = new Scanner(html);
while(sc.hasNextLine()){
    Document doc = Jsoup.parse(sc.nextLine());
    System.out.println(doc.text()+" "+doc.select("span.read-barfull").size());
}

输出:

Salary and Benefits 5
Job Security 5
Career Growth 4
Work Environment 5
CEO Rating 5

答案 2 :(得分:0)

将您的逻辑分为两种方式

  1. 创建List<String>
  2. 迭代列表&amp;使用String Buffer或Split搜索单词并获得计数器增量。 {做一个简单的逻辑来分离&#34; read-barfull&#34;和字符串&#34;键&#34;(即。薪酬和福利)
  3. 从中获取计数值。
  4. 创建Map<String,Integer> 这就是全部。