如何计算具有特定术语/单词的arraylist元素的数量?

时间:2016-08-05 15:05:45

标签: java arraylist

我有一个句子的arraylist如下 -

List<String> allDocuments= new ArrayList<String>();
    list.add("my name is john what is your name");
    list.add("hello how are you");
    list.add("no name entered");
    list.add("who are you");

正如您在两个元素中看到的那样,“姓名”和“您”这个词出现了。如何获得每个单词出现的元素数量?所以最终结果将是

name = 2个元素

my = 1个元素

你= 2个元素

到目前为止,我坚持每个单词出现在单个元素中的次数,而不是每个单词有多少个元素。

List<String[]> list2 = new ArrayList<>();
        for (String s : allDocuments) {
            list2.add(s.split(" "));
        }
        ;
        for (String[] s : list2) {
        Map<String, Integer> wordCounts = new LinkedHashMap<String, Integer>();

        for (String word : s) {
            Integer count = wordCounts.get(word);
            if (count == null) {
                count = 0;
            }
            wordCounts.put(word, count + 1);
        }

        for (String key : wordCounts.keySet()) {

             System.out.println(key + ": " + wordCounts.get(key));

        }
    }

非常感谢帮助,谢谢!

6 个答案:

答案 0 :(得分:2)

Map<String, Integer> wordCounts = new HashMap<String, Integer>();

//making list of all words
for (String s : allDocuments)
  for ( String s2 : s.split(" "))
    if( ! wordCounts.containsKey(s2) )
        wordCounts.put(s2,0);

//counting occurence of all words in whole strings
for (String k : wordCounts.keySet())
  for (String s : allDocuments)
    if(s.indexOf(k) != -1)
      wordCounts.put(k, wordCounts.get(k)+1);

答案 1 :(得分:1)

我希望这可以帮到你。我的代码有java 8语法:

 ArrayList<String> allDocuments = new ArrayList<String>();
    allDocuments.add("my name is john");
    allDocuments.add("hello how are you");
    allDocuments.add("no name entered");
    allDocuments.add("who are you");

    HashMap<String, Integer> words = new HashMap<>();

    for (String sentence : allDocuments) {
        String[] sentenceSpli = sentence.split(" ");
        for (String word : sentenceSpli) {
            //If my map contain the word I add 1 otherwise add it
            if (words.containsKey(word)) {
                words.put(word, words.get(word) + 1);
            } else {
                words.put(word, 1);
            }
        }
    }

    //Print result
    for (String key : words.keySet()) {
        System.out.println(key + " : " + words.get(key) + " time(s)");
    }

答案 2 :(得分:1)

如果您想修复代码而不是完全重写代码,请按以下步骤操作:

首先,将每个文档的单词存储在Set s而不是数组中以防止重复:

List<Set<String>> list2 = new ArrayList<>();
for (String s : allDocuments) {
    list2.add(new HashSet<>(Arrays.asList(s.split(" "))));
}

然后只需移动wordCounts声明并在循环外打印,并将循环转换为Set<String>而不是String[]的迭代:<\ n / p>

Map<String, Integer> wordCounts = new LinkedHashMap<>();
for (Set<String> s : list2) {
    for (String word : s) {
        Integer count = wordCounts.get(word);
        if (count == null) {
            count = 0;
        }
        wordCounts.put(word, count + 1);
    }
}

for (String key : wordCounts.keySet()) {
    System.out.println(key + ": " + wordCounts.get(key));
}

现在输出正确:

what: 1
name: 2
is: 1
john: 1
your: 1
my: 1
how: 1
are: 2
hello: 1
you: 2
no: 1
entered: 1
who: 1

事实上,你离解决方案还远远不够; - )

(请注意,wordCounts上的迭代可以通过迭代entrySet()来改进,但我并不想过多地修改代码。

答案 3 :(得分:0)

遍历列表,然后用空格分隔每个句子。然后,遍历每个单词,看看单词是否与您要查找的内容相匹配。

List<String> allDocuments = new ArrayList<String>();
allDocuments.add("my name is john");
allDocuments.add("hello how are you");
allDocuments.add("no name entered");
allDocuments.add("who are you");

int name = 0, my = 0, you = 0;
for (String msg : allDocuments){
    for (String word : msg.split(" ")){
        if (word == "name"){
            name++;
        }
        if (word == "my"){
            my++;
        }
        if (word == "you"){
            you++;
        }
    }
}

答案 4 :(得分:0)

创建一个地图,定义具有巧合的单词......类似于Map<String, Integer>

示例:

  public static void main(String[] args) {
    List<String> list = new ArrayList<>();
    list.add("my name is john");
    list.add("hello how are you");
    list.add("no name entered");
    list.add("who are you");
    System.out.println();
    System.out.println(processList(list));
    }

    private static Map<String, Integer> processList(List<String> list) {
    Map<String, Integer> coincidences = new HashMap<>();
    for (String string : list) {
        String[] sp = string.split(" ");
        for (String string2 : sp) {
        if (coincidences.get(string2) == null) {
            coincidences.put(string2, 1);
        } else {
            coincidences.put(string2, coincidences.get(string2) + 1);
        }
        }
    }
    return coincidences;
    }

这将给出如下地图:

  

{how = 1,no = 1,= 2,name = 2,is = 1,john = 1,hello = 1,输入= 1,my = 1,   你= 2,谁= 1}

这是您需要的信息的最佳表示

答案 5 :(得分:0)

通过您正在进行的拆分,List包含每个单词的所有实例。因此,我建议使用Set来存储要计算的单个词:

Set<String> words = new HashMap<>();
for (String s : allDocuments) {
    words.addAll(Arrays.asList(s.split(" ")));
}

然后使用此集合并遍历每个条目的allDocuments:

HashMap<String, Integer> wordcount = new HashSet<>();
for (String word : words) {
    int count = 0;
    for (String entry : allDocuments) { 
         if (entry.contains(word)) {
             count++;
        }
    }
    wordcount.put(word, count);
}

我现在没有可能对此进行测试,但类似的事情应该可以解决问题。

迎接