计算字符串存储在嵌套hashmap中时的频率

时间:2016-04-30 18:08:08

标签: java hashmap

我想编写一个代码,在从文本文件中读取字符串时将字符串存储在哈希映射中。

我已经编写了下面的代码并且它有效,没有错误,每次出现的字符串组合的频率都不会改变,它始终为1.

我正在寻求帮助,以确保如果字符串组合在文本文件中出现多次,那么它的频率也应该增加。

这是我的代码:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.*;



   public class NgramBetaC {

    static String[] hashmapWord = null;
    public static Map<String,Map<String, Integer>> bigrams = new HashMap<>();

    public static void main(String[] args) {       

    //prompt user input
    Scanner input = new Scanner(System.in);

    //read words from collected corpus; a number of .txt files

     File directory = new File("Corpus4");
     File[] listOfFiles = directory.listFiles();//To read from all listed iles in the "directory"

            //String bWord[] = null;
            int lineNumber = 0;
            String line;
            String files;
            String delimiters = "[\\s+,?!:;.]";
            int wordTracker = 0;

            //reading from a list of text files 
            for (File file : listOfFiles) {
                if (file.isFile()) {
                    files = file.getName();
                    try {
                        if (files.endsWith(".txt") || files.endsWith(".TXT")) {  //ensures a file being read is a text file 

                       BufferedReader br = new BufferedReader(new FileReader(file)); 


                        while ((line = br.readLine()) != null) {
                                line = line.toLowerCase();
                                hashmapWord = line.split(delimiters);

                        for(int s = 0; s < hashmapWord.length - 2; s++){

                                    String read = hashmapWord[s];
                                    String read1 = hashmapWord[s + 1];
                                    final String read2 = hashmapWord[s + 2];

                                    String readBigrams = read + " " + read1;

                                    final Integer count = null;

                                    //bigrams.put(readBigrams, new HashMap() {{ put (read2, (count == null)? 1 : count + 1);}});
                                    bigrams.put(readBigrams, new HashMap<String, Integer>());
                                    bigrams.get(readBigrams).put(read2, (count == null) ? 1 : count+1);


                                } br.close();
                        }
                        }
                    } catch (NullPointerException | IOException e) {
                        e.printStackTrace();
                        System.out.println("Unable to read files: " + e);
                    }

            }       
          }
          }

文本文件中包含的行是::

1.i想要一些冰淇淋。 我想在12月份去迪拜。 我喜欢吃意大利面。 我喜欢自己准备意大利面。 5.谁今天会来看我?

我在打印哈希玛的内容时获得的输出是:

{来= {see = 1},想要= {to = 1},在dubai = {this = 1},准备面食= {自己= 1},吃= {pasta = 1},就像to = {be = 1},准备= {pasta = 1},将= {coming = 1},love to = {prepare = 1},some ice = {cream = 1},in = {dubai = 1},来自= {to = 1},dubai this = {december = 1},be = {in = 1},我爱= {to = 1},看= {me = 1},谁将会= {be = 1},有些= {ice = 1},我会= {like = 1},见我= {today = 1}}

  
    

请协助!一些字符串组合甚至没有出现。

  

当我从文件中读到时,我的预期输出是:

{来= {see = 1},想要= {to = 1},在dubai = {this = 1},准备面食= {自己= 1},吃= {pasta = 1},就像to = {be = 1},准备= {pasta = 1},将= {coming = 1},love to = {prepare = 1},some ice = {cream = 1},in = {dubai = 1},来自= {to = 1},dubai this = {december = 1},be = {in = 1},我爱= {to = 1},看= {me = 1},谁将会= {be = 1},就像某些= {ice = 1},我会= {like = 2},看到我= {today = 1},爱到{eat = 1},想要{some = 1},我会{love = 1},会爱{to = 1}}

1 个答案:

答案 0 :(得分:0)

暂时更新当前结构而不覆盖原始内容

替换

bigrams.put(readBigrams, new HashMap<String, Integer>());
bigrams.get(readBigrams).put(read2, (count == null) ? 1 : count+1);

使用

HashMap<String, Integer> counter = bigrams.get(readBigrams);
if (null == counter) {
    counter = new HashMap<String, Integer>();
    bigrams.put(readBigrams, counter);
}
Integer count = counter.get(read2);
counter.put(read2, count == null ? 1 : count + 1);