我想编写一个代码,在从文本文件中读取字符串时将字符串存储在哈希映射中。
我已经编写了下面的代码并且它有效,没有错误,每次出现的字符串组合的频率都不会改变,它始终为1.
我正在寻求帮助,以确保如果字符串组合在文本文件中出现多次,那么它的频率也应该增加。
这是我的代码:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.*;
public class NgramBetaC {
static String[] hashmapWord = null;
public static Map<String,Map<String, Integer>> bigrams = new HashMap<>();
public static void main(String[] args) {
//prompt user input
Scanner input = new Scanner(System.in);
//read words from collected corpus; a number of .txt files
File directory = new File("Corpus4");
File[] listOfFiles = directory.listFiles();//To read from all listed iles in the "directory"
//String bWord[] = null;
int lineNumber = 0;
String line;
String files;
String delimiters = "[\\s+,?!:;.]";
int wordTracker = 0;
//reading from a list of text files
for (File file : listOfFiles) {
if (file.isFile()) {
files = file.getName();
try {
if (files.endsWith(".txt") || files.endsWith(".TXT")) { //ensures a file being read is a text file
BufferedReader br = new BufferedReader(new FileReader(file));
while ((line = br.readLine()) != null) {
line = line.toLowerCase();
hashmapWord = line.split(delimiters);
for(int s = 0; s < hashmapWord.length - 2; s++){
String read = hashmapWord[s];
String read1 = hashmapWord[s + 1];
final String read2 = hashmapWord[s + 2];
String readBigrams = read + " " + read1;
final Integer count = null;
//bigrams.put(readBigrams, new HashMap() {{ put (read2, (count == null)? 1 : count + 1);}});
bigrams.put(readBigrams, new HashMap<String, Integer>());
bigrams.get(readBigrams).put(read2, (count == null) ? 1 : count+1);
} br.close();
}
}
} catch (NullPointerException | IOException e) {
e.printStackTrace();
System.out.println("Unable to read files: " + e);
}
}
}
}
文本文件中包含的行是::
1.i想要一些冰淇淋。 我想在12月份去迪拜。 我喜欢吃意大利面。 我喜欢自己准备意大利面。 5.谁今天会来看我?
我在打印哈希玛的内容时获得的输出是:
{来= {see = 1},想要= {to = 1},在dubai = {this = 1},准备面食= {自己= 1},吃= {pasta = 1},就像to = {be = 1},准备= {pasta = 1},将= {coming = 1},love to = {prepare = 1},some ice = {cream = 1},in = {dubai = 1},来自= {to = 1},dubai this = {december = 1},be = {in = 1},我爱= {to = 1},看= {me = 1},谁将会= {be = 1},有些= {ice = 1},我会= {like = 1},见我= {today = 1}}
请协助!一些字符串组合甚至没有出现。
当我从文件中读到时,我的预期输出是:
{来= {see = 1},想要= {to = 1},在dubai = {this = 1},准备面食= {自己= 1},吃= {pasta = 1},就像to = {be = 1},准备= {pasta = 1},将= {coming = 1},love to = {prepare = 1},some ice = {cream = 1},in = {dubai = 1},来自= {to = 1},dubai this = {december = 1},be = {in = 1},我爱= {to = 1},看= {me = 1},谁将会= {be = 1},就像某些= {ice = 1},我会= {like = 2},看到我= {today = 1},爱到{eat = 1},想要{some = 1},我会{love = 1},会爱{to = 1}}
答案 0 :(得分:0)
暂时更新当前结构而不覆盖原始内容
替换
bigrams.put(readBigrams, new HashMap<String, Integer>());
bigrams.get(readBigrams).put(read2, (count == null) ? 1 : count+1);
使用
HashMap<String, Integer> counter = bigrams.get(readBigrams);
if (null == counter) {
counter = new HashMap<String, Integer>();
bigrams.put(readBigrams, counter);
}
Integer count = counter.get(read2);
counter.put(read2, count == null ? 1 : count + 1);