Question

我有一个术语词典dictonery/AB.txt和一个大文本文件dictonery/annotate.txt。

我想知道AB.txt文件中annotate.txt中的哪些字典术语。

到目前为止，这是我的代码：

 String fileString = new String(Files.readAllBytes(Paths.get("dictonery/AB.txt")), StandardCharsets.UTF_8);

 Map<String, String> map = new HashMap<String, String>();

 String entireFileText = new Scanner(new File("dictonery/annotate.txt")).useDelimiter("\\A").next();

 map.put(fileString, "m");

 for (String key : map.keySet()) {
     if(fileString.contains(key)) {
         System.out.print(key);
     }
 }

此刻整个字典都归还了。如何将其作为annotator.txt文件中的特定术语？

Answer 1

有一些事情可能会有所帮助：

由于您不需要Map中的值，我会使用Set（具体为HashSet）。
使用Scanner.next()一次读取单个单词而不是整个文件
您对fileString.contains(key)的检查非常低效，并且对于部分匹配也会返回true（如果您的词典中包含单词“do”，它也会匹配“dog”）。它还会多次打印匹配的单词。

就个人而言，我会创建两个集合，以相同的方式读取这两个文件，然后calculate their intersection。如果你想要排序的输出（可能不是一个要求，但通常很好），你可以使Set迭代TreeSet。

Answer 2

你真的不需要地图。

将annotate.txt作为fileString

使用如下循环读入您的AB.txt文件：

File file = new File("data.txt");

try {
    Scanner scanner = new Scanner(file);
    while (scanner.hasNextLine()) {
        String line = scanner.nextLine();
        // do something like fileString.contains(line) here
    }
} catch (FileNotFoundException e) {
    e.printStackTrace();
}

在while循环中，检查fileString是否包含line（其中应包含刚从文件中读取的令牌）。

这假设您每行都有一个令牌。

从hashmap返回特定值

2 个答案: