我是标记化的初学者...... 我有这个代码用于读取包含俚语词典
的属性文件File fileSlang = new File(slang);
FileReader fileReadSlang = new FileReader(fileSlang);
BufferedReader readBufferSlang = new BufferedReader(fileReadSlang);
Properties propertiesSlang = new Properties();
propertiesSlang.load(readBufferSlang);
HashMap<String, String> map = new HashMap<String, String>((Map)propertiesSlang);
for (String key : propertiesSlang.stringPropertyNames()){
map.put(key, propertiesSlang.getProperty(key));
}
这里是俚语词典的一些内容,它有超过5000行
replacements.put("07734","hello");
replacements.pu("0day","software illegally obtained before it was released");
replacements.put("0noe","oh no");
replacements.put("0vr","over");
...........
这里是替换令牌的代码
while (tokens.hasMoreTokens()) {
msg = tokens.nextToken();
String msgLower = msg.toLowerCase();
String punctuationremove = punctuationRemover(msgLower);
System.out.print(punctuationremove+" ");
StringBuilder sb = new StringBuilder(punctuationremove);
for (Map.Entry<String, String> replacement : replacements.entrySet()){
int start = sb.indexOf(replacement.getKey(),0);
while (start >= 0){
int end = start + replacement.getKey().length();
sb.replace(start, end, replacement.getValue());
start = sb.indexOf(replacement.getKey(), start + replacement.getValue().length());
}
}
numberOfTokens++;
}
并且它不起作用。我错过了什么或者我的代码完全是废话吗?