在Java中使用HashMap属性替换标记

时间:2015-03-12 14:53:58

标签: java replace hashmap tokenize

我是标记化的初学者...... 我有这个代码用于读取包含俚语词典

的属性文件
File fileSlang = new File(slang);
        FileReader fileReadSlang = new FileReader(fileSlang);
        BufferedReader readBufferSlang = new BufferedReader(fileReadSlang);

        Properties propertiesSlang = new Properties();
        propertiesSlang.load(readBufferSlang);

        HashMap<String, String> map = new HashMap<String, String>((Map)propertiesSlang);
        for (String key : propertiesSlang.stringPropertyNames()){
            map.put(key, propertiesSlang.getProperty(key));
        }

这里是俚语词典的一些内容,它有超过5000行

        replacements.put("07734","hello");
        replacements.pu("0day","software illegally obtained before it was released");
        replacements.put("0noe","oh no");
        replacements.put("0vr","over");
        ...........

这里是替换令牌的代码

while (tokens.hasMoreTokens()) {
                    msg = tokens.nextToken();
                    String msgLower = msg.toLowerCase();

                    String punctuationremove = punctuationRemover(msgLower);
                    System.out.print(punctuationremove+" ");

                    StringBuilder sb = new StringBuilder(punctuationremove);
                    for (Map.Entry<String, String> replacement : replacements.entrySet()){
                        int start = sb.indexOf(replacement.getKey(),0);
                        while (start >= 0){
                            int end = start + replacement.getKey().length();
                            sb.replace(start, end, replacement.getValue());
                            start = sb.indexOf(replacement.getKey(), start + replacement.getValue().length());
                        }
                    }
                    numberOfTokens++;
                }

并且它不起作用。我错过了什么或者我的代码完全是废话吗?

0 个答案:

没有答案