将文本解析为散列映射的Word阅读器

时间:2011-03-02 23:45:20

标签: java string hashmap

该计划的目标是在某种意义上成为一个单词读者。我想让它在任何</p></p>之间取出所有单词并将其存储在HashMap中。例如,</p> b.ob </p>会将字符串b.ob存储在hashmap中。任何帮助或更正将不胜感激。

public HashMap<String, List<String>> fillHashMap(String inputPath) {

    HashMap<String,List<String>> hash = new HashMap<String,List<String>>();  //creates hashmap
    CharacterFromFileReader reads = new CharacterFromFileReader(inputPath);

    String s = "";
    String p =  "</p>";
    char ch;

    while(reads.hasNext()){       //hasnext returns true if the iteration has more elements
    ch = reads.next();         //next returns the next element in the iteration
    s = "" + ch ;


    if(s.contains(p)){    //if(inputPath.indexOf("</p>") != -1){ original if statement
    int begin = s.indexOf(p);
    s = s.substring(begin); 

    if(s.contains(p)){
            int end = s.indexOf(p);
            s = s.substring(begin,end);
            hash.put(s, null);
        }
        }
    }   
    return hash;
    }
} 

3 个答案:

答案 0 :(得分:0)

您可以使用StringTokenizer

    String input = //readFromFile()
    Set<String> set = new HashSet<String>();
    StringTokenizer st = new StringTokenizer(input, "</p>");
    while(st.hasMoreTokens()) {
        set.add(st.nextToken());
    }

此外,地图应该用于存储键值对,设置在这里更合适。

答案 1 :(得分:0)

你的问题是你的修剪逻辑在第一个&lt; / p&gt;上执行,所以你从来没有读过足够的字符来看下一个。

尝试这样的事情:

int indexOfFirstP = s.indexOf(p);
int indexOfLastP = s.lastIndexOf(p);

if (indexOfFirstP >= 0 && indexOfLastP >= 0 && indexOfFirstP != indexOfLastP) {
  // then you've found a string with two </p>'s
}

答案 2 :(得分:0)

static final String REG = "</p>";

public HashMap<String, List<String>> fillHashMap(String inputPath) {
    final HashMap<String, List<String>> map = new HashMap<String, List<String>>();

    try {
        final Scanner scanner = new Scanner(new File(inputPath));
        final StringBuilder fileContent = new StringBuilder("");

        while (scanner.hasNext()) {
            fileContent.append(scanner.nextLine());
            fileContent.append("\n");
        }
        scanner.close();

        final String[] entries = fileContent.toString().split(REG);

        for (int i = 0; i < entries.length; i++) {
            //we need every second element, counting from zero
            if (i % 2 == 1) {
                map.put(entries[i], null);
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

    return map;     
}