我需要使用正则表达式= \ w(或所有单词)来实现Pattern。
当我运行程序输出时应该是:
a [1]
is [1]
test[1,2]
但是它是:
a [1]
e [2]
h [1]
i [1, 1]
s [1, 1, 2]
t [1, 2, 2]
负责扫描和模式匹配的代码如下:
public class DocumentIndex {
private TreeMap<String, ArrayList<Integer>> map =
new TreeMap<String, ArrayList<Integer>>(); // Stores words and their locations
private String regex = "\\w"; //any word
/**
* A constructor that scans a document for words and their locations
*/
public DocumentIndex(Scanner doc){
Pattern p = Pattern.compile(regex); //Pattern class: matches words
Integer location = 0; // the current line number
// while the document has lines
// set the Matcher to the current line
while(doc.hasNextLine()){
location++;
Matcher m = p.matcher(doc.nextLine());
// while there are value in the current line
// check to see if they are words
// and if so save them to the map
while(m.find()){
if(map.containsKey(m.group())){
map.get(m.group()).add(location);
} else {
ArrayList<Integer> list = new ArrayList<Integer>();
list.add(location);
map.put(m.group(), list);
}
}
}
}
...
}
将整个单词作为模式阅读的最佳方法是什么?
答案 0 :(得分:2)
您需要使用\\w+
,而不是\\w
。后者只匹配一个字符(前者,一个或多个字符)。
答案 1 :(得分:0)
([^ ]+)+
或者您可以使用StringTokenizer类。