我需要标记一个文本文件,其中标记由“[a-zA-Z] +”定义 以下作品:
Pattern WORD = Pattern.compile("[a-zA-Z]+");
File f = new File(...);
FileInputStream inputStream = new FileInputStream(f);
Scanner scanner = new Scanner(inputStream); e problem is
String word = null;
while( (word = scanner.findWithinHorizon(WORD, (int)f.length() )) != null ) {
// process the word
}
问题在于findWithinHorizon
需要int
作为地平线
文件长度为long
。
使用扫描仪标记大文件的合理方法是什么?
答案 0 :(得分:3)
使用一个否定匹配模式的分隔符:
Scanner s = new Scanner(f).useDelimiter("[^a-zA-Z]+");
while(s.hasNext()) {
String token = s.next();
// do something with "token"
}