Question

我正在玩UVa #494，我设法使用以下代码解决它：

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;

class Main {    
    public static void main(String[] args) throws IOException{
        BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
        String line;
        while((line = in.readLine()) != null){
            String words[] = line.split("[^a-zA-z]+");
            int cnt = words.length;
            // for some reason it is counting two words for 234234ddfdfd and words[0] is empty
            if(cnt != 0 && words[0].isEmpty()) cnt--; // ugly fix, if has words and the first is empty, reduce one word
            System.out.println(cnt);
        }
        System.exit(0);
    }
}

我构建了正则表达式"[^a-zA-z]+"来分割单词，例如字符串abc..abc或abc432abc应该被分割为["abc", "abc"]。但是，当我尝试字符串432abc时，结果是["", "abc"] - 来自words[]的第一个元素只是一个空字符串，但我希望只有["abc"] 。我无法弄清楚为什么这个正则表达式为我提供了""的第一个元素。

Answer 1

检查拆分参考页：split reference

分隔符的每个元素都定义了一个单独的分隔符。如果两个分隔符相邻，或者在开头找到分隔符或者此实例的结尾，相应的数组元素包含空。下表提供了示例。

由于你有几个连续的分隔符，你得到空的数组元素

Answer 2

打印单词数

public static void main(String[] args) throws IOException {
        BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
        String line;
        while ((line = in.readLine()) != null) {
            Pattern pattern = Pattern.compile("[a-zA-z]+");
            Matcher matcher = pattern.matcher(line);
            int count = 0;
            while (matcher.find()) {
                count++;
                System.out.println(matcher.group());
            }
            System.out.println(count);
        }
    }

UVa＃494 - 正则表达式[^ a-zA-z] +使用Java分割单词

2 个答案: