删除标点符号,保留字母和空格 - Java Regex

时间:2014-04-28 03:43:51

标签: java regex string replaceall

今晚我试图解析一个文件中的单词,我想删除所有标点符号,同时保留大小写单词和空格。

String alpha = word.replaceAll("[^a-zA-Z]", "");

这取代了所有内容,包括空格。

在包含Testing, testing, 1, one, 2, two, 3, three.的文本文件上操作,输出变为TESTINGTESTINGONETWOTHREE 但是,当我将其更改为

String alpha = word.replaceAll("[^a-zA-Z\\s]", "");

输出不会改变。

以下是完整的代码段:

public class UpperCaseScanner {

    public static void main(String[] args) throws FileNotFoundException {

        //First, define the filepath the program will look for. 
        String filename = "file.txt";   //Filename
        String targetFile = "";         
        String workingDir = System.getProperty("user.dir");

        targetFile = workingDir + File.separator + filename;   //Full filepath.

        //System.out.println(targetFile); //Debug code, prints the filepath. 

        Scanner fileScan = new Scanner(new File(targetFile)); 

        while(fileScan.hasNext()){
            String word = fileScan.next();
            //Replace non-alphabet characters with empty char. 
            String alpha = word.replaceAll("[^a-zA-Z\\s]", "");
            System.out.print(alpha.toUpperCase());
        }

        fileScan.close();

    }
}

file.txt有一行,显示Testing, testing, 1, one, 2, two, 3, three. 我的目标是输出读取Testing Testing One Two Three 我只是在正则表达式中做错了什么,或者我还需要做些什么呢?如果它是相关的,我正在使用32位Eclipse 2.0.2.2。

3 个答案:

答案 0 :(得分:3)

System.out.println(str.replaceAll("\\p{P}", ""));         //Removes Special characters only
System.out.println(str.replaceAll("[^a-zA-Z]", ""));      //Removes space, Special Characters and digits
System.out.println(str.replaceAll("[^a-zA-Z\\s]", ""));   //Removes Special Characters and Digits
System.out.println(str.replaceAll("\\s+", ""));           //Remove spaces only
System.out.println(str.replaceAll("\\p{Punct}", ""));     //Removes Special characters only
System.out.println(str.replaceAll("\\W", ""));            //Removes space, Special Characters but not digits
System.out.println(str.replaceAll("\\p{Punct}+", ""));    //Removes Special characters only
System.out.println(str.replaceAll("\\p{Punct}|\\d", "")); //Removes Special Characters and Digits

答案 1 :(得分:2)

我能够使用此功能获得您正在寻找的输出。我不确定你是否需要多个空格是单个空格,这就是为什么我添加了第二个调用来替换all以将多个空格转换为单个空格的原因。

public class RemovePunctuation {
    public static void main(String[] args) {
        String input = "Testing, testing, 1, one, 2, two, 3, three.";
        String alpha = input.replaceAll("[^a-zA-Z\\s]", "").replaceAll("\\s+", " ");
        System.out.println(alpha);
    }
}

此方法输出:

Testing testing one two three

如果您希望每个单词的第一个字符大写(就像您在问题中所示),那么您可以这样做:

public class Foo {
    public static void main(String[] args) {
        String input = "Testing, testing, 1, one, 2, two, 3, three.";
        String alpha = input.replaceAll("[^a-zA-Z\\s]", "").replaceAll("\\s+", " ");
        System.out.println(alpha);

        StringBuilder upperCaseWords = new StringBuilder();
        String[] words = alpha.split("\\s");

        for(String word : words) {
            String upperCase = Character.toUpperCase(word.charAt(0)) + word.substring(1) + " ";
            upperCaseWords.append(upperCase);
        }
        System.out.println(upperCaseWords.toString());
    }
}

哪个输出:

Testing testing one two three Testing Testing One Two Three

答案 2 :(得分:1)

我认为Java支持

\p{Punct}

删除所有标点符号