使用删除文本文件中的所有数字和字母数字字符

时间:2014-08-16 06:38:03

标签: java regex string replaceall

我有2个文本文件:

File1 - 此文件的格式为user_id tweet_id tweet_text

文件1

60730027    6298443824  thank you echo park. you've changed A LOT, but as long as I'm getting paid to make you move, I'm still with it! 2009-12-03 02:54:10
60730027    6297282530  fat Albert Einstein goin in right now over here!!!  2009-12-03 01:35:22

文件2
此文件的格式为genome_id name ascii_name

4045417 Southwest Indent    Southwest Indent
4045418 Southeast Point     Southeast Point     

以下是读取文件1的代码段:

public void readfromFile() throws FileNotFoundException {
    Scanner inputStream;
    String source=null;
     FileInputStream file = new FileInputStream("file1.txt");   
        String regex = "/[a-zA-Z ]+/";
        Scanner fileScan = new Scanner(file); 

        while(fileScan.hasNextLine()){
            word = fileScan.nextLine();
            word = word.replaceAll(regex, "").toLowerCase();
            PrintWriter outputStreamName = new PrintWriter(new FileOutputStream("temp.txt"));
            outputStreamName.printf("%s",word);
}

我的目的是首先用user_id替换user_id,tweet_id,genome_id中存在的数据。然后将大写值转换为小写。但是,现在只要此代码处理file1,就不会对文本文件进行任何更改。我也想知道发生了什么。当我将其输出到控制台时,我得到输出。

预期产出:

thank you echo park youve changed a lot but as long as im getting paid to make you move im still with it

fat albert einstein goin in right now over here

3 个答案:

答案 0 :(得分:1)

根据预期输出,您希望替换单词之间的字母,点和空格以外的所有内容。

[^a-zA-Z. ]+|(?<=\d)\s*(?=\d)|(?<=\D)\s*(?=\d)|(?<=\d)\s*(?=\D)

这是online demo

或者尝试没有Lookaround

[^a-zA-Z. ]+|\d\s+\d|\D\s+\d|\d\s+\D

此处\s匹配任何空格字符[\r\n\t\f ]

示例代码:

String regex = "[^a-zA-Z. ]+|(?<=\\d)\\s*(?=\\d)|(?<=\\D)\\s*(?=\\d)|(?<=\\d)\\s*(?=\\D)";
str.replaceAll(regex,"");

输出:

thank you echo park. youve changed A LOT but as long as Im getting paid to make you move Im still with it
fat Albert Einstein goin in right now over here

要从输出中排除'使用[^a-zA-Z.' ]+,否则I'myou've会更改为Imyouve

更好使用[a-zA-Z']+来获取所有单词。这是demo

示例代码:

String str = "60730027    6297282530  fat Albert Einstein goin in right now over here!!!  2009-12-03 01:35:22 ";
Pattern p = Pattern.compile("[a-zA-Z']+");
Matcher m = p.matcher(str);
while (m.find()) {
    System.out.print(m.group()+" ");
}

输出:

fat Albert Einstein goin in right now over here 

注意:您正在检查下一行

变化:

source = inputStream.next();

要:

source = inputStream.nextLine();

答案 1 :(得分:0)

public void readfromFile() throws Exception
{

    FileInputStream file = new FileInputStream("file1.txt");    

    StringBuilder builder = new StringBuilder();
    int ch;
    while((ch = file.read()) != -1){
        builder.append((char)ch);
    }

    System.out.println(builder.toString().replaceAll("[^a-zA-Z\\s]", ""));

}

扫描仪过滤空字符串。

前者

Scanner scanner = new Scanner("60730027    6298443824  thank");
while(scanner.hasNext())    //Read from file till the last line of the file.
{
    System.out.print(scanner.next());
}

输出

607300276298443824thank

所以我们不能使用扫描仪。

答案 2 :(得分:0)

试试这个

s = s.replaceAll("\\d+\\s+\\d+\\s+", "").replaceAll(" +\\S+ \\S+$", "");