删除java中的所有非字母数字字符

时间:2015-10-14 16:53:51

标签: java regex

这是一个程序,它显示每个单词在文本文件中出现的次数。发生了什么事情,它还捡到了像?而且,我只想要它来挑选信件。这只是结果的一部分{“1”= 1,“干杯”= 1,“范妮”= 1,“我= 1”,饼干“= 1,”主席“)= 1,”cheeahz“= 1, “crisps”= 1,“跳线”= 1,?= 20,工作:= 1

import java.io.File;
import java.io.FileReader;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;
import java.util.TreeMap;
import java.util.StringTokenizer;

public class Unigrammodel {

public static void main(String [] args){

    //Creating BufferedReader to accept the file name from the user
    BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

    String fileName = null;
    System.out.print("Please enter the file name with path: ");
    try{
        fileName = (String) br.readLine();

        //Creating the BufferedReader to read the file
        File textFile = new File(fileName);
        BufferedReader input = new BufferedReader(new FileReader(textFile));

        //Creating the Map to store the words and their occurrences
        TreeMap<String, Integer> frequencyMap = new TreeMap<String, Integer>();
        String currentLine = null;

        //Reading line by line from the text file
        while((currentLine = input.readLine()) != null){

            //Parsing the words from each line
            StringTokenizer parser = new StringTokenizer(currentLine); 
            while(parser.hasMoreTokens()){
                String currentWord = parser.nextToken();




                //remove all non-alphanumeric from this word

            currentWord.replaceAll(("[^A-Za-z0-9 ]"), "");

                Integer frequency = frequencyMap.get(currentWord); 
                if(frequency == null){
                    frequency = 0;                      
                }
                //Putting each word and its occurrence into Map 
                frequencyMap.put(currentWord, frequency + 1);
            }

        }

        //Displaying the Result

        System.out.println(frequencyMap +"\n");

    }catch(IOException ie){
        ie.printStackTrace();
        System.err.println("Your entered path is wrong");
    }       

}

}

1 个答案:

答案 0 :(得分:1)

字符串是不可变的,因此您需要在将修改后的字符串添加到地图之前将其分配给变量。 String wordCleaned = currentWord.replaceAll((“[^ A-Za-z0-9]”),“”); ... frequencyMap.put(wordCleaned,frequency + 1);