如何将bufferedReader中的行拆分为单词

时间:2017-07-31 20:55:55

标签: java split bufferedreader

我需要帮助来创建分割代码行的代码,然后它可以进行一些拼写检查。

  public static void main(String [] args) throws IOException {
    Stem myStem = new Stem();

    BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(new FileInputStream("C:\\Users\\lamrh\\IdeaProjects\\untitled1\\src\\bigON\\data.txt")));

    //String currentWord = String.valueOf(bufferedReader.readLine());
    Scanner scanner = new Scanner(bufferedReader.readLine());
    //byte[] data = new byte [currentWord.length()];
    String[] splitLines;
    //splitLines = splitLines.split(" ");


    String line;
    while((line = bufferedReader.readLine()) !=null  ){
        //splitLines = line.split(" ");
        String currentWord1 = formatWordGhizou ( line);
        System.out.println(""+ line+""+ ":"+ currentWord1);

    }
    bufferedReader.close();


}

结果显示了这一点:

سْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيم

سْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيم ِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيم ِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيم ِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيم ِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيم ِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيم ِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ:سماللهالرحمنالرحيم

它应该逐字逐句而不是一行词。  任何帮助 谢谢。

2 个答案:

答案 0 :(得分:0)

在while循环中尝试将行字符串连接成行,使用正则表达式分割行以填充String数组splitLines,然后迭代数组splitLines以将元素发送到标准输出,如下所示(adapted from helpful tutorial at this link

source ~/.profile

答案 1 :(得分:-1)

// format the word by removing any punctuation, diacritics and non-letter charracters
private static String formatWordGhizou ( String currentWord )
{
    StringBuffer modifiedWord = new StringBuffer ( );


    // remove any diacritics (short vowels)
    if ( removeDiacritics( currentWord, modifiedWord ) )
    {
        currentWord = modifiedWord.toString ( );
    }

    // remove any punctuation from the word
    if ( removePunctuation( currentWord, modifiedWord ) )
    {
        currentWord = modifiedWord.toString ( ) ;
    }

    // there could also be characters that aren't letters which should be removed
    if ( removeNonLetter ( currentWord, modifiedWord ) )
    {
        currentWord = modifiedWord.toString ( );
    }

    // check for stopwords
    if( !checkStrangeWords ( currentWord ) )
        // check for stopwords
        if( !checkStopwords ( currentWord ) )
            currentWord = stemWord ( currentWord );

    return currentWord;
}

//-----------------