如何从java中删除数组中的重复单词

时间:2013-04-01 19:21:52

标签: java arrays duplicates

我正在读取一个文件,然后调用一个string []方法,将该行分成单个单词,将每个单词添加到一个唯一单词数组(没有重复单词),并返回唯一单词数组。

我无法弄清楚如何只打印每个单词,但这是我到目前为止所做的。

static public String[ ] sortUnique( String [ ] unique, int count)
{
    String temp;
    for(int i = 1; i < count; i++) {
        temp = unique[i].replaceAll("([a-z]+)[,.?]*", "$1");;
        int j;
        for(j = i - 1; j>= 0 && (temp.compareToIgnoreCase(unique[j]) < 0);j--) {
            unique[j+1] = unique[j];
        }
        unique[j+1] = temp;
    }
    return unique;
}

这是数据文件。

    Is this a dagger which I see before me,
    The handle toward my hand? Come, let me clutch thee.
    I have thee not, and yet I see thee still.
    Art thou not, fatal vision, sensible
    To feeling as to sight? Or art thou but
    A dagger of the mind, a false creation,

非常感谢任何帮助!

1 个答案:

答案 0 :(得分:4)

要阅读文件并删除重复的字词:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.StreamTokenizer;
import java.util.Set;
import java.util.TreeSet;

public class WordReader {

   public static void main( String[] args ) throws Exception {
      BufferedReader br =
         new BufferedReader(
            new FileReader( "F:/docs/Notes/Notes.txt" ));
      Set< String > words = new TreeSet<>();                // {sorted,unique}
      StreamTokenizer st = new StreamTokenizer( br );
      while( st.nextToken() != StreamTokenizer.TT_EOF ) {
         if( st.ttype == StreamTokenizer.TT_WORD ) {
            words.add( st.sval );
         }
      }
      System.out.println( words );
      br.close();
   }
}