在文本文件中查找最常见的单词

时间:2018-06-26 10:19:23

标签: java performance

我创建了一个控制台应用程序,旨在显示文本文件中最常用的单词。请查看下面的代码:

public class Main
{

    public static void main(String[] args)
    {

        readTextFile();

    }


    private static void readTextFile()
    {
        final String path = C://Users//Geffrey//IdeaProjects//test.txt";

        File file = new File(path);
        BufferedReader bufferedReader = null;
        try
        {
            bufferedReader = new BufferedReader(new FileReader(file));

        } catch (FileNotFoundException e)
        {
            e.printStackTrace();
        }

        String inputLine = null;
        Map<String, Integer> wordMap = new HashMap<>();


        try
        {
            while ((inputLine = bufferedReader.readLine()) != null)
            {
                String[] words = inputLine.split("[.,;:!?(){}— \\s]"); //

                for (int count = 0; count < words.length; count++)
                {
                    String key = words[count].toLowerCase(); // remove .toLowerCase for Case Sensitive result.
                    if (key.length() > 0)
                    {
                        if (wordMap.get(key) == null)
                        {
                            wordMap.put(key, 1);
                        } else
                        {
                            int value = wordMap.get(key).intValue();
                            value++;
                            wordMap.put(key, value);
                        }
                    }
                }
            }

            List<WordComparable> topOccurrence = findMaxOccurance(wordMap, 1);
            System.out.println("Most Frequent word: " + topOccurrence.get(0).wordFromFile + " occurred " + topOccurrence.get(0).numberOfOccurrence + " times");  //Maixmum Occurance of Word in file:

        } catch (IOException error)
        {
            System.out.println("Invalid File");
        } finally
        {
            try
            {
                bufferedReader.close();
            } catch (IOException e)
            {
                e.printStackTrace();
            }
        }





    }



    public static List<WordComparable> findMaxOccurance(Map<String, Integer> map, int n)
    {
        List<WordComparable> list = new ArrayList<>();
        for (Map.Entry<String, Integer> entry : map.entrySet())
            list.add(new WordComparable(entry.getKey(), entry.getValue()));

        Collections.sort(list);
        return list;
    }

WordComparable类:

public class WordComparable implements Comparable<WordComparable>
{
    public String wordFromFile;
    public int numberOfOccurrence;

    public WordComparable(String wordFromFile, int numberOfOccurrence)
    {
        super();
        this.wordFromFile = wordFromFile;
        this.numberOfOccurrence = numberOfOccurrence;
    }

    @Override
    public int compareTo(WordComparable arg0)
    {
        int wordCompare = Integer.compare(arg0.numberOfOccurrence, this.numberOfOccurrence);
        return wordCompare != 0 ? wordCompare : wordFromFile.compareTo(arg0.wordFromFile);

    }

    @Override
    public int hashCode()
    {
        final int uniqueNumber = 19;
        int wordResult = 9;
        wordResult = uniqueNumber * wordResult + numberOfOccurrence;
        wordResult = uniqueNumber * wordResult + ((wordFromFile == null) ? 0 : wordFromFile.hashCode());
        return wordResult;
    }

    @Override
    public boolean equals(Object obj)
    {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        WordComparable other = (WordComparable) obj;
        if (numberOfOccurrence != other.numberOfOccurrence)
            return false;
        if (wordFromFile == null)
        {
            if (other.wordFromFile != null)
                return false;
        } else if (!wordFromFile.equals(other.wordFromFile))
            return false;
        return true;
    }
}

我的问题是我的解决方案是解决此问题的最有效方法,如果不能解决,我还可以进行哪些其他更改来改进代码。

1 个答案:

答案 0 :(得分:0)

您可以考虑创建最大优先级队列(经典的最大堆)而不是HashMap。

如果使用maxheap,则可以找到O(1)时间中最常见的单词。