Spellcheck的递归二进制搜索缺少一些单词

时间:2015-02-13 04:34:21

标签: java recursion binary-search-tree spell-checking

我有一个包含单词列表的字典文件,我将该文件读入数组列表sArray。然后,我有一本书,我使用字符串解析器来获取本书中的每个字符串,并将其发送到二进制搜索方法bSearchbSearch将使用递归二进制搜索来确定是否在包含字典的数组sArray中找到了密钥。如果找不到该单词,则会打印出该单词可能拼写错误。

我的问题是,我得到的词汇输出我知道的事实是在我的词典数组中。我已经确认这些字词已被正确阅读,因此问题归结为使用sArray浏览bSearch。我不确定代码有什么问题。下面列出了一些误报的例子。

这是我字典的粘贴转储的链接;你应该能够在下面搜索这些单词并找到它们。 https://paste.ee/p/wp3qh

示例输出:

结果输出仍为误报

  

ebracteate可能会被误导

     

Phaca可能被拼错了

     

holmberry可能会被误导

     

sraddha可能会被误导

public class Program2 {

private int mid;

public Program2() {
    mid = 0;
}

public static void main(String[] args) throws FileNotFoundException, IOException {
    File inf = new File("dictonary.txt");
    ArrayList<String> sArray = new ArrayList<>();
    Program2 a = new Program2();
    a.readDictonary(sArray);

    Collections.sort(sArray, String.CASE_INSENSITIVE_ORDER);

    int correctRec = 0;
    int incorrectRec = 0;
    int correctW = 0;
    int incorrectW = 0;

    FileInputStream infO = new FileInputStream(new File("oliver.txt"));
    char let;
    String str = "";
    int n = 0;
    while ((n = infO.read()) != -1) {
        let = (char) n;

        if (Character.isLetter(let)) {
            str += Character.toLowerCase(let);
        }

        if ((Character.isWhitespace(let) || let == '-') && !str.isEmpty()) {

            // Write code to insert str in to your tree here
            if (a.bSearch(sArray, str, 0, sArray.size()) >= 0) {
                correctRec++;
            } else {
                incorrectRec++;
            }

            str = "";
        }
    }
    infO.close();
    a.print(correctRec, incorrectRec);
}

public void print(int correctRec, int incorrectRec) {
    System.out.println("Out of total words " + (incorrectWords + correctWords));
    System.out.println("Correct " + correctWords);
    System.out.println("Incorrect " + incorrectWords);
    System.out.println("Total number of recursive steps is " + (correctRec + incorrectRec));
    System.out.println("The average number of comparisons for a word found = " + correctRec / correctWords);
    System.out.println("The average number of comparisons for a word not found = " + incorrectRec / incorrectWords);
}

public void readDictonary(ArrayList<String> sArray) {
    try {
        File f = new File("dictionary.txt");
        Scanner inf = new Scanner(f);
        while (inf.hasNext()) {
            sArray.add(inf.nextLine());
        }
    } catch (FileNotFoundException ex) {
        System.out.println("The dictonary file was not found");
    }
}

public int bSearch(ArrayList<String> sArray, String key, int lowIndex, int highIndex) {
    if (lowIndex > highIndex) {
        System.out.println(sArray.get(mid) + " is possibly mispelled");
        incorrectWords++;
        return rec * -1;
    }

    mid = (lowIndex + highIndex) / 2;

    if (sArray.get(mid).compareToIgnoreCase(key) == 0) {
        correctWords++;

        return rec;
    } else if (sArray.get(mid).compareToIgnoreCase(key) > 0) {
        rec++;
        return bSearch(sArray, key, lowIndex, mid - 1);
    } else {
        rec++;
        return bSearch(sArray, key, mid + 1, highIndex);
    }
}
}

1 个答案:

答案 0 :(得分:0)

问题可能不在你的算法中,看起来很好,但在你的错误信息中

System.out.println(sArray.get(mid) + " is possibly mispelled");

你的意思是

System.out.println(key + " is possibly mispelled");

关于你的二进制搜索,我唯一担心的是你的highIndex似乎是包容性的,但当你调用bSearch例程时,你传递sArray.size(),这是独占的。我怀疑如果你试图搜索一个字典大于词典中任何字词的单词,它会导致崩溃。当你打电话给二进制搜索时,你需要将sArray.size() - 1称为highIndex