Question

在此代码中：

我阅读了文件~/usr/share/dict/word的内容并将其存储在数组中。
然后开始对这个数组进行二进制搜索算法，但问题是在将数组传递给第62行的二进制搜索函数并尝试将其与binary_search(string* dictionary, string key)方法中的键进行比较之后。
我发现它将key与此未知字符串"��tudes"进行比较，原因是我不知道。
我确信该数组包含正确的数据。

代码：

#include <stdio.h>
#include <cs50.h>
#include <string.h>

#define MAX 99171

// Prototype //
int binary_search(string*, string);

int main(int argc, string argv[])
{
    // Attributes // 
    string dictionary[MAX];
    FILE* dictionaryFile = fopen("words", "r");
    char output[256];
    string key = argv[1];

    // Check if their is a problem while reading the file //
    if (dictionaryFile == NULL)
    {
        // If everything got fouled up then close the file // 
        fclose(dictionaryFile);
        printf("couldn't read the file!!!\n");
        return 1;
    }

    // storing the information into an array to make it easy to read //
    for(int i = 0; i < MAX; i++)
    { 
        fgets(output, sizeof(output), dictionaryFile); 
        dictionary[i] = output;
    }

    // Binary Search a word //
    if(binary_search(dictionary, key) == 1)
    {
        printf("word was found !!!\n");
    }
    else if(binary_search == 0)
    {
        printf("word was not found !!!\n");
    }

    // If Everything goes just fine close the file //
    fclose(dictionaryFile);
    return 0;
}


// implementing prototype //

/**
    @arag dictionary 
        a string of english words 

    @arg key 
        a key we looking for

    @return 
        0 if didn't find the key and 1 otherwise
*/
int binary_search(string* dictionary, string key)
{
    // pointer to the start and the end of the array //
    int start = 0;
    int end = MAX - 1;
    int mid;

    // while end is greater than the start //
    while (end > start)
    {
        // Get The Middle Element //
        mid = (start + end) / 2;
        printf("%s\n", dictionary[mid]);

        // Check if the middle elemenet //
        if (strcmp(key, dictionary[mid]) == 0)
        {
            return 1;
        }

        // Check the left half //
        else if(strcmp(key, dictionary[mid]) < 0)
        {
            end = mid - 1;
        }

        // Check the right half //
        else if (strcmp(key, dictionary[mid]) > 0)
        {
            start = mid + 1;
        }
    }
    // didn't find the key //
    return 0;

}

注意：cs50.h库是由Harvard制作的，作为像我这样的初学者的训练轮，我在我的代码中使用它，这是它reference的链接。

Answer 1

cs50.h库由哈佛大学制作，作为初学者的训练轮。

如果是这样，这些训练轮倒置安装，不要接触地面。我无法通过您的链接说出来，但我认为

typedef char *string;

是cs50套件的一部分。但是C中没有字符串;表达式松散地用于表示以空字符'\0'结尾的字符数组。

string的上述定义让您相信字符串是一种正确的类型，其内存会自动处理。它不是。在你的程序中有一个字符串的位置，即数组

char output[256];

＆＃34;字符串＆＃34;在你的字典中只是指针;它们应该指向现有的char数组或NULL。通过分配

dictionary[i] = output;

使字典中的所有字符串都等于临时缓冲区output。该缓冲区会在您阅读的每一行中被覆盖，并且只包含您已读过的最后一行，可能是"zulu"。

您可以在阅读完字典后打印出字典来确认。您应该在一个单独的循环中打印它，而不是在您阅读它的同一循环中打印它以查看效果。

您可以通过将指针数组声明为char：

数组的数组来解决此问题

char dictionary[MAX][LEN];

其中LEN是单词的合适最大长度，例如24.（这里的问题可能是分配的内存，MAX * LEN字节可能不适合堆栈。在这种情况下，你必须使用malloc在堆上分配内存。我不打算在这里打开那些蠕虫。如果你立即遇到分段违规，请尝试减少MAX，但只需要阅读字典的一部分。）

阅读单词时，必须复制内容：

fgets(output, sizeof(output), dictionaryFile); 
strncpy(dictionary[i], output, sizeof(dictionary[i]);

或者，更好的是，直接在字典中读下一个单词：

fgets(dictionary[i], sizeof(dictionary[i]), dictionaryFile);

不幸的是，fgets会在最后保留换行符，因此它会显示"word\n"而不是"word"。您必须删除换行符或不符合输入的字符串，该输入来自命令行，argv没有尾随换行符。

有几种方法可以消除不需要的换行符。一个简单的方法是使用换行符作为分隔符来标记字符串：

strtok(dictionary[i], "\n");

另一个问题是，对于dictionary的新定义，您binary_search的签名是错误的。你不再拥有一个指向char的指针数组，你有一个24（或左右，固定数量）的数组数组。将其更改为：

int binary_search(char dictionary[][LEN], const char *key)

在C中，如果你有数组（数组，偶数）数组，那么除了最顶层的维度之外的所有数据都必须是已知的，这样编译器就可以布局内存。

还有其他（相当小的）问题：

如果文件无法打开，您会尝试fclose该文件。当文件为NULL时，您没有要关闭的打开文件;刚退出。
您应该强制执行至少一个参数，否则您可能会循环使用null键，当您尝试比较它时，这将导致未定义的行为（即很可能是崩溃）。
当您阅读文字时，请不要依赖硬编码字数。您不知道文件中有多少单词。检查fgets的返回值;当文件用完时，它返回NULL。 MAX是估算单词数量的好方法，但您应该保留在变量中读取的实际单词数。确保您不会访问超过您阅读的单词，并确保您不要超出已分配的内存，即不要阅读超过MAX个单词的内容
如果您没有硬编码的字数，则应将该计数作为binary_search函数的参数。
在＆＃34;未找到＆＃34; beanch，你的测试是else if(binary_search == 0)。首先，else aleady意味着二进制搜索没有返回1（这是else引用的条件）并且二进制搜索只能返回0和1，所以＆＃39;不需要另一个条件。其次，binary_search只是函数的地址，而不是结果;如上所述的意见将永远属实。
二元搜索功能中的strcmp调用也是如此：您进行三次比较。您检查的结果是互斥的，因此最后一个条件可以只是else。（因为strcmp每次都进行逐字符比较，所以每个单词只需调用一次strcmp并存储结果。）

string标题中的cs50数据类型旨在提供一种简单的方法来读取字符串而无需关心内存。一旦开始创建更复杂（也称为现实）的数据结构，最好使用char数组和指针。无论如何都无法解决这个问题，你可以看到每个数据是什么。

对不起，我的回答看起来像是错误的清单。对于初学者来说，C语言的字符串处理并不容易，特别是如果您已经有过更高级语言的经验。好的一点是，当你理解C字符串时，你已经了解了很多关于C语言的完成情况。

二进制搜索词典

1 个答案: