Question

我目前正在尝试在C中实现horspool字符串匹配算法。它适用于小型数据集，但由于某种原因，它不能使用大型数据集。

这是我的createTable函数：

void tableCreate(char* string, int table[]) {
    int i = 0;
    int length = strlen(string);

    for (i = 0; i < 500; ++i) {
        table[i] = length;
    }

    for (i = 0; i < length - 1; ++i) {
        table[string[i]] = length - i - 1;
    }
}

这是实际的实现：

//table is a global variable
int table[500];

pattern = calloc(256, sizeof(char));
pattern = fgets(pattern, 255, stdin);
pattern[strcspn(pattern, "\n")] = 0;
length = strlen(pattern);

tableCreate(pattern, table);

char c;
int count = 0;
char buffer[255];

while (fgets(buffer, 255, file) != NULL) {
    int stringLength = strlen(buffer);
    int j = 0;

    while (j <= stringLength - 1) {
        c = buffer[j + length - 1];

        if (pattern[length - 1] == c && memcmp(pattern, buffer + j, length - 1) == 0) {
            count++;
        }

        j += table[c];
    }
}

不幸的是，我无法提供大数据集，因为它超过40,000行。我测试的小数据集只有两句话。

无限循环/段错误的gdb输出：

j: 48
stringLength: 78
length: 5
Shifting
j: 48
table[c]: 0
j: 48
stringLength: 78
length: 5
Shifting
j: 48
table[c]: 0
j: 48
stringLength: 78
length: 5
Shifting
j: 48

以上输出是在gdb中运行的print语句的结果。

Answer 1

当你接近缓冲区的末尾时，这会导致溢出：

while (j <= stringLength - 1) {
    c = buffer[j + length - 1];

应该是：

while (j <= stringLength - length) {
    c = buffer[j + length - 1];

Horspool算法

1 个答案: