我目前正在尝试在C中实现horspool字符串匹配算法。它适用于小型数据集,但由于某种原因,它不能使用大型数据集。
这是我的createTable函数:
void tableCreate(char* string, int table[]) {
int i = 0;
int length = strlen(string);
for (i = 0; i < 500; ++i) {
table[i] = length;
}
for (i = 0; i < length - 1; ++i) {
table[string[i]] = length - i - 1;
}
}
这是实际的实现:
//table is a global variable
int table[500];
pattern = calloc(256, sizeof(char));
pattern = fgets(pattern, 255, stdin);
pattern[strcspn(pattern, "\n")] = 0;
length = strlen(pattern);
tableCreate(pattern, table);
char c;
int count = 0;
char buffer[255];
while (fgets(buffer, 255, file) != NULL) {
int stringLength = strlen(buffer);
int j = 0;
while (j <= stringLength - 1) {
c = buffer[j + length - 1];
if (pattern[length - 1] == c && memcmp(pattern, buffer + j, length - 1) == 0) {
count++;
}
j += table[c];
}
}
不幸的是,我无法提供大数据集,因为它超过40,000行。我测试的小数据集只有两句话。
无限循环/段错误的gdb输出:
j: 48
stringLength: 78
length: 5
Shifting
j: 48
table[c]: 0
j: 48
stringLength: 78
length: 5
Shifting
j: 48
table[c]: 0
j: 48
stringLength: 78
length: 5
Shifting
j: 48
以上输出是在gdb中运行的print语句的结果。
答案 0 :(得分:1)
当你接近缓冲区的末尾时,这会导致溢出:
while (j <= stringLength - 1) {
c = buffer[j + length - 1];
应该是:
while (j <= stringLength - length) {
c = buffer[j + length - 1];