Question

以下代码一次读取一个字符的文本文件并将其打印到stdout：

#include <stdio.h>

int main()
{
    char file_to_open[] = "text_file.txt", ch;
    FILE *file_ptr;

    if((file_ptr = fopen(file_to_open, "r")) != NULL)
    {
        while((ch = fgetc(file_ptr)) != EOF)
        {
            putchar(ch);
        }
    }
    else
    {
        printf("Could not open %s\n", file_to_open);
        return 1;
    }
    return(0);
}

但是我没有打印到stdout [putchar（ch）]，而是想在文件中搜索另一个文本文件中提供的特定字符串，即。 strings.txt并将匹配的行输出到out.txt

text_file.txt：

1993 - 1999 Pentium
1997 - 1999 Pentium II
1999 - 2003 Pentium III
1998 - 2009 Xeon
2006 - 2009 Intel Core 2

strings.txt：

Nehalem
AMD Athlon
Pentium

在这种情况下，text_file.txt的三个第一行将匹配。我已经对C中的文件操作做了一些研究，似乎我可以使用fgetc [就像我在我的代码中一样]，一行fgets和一个块来读取一个字符fread，但我认为在我的情况下，没有任何言语是完美的？

Answer 1

我假设这是一个学习练习，你只是在寻找一个开始的地方。否则，你不应该重新发明轮子。

下面的代码可以让您了解所涉及的内容。它是一个程序，允许您指定要搜索的文件的名称以及在该文件中搜索的单个参数。您应该能够修改它以将短语放在一个字符串数组中进行搜索，并检查该数组中的任何单词是否出现在读取的任何行中。

您正在寻找的关键功能是strstr。

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#ifdef DEBUG
#define INITIAL_ALLOC 2
#else
#define INITIAL_ALLOC 512
#endif

char *
read_line(FILE *fin) {
    char *buffer;
    char *tmp;
    int read_chars = 0;
    int bufsize = INITIAL_ALLOC;
    char *line = malloc(bufsize);

    if ( !line ) {
        return NULL;
    }

    buffer = line;

    while ( fgets(buffer, bufsize - read_chars, fin) ) {
        read_chars = strlen(line);

        if ( line[read_chars - 1] == '\n' ) {
            line[read_chars - 1] = '\0';
            return line;
        }

        else {
            bufsize = 2 * bufsize;
            tmp = realloc(line, bufsize);
            if ( tmp ) {
                line = tmp;
                buffer = line + read_chars;
            }
            else {
                free(line);
                return NULL;
            }
        }
    }
    return NULL;
}

int
main(int argc, char *argv[]) {
    FILE *fin;
    char *line;

    if ( argc != 3 ) {
        return EXIT_FAILURE;
    }

    fin = fopen(argv[1], "r");

    if ( fin ) {
        while ( line = read_line(fin) ) {
            if ( strstr(line, argv[2]) ){
                fprintf(stdout, "%s\n", line);
            }
            free(line);
        }
    }

    fclose(fin);
    return 0;
}

示例输出：

E:\Temp> searcher.exe searcher.c char
char *
    char *buffer;
    char *tmp;
    int read_chars = 0;
    char *line = malloc(bufsize);
    while ( fgets(buffer, bufsize - read_chars, fin) ) {
        read_chars = strlen(line);
        if ( line[read_chars - 1] == '\n' ) {
            line[read_chars - 1] = '\0';
                buffer = line + read_chars;
main(int argc, char *argv[]) {
    char *line;

Answer 2

记住：fgetc（），getc（），getchar（）都返回一个整数，而不是一个char。整数可能是EOF或有效字符 - 但它返回的值多于char类型支持的范围。

你正在写'fgrep'命令的代理人：

fgrep -f strings.txt text_file.txt > out.txt

您不需要阅读字符，而是需要使用fgets（）读取行。（忘记gets（）函数存在！）

我缩进了你的代码并插入了一个返回0;最后为你（虽然C99隐含'返回0;'如果你从main（）的末尾掉落）。但是，C99还要求为每个函数提供一个显式的返回类型 - 我为你添加了'int'到'int main（）'（但是你不能使用符合C99的理由来最后不返回0）。错误消息应写入标准错误而不是标准输出。

您可能需要对字符串列表使用动态分配。一个简单的搜索将简单地应用'strstr（）'搜索每行输入中的每个必需字符串（确保在找到匹配后打破循环，这样如果有多个匹配则不会重复一行在一条线上。）

更复杂的搜索会预先计算可以忽略哪些字符，以便您可以并行搜索所有字符串，比循环循环更快地跳过文本。这可能是对搜索算法的修改，例如Boyer-Moore或Knuth-Morris-Pratt（添加：或Rabin-Karp，它是为并行搜索多个字符串而设计的）。

Answer 3

按块读取总是更好，因为它是底层文件系统的工作方式。

因此只需按块读取，检查您的任何单词是否出现在缓冲区中，然后读另一个缓冲区。您只需要小心重新复制新缓冲区中前一个缓冲区的最后几个字符，以避免在搜索字位于缓冲区边界时丢失检测。

如果这个简单的算法不够（在你的情况下可能是这样），那么在一个缓冲区cf Rabin-Karp中同时搜索几个子串的算法要复杂得多。

Answer 4

cat strings.txt |while read x; do grep "$x" text_file.txt; done

在文本文件C中搜索字符串

4 个答案: