C搜索字符串中的单词

时间:2016-01-20 18:15:00

标签: c string file search

我希望有人可以帮助我。我认为这是一个简单的问题, 我想编写一个搜索文件中单词的程序。

char *such = "Ingo";
char *fund;
FILE *datei;
char text[100];

datei = fopen("names.txt", "r");

if (datei == NULL) {
    printf("Fehler\n");
}
else 
{
    fscanf(datei, "%100c", text);
    text[100] = '\0';
    //i think this dont work
    if (fgets(text, 100, datei) != NULL)
    {
        printf("%s \n", text);
    }   
}

return 0;

该文件包含:

Ingo Test Test 123 Test Ingo Ingo

现在我想搜索名称" Ingo"在文件中。

可以搜索更多单词,也许" ingo"和"测试"数数呢?

3 个答案:

答案 0 :(得分:1)

有两种非常简单的方法可以实现这一目标:

  1. 在循环中,你使用fscanf从文件中找到单词,直到你达到EOF,同时询问你是否正在寻找你正在寻找的字符串strcmp(字符串比较)来自string.h

    < / LI>
  2. 使用两个循环,在外部循环中使用fgetc获取字符,直到达到某些分隔符(如空格或\ n或\ t),并在内部循环中检查您使用getc扫描的单词是否为单词对于。你需要一些临时的char数组。

答案 1 :(得分:1)

#include <stdio.h>
#include <string.h>
#include <ctype.h>

int main(void) {
    char *such = "Ingo";
    FILE *datei;
    char word[100];
    int counter = 0;

    datei = fopen("names.txt", "r");

    if (datei == NULL) {
        printf("Fehler\n");
    }
    else 
    {
        while(1==fscanf(datei, "%99s", word)){//read word by word
            word[0] = toupper(word[0]);       //ingo --> Ingo
            if (strcmp(word, such) == 0){
                ++counter;
            }
        }
        fclose(datei);
        if (counter != 0){
            printf("number of '%s' is %d\n", such, counter);
        }   

    }

    return 0;
}

答案 2 :(得分:1)

您应该测试很多条件,以确保您只匹配整个单词等。以下是搜索init并且仅匹配jury,{{1 },但不是jury。您还应该考虑是否要匹配单词的复数形式(例如jury'sinjury。单个分隔符集合(review)下方可以确保您匹配整个单词如果你想匹配复数或其他各种后缀,你可以很容易地把它分成两个并有一个开始和结束集。

代码期望文件名作为第一个参数搜索,搜索词(reviews)作为第二个参数。 (如果没有给出参数,它将在delim上搜索sterm的文本。代码将文件中的每一行读入一个名为stdin的临时缓冲区,然后在'the'中搜索每个字符,查找line中的起始字符。如果找到,则检查前一个字符以确保它是分隔符,然后单词后面的字符(line长度)也是分隔符。如果它是以与sterm相同的字符开头的单词,则在前后分隔,则使用sterm对内容进行比较。

如果满足所有条件,则该单词将复制到sterm并且strncmp会递增。结果与tmp从零开始的位置一起打印以进行匹配。这只是一个尚未优化的基本全字搜索,但应该为您提供一个起始位置,用于区分较少包含的子字符串中的整个单词。 (即搜索count也不会匹配line'the''them'等。)。您还可以将此代码转换为一个函数,该函数将每个匹配的行号和位置保存在可以返回指针的结构数组中。这样你就可以解析你的文本并返回一个指向数组的指针,该数组包含每个匹配的行和位置。 (那是另一天)。

查看代码并告诉我您是否有疑问。如果您不关心只匹配整个单词,那么您可以在每行上重复调用'then',同时推进指针以计算搜索词的出现次数。无论什么最符合您的需求。

'they'

示例文件

strstr

<强>输出

#include <stdio.h>
#include <string.h>

#define MAXS 256

int main (int argc, char **argv)
{
    char line[MAXS] = {0};  /* line buffer for fgets */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
    char *sterm = argc > 2 ? argv[2] : "the";
    char *delim = " \t\n\'\".";
    size_t count = 0, idx = 0, slen = strlen (sterm);

    if (!fp) {
        fprintf (stderr, "error: file open failed '%s'\n", argv[1]);
        return 1;
    }

    while (fgets (line, MAXS, fp))
    {
        size_t i, llen = strlen (line);
        idx++;

        if (llen < slen + 1)
            continue;       /* line not longer than search term + \n */

        for (i = 0; i < llen - slen + 1; i++) {

            if (line[i] != *sterm)
                continue;   /* char != first char in sterm  */
            if (i && !strchr (delim, line[i-1]))
                continue;   /* prior char is not a delim    */
            if (!strchr (delim, line[i+slen]))
                continue;   /* next char is not a delim     */
            if (strncmp (&line[i], sterm, slen))
                continue;   /* chars don't match sterm      */

            printf (" line[%2zu] match %2zu. '%s' at location %zu\n",
                    idx, ++count, sterm, &line[i] - line);
        }
    }
    if (fp != stdin) fclose (fp);

    printf ("\n total occurrences of '%s' in '%s' : %zu\n\n",
            sterm, argc > 1 ? argv[1] : "stdin", count);

    return 0;
}

$ cat dat/damages.txt
Personal injury damage awards are unliquidated
and are not capable of certain measurement; thus, the
jury has broad discretion in assessing the amount of
damages in a personal injury case. Yet, at the same
time, a factual sufficiency review insures that the
evidence supports the jury's award; and, although
difficult, the law requires appellate courts to conduct
factual sufficiency reviews on damage awards in
personal injury cases. Thus, while a jury has latitude in
assessing intangible damages in personal injury cases,
a jury's damage award does not escape the scrutiny of
appellate review.

Because Texas law applies no physical manifestation
rule to restrict wrongful death recoveries, a
trial court in a death case is prudent when it chooses
to submit the issues of mental anguish and loss of
society and companionship. While there is a
presumption of mental anguish for the wrongful death
beneficiary, the Texas Supreme Court has not indicated
that reviewing courts should presume that the mental
anguish is sufficient to support a large award. Testimony
that proves the beneficiary suffered severe mental
anguish or severe grief should be a significant and
sometimes determining factor in a factual sufficiency
analysis of large non-pecuniary damage awards.

使用指针代替数组索引表示法

您可能会发现使用指针而不是数组索引表示法更为自然。 (例如,使用$ ./bin/searchterm dat/damages.txt jury line[ 3] match 1. 'jury' at location 0 line[ 6] match 2. 'jury' at location 22 line[ 9] match 3. 'jury' at location 37 line[11] match 4. 'jury' at location 2 total occurrences of 'jury' in 'dat/damages.txt' : 4 并推进$ ./bin/searchterm <dat/damages.txt line[ 2] match 1. 'the' at location 50 line[ 3] match 2. 'the' at location 39 line[ 4] match 3. 'the' at location 43 line[ 5] match 4. 'the' at location 48 line[ 6] match 5. 'the' at location 18 line[ 7] match 6. 'the' at location 11 line[11] match 7. 'the' at location 38 line[17] match 8. 'the' at location 10 line[19] match 9. 'the' at location 34 line[20] match 10. 'the' at location 13 line[21] match 11. 'the' at location 42 line[23] match 12. 'the' at location 12 total occurrences of 'the' in 'stdin' : 12 ,而不是使用char *p = line;表示法。如果是这样,您可以使用以下内容替换读取循环:

p

指针符号在C中可能更自然。如果您有任何问题,请告诉我。