Question

假设我的文件中填充了带有空格的随机字符，并且还包含随机字符。

我想寻找这些字符组，例如：UU，II，NJ，KU。因此，目的是读取文件，查找这类组，并说明文件中有多少组。

我的问题是空白和\ n，因为如果我找到其中一个我应该跳过它并再次搜索这些组。我发现了一个可以帮助我的解决方案，功能 strtok_r 。

http://www.codecogs.com/reference/computing/c/string.h/strtok.php?alias=strtok_r

我认为这会隔离完整的字符串，所以我可以一次阅读。

这是一个好的解决方案还是应采取其他方法？

Answer 1

一个天真的解决方案可能会一次读取一个字符，当它是'U'，'I'，'N'或'K'时，然后读取另一个字符以查看它是否是组中的下一个字符。如果是，那么增加该组的计数器。所有其他字符都被丢弃了。

编辑：示例功能：

int count_uu = 0;
int count_ii = 0;
int count_nj = 0;
int count_ku = 0;

void check_next_char(int expected, FILE *input, int *counter);

void count(FILE *input)
{
    int ch;  /* Character we read into */

    while ((ch = fgetc(input)) != EOF)
    {
        switch (ch)
        {
        case 'U':
            check_next_char('U', input, &count_uu);
            break;
        case 'I':
            check_next_char('I', input, &count_ii);
            break;
        case 'N':
            check_next_char('J', input, &count_nj);
            break;
        case 'K':
            check_next_char('U', input, &count_ku);
            break;

        default:
            /* Not a character we're interested in */
            break;
    }
}

/* This function gets the next character from a file and checks against
   an `expected` character. If it is same as the expected character then
   increase a counter, else put the character back into the stream buffer */
void check_next_char(int expected, FILE *input, int *counter)
{
    int ch = fgetc(input);
    if (ch == expected)
        (*counter)++;
    else
        ungetc(ch, input);
}

Answer 2

您也可以使用

https://github.com/leblancmeneses/NPEG/tree/master/Languages/npeg_c

如果您的搜索模式变得更加困难。

这是一个可以导出C版本的可视化工具： http://www.robusthaven.com/blog/parsing-expression-grammar/npeg-language-workbench

规则语法的文档： http://www.robusthaven.com/blog/parsing-expression-grammar/npeg-dsl-documentation

规则

    (?<UU>): 'UU'\i; 
(?<II>): 'II'\i; 
(?<NJ>): 'NJ'\i; 
(?<KU>): 'KU'; // does not use \i so is case sensitive 

Find: UU / II / NJ / KU;
(?<RootExpression>): (Find / .)+;

输入1：

 UU, II, NJ, KU  uu, ii, nJ, kU

输入2：

jsdlfj023#uu, ii, nJ, kU $^%900oi)()*()  UU, II, NJ, KU

在C中解析文件以读取char

2 个答案: