Question

我正在尝试解析文件以摆脱“不必要的”信息（空格，评论（#标记注释）等），我知道我需要使用{{{ 1}}和fgets但是当我需要每行的点点滴滴时，我并不完全确定如何做到这一点。

示例：

假设我需要解析的文本文件中有一行，它是 -

strtok

我希望它的结构为 -

    (\t) foo  54  232  574   #random comment

我如何构建我的foo 54 232 574和fgets函数来正确解析这样的行？

Answer 1

这看起来很容易使用正则表达式完成。因此，即使您不能使用perl或类似的东西，您也可以尝试使用C的正则表达式库。

你基本上使用的东西是：

s/\s\+\(.*\)#.*/\1/

（这是你想要的sed等价物，我今天晚些时候会用实际的C代码更新）

（我假设您要删除前导空格并在此处进行尾随评论）

PCRE中的等价物：

\s+(.*)#.*

Regular expression visualization

Debuggex Demo

Answer 2

这应该有效。它从stdin读取并写入stdout。我注意到你假设没有长度超过256的线，我在这里做了同样的假设。

#include <stdio.h>
#include <string.h>
int main(void) {
        char buf[256];
        while(fgets(buf, sizeof(buf), stdin)) {
                char *hash = strchr(buf, '#');
                if(hash) *hash = 0; // terminate at the '#'

                char *word = strtok(buf, " \t\n");
                int count = 0;
                while(word) {
                        printf("%s%s", count++ ? " " : "", word);
                        word = strtok(NULL, " \t\n");
                }
                if(count) {
                        printf("\n");
                }
        }
        return 0;
}

更新此代码对您的输入执行的操作：

[Charlies-MacBook-Pro:~/junk] crb% a.out < i > o
[Charlies-MacBook-Pro:~/junk] crb% cat o
//This is a sample file I just made to use
.text
main:
la $s0, Var1
lw $s0, 0($s0)
exit:
li $v0, 10
syscall
.data
Var1: .word 32

在c中解析文件并仅将特定信息复制到另一个文件

2 个答案: