Question

我的文件大小为500MB。它中有一些非ascii字符。我只是想用Unix命令找出那些字符。可能最好在每一行获得行号和位置。

谢谢：）

Answer 1

使用the other solution中给出的答案，但将-n添加到grep。

Answer 2

你知道，这很奇怪。有时我觉得编写一些快速而脏的C比尝试导航UNIX实用程序命令行选项的荒野更快： - ）

#include <stdio.h>

int main (void) {
    size_t ln = 1;
    size_t chpos = 0;
    int chr;
    while ((chr = fgetc (stdin)) != EOF) {
        if (chr == '\n') {
            ln++;
            chpos = 0;
            continue;
        }
        chpos++;
        if (chr > 127) {
            printf ("Non-ASCII %02x found at line %d, offset %d\n",
                chr, ln, chpos);
        }
    }
    return 0;
}

这将为您提供ASCII范围之外的任何字符的行号和该行中的字符位置。

用于查找非ascii字符的Unix命令

2 个答案: