Question

我有一个csv文件。该文件有一些异常，因为它包含一些未知字符。

字符显示在流行编辑器的第1535行（下面附带的图像）。该终端的终端中的sed命令没有显示任何内容。

$ sed '1535!d' sample.csv
"sample_id","sample_column_text_1","sample_"sample_id","sample_column_text_1","sample_column_text_2","sample_column_text_3"

但以下是各种编辑器中文件的快照。

Sublime Text

纳米

六

该目录包含各种包含此字符/字符串的csv文件。

我需要编写一个bash脚本来确定具有此类字符的文件。我怎样才能做到这一点？

Answer 1

您可以尝试tr：

grep '\000' filename to find if the files contain the \000 characters.

您可以使用它来删除NULL并使其成为非NULL文件： tr < file-with-nulls -d '\000' > file-without-nulls

Answer 2

以下是来自;

http://www.linuxquestions.org/questions/programming-9/how-to-check-for-null-characters-in-file-509377/

#!/usr/bin/perl -w

use strict;

my $null_found = 0;

foreach my $file (@ARGV) {
    if ( ! open(F, "<$file") ) {
        warn "couldn't open $file for reading: $!\n";
        next;
    }

    while(<F>) {
        if ( /\000/ ) {
            print "detected NULL at line $. in file $file\n";
            $null_found = 1;
            last;
        }
    }
    close(F);
}

exit $null_found;

如果它按预期工作，您可以将其保存到文件nullcheck.pl并使其可执行;

chmod +x nullcheck.pl

它似乎将一组文件名作为输入，但如果在任何文件名中找到它将失败，所以我一次只传入一个。以下命令用于运行脚本。

for f in $(find . -type f -exec grep -Iq . {} \; -and -print) ; do perl ./nullcheck.pl $f || echo "$f has nulls"; done

以上查找命令取自Linux command: How to 'find' only text files?

查找csv文件中是否存在null

2 个答案: