Question

基本上，我有大约1,500个文件，这些文件的最后一个字符不应该是任何类型的空格。

如何检查一堆文件以确保它们不会以某种形式的空格结束？（换行符，空格，回车符，制表符等）？

Answer 1

awk '{if (flag) print line; line = $0; flag = 1} END {gsub("[[:space:]]+$","",line); printf line}'

修改

新版本：

sed命令删除仅包含空格的所有尾随行，然后awk命令删除结束换行符。

sed '/^[[:space:]]*$/{:a;$d;N;/\n[[:space:]]*$/ba}' inputfile | awk '{if (flag) print line; line = $0; flag = 1} END {printf line}'

缺点是它会两次读取文件。

编辑2：

这是一个只读取文件一次的全awk解决方案。它以类似于上面sed命令的方式累积仅限空格的行。

#!/usr/bin/awk -f # accumulate a run of white-space-only lines so they can be printed or discarded /^[[:space:]]*$/ { accumlines = accumlines nl $0 nl = "\n" accum = 1 next } # print the previous line and any accumulated lines, store the current line for the next pass { if (flag) print line if (accum) { print accumlines; accum = 0 } accumlines = nl = "" line = $0 flag = 1 } # print the last line without a trailing newline after removing all trailing whitespace # the resulting output could be null (nothing rather than 0x00) # note that we're not print the accumulated lines since they're part of the # trailing white-space we're trying to get rid of END { gsub("[[:space:]]+$","",line) printf line }

编辑3：

删除了不必要的BEGIN子句

将lines更改为accumlines，以便更容易区分line（单数）

添加评论

Answer 2

这将删除所有尾随空格：

perl -e '$s = ""; while (defined($_ = getc)) { if (/\s/) { $s .= $_; } else { print $s, $_; $s = ""; } }' < infile > outfile

sed可能有一个等价物，但我对Perl更熟悉，希望对你有用。基本思路：如果下一个字符是空格，请保存;否则，打印任何已保存的字符，然后是刚刚读取的字符。如果我们在读取一个或多个空格字符后点击EOF，则不会打印它们。

这将只检测尾随空格，如果是这样，则退出代码为1：

perl -e 'while (defined($_ = getc)) { $last = $_; } exit($last =~ /\s/);' < infile > outfile

<强> [编辑] 以上描述了如何检测或更改单个文件。如果您有一个包含要应用更改的文件的大型目录树，则可以将该命令放在单独的脚本中：

fix.pl

#!/usr/bin/perl
$s = "";
while (defined($_ = getc)) {
    if (/\s/) { $s .= $_; } else { print $s, $_; $s = ""; }
}

并将其与find命令一起使用：

find /top/dir -type f -exec sh -c 'mv "{}" "{}.bak" && fix.pl < "{}.bak" > "{}"' ';'

这会将每个原始文件移动到以“.bak”结尾的备份文件。（首先在一个小的测试文件集上进行测试是个好主意。）

Answer 3

可能更容易从下到上阅读文件：

tac filename | 
awk '
    /^[[:space:]]*$/ && !seen {next} 
    /[^[:space:]]/   && !seen {gsub(/[[:space:]]+$/,""); seen=1}
    seen
' | 
tac

Answer 4

Perl解决方案：

# command-line arguments are the names of the files to check.
# output is names of files that end with trailing whitespace
for (@ARGV) {
  open F, '<', $_;
  seek F, -1, 2;                # seek to before last char in file
  print "$_\n" if <F> =~ /\s/
}

Answer 5

ruby -e 's=ARGF.read;s.rstrip!;print s' file

基本上，读取整个文件，剥去最后一个空格（如果有的话），并打印出内容。所以这个解决方案不适用于非常庞大的文件。

Answer 6

您也可以使用man ed删除文件末尾的尾随空格，man dd删除最终换行符（尽管请记住ed将整个文件读入内存并执行就地操作编辑时没有任何先前的备份）：

# tested on Mac OS X using Bash
while IFS= read -r -d $'\0' file; do
   # remove white space at end of (non-empty) file
   # note: ed will append final newline if missing
   printf '%s\n' H '$g/[[:space:]]\{1,\}$/s///g' wq | ed -s "${file}"
   printf "" | dd  of="${file}" seek=$(($(stat -f "%z" "${file}") - 1)) bs=1 count=1
   #printf "" | dd  of="${file}" seek=$(($(wc -c < "${file}") - 1)) bs=1 count=1
done < <(find -x "/path/to/dir" -type f -not -empty -print0)

Answer 7

版本2. Linux语法。正确的命令。

find /directory/you/want -type f | \ 
xargs --verbose -L 1 sed -n --in-place -r \
':loop;/[^[:space:]\t]/ {p;b;}; N;b loop;'

版本1.删除每行末尾的空格。 FreeBSD语法。

find /directory/that/holds/your/files -type f | xargs -L 1  sed  -i '' -E 's/[:         :]+$//'

[: :]中的空格实际上由一个空格和一个制表符组成。有了空间，很容易。你只需点击空格键。要插入制表符，请按Ctrl-V，然后在shell中按Tab键。

Answer 8

只是为了好玩，这是一个简单的答案：

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    int c, bufsize = 100, ns = 0;
    char *buf = malloc(bufsize);

    while ((c = getchar()) != EOF) {
        if (isspace(c)) {
            if (ns == bufsize) buf = realloc(buf, bufsize *= 2);
            buf[ns++] = c;
        } else {
            fwrite(buf, 1, ns, stdout);
            ns = 0;
            putchar(c);
        }
    }

    free(buf);
    return 0;
}

不超过Dennis's awk solution，而且，我敢说，它更容易理解！ :-P

Answer 9

使用不man dd的{{1}}：

man ed

使用grep / sed仅在文件末尾递归删除尾随空格？

9 个答案:

fix.pl