Question

我想从文件中删除所有空行，但只有当它们位于文件的结尾/开头时（也就是说，如果它们之前没有非空行，则在开始时;和如果它们之后没有非空行，那么最后。）

这是否可能在Perl或Ruby等全功能脚本语言之外？如果可能，我希望sed或awk执行此操作。基本上，任何轻量级和广泛使用的UNIX-y工具都可以，特别是我可以快速了解更多（Perl，因此，不包括在内）。

Answer 1

来自 Useful one-line scripts for sed ：

# Delete all leading blank lines at top of file (only).
sed '/./,$!d' file

# Delete all trailing blank lines at end of file (only).
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file

因此，要从文件中删除前导和尾随空白行，可以将上述命令合并到：

sed -e :a -e '/./,$!d;/^\n*$/{$d;N;};/\n$/ba' file

Answer 2

所以我要借用@ dogbane的部分答案，因为删除前导空行的sed行太短了......

tac is part of coreutils，并撤消文件。所以做两次：

tac file | sed -e '/./,$!d' | tac | sed -e '/./,$!d'

它当然不是最有效的，但除非你需要效率，否则我发现它比其他所有内容都更具可读性。

Answer 3

这是awk中的一次性解决方案：它在看到非空行之前不会开始打印，当它看到一个空行时，它会记住它直到下一个非空行

awk '
    /[[:graph:]]/ {
        # a non-empty line
        # set the flag to begin printing lines
        p=1      
        # print the accumulated "interior" empty lines 
        for (i=1; i<=n; i++) print ""
        n=0
        # then print this line
        print
    }
    p && /^[[:space:]]*$/ {
        # a potentially "interior" empty line. remember it.
        n++
    }
' filename

注意，由于我用来考虑空/非空行（使用[[:graph:]]和/^[[:space:]]*$/）的机制，只有空格的内部行将被截断为真正的空。< / p>

Answer 4

使用awk：

awk '{a[NR]=$0;if($0 && !s)s=NR;}
    END{e=NR;
        for(i=NR;i>1;i--) 
            if(a[i]){ e=i; break; } 
        for(i=s;i<=e;i++)
            print a[i];}' yourFile

Answer 5

如another answer，tac is part of coreutils中所述，并撤消文件。结合使用the fact that command substitution will strip trailing new lines两次的想法，我们得到

echo "$(echo "$(tac "$filename")" | tac)"

并不依赖sed。您可以使用echo -n删除剩余的尾随换行符。

Answer 6

这是一个改编的sed版本，它也考虑了＃34;空＆＃34;那些只有空格和标签的行。

sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'

它基本上是接受的答案版本（考虑到BryanH评论），但第一个命令中的点.已更改为[^[:blank:]]（任何不空白）和{{1}在第二个命令地址内部更改为\n以允许换行符，为选项卡添加空格。

替代版本，不使用POSIX类，但您的sed必须支持在[[:space:]]中插入\t和\n。 GNU sed，BSD sed没有。

[…]

测试：

sed -e :a -e '/[^\t ]/,$!d; /^[\n\t ]*$/{ $d; N; ba' -e '}'

Answer 7

使用bash

$ filecontent=$(<file)
$ echo "${filecontent/$'\n'}"

Answer 8

在bash中，使用cat，wc，grep，sed，tail和head：

# number of first line that contains non-empty character
i=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | head -1`
# number of hte last one
j=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | tail -1`
# overall number of lines:
k=`cat <your_file> | wc -l`
# how much empty lines at the end of file we have?
m=$(($k-$j))
# let strip last m lines!
cat <your_file> | head -n-$m
# now we have to strip first i lines and we are done 8-)
cat <your_file> | tail -n+$i

男人，绝对值得学习“真正的”编程语言来避免这种丑陋！

Answer 9

对于尾随换行符的有效非递归版本（包括“白色”字符），我已经开发了这个sed脚本。

sed -n '/^[[:space:]]*$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^[[:space:]]*$/H'

它使用保持缓冲区存储所有空白行，并在找到非空行后才打印它们。如果有人只想要新行，那么就足以摆脱两个[[:space:]]*部分：

sed -n '/^$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^$/H'

我尝试过与众所周知的递归脚本进行简单的性能比较

sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba'

在3MB文件上，随机base64文本周围有1MB随机空白行。

shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M > bigfile
base64 </dev/urandom | dd bs=1 count=1M >> bigfile
shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M >> bigfile

流媒体脚本花了大约0.5秒完成，递归没有在15分钟后结束。胜利:)）

为了完整起见，解决sed脚本的主要线路已经流畅了。使用最适合你。

sed '/[^[:blank:]]/,$!d'
sed '/./,$!d'

Answer 10

因为我正在编写一个包含一些函数的 bash 脚本，所以我发现编写这些函数很方便：

function strip_leading_empty_lines()
{
    while read line; do
        if [ -n "$line" ]; then
            echo "$line"
            break
        fi
    done
    cat
}

function strip_trailing_empty_lines()
{
    acc=""
    while read line; do
        acc+="$line"$'\n'
        if [ -n "$line" ]; then
            echo -n "$acc"
            acc=""
        fi
    done
}

Answer 11

使用sed cumsum()选项可以轻松解决此问题

-z

Answer 12

这是awk版本，可删除尾随的空白行（空行和仅由空格组成的行）。

它可以提高内存效率；它不会将整个文件读入内存。

config: 'false'

awk '/^[[:space:]]*$/ {b=b $0 "\n"; next;} {printf "%s",b; b=""; print;}'变量缓冲空白行；当遇到非空白行时，它们将被打印。遇到EOF时，不会打印它们。就是这样。

如果使用gnu awk，b可以替换为[[:space:]]。（请参阅gawk-specific Regexp Operators的完整列表。）

如果只想删除那些为空的行尾，请参阅@AndyMortimer的答案。

Answer 13

perl -0pe 's/^\n+|\n+(\n)$/\1/gs'

Answer 14

此AWK脚本可以解决问题：

BEGIN {
    ne=0;
}

/^[[:space:]]*$/ {
    ne++;
}

/[^[:space:]]+/ {
    for(i=0; i < ne; i++)
        print "";
    ne=0;
    print
}

这个想法很简单：空行不会立即得到回应。取而代之的是，我们等到获得非空行，然后才回显出之前看到的尽可能多的空行，然后才回显新的非空行。

Answer 15

@dogbane有一个很好的简单答案，用于删除前导空行。这是一个简单的awk命令，只删除尾随行。使用@ dogbane＆＃seff命令删除前导空格和尾随空格。

awk '{ LINES=LINES $0 "\n"; } /./ { printf "%s", LINES; LINES=""; }'

这在操作上非常简单。

在我们阅读时将每一行添加到缓冲区。
对于包含字符的每一行，打印缓冲区的内容然后将其清除。

因此，唯一可以缓冲且永不显示的是任何尾随空白。

我使用printf而不是print来避免自动添加换行符，因为我已经使用换行符来分隔缓冲区中的行。

Answer 16

我想为 gawk v4.1 +

介绍另一种变体

result=($(gawk '
    BEGIN {
        lines_count         = 0;
        empty_lines_in_head = 0;
        empty_lines_in_tail = 0;
    }
    /[^[:space:]]/ {
        found_not_empty_line = 1;
        empty_lines_in_tail  = 0;
    }
    /^[[:space:]]*?$/ {
        if ( found_not_empty_line ) {
            empty_lines_in_tail ++;
        } else {
            empty_lines_in_head ++;
        }
    }
    {
        lines_count ++;
    }
    END {
        print (empty_lines_in_head " " empty_lines_in_tail " " lines_count);
    }
' "$file"))

empty_lines_in_head=${result[0]}
empty_lines_in_tail=${result[1]}
lines_count=${result[2]}

if [ $empty_lines_in_head -gt 0 ] || [ $empty_lines_in_tail -gt 0 ]; then
    echo "Removing whitespace from \"$file\""
    eval "gawk -i inplace '
        {
            if ( NR > $empty_lines_in_head && NR <= $(($lines_count - $empty_lines_in_tail)) ) {
                print
            }
        }
    ' \"$file\""
fi

Answer 17

bash解决方案。

注意：只有在文件足够小时才有用才能一次读入内存。

[[ $(<file) =~ ^$'\n'*(.*)$ ]] && echo "${BASH_REMATCH[1]}"

$(<file)读取整个文件并修剪尾随换行符，因为命令替换（$(....)）隐式会这样做。

=~是bash的正则表达式匹配运算符，=~ ^$'\n'*(.*)$可选地匹配任何前导换行符（贪婪），以及抓住后来发生的一切。请注意可能令人困惑的$'\n'，它使用ANSI C quoting插入文字换行符，因为不支持转义序列\n。

请注意，此特定正则表达式始终匹配，因此&&之后的命令始终执行。

特殊数组变量BASH_REMATCH rematch包含最新正则表达式匹配的结果，数组元素[1]包含捕获的（第一个也是唯一的）带括号的子表达式（捕获组）的内容，即输入剥离任何前导换行符的字符串。实际效果是${BASH_REMATCH[1]}包含输入文件内容，前导和后续换行都被剥离。

请注意，使用echo打印会添加一个尾随换行符。如果您想避免这种情况，请改用echo -n（或使用更具可移植性的printf '%s'）。

使用sed，awk，tr和friends删除尾随/开始换行符

17 个答案: