Question

我有一个文件'tbook1'，有很多数值（+ 2M）。我必须在bash（Solaris / RHEL）中执行以下操作：

Do following:
Remove 1st and last 2 lines
Remove (,") & (")
Substitute (, ) with (,)

我可以使用两种方法来实现：

Method1:
sed -e 1d -e 's/,"//g' -e 's/, /,/g' -e 's/"//g' -e 'N;$!P;$!D;$d' tbook1 > tbook1.3

method2:
tail -n +2 tbook1 | head -n -2 > tbook1.1
sed -e 's/,"//' -e 's/, //' tbook 1.1 > tbook1.2

我想知道哪一个更好，即更快更好有效（资源使用）？

Answer 1

方法1通常会更有效率，主要是因为方法2的额外管道和中间文件被读取和写入..

Answer 2

方法一只扫描文件一次并写入1个结果（但请将结果存储在具有不同名称的文件中）方法二2扫描原始文件和中间结果，并写入中间和最终结果。它必然会慢两倍。

Answer 3

我认为head和tail对于此行消除任务比纯sed更有效。但另外两个答案也是正确的。你应该避免多次通过。

您可以通过将第二种方法链接在一起来改进第二种方法：

tail -n +2 book.txt | head -n -2 | sed -e 's/,"//' -e 's/, //'

然后head和tail更快。自己尝试（在合理大小的文件上）：

#!/usr/bin/env bash

target=/dev/null

test(){
        mode=$1
        start=$(date +%s)
        if   [ $mode == 1 ]; then
                sed -e 1d -e 's/,"//g' -e 's/, /,/g' -e 's/"//g' -e 'N;$!P;$!D;$d' book.txt > $target
        elif [ $mode == 2 ]; then
                tail -n +2 book.txt | head -n -2 | sed -e 's/,"//' -e 's/, //' > $target
        else
                cat book.txt > /dev/null
        fi

        ((time = $(date +%s) - $start))
        echo $time "seconds"
}

echo "cat > /dev/null"
test 0

echo "sed > $target"
test 1

echo "tail/head > $target"
test 2

我的结果：

cat > /dev/null
0 seconds

sed > /dev/null
5 seconds

tail/head > /dev/null
3 seconds

哪两个在文件操作方面更好？

3 个答案: