Question

我的朋友最近问过如何比较linux中的两个文件夹，然后对任何不同的文本文件运行meld。我正在慢慢接受将多个粒度实用程序组合在一起的linux哲学，并且我将以下解决方案放在一起。我的问题是，我怎么能改进这个脚本。似乎有很多冗余，我很乐意学习更好的脚本unix方法。

#!/bin/bash

dir1=$1
dir2=$2

# show files that are different only
cmd="diff -rq $dir1 $dir2"
eval $cmd # print this out to the user too
filenames_str=`$cmd`

# remove lines that represent only one file, keep lines that have
# files in both dirs, but are just different
tmp1=`echo "$filenames_str" | sed -n '/ differ$/p'` 

# grab just the first filename for the lines of output
tmp2=`echo "$tmp1" | awk '{ print $2 }'`

# convert newlines sep to space
fs=$(echo "$tmp2") 

# convert string to array
fa=($fs) 

for file in "${fa[@]}"
do
    # drop first directory in path to get relative filename
    rel=`echo $file | sed "s#${dir1}/##"`

    # determine the type of file
    file_type=`file -i $file | awk '{print $2}' | awk -F"/" '{print $1}'`

    # if it's a text file send it to meld
    if [ $file_type == "text" ]
    then
        # throw out error messages with &> /dev/null
        meld $dir1/$rel $dir2/$rel &> /dev/null
    fi 
done

请保留/提高您的答案的可读性。答案较短但难以理解，不符合答案。

Answer 1

这是一个老问题，但让我们在它上面做一些有趣的事情，不考虑最终目标（可能是SCM），也不考虑已经以更好的方式做到这一点的工具。让我们专注于剧本本身。

在OP的脚本中，bash中有很多字符串处理，使用sed和awk等工具，有时在同一命令行中或在执行n次的循环内不止一次（每个文件一个）。

没关系，但有必要记住：

每次脚本调用其中任何一个程序时，它都会在操作系统中创建一个新进程，这在时间和资源上都很昂贵。因此，调用的程序越少，正在执行的脚本的性能就越好：
- diff 2次（1只是打印给用户）
- sed一次处理diff结果，每个文件处理一次
- awk 1次处理sed结果，每次处理2次（处理file结果）
- file每个文件一次
这不适用于echo，read，test以及其他内置命令bash，因此不会执行任何外部程序。
< / LI>
meld是将向用户显示文件的最终命令，因此不计算在内。
即使使用内置命令，重定向管道|也有成本，因为shell必须创建管道，重复句柄，甚至可能创建shell的分支（这是一个进程本身）。再说一次：越少越好。
diff命令的消息是语言环境依赖项，因此如果系统不是英语，则整个脚本将无效。

想一想，让我们清理一下原始脚本，保留OP的逻辑：

#!/bin/bash

dir1=$1
dir2=$2

# Set english as current language
LANG=en_US.UTF-8
# (1) show files that are different only
diff -rq $dir1 $dir2 | 
    # (2) remove lines that represent only one file, keep lines that have
    # files in both dirs, but are just different, delete all but left filename
    sed '/ differ$/!d; s/^Files //; s/ and .*//' |
    # (3) determine the type of file
    file -i -f - | 
    # (4) for each file
    while IFS=":" read file file_type
    do
        # (5) drop first directory in path to get relative filename
        rel=${file#$dir1}
        # (6) if it's a text file send it to meld
        if [[ "$file_type" =~ "text/" ]]
        then
            # throw out error messages with &> /dev/null
            meld ${dir1}${rel} ${dir2}${rel} &> /dev/null
        fi 
    done

一点解释：

独特的命令链cmd1 | cmd2 | ...，其中前一个输出（stdout）是下一个输入（stdin）。
执行sed一次，在;输出中执行3次操作（以diff分隔）：
- 删除以" differ"
- 在剩余行的开头删除"Files "
- 从" and "删除到剩余行的末尾
执行命令file一次以处理stdin中的文件列表（选项-f -）
使用while bash句子为:的每一行读取stdin分隔的两个值。
使用bash变量替换从变量中提取文件名
使用bash测试将文件类型与正则表达式进行比较

为清楚起见，我没有考虑到文件和目录名称可能有空格。在这种情况下，两个脚本都将失败。为了避免这种情况，必须在双引号中包含对file / dir name变量的任何引用。

我没有使用awk，因为它足够强大，几乎可以替换整个脚本; - ）

如何在linux中使winmerge等效

1 个答案: