惰性解决方案

Question

我有2个文件的设置：

file1.txt    and   file2.txt


A=1                  A=2
B=3                  B=3
C=5                  C=4
D=6                   .
 .                   E=7

我正在寻找将 file1.txt 的值替换为 file2.txt 的差异值的最佳方法，因此file1.txt看起来像：

file1.txt：

A=2       
B=3       
C=4       
D=6       
E=7

目前，我没有编写任何代码，但是我想到的唯一方法是编写一个bash脚本来对两个文件进行比较（作为位置参数提供），并使用 sed 替换非匹配的字符串。有这种想法：

./diffreplace.bash file1.txt file2.txt > NEWfile1.txt

我想知道是否存在更优雅的白痴？

Answer 1

以下所有解决方案都可能更改分配顺序。我以为可以。

惰性解决方案

如果您以允许覆盖的某种方式使用这些分配，则可以简单地将file2附加到file1的末尾。执行result时，所有旧值都会被新值覆盖。

cat old new > result

稍微好一点的解决方案

扩展以前的方法，您可以遍历result的行，并且对于每个变量，仅保留最后一个赋值：

cat new old |
awk -F= '{if (a[$1]!="x") {print $0; a[$1]=x}}'

替代解决方案

使用join合并两个文件，然后使用cut从第一个文件中过滤掉值。对文件排序后，使用

join -t= -a1 -a2 new old | cut -d= -f1,2

如果没有，请使用

join -t= -a1 -a2 <(sort new) <(sort old) |
cut -d= -f1,2

Answer 2

对于您的评论，我有些困惑。文件的结构必须保持不变。排序混合了顺序，所以我假设第1行或第1行的As始终为.，以此类推：

$ awk '
BEGIN { RS="\r?\n" }     # in case of Windows line-endings
$0!="." {                # we dont store . (change it to null if you need to)
    a[FNR]=$0            # hash using line number as key
}
END {                    # after all that hashing
    for(i=1;i<=FNR;i++)  # iterate in line number order
        print a[i]       # output the last met version
}' file1 file2           # mind the file order

输出：

A=2
B=3
C=4
D=6
E=7

编辑：具有白名单的版本：

$ cat whitelist
A
B
E

脚本：

$ awk -F= '
NR==FNR {                # process the whitelist
    a[FNR]=$1            # for a key is linenumber, record as value
    b[$1]=FNR            # bor b record is key, linenumber is value
    n=FNR                # remember the count for END
    next
}                        # process file1 and file2 ... filen
($1 in b) {              # if record is found in b
    a[b[$1]]=$0          # we set the record to a[linenumber]=record
}
END {
    for(i=1;i<=n;i++)    # here we loop on linenumbers, 1 to n
        print a[i]
}' whitelist file1 file2

输出：

A=2
B=3
E=7

区分2设置文件并替换差异

2 个答案:

惰性解决方案

稍微好一点的解决方案

替代解决方案