Question

我正在编写一个bash脚本，除其他外，还会比较两个管道分隔值文件$ OLDFILE和$ NEWFILE。

我已成功将$ NEWFILE中的任何记录附加到$ OLDFILE，并带有以下awk语句：

awk -F "|" 'NR==FNR{a[$4]++}!a[$4]' $OLDFILE $NEWFILE >> $OLDFILE

但是，我还想删除第一次运行上面的$ NEWFILE中$ AUDDFILE中的任何记录。我希望我可以通过以下方式实现这一目标：

awk -F "|" 'NR==FNR{a[$4]++}a[$4]' $OLDFILE $NEWFILE > $OLDFILE

我认为这会将$ OLDFILE与$ NEWFILE进行比较并仅使用匹配的行覆盖$ OLDFILE，但是awk会将输出附加到$ OLDFILE而不是覆盖它。

我错过了什么？

如果有人提出建议，我愿意采取更好的方式。

Answer 1

如果已知两个文件中的字段顺序相同，并且已知两个文件的排序方式相同，请使用comm（如果文件不已知要进行排序，然后使用sort进行一些预处理将解决它。）

comm -1 -3 oldfile newfile

这将列出仅出现在newfile中的行。

comm -1 -2 oldfile newfile

这将列出仅出现在两个文件中的行。

现在一起

cat <(comm -1 -2 oldfile newfile) <(comm -1 -3 oldfile newfile) > combined

combined现在包含仅出现在newfile中的行以及出现在oldfile中的行，这些行也在newfile中。

注意：这与仅仅说comm -1 oldfile newfile大致相同，但没有任何有趣的缩进。

不幸的是，您不能直接写回oldfile，因为它可能会在读取之前被截断。完成后只需mv -f combined oldfile。

Answer 2

感谢大家的意见。正如@Sorpigal所建议的那样，我最终能够通过我的初始方法和使用comm的混合来实现这一目标。这是我的后代解决方案。

# This appends new entries from $NEWFILE to the end of $OLDFILE
awk -F "|" 'NR==FNR{a[$4]++}!a[$4]' $OLDFILE $NEWFILE >> $OLDFILE

# This pulls out entries that are NOT in $NEWFILE but are in 
# $OLDFILE and should be deleted. It then outputs the entries to be 
# deleted to the $OUTFILE.
awk -F "|" 'NR==FNR{a[$4]++}!a[$4]' $NEWFILE $OLDFILE > $OUTFILE

# This line will effectively delete any lines that are in both 
# $OUTFILE and $OLDFILE, thus finally deleting any records not in
# $NEWFILE.
comm -3 <(sort $OUTFILE) <(sort $OLDFILE) > combined.csv

再次感谢所有看过这个的人，特别是@Sorpigal !!

将DSV与awk进行比较并通过覆盖输入文件来删除差异

2 个答案: