比较两个文件并将缺少的值添加到文件中

时间:2015-09-24 10:17:11

标签: linux bash awk sed scripting

我有一个大文件(file_new.txt),其中1组属性及其值即将出现次数。现在在某个集合中,与一个示例文件(sample.txt)属性相比,某些属性及其值会丢失。

Sample.txt的

apple = 0
black = 0
cat = 0
dog = 0
elephant = 0

file_next.txt

apple = 6
black = 7
elephant = 8
==============
apple=9
cat = 10
elephant =11

我在这里寻求输出如下(sample.txt中缺少的属性应该在file_new.txt中添加,值为零)

file_output.txt

apple = 6
black = 7
cat = 0
dog = 0
elephant = 8
=============
apple = 9
black = 0
cat = 10
dog = 0
elephant = 11

注意=第一个和最后一个属性值是永久性的(这里是苹果和大象)

由于

2 个答案:

答案 0 :(得分:1)

$ cat tst.awk
BEGIN   { FS="[[:space:]]*=[[:space:]]s*"; OFS=" = " }
NR==FNR { names[++numNames] = $1; dflt[$1] = $2; next }
/^=+$/  { prtRec(); print }
{ curr[$1] = $2 }
END { prtRec() }

function prtRec() {
    for (nameNr=1; nameNr<=numNames; nameNr++) {
        name = names[nameNr]
        print name, (name in curr ? curr[name] : dflt[name])
    }
    delete curr
}

$ awk -f tst.awk sample.txt file_next.txt
apple = 6
black = 7
cat = 0
dog = 0
elephant = 8
==============
apple = 9
black = 0
cat = 10
dog = 0
elephant = 11

或者如果你不关心每个输出记录中行的顺序,它甚至更简单:

$ cat tst2.awk
BEGIN   { FS="[[:space:]]*=[[:space:]]*"; OFS=" = " }
NR==FNR { dflt[$1] = $2; next }
/^=+$/  { prtRec(); print }
{ curr[$1] = $2 }
END { prtRec() }

function prtRec() {
    for (name in dflt) {
        print name, (name in curr ? curr[name] : dflt[name])
    }
    delete curr
}

$ awk -f tst2.awk sample.txt file_next.txt
apple = 6
elephant = 8
cat = 0
black = 7
dog = 0
==============
apple = 9
elephant = 11
cat = 10
black = 0
dog = 0

答案 1 :(得分:0)

awk -F '[[:blank:]]*=[[:blank:]]*' '
   function Feed() {
      for( Key in ToAdd){
         if( ToAdd[ Key] == 1) print Sample[ Key]
          else ToAdd[ Key] = 1
         }
      return
      }
   FNR == NR { Sample[$1]=$0;ToAdd[$1]=1}
   FNR != NR && $0 !~ /^=====/ { ToAdd[ $1]=0; print }
   $0 ~ /^=====/ { Feed(); print }
   END { Feed() }
   ' Sample.txt file_new.txt

使用:

  • 用于数据的关联数组和用于打印或提醒打印的数据的计数器
  • 函数避免两次相同的代码(=====之前和之后)

文件顺序是强制性的