比较unix中的文件列?

时间:2016-11-24 17:36:00

标签: python shell unix awk sed

我想将Today.txt的文件名与Main.txt进行比较。 如果匹配,则打印Main.txt匹配文件的所有6列,新文件为matched.txt。

以及与Main.txt不匹配的文件,然后在新文件中列出TODAY.txt的文件名和时间,例如unmatched.txt

注意:加号(+)表示文件来自inprogress目录,有时文件名附加" +"。

Main.txt

 date      filename          timestamp space  count   status
Nov 4    +CHCK01_20161104.txt  06:39   2.15M  17153    on_time
Nov 4    TRIPS11_20161104.txt 09:03   0.00M  24       On_Time
Nov 4    AR02_20161104.txt    09:31   0.00M  7        On_Time
Nov 4    AR01_20161104.txt    09:31   0.04M  433      On_Time

Today.txt

 filename       time
CHCK01_20161104.txt 06:03
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
AR01_20161104.txt   09:36
AR02_20161104.txt   09:36
ifs01_20161104.txt  21:16
TRIPS11_20161104.txt 09:16

所需输出: matched.txt

Nov 4    +CHCK01_20161104.txt  06:39   2.15M  17153    on_time
Nov 4    TRIPS11_20161104.txt 09:03   0.00M  24       On_Time
Nov 4    AR02_20161104.txt    09:31   0.00M  7        On_Time
Nov 4    AR01_20161104.txt    09:31   0.04M  433      On_Time

unmatched.txt

CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt  21:16

下面的命令为我提供了正确的输出,除非文件附加了加号(+)。

 awk 'FNR==1{next} 
  NR==FNR{a[$1]=$2; next} 
  $3 in a{print; delete a[$3]} 
      END{for(k in a) print k,a[k] > "unmatched"}' today main > matched

提前多多感谢!

1 个答案:

答案 0 :(得分:2)

问题是在$3 in a文件上运行时的行main。对于要匹配+的字符串值,请在gensub中的$3操作期间GNU awk使用gensubgsub优于$ awk 'FNR==1{next} NR==FNR{a[$1]=$2; next} gensub(/+/,"",1,$3) in a{print; delete a[gensub(/+/,"",1,$3)]} END{for(k in a) print k,a[k] > "unmatched"}' today main Nov 4 +CHCK01_20161104.txt 06:39 2.15M 17153 on_time Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time 的重要性在于它返回替换值而不是反映在文件上。所以将它用于你的情况

gawk

根据需要在输出中生成4行。

来自gensub(regexp, replacement, how [, target]) gensub is a general substitution function. Like sub and gsub, it searches the target string target for matches of the regular expression regexp. Unlike sub and gsub, the modified string is returned as the result of the function, and the original target string is not changed. If how is a string beginning with `g' or `G', then it replaces all matches of regexp with replacement. 手册页。

gensub(/+/,"",1,$3)

因此,在我们的情况下,+仅在字段的开头用空字符串替换第一次出现的1(因为我们将替换计数设置为awk)。这是为了避免在现场的任何其他地方进行更换。

(或)更整洁的gsub逻辑,感谢Ed Morton's建议在$3上使用$ awk 'FNR==1{next} NR==FNR{a[$1]=$2; next} {k=$3; sub(/^\+/,"",k)} k in a{print; delete a[k]} END{for(k in a) print k,a[k] > "unmatched"}' today main 并将其存储在变量上

{{1}}