我正在寻找一种在键级别(id)比较两个文件并在列级别显示更改的方法
file_1.txt
id|description|name|date
1|Row 1|a|2019-06-15 00:20:15:00
2|Row 2|b|2019-06-16 15:18:10:00
3|Row 3|c|2019-06-17 07:02:17:00
4|Row 4|d|2019-06-25 09:00:01:00
5|Row 5|e|2019-06-25 22:00:00:00
file_2.txt
id|description|name|date
1|Row 1|a|2019-06-15 00:20:15:00
2|Row 2|c|2019-06-16 15:18:10:00
4|Row 4|d|2019-06-25 09:00:01:00
5|ROW 5|b|2019-06-25 22:00:00:00
7|Row 7|f|2019-06-17 07:02:17:00
输出应如下所示:
1|Row 1|a|2019-06-15 00:20:15:001|Row 1|a|2019-06-15 00:20:15:00,Match
2|Row 2|c|2019-06-16 15:18:10:00|Row 2|b|2019-06-16 15:18:10:00No Match
3|Row 3|c|2019-06-17 07:02:17:00,No Match
4|Row 4|d|2019-06-25 09:00:01:004|Row 4|d|2019-06-25 09:00:01:00,Match
5|ROW 5|b|2019-06-25 22:00:00:00|Row 5|e|2019-06-25 22:00:00:00,No Match
7|Row 7|f|2019-06-17 07:02:17:00,No Match
尝试在下面使用,其中file2被用作驱动输出的驱动文件,因此它不打印并忽略id2为3的行,因为它不在file2.txt中
awk -F, 'NR==FNR{ arr[$1]=$0; next } { print $0 (arr[$1]==$0?arr[$1]",Match":arr[$1]",No Match") }' OFS=, file1.txt file2.txt
id |描述|名称|日期,匹配
1 |行1 | a | 2019-06-15 00:20:15:001 |行1 | a | 2019-06-15 00:20:15:00,比赛
2 |行2 | c | 2019-06-16 15:18:10:00,没有比赛
4 |行4 | d | 2019-06-25 09:00:01:004 |行4 | d | 2019-06-25 09:00:01:00,比赛
5 | ROW 5 | b | 2019-06-25 22:00:00:00,没有匹配结果
7 |行7 | f | 2019-06-17 07:02:17:00,没有比赛
不确定在匹配时为什么仅从文件1和文件2打印记录。
为此提供更多背景知识-我试图使用此awk命令查找黑白文件差异,然后创建一个报告,该报告基本上显示哪些列具有不同的值。理想情况下,最终输出将如下所示:
id|Change| Columns
1|No Change|NA
2|Change|name
3|Exists only in file 1|NA
4|No Change|NA
5|Change|description,name
7|Exists only in file 2|NA
非常感谢所有专家的指导,以实现这一目标。
答案 0 :(得分:0)
使用GNU awk来处理数组,gensub(),sorted_in和ARGIND:
$ cat tst.awk
BEGIN { FS=OFS="|" }
FNR==1 { next }
{ vals[$1][ARGIND] = gensub("^[^"FS"]+["FS"]","",1) }
END {
PROCINFO["sorted_in"] = "@ind_num_asc"
for (id in vals) {
print id, \
(1 in vals[id] ? vals[id][1] : "N/A"),
(2 in vals[id] ? vals[id][2] : "N/A"),
(vals[id][1] == vals[id][2] ? "" : "No ") "Match"
}
}
$ awk -f tst.awk file1 file2
1|Row 1|a|2019-06-15 00:20:15:00|Row 1|a|2019-06-15 00:20:15:00|Match
2|Row 2|b|2019-06-16 15:18:10:00|Row 2|c|2019-06-16 15:18:10:00|No Match
3|Row 3|c|2019-06-17 07:02:17:00|N/A|No Match
4|Row 4|d|2019-06-25 09:00:01:00|Row 4|d|2019-06-25 09:00:01:00|Match
5|Row 5|e|2019-06-25 22:00:00:00|ROW 5|b|2019-06-25 22:00:00:00|No Match
7|N/A|Row 7|f|2019-06-17 07:02:17:00|No Match
或者,如果您愿意:
$ awk -f tst.awk file2 file1
1|Row 1|a|2019-06-15 00:20:15:00|Row 1|a|2019-06-15 00:20:15:00|Match
2|Row 2|c|2019-06-16 15:18:10:00|Row 2|b|2019-06-16 15:18:10:00|No Match
3|N/A|Row 3|c|2019-06-17 07:02:17:00|No Match
4|Row 4|d|2019-06-25 09:00:01:00|Row 4|d|2019-06-25 09:00:01:00|Match
5|ROW 5|b|2019-06-25 22:00:00:00|Row 5|e|2019-06-25 22:00:00:00|No Match
7|Row 7|f|2019-06-17 07:02:17:00|N/A|No Match
“ N / A”将帮助您确定2个文件中的哪个没有给定ID的行。如果您还是不喜欢的话,那就按摩一下以适合自己。
更新:这是使用任何awk和排序方法的方法:
$ cat tst.awk
BEGIN { FS=OFS="|" }
FNR==1 { argind++; next }
{
id = $1
ids[id]
sub("^[^"FS"]+["FS"]","")
vals[id,argind] = $0
}
END {
for (id in ids) {
print id, \
((id,1) in vals ? vals[id,1] : "N/A"),
((id,2) in vals ? vals[id,2] : "N/A"),
(vals[id,1] == vals[id,2] ? "" : "No ") "Match"
}
}
$ awk -f tst.awk file1 file2 | sort -t'|' -k1,1n
1|Row 1|a|2019-06-15 00:20:15:00|Row 1|a|2019-06-15 00:20:15:00|Match
2|Row 2|b|2019-06-16 15:18:10:00|Row 2|c|2019-06-16 15:18:10:00|No Match
3|Row 3|c|2019-06-17 07:02:17:00|N/A|No Match
4|Row 4|d|2019-06-25 09:00:01:00|Row 4|d|2019-06-25 09:00:01:00|Match
5|Row 5|e|2019-06-25 22:00:00:00|ROW 5|b|2019-06-25 22:00:00:00|No Match
7|N/A|Row 7|f|2019-06-17 07:02:17:00|No Match
答案 1 :(得分:0)
阅读您的请求,可以轻松完成整个任务,而不是中间步骤。
这是执行最终任务的awk
脚本。
script.awk
BEGIN {FS = OFS = "|"; f[2]="descr"; f[3] = "name"; f[4] = "date "}
FNR == NR { # read first input file
lines[$1] = $0;
next;
}
{ # read scond input file
if ($1 in lines) { # index exist in file 1
if ($0 == lines[$1]) { # compare indexed lines
print $1, "Same", "NA";
} else { # indexed lines differ
split(lines[$1], file1Fields); # read all fields from file 1 line
unmatchedFields = "";
for (m = 2; m <= 4; m++) {
if (file1Fields[m] != $m) { # compare each field
fieldsSeparator = length(unmatchedFields) ? "," : "";
unmatchedFields = unmatchedFields fieldsSeparator f[m];
}
}
print $1, "change", unmatchedFields;
}
delete lines[$1]; # clean handled lines from file1
} else { # index not seen in file 1, it is only in file 2
print $1, "only in file 2", "NA";
}
}
END {
for (j in lines) { # index only in file 1
print j, "only in file 1", "NA";
}
}
input.1.txt
id|description|name|date
1|Row 1|a|2019-06-15 00:20:15:00
2|Row 2|b|2019-06-16 15:18:10:00
3|Row 3|c|2019-06-17 07:02:17:00
4|Row 4|d|2019-06-25 09:00:01:00
5|Row 5|e|2019-06-25 22:00:00:00
input.2.txt
id|description|name|date
1|Row 1|a|2019-06-15 00:20:15:00
2|Row 2|c|2019-06-16 15:18:10:00
4|Row 4|d|2019-06-25 09:00:01:00
5|ROW 5|b|2019-06-25 22:00:00:00
7|Row 7|f|2019-06-17 07:02:17:00
运行:
awk -f script.awk input.1.txt input.2.txt |sort
输出:
1|Same|NA
2|change|name
3|only in file 1|NA
4|Same|NA
5|change|descr,name,date
7|only in file 2|NA
id|Same|NA