我想根据每个文件的第一个字段$1
比较两个文件。
然后从两个文件中填充匹配行 - (在Aug.csv和Sep.csv中可用)并打印最后一个字段备注为"匹配"
来自Aug.csv的非匹配行 - (可在Aug.csv中在Sep.csv中不可用)并且找不到打印(即" NOT")类似于No of fields的5倍( $ NF) 在Sep.csv文件中" NOT,NOT,NOT,NOT,NOT"并打印最后一个字段备注为"不在Sep.csv"或FILENAME
来自Sep.csv的非匹配行 - (在Sep.csv中可用,在Aug.csv中不可用)和未找到的打印(即" NOT")4倍相当于字段数( $ NF) 在Aug.csv文件中" NOT,NOT,NOT,NOT"并打印最后提交的备注为"不在Aug.csv"或FILENAME
Aug.csv
Name,Age,Place,Des
aaa,40,xxx,Aug
aaa,20,yyy,Aug
ccc,35,xxx,Aug
Sep.csv
Name,Age,Place,Edu,Des
aaa,50,zzz,eee,Sep
bbb,30,xxx,yyy,Sep
aaa,60,yyy,fff,Sep
bbb,50,yyy,fff,Sep
预期的Output.csv
Name,Age,Place,Des,Name,Age,Place,Edu,Des,Remarks
aaa,40,xxx,Aug,aaa,50,zzz,eee,Sep,Matched
aaa,40,xxx,Aug,aaa,60,yyy,fff,Sep,Matched
aaa,20,yyy,Aug,aaa,50,zzz,eee,Sep,Matched
aaa,20,yyy,Aug,aaa,60,yyy,fff,Sep,Matched
NOT,NOT,NOT,NOT,bbb,30,xxx,yyy,Sep,Not in Aug.csv
NOT,NOT,NOT,NOT,bbb,50,yyy,fff,Sep,Not in Aug.csv
ccc,35,xxx,Aug,NOT,NOT,NOT,NOT,NOT,Not in Sep.csv
我在下面尝试了两个命令来获得所需的输出但是没有成功
第一个命令:
awk -v first="NOT,NOT,NOT,NOT" -v second="NOT,NOT,NOT,NOT,NOT" -F"," 'NR==FNR{a[$1]=$0;next}{if (a[$1])print a[$1],$0,"Matched";else print first, $0,"Not in Aug.csv";}' OFS="," Aug.csv Sep.csv >Output.csv
第二个命令:
awk -v first="NOT,NOT,NOT,NOT" -v second="NOT,NOT,NOT,NOT,NOT" -F"," 'NR==FNR{a[$1]=$0;next} !($1 in a) {print $0,second,"Not in Sep.csv";}' OFS="," Sep.csv Aug.csv >>Output.csv
从上面的命令
获得了以下的Output.csvName,Age,Place,Des,Name,Age,Place,Edu,Des,Matched
aaa,20,yyy,Aug,aaa,50,zzz,eee,Sep,Matched
aaa,20,yyy,Aug,aaa,60,yyy,fff,Sep,Matched
NOT,NOT,NOT,NOT,bbb,30,xxx,yyy,Sep,Not in Aug.csv
NOT,NOT,NOT,NOT,bbb,50,yyy,fff,Sep,Not in Aug.csv
ccc,35,xxx,Aug,NOT,NOT,NOT,NOT,NOT,Not in Sep.csv
在这里,我错过了预期输出中的以下两个匹配行(Aug.csv)。请告知如何处理这个...似乎它忽略了重复的条目
aaa,40,xxx,Aug,aaa,50,zzz,eee,Sep,Matched
aaa,40,xxx,Aug,aaa,60,yyy,fff,Sep,Matched
想知道这是一个动态变量"$first"
和" $second"
(即awk -v first="NOT,NOT,NOT,NOT" -v second="NOT,NOT,NOT,NOT,NOT"
)基于Aug.csv&中可用的字段/标题的数量。 Sep.csv
因为在原始文件中包含更多字段,并且每次都有10个字段,15个字段等变化...不想输入10次" NOT"手动
或者根据原始文件中的“字段数”,是否有任何方法REPEAT
在打印"FS"
时起作用。
这样我的输出格式将低于
预期的Output.csv
Name,Age,Place,Des,Name,Age,Place,Edu,Des,Remarks
aaa,40,xxx,Aug,aaa,50,zzz,eee,Sep,Matched
aaa,40,xxx,Aug,aaa,60,yyy,fff,Sep,Matched
aaa,20,yyy,Aug,aaa,50,zzz,eee,Sep,Matched
aaa,20,yyy,Aug,aaa,60,yyy,fff,Sep,Matched
,,,,bbb,30,xxx,yyy,Sep,Not in Aug.csv
,,,,bbb,50,yyy,fff,Sep,Not in Aug.csv
ccc,35,xxx,Aug,,,,,,Not in Sep.csv
请告知,寻找你的建议......
答案 0 :(得分:2)
复杂的GNU awk 解决方案:
compare.awk 脚本:
@Component({
selector: 'app-course',
templateUrl: './course.component.html',
styleUrls: ['./course.component.css'],
styles:[
`
`],
})
用法:
function prNot(n) {
r=s="NOT"; while(--n) r=r FS s;
return r
}
BEGIN{ FS=OFS="," }
NR==FNR{
if (NR==1) {
sep_nf=NF; sep_fn=FILENAME; h=$0
} else {
sep[$1][++c]=$2;
for(i=3;i<=NF;i++){ sep[$1][c]=sep[$1][c] FS $i }
}
next
}
FNR==1{
aug_nf=NF; aug_fn=FILENAME; print $0,h,"Remarks"; next
}
$1 in sep{ matched[$1]; for(i in sep[$1]) print $0,$1,sep[$1][i],"Matched" }
!($1 in sep){ print $0,prNot(sep_nf),"Not in "sep_fn }
END{
for(i in sep)
if (!(i in matched)) {
for(j in sep[i]) print prNot(aug_nf),i,sep[i][j],"Not in "aug_fn
}
}
输出:
awk -f compare.awk Sep.csv Aug.csv
答案 1 :(得分:2)
使用GNU awk实现真正的多维数组:
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 {
for (i=1; i<=NF; i++) {
nots[ARGIND] = (i>1 ? nots[ARGIND] OFS : "") "NOT"
}
}
NR==FNR {
file1[$1][++cnt[$1]] = $0
next
}
{
file2[$1]
if ($1 in file1) {
for (num in file1[$1]) {
print file1[$1][num], $0, (FNR>1 ? "Matched" : "Remarks")
}
}
else {
print nots[1], $0, "Not in " ARGV[1]
}
}
END {
for (name in file1) {
if ( !(name in file2) ) {
for (num in file1[name]) {
print file1[name][num], nots[2], "Not in " ARGV[2]
}
}
}
}
$ awk -f tst.awk Aug.csv Sep.csv
Name,Age,Place,Des,Name,Age,Place,Edu,Des,Remarks
aaa,40,xxx,Aug,aaa,50,zzz,eee,Sep,Matched
aaa,20,yyy,Aug,aaa,50,zzz,eee,Sep,Matched
NOT,NOT,NOT,NOT,bbb,30,xxx,yyy,Sep,Not in Aug.csv
aaa,40,xxx,Aug,aaa,60,yyy,fff,Sep,Matched
aaa,20,yyy,Aug,aaa,60,yyy,fff,Sep,Matched
NOT,NOT,NOT,NOT,bbb,50,yyy,fff,Sep,Not in Aug.csv
ccc,35,xxx,Aug,NOT,NOT,NOT,NOT,NOT,Not in Sep.csv
如果输出顺序很重要,那么有多种方法可以处理它......