我正在尝试根据某些条件合并来自2个文本文件的数据。
我有两个文件: 1.txt的
gera077||o||emi_riv_90@hotmail.com||||200.45.113.254||o||0f8caa3ced5dc172901a427410d20540
okan1993||||killa-o@hotmail.de||||84.141.125.140||o||69c1cb5ddbc66cceebe0dddba3eddf68
Tosiunia||||tosia_19@amorki.pl||o||83.22.193.86|||||ddcbba2076646980391cb4971b8030
DREP
glen-666||o||glen-666@hotmail.com||||84.196.42.167||o||f139d8b49085d012af9048bb1cba3534
Page 1
Sheyes1 ||||summer_faerie_dustyrose@yahoo.com|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
.
BenPhynix||||BenPhynix@aol.de||||| 62.226.181.57||||11dea24f1caebb012e11285579050f38
menopause |||totoche@wanadoo.fr||o||83.193.209.52||o||d7ca4d78fc79a795695ae1c161ce82ea
jonof.|o||joflem@medi3.no||o||213.161.242.106||o||239f33743e4a070b728d4dcbd1091f1a
2.txt
f139d8b49085d012af9048bb1cba3534: 12883 @: "#
d7ca4d78fc79a795695ae1c161ce82ea: 123422
0f8caa3ced5dc172901a427410d20540 :: demo
包含1.txt中的匹配行,hash替换为2.txt
中的对应值的Result.txt
gera077 || o || emi_riv_90@hotmail.com || or || 200.45.113.254 || o ||: demo
glen-666-||glen-666@hotmail.com||||84.196.42.167||||12883 @: "#
menopause |||totoche@wanadoo.fr||o||83.193.209.52||o||123422
包含1.txt
中不匹配的行left.txt
okan1993||||killa-o@hotmail.de||||84.141.125.140||o||69c1cb5ddbc66cceebe0dddba3eddf68
Tosiunia||||tosia_19@amorki.pl||o||83.22.193.86|||||ddcbba2076646980391cb4971b8030
DREP
Page 1
Sheyes1 ||||summer_faerie_dustyrose@yahoo.com|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
.
BenPhynix||||BenPhynix@aol.de||||| 62.226.181.57||||11dea24f1caebb012e11285579050f38
jonof.|o||joflem@medi3.no||o||213.161.242.106||o||239f33743e4a070b728d4dcbd1091f1a
我正在尝试的脚本是:
awk -v s1="||o||" '
FNR==NR{
a[$9]=$1 s1 $5;
b[$9]=$13 s1 $17 s1 $21;
c[$9]=$0;
next
}
($1 in a){
val=$1;
$1="";
sub(/:/,"");
print a[val] s1 $0 s1 b[val];
d[val]=$0;
next
}
END{
for(i in d){
delete c[i]
};
for(j in c){
print c[j] > "left.txt"
}}
' FS="|" 1.txt FS=":" OFS=":" 2.txt > result.txt
但是它给了我空的result.txt
我在调试问题时遇到了困难。 任何帮助都将受到高度赞赏。
答案 0 :(得分:2)
请尝试关注awk
(完全基于您显示的输入文件,并考虑到您的2.txt上也没有任何重复项)并告诉我这是否对您有帮助。
awk 'FNR==NR{a[$NF]=$0;next} $1~/:/{sub(/:/,"",$1);flag=1} ($1 in a){val=$1;if($0 ~ /:/ && !flag){sub(/[^:]*/,"");sub(/:/,"")};print a[val] OFS $0 > "result.txt";flag="";delete a[val]} END{for(i in a){print a[i]>"left.txt"}}' FS="|" 1.txt FS=" " OFS="||o||" 2.txt
输出将是2个名为results.txt
和left.txt
的文件。很快就会为上面的代码添加非单一的衬垫形式和解释。
现在也添加非单线形式的解决方案。
awk '
FNR==NR{ ##FNR and NR both are awk out of the box variables and they denote line numbers in Input_file(s), difference between them is FNR value will be RESET when it complete reading 1 Input_file and NR value will be keep increasing till it completes reading all the Input_file(s).
a[$NF]=$0; ##Creating an array named a whose index is $NF(value of last field of current line) and value is current line.
next ##next is awk out of the box keyword which will skip all further statements now.
}
$1~/:/{ ##Checking condition here if current lines 1st field has a colon in it then do following:
sub(/:/,"",$1); ##Using sub function of awk which will substitute colon with NULL of 1st field of current line of current Input_file.
flag=1 ##Setting a variable named flag here(basically to make sure that 1st colon is substituted so need for another colon removal.
}
($1 in a){ ##Checking a condition here if current line $1 is present in array a then do following:
val=$1; ##Setting variable named val value to $1 here.
if($0 ~ /:/ && !flag){ ##Checking condition here if current line is having colon and variable flag is NOT NULL then do following:
sub(/[^:]*/,""); ##Substituting all the values from starting to till colon comes with NULL.
sub(/:/,"")}; ##Then substituting only 1 colon here.
print a[val] OFS $0 > "result.txt"; ##printing the value of array a whose index is variable val OFS(output field separator) current line values to output file named results.txt here.
flag=""; ##Unsetting the value of variable flag here.
delete a[val] ##Deleting the value of array a whose index is variable val here.
}
END{ ##Starting end section of this awk program here. which will be executed once all Input_file(s) have been read.
for(i in a){ ##Traversing through the array a now.
print a[i]>"left.txt"} ##Printing the value of array a(which will basically provide those values which are NOT matched in both files) in left.txt file.
}
' FS="|" 1.txt FS=" " OFS="||o||" 2.txt ##Setting FS="|" for 1.txt Input_file and then setting FS=" " and OFS="||o||" for 2.txt Input_file, 1.txt and 2.txt are Input_files for this program to run.
答案 1 :(得分:0)
这个awk
脚本也可以提供帮助。
$ awk 'BEGIN{FS="\|";OFS="|"}NR==FNR{data[$1]=$2;}
NR!=FNR{if($NF in data){
$NF=data[$NF];print >"result.txt"
}else{
print >"left.txt"}
}' <( sed 's/\s*:\s*/|/' 2.txt) 1.txt 2>/dev/null
<强>输出强>
$ cat result.txt
gera077||o||emi_riv_90@hotmail.com||||200.45.113.254||o||: demo
glen-666||o||glen-666@hotmail.com||||84.196.42.167||o||12883 @: "#
menopause |||totoche@wanadoo.fr||o||83.193.209.52||o||123422
$ cat left.txt
okan1993||||killa-o@hotmail.de||||84.141.125.140||o||69c1cb5ddbc66cceebe0dddba3eddf68
Tosiunia||||tosia_19@amorki.pl||o||83.22.193.86|||||ddcbba2076646980391cb4971b8030
DREP
Page 1
Sheyes1 ||||summer_faerie_dustyrose@yahoo.com|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
.
BenPhynix||||BenPhynix@aol.de||||| 62.226.181.57||||11dea24f1caebb012e11285579050f38
jonof.|o||joflem@medi3.no||o||213.161.242.106||o||239f33743e4a070b728d4dcbd1091f1a
我们已经预处理了第一个文件 - 使用sed
- 来设置字段分隔符|
并使用流程替换将结果传递给awk
。