在linux中使用sed或awk提取数据

时间:2018-01-28 09:47:14

标签: linux bash awk sed

我正在尝试根据某些条件合并来自2个文本文件的数据。

我有两个文件: 1.txt的

gera077||o||emi_riv_90@hotmail.com||||200.45.113.254||o||0f8caa3ced5dc172901a427410d20540
okan1993||||killa-o@hotmail.de||||84.141.125.140||o||69c1cb5ddbc66cceebe0dddba3eddf68
Tosiunia||||tosia_19@amorki.pl||o||83.22.193.86|||||ddcbba2076646980391cb4971b8030
DREP
glen-666||o||glen-666@hotmail.com||||84.196.42.167||o||f139d8b49085d012af9048bb1cba3534
Page 1
Sheyes1 ||||summer_faerie_dustyrose@yahoo.com|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
.
BenPhynix||||BenPhynix@aol.de||||| 62.226.181.57||||11dea24f1caebb012e11285579050f38
menopause |||totoche@wanadoo.fr||o||83.193.209.52||o||d7ca4d78fc79a795695ae1c161ce82ea
jonof.|o||joflem@medi3.no||o||213.161.242.106||o||239f33743e4a070b728d4dcbd1091f1a

2.txt

f139d8b49085d012af9048bb1cba3534: 12883 @: "#
d7ca4d78fc79a795695ae1c161ce82ea: 123422
0f8caa3ced5dc172901a427410d20540 :: demo

包含1.txt中的匹配行,hash替换为2.txt

中的对应值

的Result.txt

gera077 || o || emi_riv_90@hotmail.com || or || 200.45.113.254 || o ||: demo
glen-666-||glen-666@hotmail.com||||84.196.42.167||||12883 @: "#
menopause |||totoche@wanadoo.fr||o||83.193.209.52||o||123422

包含1.txt

中不匹配的行

left.txt

okan1993||||killa-o@hotmail.de||||84.141.125.140||o||69c1cb5ddbc66cceebe0dddba3eddf68
Tosiunia||||tosia_19@amorki.pl||o||83.22.193.86|||||ddcbba2076646980391cb4971b8030
DREP
Page 1
Sheyes1 ||||summer_faerie_dustyrose@yahoo.com|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
.
BenPhynix||||BenPhynix@aol.de||||| 62.226.181.57||||11dea24f1caebb012e11285579050f38
jonof.|o||joflem@medi3.no||o||213.161.242.106||o||239f33743e4a070b728d4dcbd1091f1a

我正在尝试的脚本是:

 awk -v s1="||o||" '
FNR==NR{
  a[$9]=$1 s1 $5;
  b[$9]=$13 s1 $17 s1 $21;
  c[$9]=$0;
  next
}
($1 in a){
  val=$1;
  $1="";
  sub(/:/,"");
  print a[val] s1 $0 s1 b[val];
  d[val]=$0;
  next
}
END{
for(i in d){
  delete c[i]
};
for(j in c){
  print c[j] > "left.txt"
}}
' FS="|" 1.txt FS=":" OFS=":" 2.txt > result.txt

但是它给了我空的result.txt

我在调试问题时遇到了困难。 任何帮助都将受到高度赞赏。

2 个答案:

答案 0 :(得分:2)

请尝试关注awk(完全基于您显示的输入文件,并考虑到您的2.txt上也没有任何重复项)并告诉我这是否对您有帮助。

 awk 'FNR==NR{a[$NF]=$0;next} $1~/:/{sub(/:/,"",$1);flag=1} ($1 in a){val=$1;if($0 ~ /:/ && !flag){sub(/[^:]*/,"");sub(/:/,"")};print a[val] OFS $0 > "result.txt";flag="";delete a[val]} END{for(i in a){print a[i]>"left.txt"}}' FS="|" 1.txt FS=" "  OFS="||o||" 2.txt

输出将是2个名为results.txtleft.txt的文件。很快就会为上面的代码添加非单一的衬垫形式和解释。

现在也添加非单线形式的解决方案。

awk '
FNR==NR{                                ##FNR and NR both are awk out of the box variables and they denote line numbers in Input_file(s), difference between them is FNR value will be RESET when it complete reading 1 Input_file and NR value will be keep increasing till it completes reading all the Input_file(s).
  a[$NF]=$0;                            ##Creating an array named a whose index is $NF(value of last field of current line) and value is current line.
  next                                  ##next is awk out of the box keyword which will skip all further statements now.
}
$1~/:/{                                 ##Checking condition here if current lines 1st field has a colon in it then do following:
  sub(/:/,"",$1);                       ##Using sub function of awk which will substitute colon with NULL of 1st field of current line of current Input_file.
  flag=1                                ##Setting a variable named flag here(basically to make sure that 1st colon is substituted so need for another colon removal.
}
($1 in a){                              ##Checking a condition here if current line $1 is present in array a then do following:
  val=$1;                               ##Setting variable named val value to $1 here.
  if($0 ~ /:/ && !flag){                ##Checking condition here if current line is having colon and variable flag is NOT NULL then do following:
     sub(/[^:]*/,"");                   ##Substituting all the values from starting to till colon comes with NULL.
     sub(/:/,"")};                      ##Then substituting only 1 colon here.
  print a[val] OFS $0 > "result.txt";   ##printing the value of array a whose index is variable val OFS(output field separator) current line values to output file named results.txt here.
  flag="";                              ##Unsetting the value of variable flag here.
  delete a[val]                         ##Deleting the value of array a whose index is variable val here.
}
END{                                    ##Starting end section of this awk program here. which will be executed once all Input_file(s) have been read.
  for(i in a){                          ##Traversing through the array a now.
     print a[i]>"left.txt"}             ##Printing the value of array a(which will basically provide those values which are NOT matched in both files) in left.txt file.
}
' FS="|" 1.txt FS=" " OFS="||o||" 2.txt ##Setting FS="|" for 1.txt Input_file and then setting FS=" " and OFS="||o||" for 2.txt Input_file, 1.txt and 2.txt are Input_files for this program to run.

答案 1 :(得分:0)

这个awk脚本也可以提供帮助。

$ awk 'BEGIN{FS="\|";OFS="|"}NR==FNR{data[$1]=$2;}
     NR!=FNR{if($NF in data){
     $NF=data[$NF];print >"result.txt"
     }else{
     print >"left.txt"}
     }' <( sed 's/\s*:\s*/|/' 2.txt) 1.txt 2>/dev/null

<强>输出

$ cat result.txt 
gera077||o||emi_riv_90@hotmail.com||||200.45.113.254||o||: demo
glen-666||o||glen-666@hotmail.com||||84.196.42.167||o||12883 @: "#
menopause |||totoche@wanadoo.fr||o||83.193.209.52||o||123422

$ cat left.txt 
okan1993||||killa-o@hotmail.de||||84.141.125.140||o||69c1cb5ddbc66cceebe0dddba3eddf68
Tosiunia||||tosia_19@amorki.pl||o||83.22.193.86|||||ddcbba2076646980391cb4971b8030
DREP
Page 1
Sheyes1 ||||summer_faerie_dustyrose@yahoo.com|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
.
BenPhynix||||BenPhynix@aol.de||||| 62.226.181.57||||11dea24f1caebb012e11285579050f38
jonof.|o||joflem@medi3.no||o||213.161.242.106||o||239f33743e4a070b728d4dcbd1091f1a

我们已经预处理了第一个文件 - 使用sed - 来设置字段分隔符|并使用流程替换将结果传递给awk