匹配两个文件中的值并替换所选列中的值

时间:2019-02-20 12:03:47

标签: awk

目的是检查file1中第3列和第4列的值是否与file2中第1列的值匹配。 如果有任何值匹配,则使用文件1的第5列和第6列的信息替换文件2中第2列和第3列的值

此外,我需要将file1的第7列和第8列的值添加到匹配行的第1列和第2列的file2中,将字符R替换为行,将O替换为未替换行,

文件1

2,100,31431,37131,999991.70,0000000.30,11111,22222,3
3,100,31431,37471,111113.20,1111111.30,22222,33333,4

文件2

3143137113 318512.50 2334387.50 100
3143137131 318737.50 2334387.50 100
3143137201 319612.50 2334387.50 100
3143137471 322987.50 2334387.50 100
3143137491 323237.50 2334387.50 100

所需的输出:

31431,37113,318512.50,2334387.50,100,O
11111,22222,999991.70,0000000.30,100,R
31431,37201,319612.50,2334387.50,100,O
22222,33333,111113.20,1111111.30,100,R
31431,37491,323237.50,2334387.50,100,O

我尝试了这些2:

1)

awk '
BEGIN{
  OFS=","
}
FNR==NR{
  a[$3 $4]=$3 OFS $4
  b[$3 $4]=$5
  c[$3 $4]=$6
  d[$3 $4]=$7 OFS $8
  next
}
($1 in
 a){
  $4=d[$1]
  $3=c[$1]
  $2=b[$1]
  $1=a[$1]
  print
  next
}
{
  $1=$1
  sub(/^...../,"&,",$1)
  print
}
' FS=","  file1 FS=" "  file2

输出

31431,37113,318512.50,2334387.50,100
31431,37131,999991.70,0000000.30,11111,22222
31431,37201,319612.50,2334387.50,100
31431,37471,111113.20,1111111.30,22222,33333
31431,37491,323237.50,2334387.50,100

2)

awk -F, 'NR==FNR{a[$3 $4]=substr($0,length($3 FS)+1);next} $1 in a{print a[$1],$NF;next} {$1=substr($1,1,5) OFS substr($1,6,5);} 1' OFS=, file1 FS=' ' file2

输出

31431,37113,318512.50,2334387.50,100
31431,37131,999991.70,0000000.30,11111,22222,3,100
31431,37201,319612.50,2334387.50,100
31431,37471,111113.20,1111111.30,22222,33333,4,100
31431,37491,323237.50,2334387.50,100

两者均有效,但不完全有效。

预先感谢

2 个答案:

答案 0 :(得分:2)

请您尝试以下。

awk '
FNR==NR{
  a[$3 $4]=$7 $8
  b[$3 $4]=$5
  c[$3 $4]=$6
  next
}
($1 in a){
  $2=b[$1]
  $3=c[$1]
  $1=a[$1]
  found=1
}
{
  $0=found==1?$0",R":$0",O"
  sub(/^...../,"&,")
  $1=$1
  found=""
}
1
' FS="," file1 FS=" " OFS="," file2

输出如下。

31431,37113,318512.50,2334387.50,100,O
11111,22222,999991.70,0000000.30,100,R
31431,37201,319612.50,2334387.50,100,O
22222,33333,111113.20,1111111.30,100,R
31431,37491,323237.50,2334387.50,100,O

答案 1 :(得分:2)

Perl版本:

#!/usr/bin/perl
use warnings;
use strict;
use autodie;
use feature qw/say/;

my ($file1, $file2) = @ARGV;
my %rows;

open my $f1, '<', $file1;
while (<$f1>) {
  chomp;
  my @F = split /,/;
  $rows{"$F[2]$F[3]"} = \@F;
}

open my $f2, '<', $file2;
$, = ','; # Like awk OFS
while (<$f2>) {
  chomp;
  my @F = split;
  if (exists $rows{$F[0]}) {
    my $left = $rows{$F[0]};
    say @{$left}[2..5], $F[3], @{$left}[6,7]; 
  } else {
    my ($col1, $col2) = $F[0] =~ m/^(.{5})(.{5})$/;
    say $col1, $col2, @F[1..3];
  }
}

示例:

$ ./example.pl file1.csv file2.txt
31431,37113,318512.50,2334387.50,100
31431,37131,999991.70,0000000.30,100,11111,22222
31431,37201,319612.50,2334387.50,100
31431,37471,111113.20,1111111.30,100,22222,33333
31431,37491,323237.50,2334387.50,100