awk基于匹配行拆分更新文件

时间:2018-02-15 13:13:40

标签: awk

在下面awk我尝试将$2中的file1.相匹配,$4中的file2最多为_第一个不受欢迎的file2。如果找到匹配项,则$1的该部分的日期与file1中匹配的.值相符。我认为它很接近但不确定如何在file1中考虑awk。在我的真实数据中有数千行,但它们都是以下格式,并且可能无法始终找到匹配。 file2确实执行,但.未更新,我认为因为space delimited不匹配。谢谢你:)。

档案1 TGFBR1 NM_004612.3 TGFBR2 NM_003242.5 TGFBR3 NM_003243.4

tab-delimited

文件2 chr1 92149295 92149414 NM_003243_cds_0_0_chr1_92149296_r chr1 92161228 92161336 NM_003243_cds_1_0_chr1_92161229_r chr1 92163645 92163687 NM_003243_cds_2_0_chr1_92163646_r chr3 30648375 30648469 NM_003242_cds_0_0_chr3_30648376_f chr3 30686238 30686407 NM_003242_cds_1_0_chr3_30686239_f chr9 101867487 101867584 NM_004612_cds_0_0_chr9_101867488_f chr9 101904817 101904985 NM_001130916_cds_3_0_chr9_101904818_f

tab-delimited

所需的输出 chr1 92149295 92149414 TGFBR3_cds_0_0_chr1_92149296_r chr1 92161228 92161336 TGFBR3_cds_1_0_chr1_92161229_r chr1 92163645 92163687 TGFBR3_cds_2_0_chr1_92163646_r chr3 30648375 30648469 TGFBR2_cds_0_0_chr3_30648376_f chr3 30686238 30686407 TGFBR2_cds_1_0_chr3_30686239_f chr9 101867487 101867584 TGFBR1_cds_0_0_chr9_101867488_f

awk 'FNR==NR {A[$1]=$1; next}  $4 in A {sub ($4, $4 "_" A[$4]) }1' OFS='\t' file1 FS='\t' file2

AWK

keytool -importcert -file certificate.der -keystore avg_clientcerts.keystore.jks -alias "<<domain_name>>"

1 个答案:

答案 0 :(得分:1)

关注awk可能对您有所帮助。您也可以根据您的Input_file更改FS字段分隔符,例如 - &gt; Input_file1是空格分隔的,然后在它之前使用FS=" ",并且Input_file2是TAB分隔的,然后在其名称之前使用FS="\t"

awk '
FNR==NR{
  val=$2;
  sub(/\..*/,"",val);
  a[val]=$1;
  next
}
{
  split($4,array,"_")
}
((array[1]"_"array[2]) in a){
  sub(/.*_cds/,a[array[1]"_"array[2]]"_cds",$4);
  print
}
'   Input_file1   Input_file2

输出如下:

chr1 92149295 92149414 TGFBR3_cds_0_0_chr1_92149296_r
chr1 92161228 92161336 TGFBR3_cds_1_0_chr1_92161229_r
chr1 92163645 92163687 TGFBR3_cds_2_0_chr1_92163646_r
chr3 30648375 30648469 TGFBR2_cds_0_0_chr3_30648376_f
chr3 30686238 30686407 TGFBR2_cds_1_0_chr3_30686239_f
chr9 101867487 101867584 TGFBR1_cds_0_0_chr9_101867488_f