在下面awk
我尝试将$2
中的file1
与.
相匹配,$4
中的file2
最多为_
第一个不受欢迎的file2
。如果找到匹配项,则$1
的该部分的日期与file1
中匹配的.
值相符。我认为它很接近但不确定如何在file1
中考虑awk
。在我的真实数据中有数千行,但它们都是以下格式,并且可能无法始终找到匹配。 file2
确实执行,但.
未更新,我认为因为space delimited
不匹配。谢谢你:)。
档案1 TGFBR1 NM_004612.3
TGFBR2 NM_003242.5
TGFBR3 NM_003243.4
tab-delimited
文件2 chr1 92149295 92149414 NM_003243_cds_0_0_chr1_92149296_r
chr1 92161228 92161336 NM_003243_cds_1_0_chr1_92161229_r
chr1 92163645 92163687 NM_003243_cds_2_0_chr1_92163646_r
chr3 30648375 30648469 NM_003242_cds_0_0_chr3_30648376_f
chr3 30686238 30686407 NM_003242_cds_1_0_chr3_30686239_f
chr9 101867487 101867584 NM_004612_cds_0_0_chr9_101867488_f
chr9 101904817 101904985 NM_001130916_cds_3_0_chr9_101904818_f
tab-delimited
所需的输出 chr1 92149295 92149414 TGFBR3_cds_0_0_chr1_92149296_r
chr1 92161228 92161336 TGFBR3_cds_1_0_chr1_92161229_r
chr1 92163645 92163687 TGFBR3_cds_2_0_chr1_92163646_r
chr3 30648375 30648469 TGFBR2_cds_0_0_chr3_30648376_f
chr3 30686238 30686407 TGFBR2_cds_1_0_chr3_30686239_f
chr9 101867487 101867584 TGFBR1_cds_0_0_chr9_101867488_f
awk 'FNR==NR {A[$1]=$1; next} $4 in A {sub ($4, $4 "_" A[$4]) }1' OFS='\t' file1 FS='\t' file2
AWK
keytool -importcert -file certificate.der -keystore avg_clientcerts.keystore.jks -alias "<<domain_name>>"
答案 0 :(得分:1)
关注awk
可能对您有所帮助。您也可以根据您的Input_file更改FS
字段分隔符,例如 - &gt; Input_file1是空格分隔的,然后在它之前使用FS=" "
,并且Input_file2是TAB分隔的,然后在其名称之前使用FS="\t"
。
awk '
FNR==NR{
val=$2;
sub(/\..*/,"",val);
a[val]=$1;
next
}
{
split($4,array,"_")
}
((array[1]"_"array[2]) in a){
sub(/.*_cds/,a[array[1]"_"array[2]]"_cds",$4);
print
}
' Input_file1 Input_file2
输出如下:
chr1 92149295 92149414 TGFBR3_cds_0_0_chr1_92149296_r
chr1 92161228 92161336 TGFBR3_cds_1_0_chr1_92161229_r
chr1 92163645 92163687 TGFBR3_cds_2_0_chr1_92163646_r
chr3 30648375 30648469 TGFBR2_cds_0_0_chr3_30648376_f
chr3 30686238 30686407 TGFBR2_cds_1_0_chr3_30686239_f
chr9 101867487 101867584 TGFBR1_cds_0_0_chr9_101867488_f