尝试使用AWK将file
中每一行的内容与$2
中的list
进行匹配。这两个文件都以制表符分隔,并且list
中匹配的名称中可能有空格或特殊字符,例如file
中的名称为BRCA1
但list
名称为BRCA 1
或file
名称为BCR
,但list
的名称为BCR/ABL
。
如果匹配且$4
list
中有full gene sequence
,则$2 and $1
将以制表符分隔打印。如果找不到匹配项,则表示未匹配的名称和14
由选项卡分隔打印。下面的awk确实执行,但没有输出结果。谢谢你:)。
文件
BRCA1
BCR
SCN1A
fbn1
列表
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
AWK
awk -F'\t' -v OFS="\t" 'FNR==NR{A[$1]=$0;next} ($2 in A){if($4=="full gene sequence"){print A[$2],$1}} ELSE {print A[$2],"14"}' file list
所需的输出
BRCA1 811
BCR 71
SCN1A 14
fbn1 85
修改
List code gene gene name methodology
85 fbn1 Fibrillin full gene sequencing
95 FBN1 fibrillin del/dup
结果
85 fbn1 Fibrillin full gene sequencing
因为只有这一行中有full gene sequencing
,所以只打印出来。
答案 0 :(得分:1)
awk 'FNR==NR{
a[$2]=$1;
next
}
{
for(i in a){
if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
}
print $1,14
}' list file
<强>输入强>
$ cat list
List code gene gene name methodology
81 DMD dystrophin deletion analysis and duplication analysis
811 BRCA 1 BRCA2 full gene sequence and full deletion/duplication analysis
70 ABL1 ABL1 gene analysis variants in the kinse domane
71 BCR/ABL t(9;22) full gene sequence
$ cat file
BRCA1
BCR
SCN1A
<强>输出强>
$ awk 'FNR==NR{
a[$2]=$1;
next
}
{
for(i in a){
if($1 ~ i || i ~ $1){ print $1, a[i] ; next }
}
print $1,14
}' list file
BRCA1 811
BCR 71
SCN1A 14
答案 1 :(得分:1)
你可以尝试,
awk 'BEGIN{FS=OFS="\t"}
FNR==NR{
if(NR>1){
gsub(" ","",$2) #removing white space
n=split($2,v,"/")
d[v[1]] = $1 #from split, first element as key
}
next
}{print $1, ($1 in d?d[$1]:14)}' list file
你明白了,
BRCA1 811 BCR 71 SCN1A 14