Question

我有一些格式为chromosome"\t"position"\t"feature的基因组学数据。每个要素都是一个类的成员，在参考文件中定义。我想输出格式为class"\t"chromosome"\t"position"\t"feature

的文件

基因组学文件：

$ head *Y.tsv
chrY    8143806 HAL1B
chrY    15923083        LTR25-int

参考文件：

$ head /home/software/RepBase20.05.fasta/humrep_names.ref
HERVH   ERV1    Eutheria
X21_LINE        CR1     Mammalia

代码：

awk '
{FS=OFS="\t"}
NR==FNR{a[$1]="";a[$1,1]=$1;a[$1,2]=$2;a[$1,3]=$3; next}
$3 in a{print a[$1,2],$1,$2,$3}
' /home/software/RepBase20.05.fasta/humrep_names.ref *Y.tsv

打印输出，表明已正确读入数组并找到匹配项，但a[$1,2]为空;输出：

chrY    21596689        L1M2A_5
chrY    16760406        HERV-K14CI
chrY    18692648        MER101_I

为什么匹配＆＃39;在＆＃39;中，但是打印显示没有值？如何打印每个功能（a[$1,2]和a[$1,1]）的课程（$3）？

谢谢！

Answer 1

绝对看看Ed Morton推荐的那本书，但我认为除非你在打印声明中将$1放在$3的位置，否则你或多或少都是正确的。

$ cat a.awk
# As mentioned in the comments, use BEGIN to only do this once
BEGIN { FS=OFS="\t" }

# no change from yours
NR==FNR{ a[$1]=""; a[$1,1]=$1; a[$1,2]=$2; a[$1,3]=$3; next}

# a[$3,2] instead of a[$1,2]
$3 in a {print a[$3,2],$1,$2,$3}

$ cat file1.txt
HERVH   ERV1    Eutheria
X21_LINE        CR1     Mammalia

$ cat file2.txt
chrY    8143806 HAL1B
chrY    15923083        HERVH

$ awk -f a.awk file1.txt file2.txt
ERV1    chrY    15923083        HERVH

Awk＆＃39;数组＆＃39;返回匹配，但是＆＃39; print array＆＃39;输出OFS分隔的空白，如何打印数组元素？

1 个答案: