我有一个数组,我们称它为ensembldb
,其中包含以下几行:
rs2799070 ENST00000379389 ENSG00000187608 ISG15 inframe_insertion NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NM_005101.3 NP_005092
rs2799070 ENST00000458555 ENSG00000224969 AL645608.2 missense_variant NA NA antisense NA NULL NULL
rs2799070 ENST00000624652 ENSG00000187608 ISG15 inframe_deletion NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NULL NULL
rs2799070 ENST00000624697 ENSG00000187608 ISG15 frameshift_variant NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NULL NULL
和另一个ordered array,我们称之为ordered_array
:
frameshift_variant
missense_variant
inframe_insertion
inframe_deletion
我想对数组ensembldb
进行排序,以匹配数组ordered_array
中的订单。预期的输出如下:
rs2799070 ENST00000624697 ENSG00000187608 ISG15 frameshift_variant NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NULL NULL
rs2799070 ENST00000458555 ENSG00000224969 AL645608.2 missense_variant NA NA antisense NA NULL NULL
rs2799070 ENST00000379389 ENSG00000187608 ISG15 inframe_insertion NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NM_005101.3 NP_005092
rs2799070 ENST00000624652 ENSG00000187608 ISG15 inframe_deletion NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NULL NULL
我检查了此question,但由于它是多维数组,所以无法回答我的问题。如何根据已排序的数组ensembldb
对我的数组ordered_array
进行排序?
谢谢。
编辑1:根据@anubhava的要求添加代码
declare -A ordered_array
ordered_array[0]="frameshift_variant"
ordered_array[1]="missense_variant"
ordered_array[2]="inframe_insertion"
ordered_array[3]="inframe_deletion"
while read -r LINE; do
chrom=$(echo -e "$LINE" | cut -f1 -d$'\t' | sed 's/^chr//g')
pos=$(echo -e "$LINE" | cut -f2 -d$'\t')
ref=$(echo -e "$LINE" | cut -f3 -d$'\t')
alt=$(echo -e "$LINE" | cut -f4 -d$'\t')
LINE=$(echo -e "$LINE" | sed 's/^chr//g')
ensembldb=$(echo "PREPARE stmt1 FROM 'SELECT Annotated_ID, Transcript, Gene_ID, Gene_name, Consequence, Swissprot_ID, AA_change, Biotype, Gene_description, RefSeq_mRNA, RefSeq_peptide FROM SNP_annot.37_annot_ensembl_89_full_descr where chrom = \"$chrom\" and Start = \"$pos\" and Local_alleles = \"$ref/$alt\"'; execute stmt1;" | mariadb -A -N)
readarray -t array <<< "$ensembldb"
pos19=$(echo "PREPARE stmt2 FROM 'select hg19_pos from SNP_annot.mut_convert_pos where chrom = \"$chrom\" and hg38_pos = \"$pos\"'; execute stmt2;" | mariadb -A -N)
hits=$(echo -e "$ensembldb" | wc -l)
[ ! -z "$pos19" ] && awk -v line="$LINE" -v pos="$pos19" -v ensembl="$ensembldb" -v hit="$hits" 'BEGIN {print line"\t"ensembl"\t"hit"\t"pos}'
done
1。变量LINE
的行如下:
CHROM POS REF ALT QUAL DP Genotype
chr1 16495 G C 1722.77 252 G/C
chr1 16719 T A 145.77 189 T/A
chr1 16841 G T 701.77 521 G/T
chr1 17626 G A 154.77 124 G/A
2。变量ensembldb
是一个MySQL查询,它返回多行并转换为数组。它包含我要根据ordered_array
排序的行,并选择与ordered_array
匹配的第一行。
答案 0 :(得分:2)
此awk
可能对您有用:
awk 'FNR==NR{a[$5]=$0;next}{print a[$1]}' file_a file_b
如果a
和b
确实是数组:
readarray -t a < <(awk 'FNR==NR{a[$5]=$0;next}{print a[$1]}' <(printf '%s\n' "${a[@]}") <(printf '%s\n' "${b[@]}"))