我有一个制表符分隔的文件(我们称之为file1),如下所示:
NC_027300.1 Gnomon exon 5501 5691 . - . gene_id "1"; transcript_id "1.1";
NC_027300.1 Gnomon exon 16966 17019 . - . gene_id "1"; transcript_id "1.1";
NC_027300.1 Gnomon exon 23978 24241 . - . gene_id "1"; transcript_id "1.1";
NC_027300.1 Gnomon exon 43486 43714 . - . gene_id "1"; transcript_id "1.1";
NC_027300.1 Gnomon exon 61647 62139 . - . gene_id "1"; transcript_id "1.1";
NC_027300.1 Gnomon CDS 5501 5691 . - 2 gene_id "1"; transcript_id "1.1";
NC_027300.1 Gnomon CDS 16966 17019 . - 2 gene_id "1"; transcript_id "1.1";
NC_027300.1 Gnomon CDS 23978 24241 . - 2 gene_id "1"; transcript_id "1.1";
NC_027300.1 Gnomon CDS 43486 43633 . - 0 gene_id "1"; transcript_id "1.1";
NC_027300.1 Gnomon exon 160437 160638 . - . gene_id "2"; transcript_id "2.1";
NC_027300.1 Gnomon exon 160913 161019 . - . gene_id "2"; transcript_id "2.1";
一个更大的制表符分隔文件(file2),如下所示:
NC_027300.1 Gnomon gene 5501 62139 . - . ID=gene0;Dbxref=GeneID:106560212;Name=LOC106560212;gbkey=Gene;gene=LOC106560212;gene_biotype=protein_coding
NC_027300.1 Gnomon mRNA 5501 62139 . - . ID=rna0;Parent=gene0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;Name=XM_014160784.1;gbkey=mRNA;gene=LOC106560212;model_evidence=Supporting evidence includes similarity to: 99%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 8 samples with support for all annotated introns;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
NC_027300.1 Gnomon exon 61647 62139 . - . ID=id1;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
NC_027300.1 Gnomon exon 43486 43714 . - . ID=id2;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
NC_027300.1 Gnomon exon 23978 24241 . - . ID=id3;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
NC_027300.1 Gnomon exon 16966 17019 . - . ID=id4;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
NC_027300.1 Gnomon exon 5501 5691 . - . ID=id5;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
NC_027300.1 Gnomon CDS 43486 43633 . - 0 ID=cds0;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XP_014016259.1;Name=XP_014016259.1;gbkey=CDS;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;protein_id=XP_014016259.1
NC_027300.1 Gnomon CDS 23978 24241 . - 2 ID=cds0;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XP_014016259.1;Name=XP_014016259.1;gbkey=CDS;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;protein_id=XP_014016259.1
NC_027300.1 Gnomon CDS 16966 17019 . - 2 ID=cds0;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XP_014016259.1;Name=XP_014016259.1;gbkey=CDS;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;protein_id=XP_014016259.1
我想创建一个新文件,其中只包含file1中也存在于file2中的行,这些行基于前8列,其中file1的所有9列和file2的第9列作为第10列。像这样:
NC_027300.1 Gnomon exon 5501 5691 . - . gene_id "1"; transcript_id "1.1"; ID=id5;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
我一直在努力关注this example,这是(凭借我非常有限的知识)我想出的:
awk 'NR==FNR{a[$1,$2,$3,$4,$5,$6,$7,$8]=$10;next} ($1,$2,$3,$4,$5,$6,$7,$8) in a{print $0, a[$$1,$2,$3,$4,$5,$6,$7,$8]}' file1 file2 > newfile
有人可以告诉我,如果我在附近有任何帮助,如果这是错的吗?我的文件是1M +行,现在正在运行,但我担心它可能需要一段时间才能看到它是否正常工作!提前致谢
答案 0 :(得分:1)
你走在正确的道路上,看起来你需要小修正
更改
a[$$1,$2,$3,$4,$5,$6,$7,$8]
^
Here
要
a[$1,$2,$3,$4,$5,$6,$7,$8]
因此,如果使用file1的8个字段构成的索引键存在于使用file1的前8个字段创建的数组a
中,则它将从数组a
中的file1打印第10个字段。
答案 1 :(得分:1)
切换输入文件的顺序并整理:
**PlaceHolder Appears**
<textarea placeholder="Am Default Message"></textarea>
**PlaceHolder Doesn't Appear**
<textarea placeholder="Am Default Message"> </textarea>
<textarea placeholder="Am Default Message">
</textarea>
<textarea placeholder="Am Default Message">Something</textarea>