AWK - 使用其他文件提取信息 - 语法错误

时间:2016-06-30 19:06:28

标签: awk

我正在尝试使用awk从文件中提取信息。

informationfile.txt类似于:

>ENST00000342992.10 cdna:known chromosome:GRCh38:2:178525989:178807421:-1 gene:ENSG00000155657.24 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:TTN description:titin [Source:HGNC Symbol;Acc:HGNC:12403]
GCAGTCGTGCATTCCCAGCCTCGCCTCGGGTGTAGGGATTGCATAGAAAAGCAAAACTAC
ACAGTCTTGACTGTGTAGTTTTGTTTTTAGGATTAGAGGCTCACCGATTCATGTCGGAGA
TGGTCAGAAAAACCAACTCTCCATAGGACGTCGTTTCAGAAGCAACCTTGGGCTTAGTCC
CACCCTTTTTAGGCACTCTTGAGAAATCAGAGTGCCTAGAAAGATGACAACTCAAGCACC
GACGTTTACGCAGCCGTTACAAAGCGTTGTGGTACTGGAGGGTAGTACCGCAACCTTTGA
GGCTCACATTAGTGGTTTTCCAGTTCCTGAGGTGAGCTGGTTTAGGGATGGCCAGGTGAT
TTCCACTTCCACTCTGCCCGGCGTGCAGATCTCCTTTAGCGATGGCCGCGCTAAACTGAC
GATCCCCGCCGTGACTAAAGCCAACAGTGGACGATATTCCCTGAAAGCCACCAATGGATC
TGGACAAGCGACTAGTACTGCTGAGCTTCTCGTGAAAGCTGAGACAGCACCACCCAACTT
CGTTCAACGACTGCAGAGCATGACCGTGAGACAAGGAAGCCAAGTGAGACTCCAAGTGAG
AGTGACTGGAATCCCTACACCTGTGGTGAAGTTCTACCGGGATGGAGCCGAAATCCAGAG
CTCCCTTGATTTCCAAATTTCACAAGAAGGCGACCTCTACAGCTTACTGATTGCAGAAGC
ATACCCTGAGGACTCAGGGACCTATTCAGTAAATGCCACCAATAGCGTTGGAAGAGCTAC
TTCGACTGCTGAATTACTGGTTCAAGGTGAAGAAGAAGTACCTGCTAAAAAGACAAAGAC
AATTGTTTCGACTGCTCAGATCTCAGAATCAAGACAAACCCGAATTGAAAAGAAGATTGA
AGCCCACTTTGATGCCAGATCAATTGCAACAGTTGAGATGGTCATAGATGGTGCCGCTGG
GCAACAGCTGCCACATAAAACACCTCCCAGGATTCCTCCGAAGCCAAAGTCAAGATCCCC
AACACCACCGTCTATTGCTGCCAAAGCACAGCTGGCTCGGCAGCAGTCCCCATCGCCCAT
AAGACACTCCCCTTCCCCGGTCAGACACGTGCGGGCACCGACCCCATCTCCGGTCAGGTC
CGTGTCTCCAGCAGCAAGAATCTCCACATCCCCCATCAGGTCTGTTAGGTCTCCATTGCT
CATGCGTAAGACTCAGGCATCCACCGTGGCCACAGGTCCTGAAGTGCCTCCCCCTTGGAA
GCAAGAGGGCTACGTGGCCTCCTCATCTGAGGCTGAGATGAGAGAGACAACGCTGACAAC
CTCTACTCAGATCAGGACAGAAGAGAGATGGGAAGGGAGATACGGTGTCCAGGAGCAAGT
GACCATCAGTGGTGCTGCGGGTGCTGCCGCCAGTGTGTCGGCCAGTGCTAGCTACGCAGC
AGAGGCTGTTGCCACTGGTGCTAAAGAGGTGAAACAAGATGCTGACAAAAGTGCAGCTGT
TGCGACTGTTGTTGCTGCCGTTGATATGGCCAGAGTGAGAGAACCAGTGATCAGCGCTGT
AGAGCAGACTGCTCAGAGGACAACCACGACTGCTGTGCACATCCAACCTGCTCAAGAACA
GGTAAGAAAGGAAGCGGAGAAGACTGCTGTAACTAAGGTAGTAGTGGCCGCCGATAAAGC
CAAGGAACAAGAATTAAAATCAAGAACCAAAGAAGTAATTACCACAAAGCAAGAGCAGAT
GCACGTAACTCATGAGCAGATAAGAAAAGAAACTGAAAAAACATTTGTACCAAAGGTAGT
AATTTCCGCAGCTAAAGCCAAAGAACAAGAAACTAGAATTTCTGAAGAAATTACTAAGAA
ACAGAAACAAGTAACTCAAGAAGCAATAAGACAGGAAACTGAGATAACTGCTGCATCCAT
GGTGGTAGTTGCCACTGCAAAGTCCACAAAACTAGAAACAGTCCCGGGAGCTCAAGAAGA
AACTACCACACAACAAGATCAAATGCACCTAAGTTATGAAAAGATAATGAAGGAAACTAG
GAAAACAGTTGTACCTAAAGTCATAGTTGCCACACCCAAAGTCAAAGAACAAGATTTAGT
>ENST00000460472.6 cdna:known chromosome:GRCh38:2:178525989:178807423:-1 gene:ENSG00000155657.24 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:TTN description:titin [Source:HGNC Symbol;Acc:HGNC:12403]
GCAGTCGTGCATTCCCAGCCTCGCCTCGGGTGTAGGGATTGCATAGAAAAGCAAAACTAC
ACAGTCTTGACTGTGTAGTTTTGTTTTTAGGATTAGAGGCTCACCGATTCATGTCGGAGA
TGGTCAGAAAAACCAACTCTCCATAGGACGTCGTTTCAGAAGCAACCTTGGGCTTAGTCC
CACCCTTTTTAGGCACTCTTGAGAAATCAGAGTGCCTAGAAAGATGACAACTCAAGCACC
GACGTTTACGCAGCCGTTACAAAGCGTTGTGGTACTGGAGGGTAGTACCGCAACCTTTGA
GGCTCACATTAGTGGTTTTCCAGTTCCTGAGGTGAGCTGGTTTAGGGATGGCCAGGTGAT
TTCCACTTCCACTCTGCCCGGCGTGCAGATCTCCTTTAGCGATGGCCGCGCTAAACTGAC
GATCCCCGCCGTGACTAAAGCCAACAGTGGACGATATTCCCTGAAAGCCACCAATGGATC
TGGACAAGCGACTAGTACTGCTGAGCTTCTCGTGAAAGCTGAGACAGCACCACCCAACTT
CGTTCAACGACTGCAGAGCATGACCGTGAGACAAGGAAGCCAAGTGAGACTCCAAGTGAG
AGTGACTGGAATCCCTACACCTGTGGTGAAGTTCTACCGGGATGGAGCCGAAATCCAGAG
CTCCCTTGATTTCCAAATTTCACAAGAAGGCGACCTCTACAGCTTACTGATTGCAGAAGC
ATACCCTGAGGACTCAGGGACCTATTCAGTAAATGCCACCAATAGCGTTGGAAGAGCTAC
TTCGACTGCTGAATTACTGGTTCAAGGTGAAGAAGAAGTACCTGCTAAAAAGACAAAGAC
AATTGTTTCGACTGCTCAGATCTCAGAATCAAGACAAACCCGAATTGAAAAGAAGATTGA
AGCCCACTTTGATGCCAGATCAATTGCAACAGTTGAGATGGTCATAGATGGTGCCGCTGG
GCAACAGCTGCCACATAAAACACCTCCCAGGATTCCTCCGAAGCCAAAGTCAAGATCCCC
AACACCACCGTCTATTGCTGCCAAAGCACAGCTGGCTCGGCAGCAGTCCCCATCGCCCAT
AAGACACTCCCCTTCCCCGGTCAGACACGTGCGGGCACCGACCCCATCTCCGGTCAGGTC
CGTGTCTCCAGCAGCAAGAATCTCCACATCCCCCATCAGGTCTGTTAGGTCTCCATTGCT
CATGCGTAAGACTCAGGCATCCACCGTGGCCACAGGTCCTGAAGTGCCTCCCCCTTGGAA
GCAAGAGGGCTACGTGGCCTCCTCATCTGAGGCTGAGATGAGAGAGACAACGCTGACAAC
CTCTACTCAGATCAGGACAGAAGAGAGATGGGAAGGGAGATACGGTGTCCAGGAGCAAGT
GACCATCAGTGGTGCTGCGGGTGCTGCCGCCAGTGTGTCGGCCAGTGCTAGCTACGCAGC
AGAGGCTGTTGCCACTGGTGCTAAAGAGGTGAAACAAGATGCTGACAAAAGTGCAGCTGT
TGCGACTGTTGTTGCTGCCGTTGATATGGCCAGAGTGAGAGAACCAGTGATCAGCGCTGT
AGAGCAGACTGCTCAGAGGACAACCACGACTGCTGTGCACATCCAACCTGCTCAAGAACA
GGTAAGAAAGGAAGCGGAGAAGACTGCTGTAACTAAGGTAGTAGTGGCCGCCGATAAAGC
CAAGGAACAAGAATTAAAATCAAGAACCAAAGAAGTAATTACCACAAAGCAAGAGCAGAT
GCACGTAACTCATGAGCAGATAAGAAAAGAAACTGAAAAAACATTTGTACCAAAGGTAGT
AATTTCCGCAGCTAAAGCCAAAGAACAAGAAACTAGAATTTCTGAAGAAATTACTAAGAA
ACAGAAACAAGTAACTCAAGAAGCAATAAGACAGGAAACTGAGATAACTGCTGCATCCAT
GGTGGTAGTTGCCACTGCAAAGTCCACAAAACTAGAAACAGTCCCGGGAGCTCAAGAAGA
AACTACCACACAACAAGATCAAATGCACCTAAGTTATGAAAAGATAATGAAGGAAACTAG
GAAAACAGTTGTACCTAAAGTCATAGTTGCCACACCCAAAGTCAAAGAACAAGATTTAGT
>ENST00000589042.5 cdna:known chromosome:GRCh38:2:178525989:178807423:-1 gene:ENSG00000155657.24 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:TTN description:titin [Source:HGNC Symbol;Acc:HGNC:12403]
GCAGTCGTGCATTCCCAGCCTCGCCTCGGGTGTAGGGATTGCATAGAAAAGCAAAACTAC
ACAGTCTTGACTGTGTAGTTTTGTTTTTAGGATTAGAGGCTCACCGATTCATGTCGGAGA
TGGTCAGAAAAACCAACTCTCCATAGGACGTCGTTTCAGAAGCAACCTTGGGCTTAGTCC
CACCCTTTTTAGGCACTCTTGAGAAATCAGAGTGCCTAGAAAGATGACAACTCAAGCACC
GACGTTTACGCAGCCGTTACAAAGCGTTGTGGTACTGGAGGGTAGTACCGCAACCTTTGA
GGCTCACATTAGTGGTTTTCCAGTTCCTGAGGTGAGCTGGTTTAGGGATGGCCAGGTGAT
TTCCACTTCCACTCTGCCCGGCGTGCAGATCTCCTTTAGCGATGGCCGCGCTAAACTGAC
GATCCCCGCCGTGACTAAAGCCAACAGTGGACGATATTCCCTGAAAGCCACCAATGGATC
TGGACAAGCGACTAGTACTGCTGAGCTTCTCGTGAAAGCTGAGACAGCACCACCCAACTT
CGTTCAACGACTGCAGAGCATGACCGTGAGACAAGGAAGCCAAGTGAGACTCCAAGTGAG
AGTGACTGGAATCCCTACACCTGTGGTGAAGTTCTACCGGGATGGAGCCGAAATCCAGAG
CTCCCTTGATTTCCAAATTTCACAAGAAGGCGACCTCTACAGCTTACTGATTGCAGAAGC
ATACCCTGAGGACTCAGGGACCTATTCAGTAAATGCCACCAATAGCGTTGGAAGAGCTAC
TTCGACTGCTGAATTACTGGTTCAAGGTGAAGAAGAAGTACCTGCTAAAAAGACAAAGAC
AATTGTTTCGACTGCTCAGATCTCAGAATCAAGACAAACCCGAATTGAAAAGAAGATTGA
AGCCCACTTTGATGCCAGATCAATTGCAACAGTTGAGATGGTCATAGATGGTGCCGCTGG
GCAACAGCTGCCACATAAAACACCTCCCAGGATTCCTCCGAAGCCAAAGTCAAGATCCCC
AACACCACCGTCTATTGCTGCCAAAGCACAGCTGGCTCGGCAGCAGTCCCCATCGCCCAT
AAGACACTCCCCTTCCCCGGTCAGACACGTGCGGGCACCGACCCCATCTCCGGTCAGGTC
CGTGTCTCCAGCAGCAAGAATCTCCACATCCCCCATCAGGTCTGTTAGGTCTCCATTGCT
CATGCGTAAGACTCAGGCATCCACCGTGGCCACAGGTCCTGAAGTGCCTCCCCCTTGGAA
GCAAGAGGGCTACGTGGCCTCCTCATCTGAGGCTGAGATGAGAGAGACAACGCTGACAAC
CTCTACTCAGATCAGGACAGAAGAGAGATGGGAAGGGAGATACGGTGTCCAGGAGCAAGT
GACCATCAGTGGTGCTGCGGGTGCTGCCGCCAGTGTGTCGGCCAGTGCTAGCTACGCAGC
AGAGGCTGTTGCCACTGGTGCTAAAGAGGTGAAACAAGATGCTGACAAAAGTGCAGCTGT
TGCGACTGTTGTTGCTGCCGTTGATATGGCCAGAGTGAGAGAACCAGTGATCAGCGCTGT
AGAGCAGACTGCTCAGAGGACAACCACGACTGCTGTGCACATCCAACCTGCTCAAGAACA
GGTAAGAAAGGAAGCGGAGAAGACTGCTGTAACTAAGGTAGTAGTGGCCGCCGATAAAGC
CAAGGAACAAGAATTAAAATCAAGAACCAAAGAAGTAATTACCACAAAGCAAGAGCAGAT
GCACGTAACTCATGAGCAGATAAGAAAAGAAACTGAAAAAACATTTGTACCAAAGGTAGT
AATTTCCGCAGCTAAAGCCAAAGAACAAGAAACTAGAATTTCTGAAGAAATTACTAAGAA
ACAGAAACAAGTAACTCAAGAAGCAATAAGACAGGAAACTGAGATAACTGCTGCATCCAT
GGTGGTAGTTGCCACTGCAAAGTCCACAAAACTAGAAACAGTCCCGGGAGCTCAAGAAGA
AACTACCACACAACAAGATCAAATGCACCTAAGTTATGAAAAGATAATGAAGGAAACTAG
GAAAACAGTTGTACCTAAAGTCATAGTTGCCACACCCAAAGTCAAAGAACAAGATTTAGT
>ENST00000591111.5 cdna:known chromosome:GRCh38:2:178525989:178807423:-1 gene:ENSG00000155657.24 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:TTN description:titin [Source:HGNC Symbol;Acc:HGNC:12403]
GCAGTCGTGCATTCCCAGCCTCGCCTCGGGTGTAGGGATTGCATAGAAAAGCAAAACTAC
ACAGTCTTGACTGTGTAGTTTTGTTTTTAGGATTAGAGGCTCACCGATTCATGTCGGAGA
TGGTCAGAAAAACCAACTCTCCATAGGACGTCGTTTCAGAAGCAACCTTGGGCTTAGTCC
CACCCTTTTTAGGCACTCTTGAGAAATCAGAGTGCCTAGAAAGATGACAACTCAAGCACC
GACGTTTACGCAGCCGTTACAAAGCGTTGTGGTACTGGAGGGTAGTACCGCAACCTTTGA
GGCTCACATTAGTGGTTTTCCAGTTCCTGAGGTGAGCTGGTTTAGGGATGGCCAGGTGAT
TTCCACTTCCACTCTGCCCGGCGTGCAGATCTCCTTTAGCGATGGCCGCGCTAAACTGAC
GATCCCCGCCGTGACTAAAGCCAACAGTGGACGATATTCCCTGAAAGCCACCAATGGATC
TGGACAAGCGACTAGTACTGCTGAGCTTCTCGTGAAAGCTGAGACAGCACCACCCAACTT
CGTTCAACGACTGCAGAGCATGACCGTGAGACAAGGAAGCCAAGTGAGACTCCAAGTGAG
AGTGACTGGAATCCCTACACCTGTGGTGAAGTTCTACCGGGATGGAGCCGAAATCCAGAG
CTCCCTTGATTTCCAAATTTCACAAGAAGGCGACCTCTACAGCTTACTGATTGCAGAAGC
ATACCCTGAGGACTCAGGGACCTATTCAGTAAATGCCACCAATAGCGTTGGAAGAGCTAC
TTCGACTGCTGAATTACTGGTTCAAGGTGAAGAAGAAGTACCTGCTAAAAAGACAAAGAC
AATTGTTTCGACTGCTCAGATCTCAGAATCAAGACAAACCCGAATTGAAAAGAAGATTGA
AGCCCACTTTGATGCCAGATCAATTGCAACAGTTGAGATGGTCATAGATGGTGCCGCTGG
GCAACAGCTGCCACATAAAACACCTCCCAGGATTCCTCCGAAGCCAAAGTCAAGATCCCC
AACACCACCGTCTATTGCTGCCAAAGCACAGCTGGCTCGGCAGCAGTCCCCATCGCCCAT
AAGACACTCCCCTTCCCCGGTCAGACACGTGCGGGCACCGACCCCATCTCCGGTCAGGTC
CGTGTCTCCAGCAGCAAGAATCTCCACATCCCCCATCAGGTCTGTTAGGTCTCCATTGCT
CATGCGTAAGACTCAGGCATCCACCGTGGCCACAGGTCCTGAAGTGCCTCCCCCTTGGAA
GCAAGAGGGCTACGTGGCCTCCTCATCTGAGGCTGAGATGAGAGAGACAACGCTGACAAC
CTCTACTCAGATCAGGACAGAAGAGAGATGGGAAGGGAGATACGGTGTCCAGGAGCAAGT
GACCATCAGTGGTGCTGCGGGTGCTGCCGCCAGTGTGTCGGCCAGTGCTAGCTACGCAGC
AGAGGCTGTTGCCACTGGTGCTAAAGAGGTGAAACAAGATGCTGACAAAAGTGCAGCTGT
TGCGACTGTTGTTGCTGCCGTTGATATGGCCAGAGTGAGAGAACCAGTGATCAGCGCTGT
AGAGCAGACTGCTCAGAGGACAACCACGACTGCTGTGCACATCCAACCTGCTCAAGAACA
GGTAAGAAAGGAAGCGGAGAAGACTGCTGTAACTAAGGTAGTAGTGGCCGCCGATAAAGC
CAAGGAACAAGAATTAAAATCAAGAACCAAAGAAGTAATTACCACAAAGCAAGAGCAGAT
GCACGTAACTCATGAGCAGATAAGAAAAGAAACTGAAAAAACATTTGTACCAAAGGTAGT
AATTTCCGCAGCTAAAGCCAAAGAACAAGAAACTAGAATTTCTGAAGAAATTACTAAGAA
ACAGAAACAAGTAACTCAAGAAGCAATAAGACAGGAAACTGAGATAACTGCTGCATCCAT
GGTGGTAGTTGCCACTGCAAAGTCCACAAAACTAGAAACAGTCCCGGGAGCTCAAGAAGA
AACTACCACACAACAAGATCAAATGCACCTAAGTTATGAAAAGATAATGAAGGAAACTAG
GAAAACAGTTGTACCTAAAGTCATAGTTGCCACACCCAAAGTCAAAGAACAAGATTTAGT
>ENST00000425332.2 cdna:known chromosome:GRCh38:2:178663627:178667307:-1 gene:ENSG00000155657.24 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:TTN description:titin [Source:HGNC Symbol;Acc:HGNC:12403]
GCAGTCGTGCATTCCCAGCCTCGCCTCGGGTGTAGGGATTGCATAGAAAAGCAAAACTAC
ACAGTCTTGACTGTGTAGTTTTGTTTTTAGGATTAGAGGCTCACCGATTCATGTCGGAGA
TGGTCAGAAAAACCAACTCTCCATAGGACGTCGTTTCAGAAGCAACCTTGGGCTTAGTCC
CACCCTTTTTAGGCACTCTTGAGAAATCAGAGTGCCTAGAAAGATGACAACTCAAGCACC
GACGTTTACGCAGCCGTTACAAAGCGTTGTGGTACTGGAGGGTAGTACCGCAACCTTTGA
GGCTCACATTAGTGGTTTTCCAGTTCCTGAGGTGAGCTGGTTTAGGGATGGCCAGGTGAT
TTCCACTTCCACTCTGCCCGGCGTGCAGATCTCCTTTAGCGATGGCCGCGCTAAACTGAC
GATCCCCGCCGTGACTAAAGCCAACAGTGGACGATATTCCCTGAAAGCCACCAATGGATC
TGGACAAGCGACTAGTACTGCTGAGCTTCTCGTGAAAGCTGAGACAGCACCACCCAACTT
CGTTCAACGACTGCAGAGCATGACCGTGAGACAAGGAAGCCAAGTGAGACTCCAAGTGAG
AGTGACTGGAATCCCTACACCTGTGGTGAAGTTCTACCGGGATGGAGCCGAAATCCAGAG
CTCCCTTGATTTCCAAATTTCACAAGAAGGCGACCTCTACAGCTTACTGATTGCAGAAGC
ATACCCTGAGGACTCAGGGACCTATTCAGTAAATGCCACCAATAGCGTTGGAAGAGCTAC
TTCGACTGCTGAATTACTGGTTCAAGGTGAAGAAGAAGTACCTGCTAAAAAGACAAAGAC
AATTGTTTCGACTGCTCAGATCTCAGAATCAAGACAAACCCGAATTGAAAAGAAGATTGA
AGCCCACTTTGATGCCAGATCAATTGCAACAGTTGAGATGGTCATAGATGGTGCCGCTGG
GCAACAGCTGCCACATAAAACACCTCCCAGGATTCCTCCGAAGCCAAAGTCAAGATCCCC
AACACCACCGTCTATTGCTGCCAAAGCACAGCTGGCTCGGCAGCAGTCCCCATCGCCCAT
AAGACACTCCCCTTCCCCGGTCAGACACGTGCGGGCACCGACCCCATCTCCGGTCAGGTC
CGTGTCTCCAGCAGCAAGAATCTCCACATCCCCCATCAGGTCTGTTAGGTCTCCATTGCT
CATGCGTAAGACTCAGGCATCCACCGTGGCCACAGGTCCTGAAGTGCCTCCCCCTTGGAA
GCAAGAGGGCTACGTGGCCTCCTCATCTGAGGCTGAGATGAGAGAGACAACGCTGACAAC
CTCTACTCAGATCAGGACAGAAGAGAGATGGGAAGGGAGATACGGTGTCCAGGAGCAAGT
GACCATCAGTGGTGCTGCGGGTGCTGCCGCCAGTGTGTCGGCCAGTGCTAGCTACGCAGC
AGAGGCTGTTGCCACTGGTGCTAAAGAGGTGAAACAAGATGCTGACAAAAGTGCAGCTGT
TGCGACTGTTGTTGCTGCCGTTGATATGGCCAGAGTGAGAGAACCAGTGATCAGCGCTGT
AGAGCAGACTGCTCAGAGGACAACCACGACTGCTGTGCACATCCAACCTGCTCAAGAACA
GGTAAGAAAGGAAGCGGAGAAGACTGCTGTAACTAAGGTAGTAGTGGCCGCCGATAAAGC
CAAGGAACAAGAATTAAAATCAAGAACCAAAGAAGTAATTACCACAAAGCAAGAGCAGAT
GCACGTAACTCATGAGCAGATAAGAAAAGAAACTGAAAAAACATTTGTACCAAAGGTAGT
AATTTCCGCAGCTAAAGCCAAAGAACAAGAAACTAGAATTTCTGAAGAAATTACTAAGAA
ACAGAAACAAGTAACTCAAGAAGCAATAAGACAGGAAACTGAGATAACTGCTGCATCCAT
GGTGGTAGTTGCCACTGCAAAGTCCACAAAACTAGAAACAGTCCCGGGAGCTCAAGAAGA
AACTACCACACAACAAGATCAAATGCACCTAAGTTATGAAAAGATAATGAAGGAAACTAG
GAAAACAGTTGTACCTAAAGTCATAGTTGCCACACCCAAAGTCAAAGAACAAGATTTAGT
>ENST00000448510.2 cdna:known chromosome:GRCh38:2:178669625:178672418:-1 gene:ENSG00000155657.24 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:TTN description:titin [Source:HGNC Symbol;Acc:HGNC:12403]
GCAGTCGTGCATTCCCAGCCTCGCCTCGGGTGTAGGGATTGCATAGAAAAGCAAAACTAC
ACAGTCTTGACTGTGTAGTTTTGTTTTTAGGATTAGAGGCTCACCGATTCATGTCGGAGA
TGGTCAGAAAAACCAACTCTCCATAGGACGTCGTTTCAGAAGCAACCTTGGGCTTAGTCC
CACCCTTTTTAGGCACTCTTGAGAAATCAGAGTGCCTAGAAAGATGACAACTCAAGCACC
GACGTTTACGCAGCCGTTACAAAGCGTTGTGGTACTGGAGGGTAGTACCGCAACCTTTGA
GGCTCACATTAGTGGTTTTCCAGTTCCTGAGGTGAGCTGGTTTAGGGATGGCCAGGTGAT
TTCCACTTCCACTCTGCCCGGCGTGCAGATCTCCTTTAGCGATGGCCGCGCTAAACTGAC
GATCCCCGCCGTGACTAAAGCCAACAGTGGACGATATTCCCTGAAAGCCACCAATGGATC
TGGACAAGCGACTAGTACTGCTGAGCTTCTCGTGAAAGCTGAGACAGCACCACCCAACTT
CGTTCAACGACTGCAGAGCATGACCGTGAGACAAGGAAGCCAAGTGAGACTCCAAGTGAG
AGTGACTGGAATCCCTACACCTGTGGTGAAGTTCTACCGGGATGGAGCCGAAATCCAGAG
CTCCCTTGATTTCCAAATTTCACAAGAAGGCGACCTCTACAGCTTACTGATTGCAGAAGC
ATACCCTGAGGACTCAGGGACCTATTCAGTAAATGCCACCAATAGCGTTGGAAGAGCTAC
TTCGACTGCTGAATTACTGGTTCAAGGTGAAGAAGAAGTACCTGCTAAAAAGACAAAGAC
AATTGTTTCGACTGCTCAGATCTCAGAATCAAGACAAACCCGAATTGAAAAGAAGATTGA
AGCCCACTTTGATGCCAGATCAATTGCAACAGTTGAGATGGTCATAGATGGTGCCGCTGG
GCAACAGCTGCCACATAAAACACCTCCCAGGATTCCTCCGAAGCCAAAGTCAAGATCCCC
AACACCACCGTCTATTGCTGCCAAAGCACAGCTGGCTCGGCAGCAGTCCCCATCGCCCAT
AAGACACTCCCCTTCCCCGGTCAGACACGTGCGGGCACCGACCCCATCTCCGGTCAGGTC
CGTGTCTCCAGCAGCAAGAATCTCCACATCCCCCATCAGGTCTGTTAGGTCTCCATTGCT
CATGCGTAAGACTCAGGCATCCACCGTGGCCACAGGTCCTGAAGTGCCTCCCCCTTGGAA
GCAAGAGGGCTACGTGGCCTCCTCATCTGAGGCTGAGATGAGAGAGACAACGCTGACAAC
CTCTACTCAGATCAGGACAGAAGAGAGATGGGAAGGGAGATACGGTGTCCAGGAGCAAGT
GACCATCAGTGGTGCTGCGGGTGCTGCCGCCAGTGTGTCGGCCAGTGCTAGCTACGCAGC
AGAGGCTGTTGCCACTGGTGCTAAAGAGGTGAAACAAGATGCTGACAAAAGTGCAGCTGT
TGCGACTGTTGTTGCTGCCGTTGATATGGCCAGAGTGAGAGAACCAGTGATCAGCGCTGT
AGAGCAGACTGCTCAGAGGACAACCACGACTGCTGTGCACATCCAACCTGCTCAAGAACA
GGTAAGAAAGGAAGCGGAGAAGACTGCTGTAACTAAGGTAGTAGTGGCCGCCGATAAAGC
CAAGGAACAAGAATTAAAATCAAGAACCAAAGAAGTAATTACCACAAAGCAAGAGCAGAT
GCACGTAACTCATGAGCAGATAAGAAAAGAAACTGAAAAAACATTTGTACCAAAGGTAGT
AATTTCCGCAGCTAAAGCCAAAGAACAAGAAACTAGAATTTCTGAAGAAATTACTAAGAA
ACAGAAACAAGTAACTCAAGAAGCAATAAGACAGGAAACTGAGATAACTGCTGCATCCAT
GGTGGTAGTTGCCACTGCAAAGTCCACAAAACTAGAAACAGTCCCGGGAGCTCAAGAAGA
AACTACCACACAACAAGATCAAATGCACCTAAGTTATGAAAAGATAATGAAGGAAACTAG
GAAAACAGTTGTACCTAAAGTCATAGTTGCCACACCCAAAGTCAAAGAACAAGATTTAGT
>ENST00000360870.9 cdna:known chromosome:GRCh38:2:178744405:178807421:-1 gene:ENSG00000155657.24 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:TTN description:titin [Source:HGNC Symbol;Acc:HGNC:12403]
GCAGTCGTGCATTCCCAGCCTCGCCTCGGGTGTAGGGATTGCATAGAAAAGCAAAACTAC
ACAGTCTTGACTGTGTAGTTTTGTTTTTAGGATTAGAGGCTCACCGATTCATGTCGGAGA
TGGTCAGAAAAACCAACTCTCCATAGGACGTCGTTTCAGAAGCAACCTTGGGCTTAGTCC
CACCCTTTTTAGGCACTCTTGAGAAATCAGAGTGCCTAGAAAGATGACAACTCAAGCACC
GACGTTTACGCAGCCGTTACAAAGCGTTGTGGTACTGGAGGGTAGTACCGCAACCTTTGA
GGCTCACATTAGTGGTTTTCCAGTTCCTGAGGTGAGCTGGTTTAGGGATGGCCAGGTGAT
TTCCACTTCCACTCTGCCCGGCGTGCAGATCTCCTTTAGCGATGGCCGCGCTAAACTGAC
GATCCCCGCCGTGACTAAAGCCAACAGTGGACGATATTCCCTGAAAGCCACCAATGGATC
TGGACAAGCGACTAGTACTGCTGAGCTTCTCGTGAAAGCTGAGACAGCACCACCCAACTT
CGTTCAACGACTGCAGAGCATGACCGTGAGACAAGGAAGCCAAGTGAGACTCCAAGTGAG
AGTGACTGGAATCCCTACACCTGTGGTGAAGTTCTACCGGGATGGAGCCGAAATCCAGAG
CTCCCTTGATTTCCAAATTTCACAAGAAGGCGACCTCTACAGCTTACTGATTGCAGAAGC
ATACCCTGAGGACTCAGGGACCTATTCAGTAAATGCCACCAATAGCGTTGGAAGAGCTAC
TTCGACTGCTGAATTACTGGTTCAAGGTGAAGAAGAAGTACCTGCTAAAAAGACAAAGAC
AATTGTTTCGACTGCTCAGATCTCAGAATCAAGACAAACCCGAATTGAAAAGAAGATTGA
AGCCCACTTTGATGCCAGATCAATTGCAACAGTTGAGATGGTCATAGATGGTGCCGCTGG
GCAACAGCTGCCACATAAAACACCTCCCAGGATTCCTCCGAAGCCAAAGTCAAGATCCCC
AACACCACCGTCTATTGCTGCCAAAGCACAGCTGGCTCGGCAGCAGTCCCCATCGCCCAT
AAGACACTCCCCTTCCCCGGTCAGACACGTGCGGGCACCGACCCCATCTCCGGTCAGGTC
CGTGTCTCCAGCAGCAAGAATCTCCACATCCCCCATCAGGTCTGTTAGGTCTCCATTGCT
CATGCGTAAGACTCAGGCATCCACCGTGGCCACAGGTCCTGAAGTGCCTCCCCCTTGGAA
GCAAGAGGGCTACGTGGCCTCCTCATCTGAGGCTGAGATGAGAGAGACAACGCTGACAAC
CTCTACTCAGATCAGGACAGAAGAGAGATGGGAAGGGAGATACGGTGTCCAGGAGCAAGT
GACCATCAGTGGTGCTGCGGGTGCTGCCGCCAGTGTGTCGGCCAGTGCTAGCTACGCAGC
AGAGGCTGTTGCCACTGGTGCTAAAGAGGTGAAACAAGATGCTGACAAAAGTGCAGCTGT
TGCGACTGTTGTTGCTGCCGTTGATATGGCCAGAGTGAGAGAACCAGTGATCAGCGCTGT
AGAGCAGACTGCTCAGAGGACAACCACGACTGCTGTGCACATCCAACCTGCTCAAGAACA
GGTAAGAAAGGAAGCGGAGAAGACTGCTGTAACTAAGGTAGTAGTGGCCGCCGATAAAGC
CAAGGAACAAGAATTAAAATCAAGAACCAAAGAAGTAATTACCACAAAGCAAGAGCAGAT
GCACGTAACTCATGAGCAGATAAGAAAAGAAACTGAAAAAACATTTGTACCAAAGGTAGT
AATTTCCGCAGCTAAAGCCAAAGAACAAGAAACTAGAATTTCTGAAGAAATTACTAAGAA
ACAGAAACAAGTAACTCAAGAAGCAATAAGACAGGAAACTGAGATAACTGCTGCATCCAT
GGTGGTAGTTGCCACTGCAAAGTCCACAAAACTAGAAACAGTCCCGGGAGCTCAAGAAGA
AACTACCACACAACAAGATCAAATGCACCTAAGTTATGAAAAGATAATGAAGGAAACTAG
GAAAACAGTTGTACCTAAAGTCATAGTTGCCACACCCAAAGTCAAAGAACAAGATTTAGT
>ENST00000634225.1 cdna:known chromosome:GRCh38:2:178753361:178767825:-1 gene:ENSG00000155657.24 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:TTN description:titin [Source:HGNC Symbol;Acc:HGNC:12403]
GCAGTCGTGCATTCCCAGCCTCGCCTCGGGTGTAGGGATTGCATAGAAAAGCAAAACTAC
ACAGTCTTGACTGTGTAGTTTTGTTTTTAGGATTAGAGGCTCACCGATTCATGTCGGAGA
TGGTCAGAAAAACCAACTCTCCATAGGACGTCGTTTCAGAAGCAACCTTGGGCTTAGTCC
CACCCTTTTTAGGCACTCTTGAGAAATCAGAGTGCCTAGAAAGATGACAACTCAAGCACC
GACGTTTACGCAGCCGTTACAAAGCGTTGTGGTACTGGAGGGTAGTACCGCAACCTTTGA
GGCTCACATTAGTGGTTTTCCAGTTCCTGAGGTGAGCTGGTTTAGGGATGGCCAGGTGAT
TTCCACTTCCACTCTGCCCGGCGTGCAGATCTCCTTTAGCGATGGCCGCGCTAAACTGAC
GATCCCCGCCGTGACTAAAGCCAACAGTGGACGATATTCCCTGAAAGCCACCAATGGATC
TGGACAAGCGACTAGTACTGCTGAGCTTCTCGTGAAAGCTGAGACAGCACCACCCAACTT
CGTTCAACGACTGCAGAGCATGACCGTGAGACAAGGAAGCCAAGTGAGACTCCAAGTGAG
AGTGACTGGAATCCCTACACCTGTGGTGAAGTTCTACCGGGATGGAGCCGAAATCCAGAG
CTCCCTTGATTTCCAAATTTCACAAGAAGGCGACCTCTACAGCTTACTGATTGCAGAAGC
ATACCCTGAGGACTCAGGGACCTATTCAGTAAATGCCACCAATAGCGTTGGAAGAGCTAC
TTCGACTGCTGAATTACTGGTTCAAGGTGAAGAAGAAGTACCTGCTAAAAAGACAAAGAC
AATTGTTTCGACTGCTCAGATCTCAGAATCAAGACAAACCCGAATTGAAAAGAAGATTGA
AGCCCACTTTGATGCCAGATCAATTGCAACAGTTGAGATGGTCATAGATGGTGCCGCTGG
GCAACAGCTGCCACATAAAACACCTCCCAGGATTCCTCCGAAGCCAAAGTCAAGATCCCC
AACACCACCGTCTATTGCTGCCAAAGCACAGCTGGCTCGGCAGCAGTCCCCATCGCCCAT
AAGACACTCCCCTTCCCCGGTCAGACACGTGCGGGCACCGACCCCATCTCCGGTCAGGTC
CGTGTCTCCAGCAGCAAGAATCTCCACATCCCCCATCAGGTCTGTTAGGTCTCCATTGCT
CATGCGTAAGACTCAGGCATCCACCGTGGCCACAGGTCCTGAAGTGCCTCCCCCTTGGAA
GCAAGAGGGCTACGTGGCCTCCTCATCTGAGGCTGAGATGAGAGAGACAACGCTGACAAC
CTCTACTCAGATCAGGACAGAAGAGAGATGGGAAGGGAGATACGGTGTCCAGGAGCAAGT
GACCATCAGTGGTGCTGCGGGTGCTGCCGCCAGTGTGTCGGCCAGTGCTAGCTACGCAGC
AGAGGCTGTTGCCACTGGTGCTAAAGAGGTGAAACAAGATGCTGACAAAAGTGCAGCTGT
TGCGACTGTTGTTGCTGCCGTTGATATGGCCAGAGTGAGAGAACCAGTGATCAGCGCTGT
AGAGCAGACTGCTCAGAGGACAACCACGACTGCTGTGCACATCCAACCTGCTCAAGAACA
GGTAAGAAAGGAAGCGGAGAAGACTGCTGTAACTAAGGTAGTAGTGGCCGCCGATAAAGC
CAAGGAACAAGAATTAAAATCAAGAACCAAAGAAGTAATTACCACAAAGCAAGAGCAGAT
GCACGTAACTCATGAGCAGATAAGAAAAGAAACTGAAAAAACATTTGTACCAAAGGTAGT
AATTTCCGCAGCTAAAGCCAAAGAACAAGAAACTAGAATTTCTGAAGAAATTACTAAGAA
ACAGAAACAAGTAACTCAAGAAGCAATAAGACAGGAAACTGAGATAACTGCTGCATCCAT
GGTGGTAGTTGCCACTGCAAAGTCCACAAAACTAGAAACAGTCCCGGGAGCTCAAGAAGA
AACTACCACACAACAAGATCAAATGCACCTAAGTTATGAAAAGATAATGAAGGAAACTAG
GAAAACAGTTGTACCTAAAGTCATAGTTGCCACACCCAAAGTCAAAGAACAAGATTTAGT
>ENST00000436599.1 cdna:known chromosome:GRCh38:2:178786089:178794954:-1 gene:ENSG00000155657.24 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:TTN description:titin [Source:HGNC Symbol;Acc:HGNC:12403]
>ENST00000470257.1 cdna:known chromosome:GRCh38:2:178798495:178807408:-1 gene:ENSG00000155657.24 gene_biotype:protein_coding transcript_biotype:retained_intron gene_symbol:TTN description:titin [Source:HGNC Symbol;Acc:HGNC:12403]
>ENST00000412264.1 cdna:known chromosome:GRCh38:2:178802287:178830802:-1 gene:ENSG00000155657.24 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:TTN description:titin [Source:HGNC Symbol;Acc:HGNC:12403]
GCAGTCGTGCATTCCCAGCCTCGCCTCGGGTGTAGGGATTGCATAGAAAAGCAAAACTAC
ACAGTCTTGACTGTGTAGTTTTGTTTTTAGGATTAGAGGCTCACCGATTCATGTCGGAGA
TGGTCAGAAAAACCAACTCTCCATAGGACGTCGTTTCAGAAGCAACCTTGGGCTTAGTCC
CACCCTTTTTAGGCACTCTTGAGAAATCAGAGTGCCTAGAAAGATGACAACTCAAGCACC
GACGTTTACGCAGCCGTTACAAAGCGTTGTGGTACTGGAGGGTAGTACCGCAACCTTTGA
GGCTCACATTAGTGGTTTTCCAGTTCCTGAGGTGAGCTGGTTTAGGGATGGCCAGGTGAT
TTCCACTTCCACTCTGCCCGGCGTGCAGATCTCCTTTAGCGATGGCCGCGCTAAACTGAC
GATCCCCGCCGTGACTAAAGCCAACAGTGGACGATATTCCCTGAAAGCCACCAATGGATC
TGGACAAGCGACTAGTACTGCTGAGCTTCTCGTGAAAGCTGAGACAGCACCACCCAACTT
CGTTCAACGACTGCAGAGCATGACCGTGAGACAAGGAAGCCAAGTGAGACTCCAAGTGAG
AGTGACTGGAATCCCTACACCTGTGGTGAAGTTCTACCGGGATGGAGCCGAAATCCAGAG
CTCCCTTGATTTCCAAATTTCACAAGAAGGCGACCTCTACAGCTTACTGATTGCAGAAGC
ATACCCTGAGGACTCAGGGACCTATTCAGTAAATGCCACCAATAGCGTTGGAAGAGCTAC
TTCGACTGCTGAATTACTGGTTCAAGGTGAAGAAGAAGTACCTGCTAAAAAGACAAAGAC
AATTGTTTCGACTGCTCAGATCTCAGAATCAAGACAAACCCGAATTGAAAAGAAGATTGA
AGCCCACTTTGATGCCAGATCAATTGCAACAGTTGAGATGGTCATAGATGGTGCCGCTGG
GCAACAGCTGCCACATAAAACACCTCCCAGGATTCCTCCGAAGCCAAAGTCAAGATCCCC
AACACCACCGTCTATTGCTGCCAAAGCACAGCTGGCTCGGCAGCAGTCCCCATCGCCCAT
AAGACACTCCCCTTCCCCGGTCAGACACGTGCGGGCACCGACCCCATCTCCGGTCAGGTC
CGTGTCTCCAGCAGCAAGAATCTCCACATCCCCCATCAGGTCTGTTAGGTCTCCATTGCT
CATGCGTAAGACTCAGGCATCCACCGTGGCCACAGGTCCTGAAGTGCCTCCCCCTTGGAA
GCAAGAGGGCTACGTGGCCTCCTCATCTGAGGCTGAGATGAGAGAGACAACGCTGACAAC
CTCTACTCAGATCAGGACAGAAGAGAGATGGGAAGGGAGATACGGTGTCCAGGAGCAAGT
GACCATCAGTGGTGCTGCGGGTGCTGCCGCCAGTGTGTCGGCCAGTGCTAGCTACGCAGC
AGAGGCTGTTGCCACTGGTGCTAAAGAGGTGAAACAAGATGCTGACAAAAGTGCAGCTGT
TGCGACTGTTGTTGCTGCCGTTGATATGGCCAGAGTGAGAGAACCAGTGATCAGCGCTGT
AGAGCAGACTGCTCAGAGGACAACCACGACTGCTGTGCACATCCAACCTGCTCAAGAACA
GGTAAGAAAGGAAGCGGAGAAGACTGCTGTAACTAAGGTAGTAGTGGCCGCCGATAAAGC
CAAGGAACAAGAATTAAAATCAAGAACCAAAGAAGTAATTACCACAAAGCAAGAGCAGAT
GCACGTAACTCATGAGCAGATAAGAAAAGAAACTGAAAAAACATTTGTACCAAAGGTAGT
AATTTCCGCAGCTAAAGCCAAAGAACAAGAAACTAGAATTTCTGAAGAAATTACTAAGAA
ACAGAAACAAGTAACTCAAGAAGCAATAAGACAGGAAACTGAGATAACTGCTGCATCCAT
GGTGGTAGTTGCCACTGCAAAGTCCACAAAACTAGAAACAGTCCCGGGAGCTCAAGAAGA
AACTACCACACAACAAGATCAAATGCACCTAAGTTATGAAAAGATAATGAAGGAAACTAG
GAAAACAGTTGTACCTAAAGTCATAGTTGCCACACCCAAAGTCAAAGAACAAGATTTAGT
>ENST00000359218.9 cdna:known chromosome:GRCh38:2:178525989:178807423:-1 gene:ENSG00000155657.24 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:TTN description:titin [Source:HGNC Symbol;Acc:HGNC:12403]
GCAGTCGTGCATTCCCAGCCTCGCCTCGGGTGTAGGGATTGCATAGAAAAGCAAAACTAC
ACAGTCTTGACTGTGTAGTTTTGTTTTTAGGATTAGAGGCTCACCGATTCATGTCGGAGA
TGGTCAGAAAAACCAACTCTCCATAGGACGTCGTTTCAGAAGCAACCTTGGGCTTAGTCC
CACCCTTTTTAGGCACTCTTGAGAAATCAGAGTGCCTAGAAAGATGACAACTCAAGCACC
GACGTTTACGCAGCCGTTACAAAGCGTTGTGGTACTGGAGGGTAGTACCGCAACCTTTGA
GGCTCACATTAGTGGTTTTCCAGTTCCTGAGGTGAGCTGGTTTAGGGATGGCCAGGTGAT
TTCCACTTCCACTCTGCCCGGCGTGCAGATCTCCTTTAGCGATGGCCGCGCTAAACTGAC
GATCCCCGCCGTGACTAAAGCCAACAGTGGACGATATTCCCTGAAAGCCACCAATGGATC
TGGACAAGCGACTAGTACTGCTGAGCTTCTCGTGAAAGCTGAGACAGCACCACCCAACTT
CGTTCAACGACTGCAGAGCATGACCGTGAGACAAGGAAGCCAAGTGAGACTCCAAGTGAG
AGTGACTGGAATCCCTACACCTGTGGTGAAGTTCTACCGGGATGGAGCCGAAATCCAGAG
CTCCCTTGATTTCCAAATTTCACAAGAAGGCGACCTCTACAGCTTACTGATTGCAGAAGC
ATACCCTGAGGACTCAGGGACCTATTCAGTAAATGCCACCAATAGCGTTGGAAGAGCTAC
TTCGACTGCTGAATTACTGGTTCAAGGTGAAGAAGAAGTACCTGCTAAAAAGACAAAGAC
AATTGTTTCGACTGCTCAGATCTCAGAATCAAGACAAACCCGAATTGAAAAGAAGATTGA
AGCCCACTTTGATGCCAGATCAATTGCAACAGTTGAGATGGTCATAGATGGTGCCGCTGG
GCAACAGCTGCCACATAAAACACCTCCCAGGATTCCTCCGAAGCCAAAGTCAAGATCCCC
AACACCACCGTCTATTGCTGCCAAAGCACAGCTGGCTCGGCAGCAGTCCCCATCGCCCAT
AAGACACTCCCCTTCCCCGGTCAGACACGTGCGGGCACCGACCCCATCTCCGGTCAGGTC
CGTGTCTCCAGCAGCAAGAATCTCCACATCCCCCATCAGGTCTGTTAGGTCTCCATTGCT
CATGCGTAAGACTCAGGCATCCACCGTGGCCACAGGTCCTGAAGTGCCTCCCCCTTGGAA
GCAAGAGGGCTACGTGGCCTCCTCATCTGAGGCTGAGATGAGAGAGACAACGCTGACAAC
CTCTACTCAGATCAGGACAGAAGAGAGATGGGAAGGGAGATACGGTGTCCAGGAGCAAGT
GACCATCAGTGGTGCTGCGGGTGCTGCCGCCAGTGTGTCGGCCAGTGCTAGCTACGCAGC
AGAGGCTGTTGCCACTGGTGCTAAAGAGGTGAAACAAGATGCTGACAAAAGTGCAGCTGT
TGCGACTGTTGTTGCTGCCGTTGATATGGCCAGAGTGAGAGAACCAGTGATCAGCGCTGT
AGAGCAGACTGCTCAGAGGACAACCACGACTGCTGTGCACATCCAACCTGCTCAAGAACA
GGTAAGAAAGGAAGCGGAGAAGACTGCTGTAACTAAGGTAGTAGTGGCCGCCGATAAAGC
CAAGGAACAAGAATTAAAATCAAGAACCAAAGAAGTAATTACCACAAAGCAAGAGCAGAT
GCACGTAACTCATGAGCAGATAAGAAAAGAAACTGAAAAAACATTTGTACCAAAGGTAGT
AATTTCCGCAGCTAAAGCCAAAGAACAAGAAACTAGAATTTCTGAAGAAATTACTAAGAA
ACAGAAACAAGTAACTCAAGAAGCAATAAGACAGGAAACTGAGATAACTGCTGCATCCAT
GGTGGTAGTTGCCACTGCAAAGTCCACAAAACTAGAAACAGTCCCGGGAGCTCAAGAAGA
AACTACCACACAACAAGATCAAATGCACCTAAGTTATGAAAAGATAATGAAGGAAACTAG
GAAAACAGTTGTACCTAAAGTCATAGTTGCCACACCCAAAGTCAAAGAACAAGATTTAGT

headerlist.txt文件看起来完全如下:

ENST00000342992.10
ENST00000460472.6
ENST00000589042.5
ENST00000591111.5
ENST00000359218.9
ENST00000615779.4
ENST00000342175.10

我编写了awk代码,用于收集我想要定位的标头,并收集该标头及其以下信息,直到下一个标题。

我称之为:

awk -f myScript.txt <headerlist.txt> <informationfile.txt>

以下是代码:

#!/bin/awk                       
NR == FNR {tags[$1]; next;}
for (i in tags) { if (i ~ $0) {a=1; print; next;}}
/>/ {a=0}
a

它应该产生:

>Target Header
Information attached to header
.
.
.

但是,我收到语法错误而没有任何信息。箭头不指向任何字符只是空格。

^ Syntax Error

我该如何纠正?

2 个答案:

答案 0 :(得分:1)

输入

$ cat HeaderList
Target Header
SomeOther Header

$ cat InfoFile
>Generic Header
Information attached to header
.
.
.
>Target Header
Information attached to header
.
.
.
>SomeOther Header
Information attached to header
.
.
.

<强>脚本

  while read line
  do 
  awk 'BEGIN{RS="\n>"}/'"$line"'/{printf ">%s\n",$0}' InfoFile
  done <HeaderList 

<强>输出

>Target Header
Information attached to header
.
.
.
>SomeOther Header
Information attached to header
.
.
.

答案 1 :(得分:1)

我认为这将是一个更好的解决方案

$ awk 'NR==FNR{h[$0]; next} 
       $0 in h{c=2} 
        c&&c--' headers file

>Target Header
Information attached to header

如果您的标题完全相同,则可以与等式检查匹配(h中为$ 0)并打印两行。

如果要打印到下一个标题

$ awk 'NR==FNR{h[$0]; next} 
          /^>/{p=0} 
       $0 in h{p=1} 
              p' headers file

>Target Header
Information attached to header
.
.
.

使用新文件布局时,需要对此脚本进行修改

$ awk 'NR==FNR{h[">"$0]; next} 
          /^>/{p=0} 
       $1 in h{p=1} 
              p' headers file

只要密钥(在头文件中使用)与记录的其余部分之间存在空白,这应该有效。现在,标题不会有“&gt;”前缀。