该文件有数百万行,以2个行块排列,每个块中的第1行是标题,标有>
,后跟两行字母字符。
在linux和/或bash中,如何将文件拆分成较小的文件,保留2block结构?理想情况下,根据输出文件的数量或每个较小文件的块数,可以灵活地输出多少?
简短的例子:
>k99_12
CCTTCTTCAATGCCAATACCCTCGAAGAATTGCACCGCCTCGAACAAACCACATGACACACCCACCGTCACTGGCTGACATTGCCGCACAACTTGAAGCCTATGACCCGCAGGCACTACCTGCCAATGAGGTCTTGAATTTTCTGGATCACTTGGTGACGCCAGTGCACGACTCAGAATCAGTCGACATCTTTGCAGCATTGGGCAGAGTCACTGCGCAA
>k99_27
ATCCAAGCCAGAGAATATGCCTACCCGCATCGCACCGCGATCTCGCAAATGTCGTGTAATCGCGCGGGTATCAACACCCTGAATGCCAATGATTCCTTGCTCAATCAATTCCGCCTCAAGATCATTTTGTGCGCGCCAATTTGATGTCAAAGAGGATGGGTTTCTAACAACAAACCCTGCCACCCAAATCTTTGATGACTCATTATCTAA
>k99_31
CCATTGCGCAAACGGACTGCCGGACACCAAGTGCACCTCGTGCGACAGACCATACTGGTCGTCATAAGAGAGTTCAGGGTTTTCGCGGTGGTCGGCCATGCCGTCCACATTGTGCACCTGCTGGTGCAGCGAACCGCCCATGGCCACATTGATTTCCTGGAAGCCACGGCAAATCCCCAGCAGGGGCACACCCTGCGCGACGCAGGCGCGTACCAAGGGCAAGGTCAGGCTGTCGCGGTGCGGATCCAGCGGCAGACGCGGA
>k99_35
AAAATTGAGTTTGAAGGAATTTCGCATTTCATCAAAAATCAACACGACGAGAGTGGTTCAACAACTATAAAACGTTGGGCAAAGGAATTTATGGACGAAATAAATTGTCCTGTTTGCGAAGGTTCACGATTAAAAAAAGAAGCTTTATTTTTCAAAATTAATGGAAAAAACATCACTGAATTATGCAATATGGATATTTCGGATGTCACGGCTTGGTTTTTGGAATTGAACACCCATTTATCAGATAAACAAAAGACTATAGCGACGGAGGTTATCAAGGAAATAAAAGATCGATTGGCCTTTTTAATGAATGTAGGTTTGGATTATTT
>k99_40
GAGGCCGGCGAAGGCGCGGTGATCGACGAGGAGGACGACGACGCTGGCGCGGGCGAGCGCGTCGGCGAGGGAAACGAGTTCCGCCCTGCCCTGCAGCGCTTTCGGCAAGGCGGTCACGTGCGGCTCGACGGCCAGGACCTTCAGGCCGGCGTCGGCGAGGTGCGCGGCGATCTCGACGGCGGGAGATTCACGGAGGTCGTCGACGTTCGCCTTGAAGGCGAGGCCGAGGCAGGCGACGGCGGCGCCAGAA
>k99_42
AGACCAAATCGCACGGCTAGCAGGATCAAAACGCAAGATGCGCGGGTCTCTTACTTCATCGCGCAGAGTAGGGCGCATCAGCGCGACTTTTTCGCGCACGTCATCGGCGCCTTTGCGGCCGTCTATGTTGAGGTCAAACTCAACCACCACCACCGACACGCCTTCATAACTGCGAGATGTGAGGGCATTGATACCGGCAATGGAATTGACTGCTTCTTCCACTTTTTTAGTCACCTCGCTCTCGACAATTTCAGGAGAGGCGCCTGGATATTCGGTGCTCACGACAACGACGGGCAAATCAATATTAGGAAACTGGTCGATCTTGAGGCGCTGATAAGAGAACAAGCCCAGCACCACAAAGGCAAGCATCACCATCGTTGCGAAGACGGGGTTTTTGAGGCTGACTTTGGTGA
>k99_75
AAAGGTAGCATTGAAGATTATACGCAGTTGTTTCAGGCAGCAGCACAAATTGCGAATGAATCGGCACATATGCAACTCGATATAGATGTCGAGGGATTCAACGAATTTGCTACGGCGGCGGACGACCTCAGTAAGTTATTCACTGGTTTCATTTTGAAGTTGGAGAATGTGAGTATCATCGACGATACTGTATTTTTGACTGCGGTGGCAAATGCTCTCTCGAAGATAAGCAATTTGTCGAAAGTGTTTGGTAAGTTCAAAGAAACTATATTGGGCACTTCGACAATTCGTTTGCCCAAATCCGCACATGATGCATCGGTTATACTGAAAGATGTGGTTGGGCAAATCAATTGTGCAATGACGTATATAAACCATTTTGTCGATTCGAGTGTTCCCGCACCAAGTGTTGCGGAATTATCGAAAGAAGAGAAGAATATAATCGACGCTGCGGTGACAACCATTCACAATTGGAATACATTGTGTGACCAAGGAGTTAGTATTGCCATGTCAAGCGACCCAGATATTCAATTTGTTAGTAATGCGAATCAATCGCT
>k99_76
TCGTAAGCTAACTAAATCAACTGAACAATCTATCACCAATAGTATGTAATCAGAAATCAACTTAAATCTCATATATTAATGAAAGTTTTATCAATTGTTGGAACAAGGCCGGAAATAATTAAGTTATCAAGAGTGTTTCATGAACTTGAAAAATATACTGAACACATTTTAGTACATACAGGTCAAAACTTTGATTATGAACTAAATGAAATATTTTTCAATGATCTTAAAATTAAGAAACCTGATTTTTTTTTAAATGTTGTTGGCGAATCTTTAGCTGATACTATTGCAAACATAATTTCCAAATCCGATAAAGTTCTAGAAAAAATAAAACCA
>k99_79
GATGTACTGGTACTCGTTGTAGGTCGTCGTCTTGCTACCTCTGCTGCTGTCGTTCGTGGCCTCGTTGCGGTGGTCGTAGTTGTTGTGGTCGCTCTCGCAGGCCCGCCGCTCAGAGCTTGGAACGAGTTCTTGGAGACGAAGTCTCCCAGCGTTGCGCCGCGAGGCGTCGGGCGAGGTCGAGCTGCGACCTTCGCCTGGACAAAGCCGTCCTGGACCAGAGAGATGTCCATCCGCTGCGGCGGCCCCTCTTCGACGCTCCTGACGGCGCCTGTCGTTGGCCTCTGCGGGCAGGCTCGGGAGGAGTGACCGGTCTTTTTGCAGATCCAGCACTTCCGGAGCTCCCGTGCGACCTCAGGCAGGGGGCACTTGATGGCGGCGTGCGACTCGCC
>k99_83
CCCGAACACAATCGCTTTAGTCGAGCGGGAAACGCGGTGGGATTATGCGGACCCAGCCTTTACGAACGGGATCGCGGAAGACTTCTCCATCGACCAGTCTACTCACTCGCTCTTCGGCGCCTCGAAGGTTGCCGCCGACGTTTTGGTGCAGGAATACGGCCGCTATTTTGGAATGCCTACTTGCGTGCTGCGCGGCGGCTGCCTCACCGGCCCGAATCACAGCGGCGTCCAG
>k99_90
GGCTGACGTACAAGATGCGCCGTCCGTGGTCACGCGGCACGCTGGGCGTGGTGTTCAACGCGTTGTATGCCGTCATGTTCCTGTTCACGATCACGGTGATCGCGTCGATTCTCCACTCGTTCGAGTTCAACGGGCTATCCATCTTCTTCTTCCTGTTCTTCCTGTCGCTCGTGACCTTCTTCGGCCTGAAGATTCGCAATACGCGCCGCGAGCTGATGGTGGTAGAGGCGCGCGTCGGCATCGTCGGCACGATCGCGGACATCCTGTTTCTCCCCATGATACGCGCCGGCCGCTGGGTCGCGCTCCGGGCGCCGCGGGCCATCGCCACGCGGCCGGTCCGGACCATTTCCATGATCCCGTACGGCCGAAGCACCTCGAGCAGACCGTCAATCTTGTCTTCCGTACCGGTGATCTCGATGATCAGCGAATCCACCGCCACGTCGATCACCCGCGCGCGGAACACCTCGGCGAGCTGCATGACGTGCGGCCTGGATTCCGCCGACGCGGCAAC
>k99_100
AAAATACAGGTCTTTCAATGATGAAAGAAATGGATGATGCAAAAAATCTCGTTGGAATTGATTATACGAAGCATTTTGCTGATTTGGTAGAGAAAGCAGATCCTTTTGGTTCTAAAGCAGCGTTTATGCCAATGAAAGTAATTACTGCTTTGGCTTTGTTTGGTGAAAACGGCTCAACGAAAGCATTGGAAAGCTCATTAAAAAGAGGTGGAAGTGAAGAAAATTTAAACGATCTTTATTTAAACAGAGTAGGTGAGTACAAATGGAATGGTAAAACCTGGATTAAAAATAAAGAAGTTAAAGATAAAATTATTTTACGCTTTCCATCTTCTAATGCTAAAACTGTAAATAACGCTTCTTATGAAATTTCATTTGTGAACTATGCTGGAGCAGGTTTGCCTGATGA
>k99_104
GGTTCCATACATGTAACGCCAGGAATAGTGGACAACATTTGGTGCATCAGTGCGCCACGACGAGCAAATGCCTCACGCATCATGTGCACCGCTGATAAATCACCACTGACTGCAGCAAGTGCTGCAACCTGTGACACGTTGGCAACGTTTGACGTGGCATGCGATTGGAAGTTTGTTGAAGCCTTCATGATGTCTTTCGGTCCA
>k99_108
CCGCAGCATCTGACCGAGATCGAAGGGGCGGCCGTAGGGGCGCCGGCTGCTGTGCTGGCGCGCTGGACGGCGGCGGGCATGGCGCCGGCAGTCGTCATCGGCGACGGCGCGTTGGCCTTCGAGTCCCTCCTCGCCGGAGAGGCCCGCGTGTGTGGCGCGCAGCCGCTCGCCGGGACAATCGGACGAATCGCGGCGATCCGCGCGGATCGGGGAGAAGCGGTGGAGCCACACGCCGTGCGCGCGCTGTACGTCCGGCGTTCTGACGCGGAGGTCGAGAGGGACCGTGCCCGCTGATTCGAATGGTGCGGCGCCCCTCGCGCTGACGGTCGATCTCTTGTCGTCACTCGACGAACTGGACGAGGTGATGGCGGTCGA
>k99_112
ATGTCGAGCGCCAGCATTAGCGGGCGGGCGGACAAGGATGTTGATGCGCGCTCAATCGCTTTGGTGAAAACCGGTGACGAAGAAGCCAGCGCTGGCCGCGTCGATAGCGCAATTGGCTGGTATGAAACTGCGCTCGCGGTCGACCCGCGCAACCGTGCCGCTTATGTCGCCATGGCGCGCGCCGTAAAATCCCAGGGGTTG
>k99_115
GCAGTGGATGCCATACCAGAAAAAGTCGGGATGGTGCGGCTCGAATTCGGCGGGCCCGATGGAGTAGACGGAGGTGATGTCACCGATCTTGGCGAACTTTGAGGCGAAGGTTTCAGGTGGATACCTAAGGGACGAAGAACTGAAGAGTGGGATTTTGTGTTCGCGGGCGAGGCGGACGATTTCGACGGCGTCTGTAAGAGAGCCGGCGAGGGGCTTGTCGATGAAGACGGGTTTTTTCGCGGCGAGGGTCTGGCGGAATTGTTCGAGGTGGGGGCGGCCGTCGACGCTTTCGATCAGA
>k99_117
CGTCTCTGAGCTTTTCAGCTTCCATCAACTTGGCTTTTCCTATGGCCGCACTGTCGGAAACAATCGCAATCGGCCCTAACCTGGCTCCCAAGCAAGCCATGTGCATCGCCACGTGCTCAGCGGCCCCGGACGAGATTCGATATCCCACCGGAATGGCGACCGCCATGGATCTGACACAAACCTTCCCGTCCTTTCGCCATACCGTCCGTCGCTTCGTACGTCGTCTGGTGTGGCGTCGCTGTCCTCCGTGGTGCTGTAGACGTTCTCGATGGGGTCGTCGCCCTGGAAGTACTGGAGGTGGCTGTAGTCGCGCGGGAC
>k99_121
ATGAGTACAACAGTCAGTCATAACTGCGTAAGGGGCACCTGTAAATCTAGCCAATGCATGTTCAAATTCTAGTATTTTCTCAAACATTTTCGCTCAAGTGATCTTGTTTAATTTCTCGCACTGGGCAATTTAGTAATTCTGCTATAGTATTTTTAACTGCTATTCTTTTATTATTCCAATTTCTTATTAGTATAGCACGTCGTCCAATTTCTTCCAAACTTAATTCTTCTTCGCAACCACTTTTTAATTCAGCTTCTAATGTCCAGATAGTATCATGTATTCTTTTGAGCTCATCAAAACACGTTTTAACAAGAGATAAATCGAATTGTGAAGTTTGATCTTGATACCAATTAAGTTCCTCTTGATTGCTGTGTGTCCGATCCCACTTAACTTCGGCTATGGCTAATCTATCAAATAGTTCAATTACTGGAAAGTGGTAACTCATAGATATAGTCCTTCAATTTTTTCTGGA
>k99_135
AAAAGACTGTTGGCTTCTCCCAAAAAATTTACTTAAAAAATAATATTCAGACAACAATTCTTGAAAGTGCTATGCTTTGAAAGTTGTGTTTTTTTTAATTATGGCCAAAGAAAAAACAATACACACAAAAAAAGTTTGAAACATGGCCGATTTTCGTTTTAACGTGAAAGCTGATACCACAGATTAGATATAGAATAGATAGAGGCTTCCTAAATATCAGTAGTTCCCGGTCAAAGGGGCAGGATCAAGAGGGTTGCGGGGTTTCCTCTCTTCACATTGTACATTGTACACCTTGGTTGTAATAATAGAATATGTAACACCTTGT
答案 0 :(得分:3)
将文件拆分为2个行块,使用的工具是split
split -l 2 -d file file_
原始file
将每2行生成file_xx
个部分。这将一下子完成。您可以将2
更改为所需的行数。