Bash将2行块的大文件拆分为较小的文件

时间:2017-03-31 13:33:00

标签: linux bash

该文件有数百万行,以2个行块排列,每个块中的第1行是标题,标有>,后跟两行字母字符。

在linux和/或bash中,如何将文件拆分成较小的文件,保留2block结构?理想情况下,根据输出文件的数量或每个较小文件的块数,可以灵活地输出多少?

简短的例子:

>k99_12
CCTTCTTCAATGCCAATACCCTCGAAGAATTGCACCGCCTCGAACAAACCACATGACACACCCACCGTCACTGGCTGACATTGCCGCACAACTTGAAGCCTATGACCCGCAGGCACTACCTGCCAATGAGGTCTTGAATTTTCTGGATCACTTGGTGACGCCAGTGCACGACTCAGAATCAGTCGACATCTTTGCAGCATTGGGCAGAGTCACTGCGCAA
>k99_27
ATCCAAGCCAGAGAATATGCCTACCCGCATCGCACCGCGATCTCGCAAATGTCGTGTAATCGCGCGGGTATCAACACCCTGAATGCCAATGATTCCTTGCTCAATCAATTCCGCCTCAAGATCATTTTGTGCGCGCCAATTTGATGTCAAAGAGGATGGGTTTCTAACAACAAACCCTGCCACCCAAATCTTTGATGACTCATTATCTAA
>k99_31
CCATTGCGCAAACGGACTGCCGGACACCAAGTGCACCTCGTGCGACAGACCATACTGGTCGTCATAAGAGAGTTCAGGGTTTTCGCGGTGGTCGGCCATGCCGTCCACATTGTGCACCTGCTGGTGCAGCGAACCGCCCATGGCCACATTGATTTCCTGGAAGCCACGGCAAATCCCCAGCAGGGGCACACCCTGCGCGACGCAGGCGCGTACCAAGGGCAAGGTCAGGCTGTCGCGGTGCGGATCCAGCGGCAGACGCGGA
>k99_35
AAAATTGAGTTTGAAGGAATTTCGCATTTCATCAAAAATCAACACGACGAGAGTGGTTCAACAACTATAAAACGTTGGGCAAAGGAATTTATGGACGAAATAAATTGTCCTGTTTGCGAAGGTTCACGATTAAAAAAAGAAGCTTTATTTTTCAAAATTAATGGAAAAAACATCACTGAATTATGCAATATGGATATTTCGGATGTCACGGCTTGGTTTTTGGAATTGAACACCCATTTATCAGATAAACAAAAGACTATAGCGACGGAGGTTATCAAGGAAATAAAAGATCGATTGGCCTTTTTAATGAATGTAGGTTTGGATTATTT
>k99_40
GAGGCCGGCGAAGGCGCGGTGATCGACGAGGAGGACGACGACGCTGGCGCGGGCGAGCGCGTCGGCGAGGGAAACGAGTTCCGCCCTGCCCTGCAGCGCTTTCGGCAAGGCGGTCACGTGCGGCTCGACGGCCAGGACCTTCAGGCCGGCGTCGGCGAGGTGCGCGGCGATCTCGACGGCGGGAGATTCACGGAGGTCGTCGACGTTCGCCTTGAAGGCGAGGCCGAGGCAGGCGACGGCGGCGCCAGAA
>k99_42
AGACCAAATCGCACGGCTAGCAGGATCAAAACGCAAGATGCGCGGGTCTCTTACTTCATCGCGCAGAGTAGGGCGCATCAGCGCGACTTTTTCGCGCACGTCATCGGCGCCTTTGCGGCCGTCTATGTTGAGGTCAAACTCAACCACCACCACCGACACGCCTTCATAACTGCGAGATGTGAGGGCATTGATACCGGCAATGGAATTGACTGCTTCTTCCACTTTTTTAGTCACCTCGCTCTCGACAATTTCAGGAGAGGCGCCTGGATATTCGGTGCTCACGACAACGACGGGCAAATCAATATTAGGAAACTGGTCGATCTTGAGGCGCTGATAAGAGAACAAGCCCAGCACCACAAAGGCAAGCATCACCATCGTTGCGAAGACGGGGTTTTTGAGGCTGACTTTGGTGA
>k99_75
AAAGGTAGCATTGAAGATTATACGCAGTTGTTTCAGGCAGCAGCACAAATTGCGAATGAATCGGCACATATGCAACTCGATATAGATGTCGAGGGATTCAACGAATTTGCTACGGCGGCGGACGACCTCAGTAAGTTATTCACTGGTTTCATTTTGAAGTTGGAGAATGTGAGTATCATCGACGATACTGTATTTTTGACTGCGGTGGCAAATGCTCTCTCGAAGATAAGCAATTTGTCGAAAGTGTTTGGTAAGTTCAAAGAAACTATATTGGGCACTTCGACAATTCGTTTGCCCAAATCCGCACATGATGCATCGGTTATACTGAAAGATGTGGTTGGGCAAATCAATTGTGCAATGACGTATATAAACCATTTTGTCGATTCGAGTGTTCCCGCACCAAGTGTTGCGGAATTATCGAAAGAAGAGAAGAATATAATCGACGCTGCGGTGACAACCATTCACAATTGGAATACATTGTGTGACCAAGGAGTTAGTATTGCCATGTCAAGCGACCCAGATATTCAATTTGTTAGTAATGCGAATCAATCGCT
>k99_76
TCGTAAGCTAACTAAATCAACTGAACAATCTATCACCAATAGTATGTAATCAGAAATCAACTTAAATCTCATATATTAATGAAAGTTTTATCAATTGTTGGAACAAGGCCGGAAATAATTAAGTTATCAAGAGTGTTTCATGAACTTGAAAAATATACTGAACACATTTTAGTACATACAGGTCAAAACTTTGATTATGAACTAAATGAAATATTTTTCAATGATCTTAAAATTAAGAAACCTGATTTTTTTTTAAATGTTGTTGGCGAATCTTTAGCTGATACTATTGCAAACATAATTTCCAAATCCGATAAAGTTCTAGAAAAAATAAAACCA
>k99_79
GATGTACTGGTACTCGTTGTAGGTCGTCGTCTTGCTACCTCTGCTGCTGTCGTTCGTGGCCTCGTTGCGGTGGTCGTAGTTGTTGTGGTCGCTCTCGCAGGCCCGCCGCTCAGAGCTTGGAACGAGTTCTTGGAGACGAAGTCTCCCAGCGTTGCGCCGCGAGGCGTCGGGCGAGGTCGAGCTGCGACCTTCGCCTGGACAAAGCCGTCCTGGACCAGAGAGATGTCCATCCGCTGCGGCGGCCCCTCTTCGACGCTCCTGACGGCGCCTGTCGTTGGCCTCTGCGGGCAGGCTCGGGAGGAGTGACCGGTCTTTTTGCAGATCCAGCACTTCCGGAGCTCCCGTGCGACCTCAGGCAGGGGGCACTTGATGGCGGCGTGCGACTCGCC
>k99_83
CCCGAACACAATCGCTTTAGTCGAGCGGGAAACGCGGTGGGATTATGCGGACCCAGCCTTTACGAACGGGATCGCGGAAGACTTCTCCATCGACCAGTCTACTCACTCGCTCTTCGGCGCCTCGAAGGTTGCCGCCGACGTTTTGGTGCAGGAATACGGCCGCTATTTTGGAATGCCTACTTGCGTGCTGCGCGGCGGCTGCCTCACCGGCCCGAATCACAGCGGCGTCCAG
>k99_90
GGCTGACGTACAAGATGCGCCGTCCGTGGTCACGCGGCACGCTGGGCGTGGTGTTCAACGCGTTGTATGCCGTCATGTTCCTGTTCACGATCACGGTGATCGCGTCGATTCTCCACTCGTTCGAGTTCAACGGGCTATCCATCTTCTTCTTCCTGTTCTTCCTGTCGCTCGTGACCTTCTTCGGCCTGAAGATTCGCAATACGCGCCGCGAGCTGATGGTGGTAGAGGCGCGCGTCGGCATCGTCGGCACGATCGCGGACATCCTGTTTCTCCCCATGATACGCGCCGGCCGCTGGGTCGCGCTCCGGGCGCCGCGGGCCATCGCCACGCGGCCGGTCCGGACCATTTCCATGATCCCGTACGGCCGAAGCACCTCGAGCAGACCGTCAATCTTGTCTTCCGTACCGGTGATCTCGATGATCAGCGAATCCACCGCCACGTCGATCACCCGCGCGCGGAACACCTCGGCGAGCTGCATGACGTGCGGCCTGGATTCCGCCGACGCGGCAAC
>k99_100
AAAATACAGGTCTTTCAATGATGAAAGAAATGGATGATGCAAAAAATCTCGTTGGAATTGATTATACGAAGCATTTTGCTGATTTGGTAGAGAAAGCAGATCCTTTTGGTTCTAAAGCAGCGTTTATGCCAATGAAAGTAATTACTGCTTTGGCTTTGTTTGGTGAAAACGGCTCAACGAAAGCATTGGAAAGCTCATTAAAAAGAGGTGGAAGTGAAGAAAATTTAAACGATCTTTATTTAAACAGAGTAGGTGAGTACAAATGGAATGGTAAAACCTGGATTAAAAATAAAGAAGTTAAAGATAAAATTATTTTACGCTTTCCATCTTCTAATGCTAAAACTGTAAATAACGCTTCTTATGAAATTTCATTTGTGAACTATGCTGGAGCAGGTTTGCCTGATGA
>k99_104
GGTTCCATACATGTAACGCCAGGAATAGTGGACAACATTTGGTGCATCAGTGCGCCACGACGAGCAAATGCCTCACGCATCATGTGCACCGCTGATAAATCACCACTGACTGCAGCAAGTGCTGCAACCTGTGACACGTTGGCAACGTTTGACGTGGCATGCGATTGGAAGTTTGTTGAAGCCTTCATGATGTCTTTCGGTCCA
>k99_108
CCGCAGCATCTGACCGAGATCGAAGGGGCGGCCGTAGGGGCGCCGGCTGCTGTGCTGGCGCGCTGGACGGCGGCGGGCATGGCGCCGGCAGTCGTCATCGGCGACGGCGCGTTGGCCTTCGAGTCCCTCCTCGCCGGAGAGGCCCGCGTGTGTGGCGCGCAGCCGCTCGCCGGGACAATCGGACGAATCGCGGCGATCCGCGCGGATCGGGGAGAAGCGGTGGAGCCACACGCCGTGCGCGCGCTGTACGTCCGGCGTTCTGACGCGGAGGTCGAGAGGGACCGTGCCCGCTGATTCGAATGGTGCGGCGCCCCTCGCGCTGACGGTCGATCTCTTGTCGTCACTCGACGAACTGGACGAGGTGATGGCGGTCGA
>k99_112
ATGTCGAGCGCCAGCATTAGCGGGCGGGCGGACAAGGATGTTGATGCGCGCTCAATCGCTTTGGTGAAAACCGGTGACGAAGAAGCCAGCGCTGGCCGCGTCGATAGCGCAATTGGCTGGTATGAAACTGCGCTCGCGGTCGACCCGCGCAACCGTGCCGCTTATGTCGCCATGGCGCGCGCCGTAAAATCCCAGGGGTTG
>k99_115
GCAGTGGATGCCATACCAGAAAAAGTCGGGATGGTGCGGCTCGAATTCGGCGGGCCCGATGGAGTAGACGGAGGTGATGTCACCGATCTTGGCGAACTTTGAGGCGAAGGTTTCAGGTGGATACCTAAGGGACGAAGAACTGAAGAGTGGGATTTTGTGTTCGCGGGCGAGGCGGACGATTTCGACGGCGTCTGTAAGAGAGCCGGCGAGGGGCTTGTCGATGAAGACGGGTTTTTTCGCGGCGAGGGTCTGGCGGAATTGTTCGAGGTGGGGGCGGCCGTCGACGCTTTCGATCAGA
>k99_117
CGTCTCTGAGCTTTTCAGCTTCCATCAACTTGGCTTTTCCTATGGCCGCACTGTCGGAAACAATCGCAATCGGCCCTAACCTGGCTCCCAAGCAAGCCATGTGCATCGCCACGTGCTCAGCGGCCCCGGACGAGATTCGATATCCCACCGGAATGGCGACCGCCATGGATCTGACACAAACCTTCCCGTCCTTTCGCCATACCGTCCGTCGCTTCGTACGTCGTCTGGTGTGGCGTCGCTGTCCTCCGTGGTGCTGTAGACGTTCTCGATGGGGTCGTCGCCCTGGAAGTACTGGAGGTGGCTGTAGTCGCGCGGGAC
>k99_121
ATGAGTACAACAGTCAGTCATAACTGCGTAAGGGGCACCTGTAAATCTAGCCAATGCATGTTCAAATTCTAGTATTTTCTCAAACATTTTCGCTCAAGTGATCTTGTTTAATTTCTCGCACTGGGCAATTTAGTAATTCTGCTATAGTATTTTTAACTGCTATTCTTTTATTATTCCAATTTCTTATTAGTATAGCACGTCGTCCAATTTCTTCCAAACTTAATTCTTCTTCGCAACCACTTTTTAATTCAGCTTCTAATGTCCAGATAGTATCATGTATTCTTTTGAGCTCATCAAAACACGTTTTAACAAGAGATAAATCGAATTGTGAAGTTTGATCTTGATACCAATTAAGTTCCTCTTGATTGCTGTGTGTCCGATCCCACTTAACTTCGGCTATGGCTAATCTATCAAATAGTTCAATTACTGGAAAGTGGTAACTCATAGATATAGTCCTTCAATTTTTTCTGGA
>k99_135
AAAAGACTGTTGGCTTCTCCCAAAAAATTTACTTAAAAAATAATATTCAGACAACAATTCTTGAAAGTGCTATGCTTTGAAAGTTGTGTTTTTTTTAATTATGGCCAAAGAAAAAACAATACACACAAAAAAAGTTTGAAACATGGCCGATTTTCGTTTTAACGTGAAAGCTGATACCACAGATTAGATATAGAATAGATAGAGGCTTCCTAAATATCAGTAGTTCCCGGTCAAAGGGGCAGGATCAAGAGGGTTGCGGGGTTTCCTCTCTTCACATTGTACATTGTACACCTTGGTTGTAATAATAGAATATGTAACACCTTGT

1 个答案:

答案 0 :(得分:3)

将文件拆分为2个行块,使用的工具是split

split -l 2 -d file file_

原始file将每2行生成file_xx个部分。这将一下子完成。您可以将2更改为所需的行数。