将文件拆分为多个文件

时间:2013-08-21 23:34:26

标签: macos shell unix awk split

我想基于以数字(1。*)开头的行将文本文件拆分为多个文本文件例如,我想将此文本文件拆分为2个文件:

 1. J Med Chem. 2013 May 23;56(10):4028-43. doi: 10.1021/jm400241j. Epub 2013 May 13.

Optimization of benzoxazole-based inhibitors of Cryptosporidium parvum inosine
5'-monophosphate dehydrogenase.

Gorla SK, Kavitha M, Zhang M, Chin JE, Liu X, Striepen B, Makowska-Grzyska M, Kim
Y, Joachimiak A, Hedstrom L, Cuny GD.

Department of Biology, Brandeis University , 415 South Street, Waltham,
Massachusetts 02454, USA.

Cryptosporidium parvum is an enteric protozoan parasite that has emerged as a
major cause of diarrhea, malnutrition, and gastroenteritis and poses a potential 
bioterrorism threat.

PMID: 23668331  [PubMed - indexed for MEDLINE]


 2.Biochem Pharmacol. 2013 May 1;85(9):1370-8. doi: 10.1016/j.bcp.2013.02.014. Epub 
2013 Feb 16.

Carbonyl reduction of triadimefon by human and rodent 11β-hydroxysteroid
dehydrogenase 1.

Meyer A, Vuorinen A, Zielinska AE, Da Cunha T, Strajhar P, Lavery GG, Schuster D,
Odermatt A.

Swiss Center for Applied Human Toxicology and Division of Molecular and Systems
Toxicology, Department of Pharmaceutical Sciences, University of Basel,
Klingelbergstrasse 50, 4056 Basel, Switzerland.

11β-Hydroxysteroid dehydrogenase 1 (11β-HSD1) catalyzes the conversion of
inactive 11-oxo glucocorticoids (endogenous cortisone, 11-dehydrocorticosterone
and synthetic prednisone) to their potent 11β-hydroxyl forms (cortisol,
corticosterone and prednisolone).

Copyright © 2013 Elsevier Inc. All rights reserved.

PMID: 23419873  [PubMed - indexed for MEDLINE]

我试过了:

awk 'NF{print > $2;close($2);}' file

和此:

split -l 2

但我对如何给空行感到困惑。 (我是awk的新手。)

2 个答案:

答案 0 :(得分:3)

我认为你在寻找的是:

awk '/^[[:space:]]+[[:digit:]]+\./{ if (fname) close(fname); fname="out_"$1; sub(/\..*/,"",fname) } {print > fname}' file

根据@ zjhui的要求评论版本:

awk '
/^[[:space:]]+[[:digit:]]+\./ {     # IF the line starts with spaces, then digits then a period THEN
    if (fname)                      #     IF the output file name variable is populated THEN
        close(fname)                #         close the file youve been writing to until now
                                    #     ENDIF
    fname="out_"$1                  #     set the output file name to the word "out_" followed by the first field of this line, e.g. "out_2.Biochem"
    sub(/\..*/,"",fname)            #     strip everything from the period on from the file name so it becomes e.g. "out_2"
}                                   # ENDIF
{                                   # IF true THEN
    print > fname                   #     print the current record to the filename stored in the variable fname, e.g. "out_2".
}                                   # ENDIF
' file

答案 1 :(得分:0)

这应该有用。

awk -F"\." '/^ +[0-9]+\./
           {
            gsub(/ /,"",$1);
            file="file_"$1
           }
          {
            print >file
          }' Your_file