如何从文本中编写csv?

时间:2015-12-24 00:23:55

标签: python regex text concatenation

我试图使用Python从存档中提取一些信息。该档案的一部分是:

1. [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array
(Submitter supplied) Affymetrix submissions are replicated on the GeneChip Human Genome U133 Plus 2.0 Array. more...
Organism:   Homo sapiens
527 DataSets 4123 Series 54 Related Platforms 115874 Samples
FTP download: GEO ftp://ftp.ncbi.nlm.nih.gov/geo/platforms/GPLnnn/GPL570/
Platform    Accession: GPL570   ID: 100000570

2. [Mouse430_2] Affymetrix Mouse Genome 430 2.0 Array
(Submitter supplied) Affymetrix submissions are typically Array. more...
Organism:   Mus musculus
517 DataSets 3529 Series 36 Related Platforms 46528 Samples
FTP download: GEO ftp://ftp.ncbi.nlm.nih.gov/geo/platforms/GPL1nnn/GPL1261/
Platform    Accession: GPL1261  ID: 100001261


import re
import sys
import itertools

stdout = open("results.txt", "w")
pattern = re.compile(r'^\d+[.]\s')
pattern2 = re.compile(r'Organism:')
pattern3 = re.compile(r'FTP download:')
pattern4 = re.compile(r'ID: ')

listOrg = []

def group_separator(line):
    return line=='ID: '

with open('Microarray/PlatformsMicroarray.txt') as f:
    for key,group in itertools.groupby(f,group_separator):
        # print(key,list(group))  # uncomment to see what itertools.groupby does.
        if not key:
            data={}
            for item in group:
                for line in f:
                   if pattern.search(line):
                        listOrg.append(line)
                   if pattern2.search(line):
                        #field,value=line.split(':')
                        listOrg.append(line)
                   if pattern3.search(line):
                        listOrg.append(line)
                   if pattern4.search(line):
                        listOrg.append(line)

for item in listOrg:
  stdout.write("%s" % item)

stdout.close()

如何连接信息以便在.csv中编写存档?

1 个答案:

答案 0 :(得分:2)

csv模块是您的首选武器。

with open("path/to/out.csv", "wb") as out:
    writer = csv.writer(out)
    for line in whatever_your_input_is:
        writer.writerow(line)

在这种情况下看起来listOrg是您的解析输入,所以您要做

...
    for line in listOrg:
        writer.writerow(line)