在Python中解析并输出文件为CSV

时间:2014-10-19 00:19:51

标签: python parsing csv text

我正在尝试解析具有以下格式的文本文件:

+++++
line1
line2
<<<<<
+++++
rline1
rline2
<<<<<

其中,+++++表示开始记录,<<<<<表示记录结束。

现在我想以下列格式将整个文本输出到csv:

line1, line2
rline1, rline2

我正在尝试这样:

lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<']
output_lines =[]

for line in lines:
    if (line == "+++++") or not(line == "<<<<<") :
        if (line == "<<<<<"):
            output_lines.append(line)
            output_lines.append(",")

print (output_lines)

我不确定如何从这里前进。

2 个答案:

答案 0 :(得分:1)

也许是这样的?

from itertools import groupby
import csv

lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<']

# remove the +++++s, so that only the <<<<<s indicate line breaks
cleaned_list = [ x for x in lines if x is not "+++++" ]

# separate at <<<<<s
rows = [list(group) for k, group in groupby(cleaned_list, lambda x: x == "<<<<<") if not k]

f = open('result.csv', 'wt')
try:
    writer = csv.writer(f)
    for row in rows:
        writer.writerow(row)
finally:
    f.close()

print open('result.csv', 'rt').read()

答案 1 :(得分:0)

在嵌套循环中收集行直到记录结束标记,并将结果列表写入CSV文件:

import csv

with open(inputfilename) as infh, open(outputfilename, 'w', newline='') as outfh:
    writer = csv.writer(outfh)
    for line in infh:
        if not line.startswith('+++++'):
            continue

        # found start, collect lines until end-of-record
        row = []
        for line in infh:
            if line.startswith('<<<<<'):
                # found end, end this inner loop
                break
            row.append(line.rstrip('\n'))

        if row:
            # lines for this record are added to the CSV file as a single row
            writer.writerow(row)

外部循环从输入文件中获取行,但跳过任何看起来不像记录开头的内容。找到一个开始后,第二个内部循环从文件对象中绘制 more 行,只要它们 not 看起来像记录的结尾,就添加它们到列表对象(没有行分隔符)。

当找到记录的结尾时,内部循环结束,如果在row列表中收集了任何行,则会将其写入CSV文件。

演示:

>>> import csv
>>> from io import StringIO
>>> import sys
>>> demo = StringIO('''\
... +++++
... line1
... line2
... <<<<<
... +++++
... rline1
... rline2
... <<<<<
... ''')
>>> writer = csv.writer(sys.stdout)
>>> for line in demo:
...     if not line.startswith('+++++'):
...         continue
...     row = []
...     for line in demo:
...         if line.startswith('<<<<<'):
...             break
...         row.append(line.rstrip('\n'))
...     if row:
...         writer.writerow(row)
... 
line1,line2
13
rline1,rline2
15

写入行之后的数字是写入的字节数,由writer.writerow()报告。