我正在尝试解析具有以下格式的文本文件:
+++++
line1
line2
<<<<<
+++++
rline1
rline2
<<<<<
其中,+++++
表示开始记录,<<<<<
表示记录结束。
现在我想以下列格式将整个文本输出到csv:
line1, line2
rline1, rline2
我正在尝试这样:
lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<']
output_lines =[]
for line in lines:
if (line == "+++++") or not(line == "<<<<<") :
if (line == "<<<<<"):
output_lines.append(line)
output_lines.append(",")
print (output_lines)
我不确定如何从这里前进。
答案 0 :(得分:1)
也许是这样的?
from itertools import groupby
import csv
lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<']
# remove the +++++s, so that only the <<<<<s indicate line breaks
cleaned_list = [ x for x in lines if x is not "+++++" ]
# separate at <<<<<s
rows = [list(group) for k, group in groupby(cleaned_list, lambda x: x == "<<<<<") if not k]
f = open('result.csv', 'wt')
try:
writer = csv.writer(f)
for row in rows:
writer.writerow(row)
finally:
f.close()
print open('result.csv', 'rt').read()
答案 1 :(得分:0)
在嵌套循环中收集行直到记录结束标记,并将结果列表写入CSV文件:
import csv
with open(inputfilename) as infh, open(outputfilename, 'w', newline='') as outfh:
writer = csv.writer(outfh)
for line in infh:
if not line.startswith('+++++'):
continue
# found start, collect lines until end-of-record
row = []
for line in infh:
if line.startswith('<<<<<'):
# found end, end this inner loop
break
row.append(line.rstrip('\n'))
if row:
# lines for this record are added to the CSV file as a single row
writer.writerow(row)
外部循环从输入文件中获取行,但跳过任何看起来不像记录开头的内容。找到一个开始后,第二个内部循环从文件对象中绘制 more 行,只要它们 not 看起来像记录的结尾,就添加它们到列表对象(没有行分隔符)。
当找到记录的结尾时,内部循环结束,如果在row
列表中收集了任何行,则会将其写入CSV文件。
演示:
>>> import csv
>>> from io import StringIO
>>> import sys
>>> demo = StringIO('''\
... +++++
... line1
... line2
... <<<<<
... +++++
... rline1
... rline2
... <<<<<
... ''')
>>> writer = csv.writer(sys.stdout)
>>> for line in demo:
... if not line.startswith('+++++'):
... continue
... row = []
... for line in demo:
... if line.startswith('<<<<<'):
... break
... row.append(line.rstrip('\n'))
... if row:
... writer.writerow(row)
...
line1,line2
13
rline1,rline2
15
写入行之后的数字是写入的字节数,由writer.writerow()
报告。