现在我使用以下代码解析某些txt文件中的段落文本:
def ParseFile(path,filename):
content=open(path+filename).read()
code=filename.split('.')[0]
pattenstart = ''
pattenend = ''
for catlog in CATLOG:
i = content.index(pattenstart)
j = content.index(pattenend)
info=content[i:j]
yield (catlog,code,info)
sys.stdout.write('.')
,信息是一个多行文字
现在我想输出一个csv文件,如:
code info
*** ****
****
****
*** ****
****
****
我使用一些脚本进行测试,但只能输出如下文件:
code info
*** ****
***********
**********
我的测试脚本是:
time1=time.time()
subfix='_ALL.csv'
d = defaultdict(list)
for path in [PATH1,PATH2]:
print 'Parsing',path
filenames = os.listdir(path)
for filename in filenames:
print 'Parsing',filename
for item in ParseFile(path,filename):
d[item[0]].append((item[1],item[2]))
print
for k in d.keys():
out_file=open(DESTFILEPATH+k+subfix,'w')
for code,info in sorted(set(d[k])):
out_file.write(code+'\t'+info+\n')
out_file.close()
print 'Done in %0.1f seconds'%(time.time()-time1)
如何解决?
答案 0 :(得分:3)
Python有the csv
module,它可以让你更轻松地做你想做的事情,我建议你看看。
E.g:
import csv
with open('somefile.csv', 'w') as file:
output = csv.writer(file, delimiter='\t')
output.writerows([
['code', 'info'],
['****', '****'],
[None, '****'],
[None, '****'],
[None, '****'],
['****', '****'],
[None, '****']
])
产生:
code info
**** ****
****
****
****
**** ****
****
编辑:
如果您的数据格式不合适,那么您只需将其更改为适合:
import csv
from itertools import izip_longest
from itertools import chain
data = [("key", ["value", "value"]), ("key", ["value", "value"])]
with open('somefile.csv', 'w') as file:
output = csv.writer(file, dialect='excel-tab')
output.writerows(
chain.from_iterable(
izip_longest([key], values) for key, values in data
)
)
产生:
key value
value
key value
value