我有一个文本文件,我从中提取了两个字符串之间的区域。提取的区域如下所示:
title "A" "B" "C" "D" "E" "F"
number "G1" "G2" "G3" "G4" "G5" "G6"
data "aaa,bbb" "sss,ddd" "fff,ggg" "rrr,eee" "aaa,ooo" "ggg,aaa"
我想写一个csv文件。但是,即使将“\ t”指定为分隔符,它也会将逗号分隔成一行中的单独单元格并使用制表符将数据转换为新行,如下所示:
title
"A"
"B"
"C"
"D"
"E"
"F"
number
"G1"
"G2"
"G3"
"G4"
"G5"
"G6"
data
"aaa bbb"
"sss ddd"
"fff ggg"
"rrr eee"
"aaa ooo"
"ggg aaa"
我需要这样:
title A B C D E F
number G1 G2 G3 G4 G5 G6
data aaa,bbb sss,ddd fff,ggg rrr,eee aaa,ooo ggg,aaa
在一行中的单独单元格中,由制表符分隔。我感谢任何帮助。
答案 0 :(得分:0)
infile.csv:
title "A" "B" "C" "D" "E" "F"
number "G1" "G2" "G3" "G4" "G5" "G6"
data "aaa,bbb" "sss,ddd" "fff,ggg" "rrr,eee" "aaa,ooo" "ggg,aaa"
outfile.csv:
title A B C D E F
number G1 G2 G3 G4 G5 G6
data aaa,bbb sss,ddd fff,ggg rrr,eee aaa,ooo ggg,aaa
代码:
In [40]: import csv
In [41]: with open('infile.csv') as infile, open('outfile.csv', 'w') as outfile:
....: writer = csv.writer(outfile, delimiter='\t')
....: for row in csv.reader(infile, delimiter='\t', quotechar='"'):
....: writer.writerow(row)
....:
答案 1 :(得分:0)
使用正则表达式
f=open('yoyr_file.txt','r')
f=f.readlines()
for x in f:
print " ".join(re.findall('\w+,?\w*',x))
输出:
'title A B C D E F'
'number G1 G2 G3 G4 G5 G6'
'data aaa,bbb sss,ddd fff,ggg rrr,eee aaa,ooo ggg,aaa'
readlines()
会将您的文件读作行列表,然后我循环查找该模式。当你得到模式时,你可以像你想要的那样格式化它。