在Python中提取数据并进行转置

时间:2014-11-03 01:55:57

标签: python csv extract delimiter tab-delimited

我有一个文本文件,我从中提取了两个字符串之间的区域。提取的区域如下所示:

title   "A" "B" "C" "D" "E" "F" 
number  "G1"    "G2"    "G3"    "G4"    "G5"    "G6"
data "aaa,bbb"  "sss,ddd"   "fff,ggg"   "rrr,eee"   "aaa,ooo"   "ggg,aaa"

我想写一个csv文件。但是,即使将“\ t”指定为分隔符,它也会将逗号分隔成一行中的单独单元格并使用制表符将数据转换为新行,如下所示:

title   
"A" 
"B" 
"C" 
"D" 
"E" 
"F" 
number  
"G1"    
"G2"    
"G3"    
"G4"    
"G5"    
"G6"
data 
"aaa    bbb"    
"sss    ddd"    
"fff    ggg"    
"rrr    eee"    
"aaa    ooo"    
"ggg    aaa"

我需要这样:

title   A   B   C   D   E   F   
number  G1  G2  G3  G4  G5  G6
data    aaa,bbb sss,ddd fff,ggg rrr,eee aaa,ooo ggg,aaa

在一行中的单独单元格中,由制表符分隔。我感谢任何帮助。

2 个答案:

答案 0 :(得分:0)

infile.csv:

title   "A" "B" "C" "D" "E" "F" 
number  "G1"    "G2"    "G3"    "G4"    "G5"    "G6"
data    "aaa,bbb"   "sss,ddd"   "fff,ggg"   "rrr,eee"   "aaa,ooo"   "ggg,aaa"

outfile.csv:

title   A   B   C   D   E   F   
number  G1  G2  G3  G4  G5  G6
data    aaa,bbb sss,ddd fff,ggg rrr,eee aaa,ooo ggg,aaa

代码:

In [40]: import csv

In [41]: with open('infile.csv') as infile, open('outfile.csv', 'w') as outfile:
   ....:     writer = csv.writer(outfile, delimiter='\t')
   ....:     for row in csv.reader(infile, delimiter='\t', quotechar='"'):
   ....:         writer.writerow(row)
   ....:         

答案 1 :(得分:0)

使用正则表达式

f=open('yoyr_file.txt','r')
f=f.readlines()
for x in f:
    print " ".join(re.findall('\w+,?\w*',x))

输出:

'title A B C D E F'
'number G1 G2 G3 G4 G5 G6'
'data aaa,bbb sss,ddd fff,ggg rrr,eee aaa,ooo ggg,aaa'

readlines()会将您的文件读作行列表,然后我循环查找该模式。当你得到模式时,你可以像你想要的那样格式化它。