TSV到CSV转换Python

时间:2017-05-29 12:31:35

标签: python csv type-conversion

我想将此file.tsv转换为csv 转换效果很好但是字段的分离并不是很好 这是file.tsv

protein1 protein2 neighborhood neighborhood_transferred fusion cooccurence homology coexpression coexpression_transferred experiments experiments_transferred database database_transferred textmining textmining_transferred combined_score
9606.ENSP00000003084 9606.ENSP00000301645 0 0 0 0 0 0 0 0 0 0 0 163 129 239

这是第一行结果file.csv

"protein1 protein2 neighborhood neighborhood_transferred fusion cooccurence homology coexpression coexpression_transferred experiments experiments_transferred database database_transferred textmining textmining_transferred combined_score"
"9606.ENSP00000003084 9606.ENSP00000301645 0 0 0 0 0 0 0 0 0 0 0 163 129 239"

这是代码

import csv


print(csv.list_dialects())


with open('File.tsv', 'r', encoding='utf-8', newline='') as fin, \
     open('file2.csv', 'w', encoding='utf-8', newline='') as fout: 

     reader = csv.reader(fin, dialect='excel-tab')
     writer = csv.writer(fout, delimiter=' ')    

     for row in reader:
         writer.writerow(row)

问题是代码没有使用空格分隔字段,它将整个标题排成一行
  期望的结果是分离应该是我放逗号的地方 protein1,protein2,neighborhood,neighborhood_transferred,fusion,cooccurence homology,coexpression,coexpression_transferred,experimental experiments_transferred,database,database_transferred,textmining,textmining_transferred,combined_score 9606.ENSP00000003084,9606.ENSP00000301645,0,0,0,0,0,0,0,0,0,0,0,163,129,239

1 个答案:

答案 0 :(得分:0)

编辑:在与OP交换评论后重写。

指定输入以期望输入中的制表符为分隔符:

reader = csv.reader(fin, dialect='excel-tab')

但是没有标签,有空格,所以:

reader = csv.reader(fin, delimiter=' ')

请注意,这会将2个连续的空格视为两个分隔符,它们之间带有空字段。您无法以Excel的方式指定忽略重复分隔符