实际上,下面的代码可以成功地将.tsv
文件转换为.csv
文件,但是,当文件很大(超过1GB)时,它有一个MemoryError
个read
函数。
import re
tsv = open('tsv.tsv', 'r')
fileContent = tsv.read()
fileContent = re.sub("\t", ",", fileContent) # convert from tab to comma
csv_file = open("csv.csv", "w")
csv_file.write(fileContent)
csv_file.close()
我知道通过阅读大文件,我可以使用以下代码:
with open("data.txt") as myfile:
for line in myfile:
但我不知道如何将这两个代码合并为一个并正常工作以将大尺寸.tsv文件转换为.csv文件
答案 0 :(得分:2)
直接将两个片段粘在一起:
with open("data.txt", 'r') as myfile:
with open("csv.csv", 'w') as csv_file:
for line in myfile:
fileContent = re.sub("\t", ",", line)
csv_file.write(fileContent)
答案 1 :(得分:0)
对于大文件使用pandas,而不是纯Python:
import pandas as pd
dfs = pd.read_csv('file.tsv', sep='\t', chunksize=50)
for df in dfs:
df.to_csv('file.csv', sep=',', mode='a')