写入巨大的CSV文件时出现MemoryError

时间:2018-05-29 22:00:15

标签: python python-3.x pandas csv exception

每次尝试写入csv时都会出现内存错误。所以前5 GB的数据工作正常,但后来我得到了内存错误。

我不知道为什么,因为我每次都要从记忆中清除我的元素,所以它不应该发生。

def writeDataCSV(file):
    try:
        with open('Data/csv/'+file+'.csv','w') as fp:
            for evt, elem in iterparse('dumpData/'+str(filename)+'.xml', events=('end',)):
                if elem.tag == 'row':
                    element_fields = elem.attrib
                    data = []

                    if(file== "Comments"):
                        data = commentsXML(element_fields)
                        wr = csv.writer(fp, dialect='excel')
                        wr.writerow(data)
                        elem.clear()
        fp.close
    except UnicodeEncodeError as uniError:
        print(uniError)
    try:
        if(file== "Comments"):
            df = pd.read_csv('Data/csv/Comments.csv', names=["Id","PostId","Score","Text","Date","Time","UserID"])
            df.to_csv("Data/csv/Comments.csv")

     except UnicodeDecodeError as uniDeError:
        print(uniDeError)

的MemoryError

1 个答案:

答案 0 :(得分:0)

你的功能内部有太多的责任,难以阅读,难以调试,一般而言,不是一个可以遵循的例子。

我避免内存错误的最佳猜测是将代码的读写部分分离为自己的函数,格式为:

import csv

# FIXME: iterparse, commentsXML are some global functions

def get_data(filename):
    for evt, elem in iterparse('dumpData/'+str(filename)+'.xml', events=('end',)):
        if elem.tag == 'row':
            yield commentsXML(elem.attrib)

def save_stream_to_csv_file(gen, target_csv_filename):
    with open('Data/csv/'+target_csv_filename+'.csv','w') as fp:
        wr = csv.writer(fp, dialect='excel')
        for data in gen:
           wr.writerow(data)

gen = get_data('your_source_filename')
save_stream_to_csv_file(gen, 'your_target_filename')

# WONTFIX: 'dumpData/'+str(filename)+'.xml' and 
#          'Data/csv/'+target_csv_filename+'.csv' are a bit ugly  
#           os.join() and .format() highly welcome