每次尝试写入csv时都会出现内存错误。所以前5 GB的数据工作正常,但后来我得到了内存错误。
我不知道为什么,因为我每次都要从记忆中清除我的元素,所以它不应该发生。
def writeDataCSV(file):
try:
with open('Data/csv/'+file+'.csv','w') as fp:
for evt, elem in iterparse('dumpData/'+str(filename)+'.xml', events=('end',)):
if elem.tag == 'row':
element_fields = elem.attrib
data = []
if(file== "Comments"):
data = commentsXML(element_fields)
wr = csv.writer(fp, dialect='excel')
wr.writerow(data)
elem.clear()
fp.close
except UnicodeEncodeError as uniError:
print(uniError)
try:
if(file== "Comments"):
df = pd.read_csv('Data/csv/Comments.csv', names=["Id","PostId","Score","Text","Date","Time","UserID"])
df.to_csv("Data/csv/Comments.csv")
except UnicodeDecodeError as uniDeError:
print(uniDeError)
的MemoryError
答案 0 :(得分:0)
你的功能内部有太多的责任,难以阅读,难以调试,一般而言,不是一个可以遵循的例子。
我避免内存错误的最佳猜测是将代码的读写部分分离为自己的函数,格式为:
import csv
# FIXME: iterparse, commentsXML are some global functions
def get_data(filename):
for evt, elem in iterparse('dumpData/'+str(filename)+'.xml', events=('end',)):
if elem.tag == 'row':
yield commentsXML(elem.attrib)
def save_stream_to_csv_file(gen, target_csv_filename):
with open('Data/csv/'+target_csv_filename+'.csv','w') as fp:
wr = csv.writer(fp, dialect='excel')
for data in gen:
wr.writerow(data)
gen = get_data('your_source_filename')
save_stream_to_csv_file(gen, 'your_target_filename')
# WONTFIX: 'dumpData/'+str(filename)+'.xml' and
# 'Data/csv/'+target_csv_filename+'.csv' are a bit ugly
# os.join() and .format() highly welcome