我有一些代码,我正在尝试优化以提高效率。其中一部分是处理我的文件,在处理完每一行后,立即将其写入csv
。这是理想的,因为我不是通过处理数据浪费内存,然后将数据加载到列表中以写出整个列表。如果我将整个处理过的数据添加到列表中,我可以毫不费力地将其写入csv
,如下所示# write folded_data to csv
下:
注意:#data处理下的代码是可靠的,我需要帮助写出处理过的每一行。
# data processing
seen = set()
folded_data = []
for u in name_nodes:
# seen=set([u]) # print both u-v, and v-u
seen.add(u) # don't print v-u
unbrs = set(B[u])
nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - seen
for v in nbrs2:
vnbrs = set(B[v])
common = unbrs & vnbrs
weight = len(common)
row = u, v, weight
folded_data.append(row)
# write folded_data to csv
with ('out_file.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(folded_data)
然而,当我尝试写出处理过的每个row
时,我只得到'out_file.csv'中的第一行。
# data processing
seen = set()
for u in name_nodes:
# seen=set([u]) # print both u-v, and v-u
seen.add(u) # don't print v-u
unbrs = set(B[u])
nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - seen
for v in nbrs2:
vnbrs = set(B[v])
common = unbrs & vnbrs
weight = len(common)
row = u, v, weight
# write row for each line to csv
with open('out_file.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerow(row)
我已经尝试过移动我的编写代码,以便按照我的意愿完成这项工作,但我无法弄清楚这一点。
答案 0 :(得分:1)
我怀疑你是否正在获取第一个行,而你正在获取 last 行。对于您写出的每一行,您将重新打开该文件,删除以前的内容。将文件打开并将csv writer创建放在循环之外。
答案 1 :(得分:1)
除非您的程序要求(例如)大于系统内存的1/2,否则我不会担心“浪费”内存。如果您的CSV处于数千兆字节(或更大)范围内,那么这是一个问题。
如果你的csv不是那么大,你的文件最终会在内存中的OS文件缓存中结束,除非你有一些非标准的内核设置。
要以“高效”的方式(即不将数据显式存储在内存中),您需要在for循环之前打开文件。
答案 2 :(得分:0)
在@etep和@MarkRansom的帮助下弄明白了!我必须打开文件并在整个writer
之前定义for-loop
。
# open file and define writer
with open('out_file.csv', 'wb') as f:
writer = csv.writer(f)
# data processing
seen = set()
for u in name_nodes:
# seen=set([u]) # print both u-v, and v-u
seen.add(u) # don't print v-u
unbrs = set(B[u])
nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - seen
for v in nbrs2:
vnbrs = set(B[v])
common = unbrs & vnbrs
weight = len(common)
row = u, v, weight
# write row for each record
writer.writerow(row)