在python中使用for循环处理每一行,但只写第一行

时间:2014-08-27 02:56:12

标签: python csv read-write

我有一些代码,我正在尝试优化以提高效率。其中一部分是处理我的文件,在处理完每一行后,立即将其写入csv。这是理想的,因为我不是通过处理数据浪费内存,然后将数据加载到列表中以写出整个列表。如果我将整个处理过的数据添加到列表中,我可以毫不费力地将其写入csv,如下所示# write folded_data to csv下:

注意:#data处理下的代码是可靠的,我需要帮助写出处理过的每一行。

# data processing
seen = set()
folded_data = []
for u in name_nodes:
#    seen=set([u]) # print both u-v, and v-u
    seen.add(u) # don't print v-u
    unbrs = set(B[u])
    nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - seen
    for v in nbrs2:
        vnbrs = set(B[v])
        common = unbrs & vnbrs
        weight = len(common)
        row = u, v, weight
        folded_data.append(row)

# write folded_data to csv
with ('out_file.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerows(folded_data)

然而,当我尝试写出处理过的每个row时,我只得到'out_file.csv'中的第一行。

# data processing
seen = set()
for u in name_nodes:
    # seen=set([u]) # print both u-v, and v-u
    seen.add(u) # don't print v-u
    unbrs = set(B[u])   
    nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - seen
    for v in nbrs2:
        vnbrs = set(B[v])
        common = unbrs & vnbrs
        weight = len(common)
        row = u, v, weight
        # write row for each line to csv
        with open('out_file.csv', 'wb') as f:
            writer = csv.writer(f)
            writer.writerow(row)

我已经尝试过移动我的编写代码,以便按照我的意愿完成这项工作,但我无法弄清楚这一点。

3 个答案:

答案 0 :(得分:1)

我怀疑你是否正在获取第一个行,而你正在获取 last 行。对于您写出的每一行,您将重新打开该文件,删除以前的内容。将文件打开并将csv writer创建放在循环之外。

答案 1 :(得分:1)

除非您的程序要求(例如)大于系统内存的1/2,否则我不会担心“浪费”内存。如果您的CSV处于数千兆字节(或更大)范围内,那么这是一个问题。

如果你的csv不是那么大,你的文件最终会在内存中的OS文件缓存中结束,除非你有一些非标准的内核设置。

要以“高效”的方式(即不将数据显式存储在内存中),您需要在for循环之前打开文件。

答案 2 :(得分:0)

在@etep和@MarkRansom的帮助下弄明白了!我必须打开文件并在整个writer之前定义for-loop

# open file and define writer
with open('out_file.csv', 'wb') as f:
    writer = csv.writer(f)

    # data processing
    seen = set()
    for u in name_nodes:
    #    seen=set([u]) # print both u-v, and v-u
        seen.add(u) # don't print v-u
        unbrs = set(B[u])
        nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - seen
        for v in nbrs2:
            vnbrs = set(B[v])
            common = unbrs & vnbrs
            weight = len(common)
            row = u, v, weight
            # write row for each record
            writer.writerow(row)