在python中写入CSV之前过滤列表

时间:2014-08-21 01:29:32

标签: python list csv networkx bipartite

我创建了一个将二分边缘列表投影到单一模式边缘列表中的函数,一切正常。但是,我现有的计划是将所有这些边添加到列表中,然后将该列表加载到pandas数据帧中,并根据边权重过滤列表以创建新的数据帧,然后将这些数据帧写入csv。

这很有效,直到我的数据太大而无法保存在RAM中。

我认为我不应该将单一模式边缘列表添加到列表中,而应该只将folded的内容写入CSV,并且甚至可以跳过将该数据添加到列表中。我还希望过滤我写入CSV的内容,只写重量大于或等于2的行。

数据:

E1,Brenda Rogers
E1,Evelyn Jefferson
E1,Laura Mandeville
E10,Nora Fayette
E10,Helen Lloyd
E10,Katherina Rogers
E10,Myra Liddel
E10,Sylvia Avondale
E11,Flora Price
E11,Nora Fayette
E11,Helen Lloyd
E11,Olivia Carleton
E12,Nora Fayette
E12,Verne Sanderson
E12,Helen Lloyd
E12,Katherina Rogers
E12,Myra Liddel
E12,Sylvia Avondale
E13,Nora Fayette
E13,Katherina Rogers
E13,Sylvia Avondale
E14,Nora Fayette
E14,Katherina Rogers
E14,Sylvia Avondale
E2,Evelyn Jefferson
E2,Laura Mandeville
E2,Theresa Anderson
E3,Brenda Rogers
E3,Charlotte McDowd
E3,Frances Anderson
E3,Evelyn Jefferson
E3,Laura Mandeville
E3,Theresa Anderson
E4,Brenda Rogers
E4,Charlotte McDowd
E4,Evelyn Jefferson
E4,Theresa Anderson
E5,Brenda Rogers
E5,Charlotte McDowd
E5,Frances Anderson
E5,Evelyn Jefferson
E5,Ruth DeSand
E5,Eleanor Nye
E5,Laura Mandeville
E5,Theresa Anderson
E6,Brenda Rogers
E6,Nora Fayette
E6,Frances Anderson
E6,Evelyn Jefferson
E6,Eleanor Nye
E6,Laura Mandeville
E6,Pearl Oglethorpe
E6,Theresa Anderson
E7,Brenda Rogers
E7,Charlotte McDowd
E7,Nora Fayette
E7,Verne Sanderson
E7,Ruth DeSand
E7,Helen Lloyd
E7,Eleanor Nye
E7,Laura Mandeville
E7,Sylvia Avondale
E7,Theresa Anderson
E8,Brenda Rogers
E8,Verne Sanderson
E8,Frances Anderson
E8,Dorothy Murchison
E8,Evelyn Jefferson
E8,Ruth DeSand
E8,Helen Lloyd
E8,Eleanor Nye
E8,Katherina Rogers
E8,Laura Mandeville
E8,Myra Liddel
E8,Pearl Oglethorpe
E8,Sylvia Avondale
E8,Theresa Anderson
E9,Flora Price
E9,Nora Fayette
E9,Verne Sanderson
E9,Dorothy Murchison
E9,Evelyn Jefferson
E9,Ruth DeSand
E9,Olivia Carleton
E9,Katherina Rogers
E9,Myra Liddel
E9,Pearl Oglethorpe
E9,Sylvia Avondale
E9,Theresa Anderson

如何更改我的代码以直接写入CSV并跳过将边添加到折叠列表中,但只有权重大于或等于3的边?

下面是代码,它将所有边添加到列表中,然后将列表写入CSV:

import csv
import networkx as nx
from networkx.algorithms import bipartite

def fold_network(input_file):

    # load text file into a dict with head as keys
    header = ['Event','Name']        
    rawData = [{key: value for (key, value) in zip(header, line.strip().split(','))} for line in open(input_file)]

    # create edgelist for Name -x- Event relationships
    edgelist = []
    for i in rawData:
        edgelist.append(
        (i['Event'],
        i['Name'])    
        )

    # create a unique list of Name and Event for nodes
    Event = sorted(set([i['Event'] for i in rawData]))
    Name = sorted(set([i['Name'] for i in rawData]))

    # add nodes and edges to a graph
    B = nx.Graph()
    B.add_nodes_from(Event, bipartite=0)
    B.add_nodes_from(Name, bipartite=1)
    B.add_edges_from(edgelist)

    # create bipartite projection graph
    name_nodes, event_nodes = bipartite.sets(B)
    event_nodes = set(n for n,d in B.nodes(data=True) if d['bipartite']==0)
    name_nodes = set(B) - event_nodes

    # project graph and write projected graph's edgelist to a list
    seen = set()
    folded = []
    for u in name_nodes:
    #    seen=set([u]) # print both u-v, and v-u
        seen.add(u) # don't print v-u
        unbrs = set(B[u])
        nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - seen
        for v in nbrs2:
            vnbrs = set(B[v])
            common = unbrs & vnbrs
            weight = len(common)
            row = u, v, weight
            folded.append(row)

    # write folded list containing only edges with weight greater than or equal to 3 to CSV
    for i in folded:
        if i[2] >= 3:
            with open('outfile.csv', 'wb') as f:
                csv.writer(f).writerows(i)

1 个答案:

答案 0 :(得分:1)

好吧,主要问题的答案(有一个很好的理由你应该将问题限制在一个问题上)很简单 - 你只需要重新编写这一小段代码:

    for v in nbrs2:
        vnbrs = set(B[v])
        common = unbrs & vnbrs
        weight = len(common)
        row = u, v, weight
        folded.append(row)

类似于:

    for v in nbrs2:
        vnbrs = set(B[v])
        common = unbrs & vnbrs
        weight = len(common)
        row = u, v, weight
        f = open('outfile.csv', 'a')
        f.write(row)
        f.close()

当然,您必须相应地格式化行,并且您可能不需要为每行打开和关闭文件句柄,但是使用这种方法您不必在内存中构建大量数据你不需要的。