图表的节点计数不匹配

时间:2012-04-23 09:34:32

标签: python mongodb social-networking data-visualization networkx

我有一个MDB数据库,其中包含有关论坛帖子的以下属性:

thread
author (posted in the thread)
children (a list of authors who replied to the post)
child_count (number of children in the list)

我正在尝试使用以下节点构建图表:

thread
author
child authors

我的数据库中的不同作者总数超过30,000,但生成作者计数的图表大约为3000.或者,在总共33000个节点中,以下代码生成大约5000个。这里发生了什么?

for doc in coll.find():

    thread = doc['thread'].encode('utf-8')
    author_parent = doc['author'].encode('utf-8')
    children = doc['children']
    children_count = len(children)
    #print G.nodes()

    #print post_parent, author, doc['thread']
    try:
        if thread in G:
            continue
        else:
            G.add_node(thread, color='red')
            thread_count+=1


        if author_parent in G:
            G.add_edge(author_parent, thread)
        else:
            G.add_node(author_parent, color='green')
            G.add_edge(author_parent, thread, weight=0)
            author_count+=1


        if doc['child_count']!=0:          
            for doc in children:
                if doc['author'].encode("utf-8") in G:
                    print doc['author'].encode("utf-8"), 'in G'
                    G.add_edge(doc['author'].encode("utf-8"), author_parent)
                else:
                    G.add_node(doc['author'].encode("utf-8"),color='green')
                    G.add_edge(doc['author'].encode("utf-8"), author_parent, weight=0)
                    author_count+=1     

    except:
        print "failed"
        nx.write_dot(G,PATH)

    print thread_count, author_count, children_count

1 个答案:

答案 0 :(得分:1)

我得到了答案。 continue语句跳到下一次迭代,所以我以这种方式丢失了许多节点。