Question

我试图查看一组给定用户后面的句柄中的社区。我已经收集了给定用户组后面的所有句柄，并将它们修剪成最相关的用户（即删除了那些只有极少数跟随者的那些基本上是噪音的人）。在这种情况下，给定用户的数量是69，我尝试聚类的句柄数是435。

我正在使用NetworkX构建此网络的图表。每个跟随的句柄构成一个节点，并且给定用户后面的句柄的所有成对组合是无向边缘。例如，给定user1跟随句柄1，句柄2和句柄3：handle1，handle2和handle3是节点，handle1-handle2，handle1-handle3，handle2-handle3是边缘。我最终得到435个节点和81,182个边缘。

然后我将此图表导出到Gephi进行分析，但它似乎过于互联，无法提取任何有趣的内容。找到图形的模块化只会产生两个庞大且无用的社区。我已经尝试了各种加权边和节点的方法，但似乎无法获得任何有意义的东西。也许我需要确定其中一些边缘是无关紧要的，但我不确定如何。当我查看每个节点的边缘时，具有最高权重的那些实际上是最密切相关的句柄，但这在模块化分析中没有实现。

我的代码如下，任何人都可以提供有关我如何在这里找到社区的指导吗？

# build a network of followedUsers
followedUsersGraph = nx.Graph()

# followedUsersSorted is a Pandas series of handles followed with userid
# as the id and the number of users from the set following as the value
for i, user in enumerate(followedUsersSorted.iteritems()):
    followedUsersGraph.add_node(i)
    followedUsersGraph.node[i]['user'] = str(user[0])
    followedUsersGraph.node[i]['weight'] = int(user[1])

# followedUsersMatrix is a Pandas DataFrame acting as a binary matrix
# with rows of given users and columns of followed handles
# convert the column labels to node ids
followedUsersMatrix.columns = range(len(followedUsersMatrix.columns))

# convert the matrix into tuple of edge tuples w/ weight
edgeTuples = []
for _, vector in followedUsersMatrix.iterrows():
    # each user from the set can only provide a total weight of 1
    # the more handles they follow the less weight they contribute to the edge
    edges = [t for t in combinations(vector[vector != 0].index, 2)]
    weight = 1.0/len(edges)
    edgeTuples.extend([(edge, weight) for edge in edges])

# add the edges to the graph incrementing the weight for repeated edges
for edge, weight in edgeTuples:
    if followedUsersGraph.has_edge(*edge):
        followedUsersGraph[edge[0]][edge[1]]['weight'] += weight
    else:
        followedUsersGraph.add_edge(*edge, weight = weight)

nx.write_gexf(followedUsersGraph, 'fug.gexf')

使用NetworkX和Gephi在高度连接的网络中查找社区

0 个答案: