我试图查看一组给定用户后面的句柄中的社区。我已经收集了给定用户组后面的所有句柄,并将它们修剪成最相关的用户(即删除了那些只有极少数跟随者的那些基本上是噪音的人)。在这种情况下,给定用户的数量是69,我尝试聚类的句柄数是435。
我正在使用NetworkX构建此网络的图表。每个跟随的句柄构成一个节点,并且给定用户后面的句柄的所有成对组合是无向边缘。例如,给定user1跟随句柄1,句柄2和句柄3:handle1,handle2和handle3是节点,handle1-handle2,handle1-handle3,handle2-handle3是边缘。我最终得到435个节点和81,182个边缘。
然后我将此图表导出到Gephi进行分析,但它似乎过于互联,无法提取任何有趣的内容。找到图形的模块化只会产生两个庞大且无用的社区。我已经尝试了各种加权边和节点的方法,但似乎无法获得任何有意义的东西。也许我需要确定其中一些边缘是无关紧要的,但我不确定如何。当我查看每个节点的边缘时,具有最高权重的那些实际上是最密切相关的句柄,但这在模块化分析中没有实现。
我的代码如下,任何人都可以提供有关我如何在这里找到社区的指导吗?
# build a network of followedUsers
followedUsersGraph = nx.Graph()
# followedUsersSorted is a Pandas series of handles followed with userid
# as the id and the number of users from the set following as the value
for i, user in enumerate(followedUsersSorted.iteritems()):
followedUsersGraph.add_node(i)
followedUsersGraph.node[i]['user'] = str(user[0])
followedUsersGraph.node[i]['weight'] = int(user[1])
# followedUsersMatrix is a Pandas DataFrame acting as a binary matrix
# with rows of given users and columns of followed handles
# convert the column labels to node ids
followedUsersMatrix.columns = range(len(followedUsersMatrix.columns))
# convert the matrix into tuple of edge tuples w/ weight
edgeTuples = []
for _, vector in followedUsersMatrix.iterrows():
# each user from the set can only provide a total weight of 1
# the more handles they follow the less weight they contribute to the edge
edges = [t for t in combinations(vector[vector != 0].index, 2)]
weight = 1.0/len(edges)
edgeTuples.extend([(edge, weight) for edge in edges])
# add the edges to the graph incrementing the weight for repeated edges
for edge, weight in edgeTuples:
if followedUsersGraph.has_edge(*edge):
followedUsersGraph[edge[0]][edge[1]]['weight'] += weight
else:
followedUsersGraph.add_edge(*edge, weight = weight)
nx.write_gexf(followedUsersGraph, 'fug.gexf')