我正在尝试下载Twitter帐户和关注者信息,试图通过使用Networkx python包和Gaphi创建关系图来可视化数据。
import networkx as nx
import MySQLdb
conn = MySQLdb.connect(host="localhost", # your host, usually localhost
user="root", # your username
passwd="123456", # your password
db="twitterbank") # name of the data base
cur = conn.cursor()
def get_user_info(m):
cur.execute("SELECT tweeter_name FROM tweets_fetch where tweeter_id=%s" %m)
g=nx.Graph()
def add_node_tw(n,weight=None,time=None,location=None):
if not g.has_node(n):
screen_name=get_user_info(n)
g.add_node(n)
g.node[n]['weight']=1
g.node[n]["screen_name"]=screen_name
else:
g.node[n]['weight']+=1
def add_edge_tw(n1,n2,weight=None):
if not g.has_edge(n1,n2):
g.add_edge(n1,n2)
g[n1][n2]['weight']=1
else:
g[n1][n2]['weight']+=1
#generate set of users
users=set()
cur.execute("SELECT distinct tweeter_id FROM tweets_fetch")
cur.fetchall()
for row in cur:
users.add(row[0])
g=nx.DiGraph()
for u_id in users:
add_node_tw(u_id)
cur.execute("select * from tweeter_followers where tweeter_id=%s" %u_id)
cur.fetchall()
for row1 in cur:
if row1[0] in users:
add_node_tw(row1[0])
add_edge_tw(row1[0],row1[1])
nx.write_graphml(g,'relationship_graphml')
我用下载数据创建的两个表是:
tweets_fetch: with columns (tweeter_id, tweeter_name, tweet_content, datetime...)
tweeter_followers: with columns (tweeter_id, follower_id)
当我执行上面的代码时,错误弹出如下:
Traceback (most recent call last):
File "D:\Sepups\eclipse-SDK-3.7.1-win32- x86_64\eclipse\plugins\org.python.pydev_2.7.3.2013031601\pysrc\pydevd.py", line 1397, in <module>
debugger.run(setup['file'], None, None)
File "D:\Sepups\eclipse-SDK-3.7.1-win32-x86_64\eclipse\plugins\org.python.pydev_2.7.3.2013031601\pysrc\pydevd.py", line 1090, in run
pydev_imports.execfile(file, globals, locals) #execute the script
File "D:\java\python\workspace\tweetsHarvest\src\tweet_graph.py", line 47, in <module>
add_node_tw(u_id)
File "D:\java\python\workspace\tweetsHarvest\src\tweet_graph.py", line 24, in add_node_tw
g.node[n]['weight']+=1
KeyError: 'weight'
任何人都知道如何修复它?我真的是python和Gephi的新手。我在创建代码时提到的博客是http://giladlotan.com/blog/mapping-twitters-python-data-science-communities/
答案 0 :(得分:0)
我创建了一个基于相同代码的脚本,特别是使用一个数据集时出现了相同的错误。如果您遇到与我相同的问题,则数据中的一些行会出现问题。对我来说,它只是屈指可数的几千个边缘。要诊断出问题的位置,可以在add_edge_tw语句之前打印出每一行,并在add_edge_tw之前添加一个try / except子句。
我相信其他擅长Python和NetworkX的人可以提供更好的答案,但希望这有助于您在诊断时快速解决问题。