我写了一个小工具,使用api从facebook收集数据。工具使用多处理,队列和httplib模块。这是代码的一部分:
主要流程:
def extract_and_save(args):
put_queue = JoinableQueue()
get_queue = Queue()
for index in range(args.number_of_processes):
process_name = u"facebook_worker-%s" % index
grabber = FacebookGrabber(get_queue=put_queue, put_queue=get_queue, name=process_name)
grabber.start()
friend_list = get_user_friends(args.default_user_id, ["id"])
for index, friend_id in enumerate(friend_list):
put_queue.put(friend_id)
put_queue.join()
if not get_queue.empty():
... save to database ...
else:
logger.info(u"There is no data to save")
工作流程:
class FacebookGrabber(Process):
def __init__(self, *args, **kwargs):
self.connection = httplib.HTTPSConnection("graph.facebook.com", timeout=2)
self.get_queue = kwargs.pop("get_queue")
self.put_queue = kwargs.pop("put_queue")
super(FacebookGrabber, self).__init__(*args, **kwargs)
self.daemon = True
def run(self):
while True:
friend_id = self.get_queue.get(block=True)
try:
friend_obj = self.get_friend_obj(friend_id)
except Exception, e:
logger.info(u"Friend id %s: facebook responded with an error (%s)", friend_id, e)
else:
if friend_obj:
self.put_queue.put(friend_obj)
self.get_queue.task_done()
常用代码:
def get_json_from_facebook(connection, url, kwargs=None):
url_parts = list(urlparse.urlparse(url))
query = dict(urlparse.parse_qsl(url_parts[4]))
if kwargs:
query.update(kwargs)
url_parts[4] = urllib.urlencode(query)
url = urlparse.urlunparse(url_parts)
try:
connection.request("GET", url)
except Exception, e:
print "<<<", e
response = connection.getresponse()
data = json.load(response)
return data
此代码完全适用于Ubuntu。但当我尝试在Windows 7上运行它时,我收到消息“没有要保存的数据”。问题出在这里:
try:
connection.request("GET", url)
except Exception, e:
print "<<<", e
我收到下一个错误:<<< a float is required
有谁知道,如何解决这个问题?
Python版本:2.7.5
答案 0 :(得分:2)
套接字超时值偶尔发生的“陷阱”之一是大多数操作系统都将它们视为浮点数。我相信这已经被Linux内核的更高版本所占用了。
尝试更改:
self.connection = httplib.HTTPSConnection("graph.facebook.com", timeout=2)
至:
self.connection = httplib.HTTPSConnection("graph.facebook.com", timeout=2.0)
顺便说一下,这是2秒。默认值通常为5秒。可能会有点低。