我有一个包含百万网址的文件,例如: 数据文件如:
http://wonderland.cjfallon.ie/
http://www.youtube.com/
http://www.starfall.com/
http://education.scholastic.co.uk/
http://www.scoilnet.ie/
http://www.nessy.com/
http://www.senteacher.org/
http://scoop.it/
http://www.moviemaker.com/
http://learni.st/
http://www.twitter.com/
http://www.facebook.com/
http://www.gutenberg.org/
http://www.gutenberg.org/cache/epub/42361/pg42361.txt
我想抓取它们,所以绑定是网络IO,所以我想使用多个线程或gevent来解决它。
我的多线程代码适用于: https://gist.github.com/young001/5449751
但使用gevent时, 代码是:https://gist.github.com/young001/baa3eebbf7342c5ac077 它总是出错:
status is 200
status is 200
Internal error in evhttp
the url is down http://web2.socialcomputingmagazine.com/the_social_graph_issues_and_strategies_in_2008.htm
the reason
status is 200
status is 200
status is 200
status is 200
status is 200
status is 200
status is 301
status is 200
status is 301
status is 200
status is 200
Internal error in evhttp
然后它停滞不前。 我不知道为什么会这样呢?
任何帮助?
似乎所有人都应该顺利,但事实并非如此,这让我很疯狂。答案 0 :(得分:1)
我可以在修好你的样品后在这里重现它。
基本上this seems to be a gevent bug有时会Internal error in evhttp
。
# sometimes this happens, don't know why
sys.stderr.write("Internal error in evhttp\n")
您必须调试或使用其他内容,或者只是在失败时重试。