Question

如果有人可以帮我多线程化这个脚本并将输出写入文本文件，那将会很棒。我是编码的新手，所以请帮帮我。

#!/usr/bin/python

from tornado import ioloop, httpclient
from BeautifulSoup import BeautifulSoup
from mechanize import Browser
import requests
import urllib2
import socket
import sys

def handle_request(response):
    print response.code

global i

i = 0
i -= 1
if i == 0:
    http_client = httpclient.AsyncHTTPClient()
for url in open('urls.txt'):
    try:
        br = Browser()
        br.set_handle_robots(False)
        res = br.open(url, None, 2.5)
        data = res.get_data()
        soup = BeautifulSoup(data)
        title = soup.find('title')
        if soup.title != None:
            print url, title.renderContents(), '\n'
        i += 1
    except urllib2.URLError, e:
        print "Oops, timed out?", '\n'
    except socket.error,e:
        print "Oops, timed out?", '\n'
    except socket.timeout:
        print "Oops, timed out?", '\n'
print 'Processing of list completed, Cheers!!'
sys.exit()
try:
    ioloop.IOLoop.instance().start()
except KeyboardInterrupt:
    ioloop.IOLoop.instance().stop()

我正在尝试grep主机列表的HTTP标题。

Answer 1

您已经实施的基本想法是an non-blocking HTTP client.

def handle_request(response):
    if response.error:
        print "Error:", response.error
    else:
        print response.body

for url in ["http://google.com", "http://twitter.com"]:
    http_client = httpclient.AsyncHTTPClient()
    http_client.fetch(url, handle_request)

您可以遍历您的网址，并且只要特定网址的响应变为可用，就会立即调用回调。

我不会混淆机械，ioloop，......如果没有必要的话。

除此之外，我建议grequests。它是一种轻量级工具，可满足您的要求。

import grequests
from bs4 import BeautifulSoup

urls = ['http://google.com', 'http://www.python.org/']

rs = (grequests.get(u) for u in urls)
res = grequests.map(rs)

for r in res:
    soup = BeautifulSoup(r.text)
    print "%s: %s" % (r.url, soup.title.text)

Python这个脚本可以多线程吗？

1 个答案: