Question

我正在编写python脚本来打开url，读取html页面的内容并搜索一些特定的字符串。它对于少数几个网址工作正常，但是当我尝试使用10个网址时。我收到错误消息：

urllib2.URLError：

我正在文件中写入多个网址，然后在浏览器中打开它们

我在这里使用python 2.7和linux设备

我可以在我的窗户上打开网站，尽管需要花费一些时间来加载。

这是我的一部分代码：

import re
import sys
import os

import webbrowser

import urllib

import requests

import subprocess

from bs4 import BeautifulSoup

import urllib2

for urls in lines:

   req = urllib2.Request(urls, headers={ 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11','Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Connection': 'keep-alive'})

      try:

        response = urllib2.urlopen(req).read()
        soup = BeautifulSoup(response)

        for script in soup(["script", "style"]):
            script.extract()    # rip it out
        text = soup.get_text()
        lines = (line.strip() for line in text.splitlines())
        chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
        text = '\n'.join(chunk for chunk in chunks if chunk)

      ##After this here I am fetching strings using regex on text variable####

      except urllib2.HTTPError, e:
        print e.fp.read()

错误：urllib2.URLError：<urlopen错误[Errno 110]连接超时>

0 个答案: