Question

import requests
import urllib3
from time import sleep
from sys import argv
script, filename = argv
http = urllib3.PoolManager()

datafile = open('datafile.txt','w')
crawl = ""

with open(filename) as f:
mylist = f.read().splitlines()

def crawlling(x):
    for i in mylist:
        domain = ("http://" + "%s") % i
        crawl = http.request('GET','%s',preload_content=False) % domain
        for crawl in crawl.stream(32):
            print crawl
            sleep(10)
            crawl.release_conn()
            datafile.write(crawl.status)
            datafile.write('>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n')
            datafile.write(crawl.data)
            datafile.close()
return x


crawlling(crawl)

_______________________________________________________________________
Extract of domain.txt file:
fjarorojo.info
buscadordeproductos.com

我是python的新手，所以请耐心等待：我正在尝试从URL获取内容，但这会引发错误。此外，它在浏览器中运行良好。脚本的对象是从domain.txt文件中获取数据并对其进行迭代并获取内容并将其保存在文件中。

Getting this error: 
  raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='%s',
port=80):     Max retries exceeded with url: / (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 
0x7ff45e4f9cd0>: Failed to establish a new connection: [Errno -2] Name or 
service not known',))

Answer 1

这一行是问题所在：

crawl = http.request('GET','%s',preload_content=False) % domain

现在您正在尝试向域%s发送请求，该域crawl = http.request('GET', '%s' % domain, preload_content=False)不是有效域，因此错误＆＃34;名称或服务未知＆＃34;。

应该是：

crawl = http.request('GET', domain, preload_content=False)

或更简单：

    for crawl in crawl.stream(32):
        print crawl
        sleep(10)
        crawl.release_conn() # <--

此外，与您发布的错误无关，这些行也可能会导致问题：

您在循环中释放连接，因此循环将无法在第二次迭代中产生预期结果。相反，您应该只在完成请求后才释放连接。 More details here

Python：HTTPConnectionPool（host ='％s'，port = 80）：

1 个答案: