Python使用列表中的多个URL执行请求

时间:2019-10-20 13:22:28

标签: python web-scraping urllib

我是python的新手。

我已经创建了一个包含URL的列表,并且我想为列表中的所有URL做urllib.request。我的列表当前有5个URL,但是我一次只能请求一个索引urlib.Request(List[0]),如果我urlib.Request(List[0:4])会出现错误

Traceback (most recent call last):
  File "c:/Users/Farzad/Desktop/Python/Webscraping/Responseheaderinfo.py", line 22, in <module>
    response = urllib.urlopen(request)
  File "C:\Users\Farzad\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\Farzad\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)
  File "C:\Users\Farzad\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 548, in _open
    'unknown_open', req)
  File "C:\Users\Farzad\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Users\Farzad\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 1387, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: ['http>

import urllib.request as urllib
import socket
import pyodbc
from datetime import datetime
import ssl
import OpenSSL


List = open("C:\\Users\\Farzad\\Desktop\\hosts.txt").read().splitlines()

length = len(List)
for i in range(length): 
    print(List) 

request = urllib.Request(List[0])
request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36')
response = urllib.urlopen(request)
rdata = response.info()
ipaddr = socket.gethostbyname(request.origin_req_host)

1 个答案:

答案 0 :(得分:0)

代码可能如下:

import urllib.request as urllib
import socket
import pyodbc
from datetime import datetime
import ssl
import OpenSSL
import logging
from celery.app.log import Logging

List = open("C:\\Users\\Farzad\\Desktop\\hosts.txt").read().splitlines()

length = len(List)
for url in List:
    print(url)

    try:
        request = urllib.Request(url)
        request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36')
        response = urllib.urlopen(request)
        rdata = response.info()
        ipaddr = socket.gethostbyname(request.origin_req_host)
    except Exception as e:
        print(logging.traceback.format_exc())