urllib.error.URLError未知网址类型'https>

时间:2018-04-27 16:51:16

标签: python https urllib

我使用这个脚本来解析一个站点并下载文件,但它不断返回相同的错误。我假设 urllib.parse.encode 和urllib.parse.urljoin,但不清楚我将如何或在何处使用它。

我重新安装了python 3.4,3.6和pycharm,无法安装Openssl。

import bs4 as bs
import urllib.request
from urllib.parse import urlparse, urljoin, urlencode
import lxml
import os

class tools():
    def get_page(*args):
        headers = headers = {}
        headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
        req = urllib.request.Request(args, headers=headers)
        resp = urllib.request.urlopen(req)
        respData = resp.read()
        return respData


class Advisorshares():
    def productscreener():
        '''Creates a csv of the list of ets advisor shares holds'''
        url = ('https://www.advisorshares.com/etfs')
        soup = bs.BeautifulSoup(tools.get_page(url), "lxml")
        table = soup.find('table')
        links = []
        tickers = []

        for i in range(0,len(table.find_all('a')),2):
            tag = table.find_all('a')[num]
            links.append(tag.get('href'))
            tickers.append(tag.text)

    def download():
        Advisorshares.productscreener()
        os.cwd('/')
        for i in tickers:
            base = urlencode('http://www.advisorshares.com/holdings-file/')
            urllib.request.urlretrieve(base + i, i + '.csv')


Advisorshares.download()
  
    

回溯(最近一次调用最后一次):文件“C:\ Program Files \ JetBrains \ PyCharm 2018.1.2 \ helpers \ pydev \ pydev_run_in_console.py”,第52行,在run_file Python 3.6.2中(v3.6.2:5fd33b5 ,2017年7月8日,04:57:36)[MSC v.1900 64位(AMD64)]在win32 pydev_imports.execfile(文件,全局,本地)#执行脚本文件“C:\ Program Files \ JetBrains \ PyCharm 2018.1 .2 \ helpers \ pydev_pydev_imps_pydev_execfile.py“,第18行,在execfile exec中(编译(内容+”\ n“,文件,'exec'),glob,loc)     文件“C:/Users/HP/Desktop/webscrapper/venv/src/webscrapper.py”,第91行,Advisorshares.download()     文件“C:/Users/HP/Desktop/webscrapper/venv/src/webscrapper.py”,第84行,下载Advisorshares.productscreener()     文件“C:/Users/HP/Desktop/webscrapper/venv/src/webscrapper.py”,第73行,在productscreener soup = bs.BeautifulSoup(tools.get_page(url),“lxml”)文件“C:/ Users /HP/Desktop/webscrapper/venv/src/webscrapper.py“,第14行,在get_page resp = urllib.request.urlopen(req)文件”C:\ Program> Files \ Python36 \ Lib \ urllib \ request.py “,第223行,在urlopen中返回opener.open(url,data,timeout)     文件“C:\ ProgramFiles \ Python36 \ Lib \ urllib \ request.py”,第526行,打开响应= self._open(req,data)文件“C:\ Program Files \ Python36 \ Lib \ urllib \ request.py “,第549行,在_open'unknown_open',req)     文件“C:\ Program Files \ Python36 \ Lib \ urllib \ request.py”,第504行,在_call_chain中结果= func(* args)文件“C:\ Program Files \ Python36 \ Lib \ urllib \ request.py”,第1388行,在unknown_open中提升URLError('未知网址类型:%s'%类型)urllib.error.URLError:

  

1 个答案:

答案 0 :(得分:0)

您的问题是您正在为model = Sequential() model.add(LSTM(64, return_sequences=True, input_shape=(None, x_train.shape[2]))) model.add(LSTM(128, return_sequences=True)) model.add(LSTM(128, return_sequences=True)) model.add(LSTM(64, return_sequences=True)) model.add(LSTM(n_feats, return_sequences=True)) model.compile(loss='mse', optimizer='adam') 提供一个元组,loss: 0.0389 - val_loss: 0.0437 无法解释这个元组。你所要做的就是改变这一行:

get_page

到这一行

urllib.request.Request

因此req = urllib.request.Request(args, headers=headers) 将被解释为req = urllib.request.Request(*args, headers=headers)

的参数