Question

运行我的脚本后，我注意到我的＆＃34; parse_doc＆＃34;函数在找到任何网址时抛出错误无。事实证明，我的＆＃34; process_doc＆＃34;函数应该产生25个链接，但它只产生19个，因为很少有页面没有任何链接指向另一个页面。但是，当我的第二个函数收到带有None值的链接时，它会产生指示＆＃34; MissingSchema＆＃34;的错误。如何解决这个问题，以便当它找到任何没有值的链接时，它将转向另一个。以下是我脚本的部分内容，它可以让您了解我的意思：

def process_doc(medium_link):

    page = requests.get(medium_link).text
    tree = html.fromstring(page)
    try:
        name = tree.xpath('//span[@id="titletextonly"]/text()')[0]
    except IndexError:
        name = ""
    try:
        link = base + tree.xpath('//section[@id="postingbody"]//a[@class="showcontact"]/@href')[0]
    except IndexError:
        link = ""

    parse_doc(name, link)   "All links get to this function whereas some links are with None value

def parse_doc(title, target_link):
    page = requests.get(target_link).text   # Error thrown here when it finds any link with None value

    tel = re.findall(r'\d{10}', page)[0] if re.findall(r'\d{10}', page) else ""
    print(title, tel)

我得到的错误：

raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL '': No schema supplied. Perhaps you meant http://?

顺便说一下，在我的第一个函数中有一个名为＆＃34; base＆＃34;的变量。用于与生成的结果连接以形成完整的链接。

Answer 1

如果您想在target_link == None尝试

时避免出现这种情况

def parse_doc(title, target_link):
    if target_link:
        page = requests.get(target_link).text            
        tel = re.findall(r'\d{10}', page)[0] if re.findall(r'\d{10}', page) else ""
        print(tel)
    print(title)

这应该允许您只处理非空链接或不执行任何操作

Answer 2

首先，请确保您的架构（即url）正确无误。有时您只是缺少一个字符或在https://中有太多字符。如果您必须提出例外，您可以这样做：

import requests
from requests.exceptions import MissingSchema

...

try:
    res = requests.get(linkUrl)
    print(res) 
except MissingSchema:
    print('URL is not complete')

如何到处走走＆＃34; MissingSchema＆＃34; Python中的错误？

2 个答案: