Question

我编写了一个脚本来解析每个网页中可见文字prev_year_value或contact中的可用链接。但是，当我运行我的脚本时，我可以看到我的刮刀始终用于解析about中的链接。仅当about不可用时，它才会解析contact中的链接。如何让我的脚本反其道而行，我的意思是它会查找与about而不是contact相关联的链接。如果about不可用，则只会解析contact。我尝试了以下方式来完成它，但它正在按照我描述的方式进行。

这是我的尝试：

about

有没有办法根据条件的可用性确定条件的优先顺序？最重要的是我想要将链接连接到import requests from urllib.parse import urljoin from bs4 import BeautifulSoup links = ( "http://www.mount-zion.biz/", "http://www.latamcham.org/", "http://www.innovaprint.com.sg/", "http://www.cityscape.com.sg/" ) def Get_Link(site): res = requests.get(site) soup = BeautifulSoup(res.text,"lxml") for item in soup.select("a[href]"): if "contact" in item.text.lower(): abslink = urljoin(site,item['href']) ##I thought the script prioritizes the first condition but I am wrong print(abslink) break else: if "about" in item.text.lower(): abslink = urljoin(site,item['href']) print(abslink) break if __name__ == '__main__': for link in links: Get_Link(link)。如果它不可用，则脚本将查找连接到contact的链接。

Answer 1

不要使用else。请使用少量if。另请查看what's difference between if, elif and else。

您的功能应如下所示：

def Get_Link(site):
    res = requests.get(site)
    soup = BeautifulSoup(res.text,"lxml")
    for item in soup.select("a[href]"):
        if "contact" in item.text.lower() or "about" in item.text.lower():
            abslink = urljoin(site,item['href']) 
            print(abslink)
            break

您不能使用break语句，因为它们会破坏程序的阻止，第二个if永远不会触发。

另请注意，在Python中我们使用convention来命名snake_case中的方法/函数，如：my_function()或my_method()以及CamelCase中的类名，如下所示：MyClass。

编辑：

好吧，看起来您的代码更复杂，因为您在另一个循环中运行循环。所以基本上你几乎没有选择：

首先进行if "contact"循环，如果在所有情况下都失败，请使用＆＃34; about＆＃34;
在代码中放置一些标志来控制if语句
使用功能

或者破解它：

def Get_Link(site):
    res = requests.get(site)
    soup = BeautifulSoup(res.text,"lxml")
    for item in soup.select("a[href]"):
        if "contact" in item.text.lower():
            abslink = urljoin(site,item['href'])
            print(abslink)
            return 0 # Exit from function
    for item in soup.select("a[href]"):
        if "about" in item.text.lower():
            abslink = urljoin(site,item['href'])
            print(abslink)
            return 0

如何优先考虑另一种情况？

1 个答案:

编辑：