无法使用条件语句解析某些信息

时间:2018-06-25 07:45:55

标签: python python-3.x selenium selenium-webdriver web-scraping

我用python和硒结合编写了一个脚本,以从网页中解析出一些公司的电子邮件。问题是电子邮件在span[data-mail]span[data-mail-e-contact-mail]之内。如果我分别尝试这两个条件,则可以获得所有电子邮件。但是,当我尝试将它们包装在try:except:else块中时,它们将不再起作用。我要去哪里错了?

website link

这是脚本:

from selenium import webdriver
from bs4 import BeautifulSoup

url = "replace with the link above"

driver = webdriver.Chrome()
driver.get(url)
soup = BeautifulSoup(driver.page_source,'html.parser')
for links in soup.select("article.vcard"):
    try: #the following works when tried individually
        email = links.select_one(".hit-footer-wrapper span[data-mail]").get("data-mail")
    except: #the following works as well when tried individually
        email = links.select_one(".hit-footer-wrapper span[data-mail-e-contact-mail]").get("data-mail-e-contact-mail")
    else:
        email = ""
    print(email)
driver.quit()

当我执行上面的脚本时,它什么也不打印。如果单独打印,它们都可以工作。

1 个答案:

答案 0 :(得分:2)

请注意,您的代码不会引发异常,因为get("data-mail")get("data-mail-e-contact-mail")都将返回值(是否为空),但不会返回异常

尝试以下代码以获取所需的输出:

for links in soup.select("article.vcard"):
    email = links.select_one(".hit-footer-wrapper span[data-mail]").get("data-mail") or links.select_one(".hit-footer-wrapper span[data-mail-e-contact-mail]").get("data-mail-e-contact-mail")
    print(email)