Question

我拼凑了一些代码以登录网站并导航到要从中进行抓取的特定页面。这部分工作正常。但是，现在，我正在搜索标题为“ tspan”的特定元素，但出现错误消息：

AttributeError: 'str' object has no attribute 'descendants'

如果我转到URL，请右键单击要捕获的元素，然后单击“检查元素”，我会在页面后面看到代码，看起来像这样。

似乎也可以通过'g id'查询。

因此，我想我可以获取所有“ tspan”项目，将所有内容加载到列表中，并将列表写入文本文件。但是，我根本没有'tspan'元素。如果我右键单击该页面，然后单击“查看页面源代码”，则看不到“ tspan”元素。这很奇怪！页面后面的代码肯定与页面本身上呈现的代码不同。这是我的代码。我在这里做什么错了？

from bs4 import BeautifulSoup as bs
import webbrowser
import requests
from lxml import html
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.accept_untrusted_certs = True
import time

# selenium
wd = webdriver.Firefox(executable_path="C:/Utility/geckodriver.exe", firefox_profile=profile)
url = "https://corp-internal.com/admin/?page=0"
wd.get(url)

# set username
time.sleep(2)
username = wd.find_element_by_id("identifierId")
username.send_keys("my_email@email.com")
wd.find_element_by_id("identifierNext").click()

# set password
time.sleep(2)
password = wd.find_element_by_name("password")
password.send_keys("my_pswd")
wd.find_element_by_id("passwordNext").click()

all_text = []

# list of URLs
url_list = ['https://corp-internal.com/admin/graph?dag_id=emm1_daily_legacy',
'https://corp-internal.com/admin/graph?dag_id=eemm1_daily_legacy_history']
for link in url_list:
    #File = webbrowser.open(link)
    #File = requests.get(link)
    #data = File.text
    for link in bs.findAll('tspan'):
        alldata = all_text.append(link.get('tspan'))

outF = open('C:/Users/ryans/OneDrive/Desktop/test.txt', 'w')
outF.writelines(alldata)
outF.close()

尝试使用Selenium抓取特定元素但没有结果

0 个答案: