尝试获取div的文本时出错

时间:2017-06-03 01:37:42

标签: python selenium webdriver bs4

我正在尝试在div内获取html /文本。 div有一个class的数学。

这是我使用的代码:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.support.wait import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    import time
    from bs4 import BeautifulSoup as soup
    from bs4 import SoupStrainer
    import urllib.request
    from selenium.webdriver.common.action_chains import ActionChains
    import getpass

    ui = input('What is your IXL username?\n\n')
    pi = getpass.getpass('\nWhat is your IXL password?\n\n')

    driver = 'C:\\Users\\agzsc\\Desktop\\MicrosoftWebDriver.exe'
    driver = webdriver.Edge(driver)
    driver.get('https://www.ixl.com')
    username = driver.find_element_by_id('qlusername')
    password = driver.find_element_by_id('qlpassword')
    submit = driver.find_element_by_id('qlsubmit')
    username.send_keys(ui)
    password.send_keys(pi)
    ActionChains(driver).move_to_element(submit).click().perform()

    for x in range(1):
        time.sleep(1)
        driver.execute_script('''window.open("https://www.ixl.com/math/grade-3/multiply-by-11","_blank");''')
        driver.switch_to_window(driver.window_handles[1+x])
        math = soup.find_all('div', attrs={"class":"math"})
        print(math)

如您所见,我正在使用selenium webdriver for Microsoft Edge。我还尝试使用bs4解析该页面,并仅使用div数学获得class。但是,我一直收到这个错误:

    Traceback (most recent call last):
  File "C:\Users\agzsc\Downloads\powerixl.py", line 41, in <module>
    math = soup.find_all('div', attrs={"class":"math"})
  File "C:\Users\agzsc\AppData\Local\Programs\Python\Python36-32\lib\site-packages\bs4\element.py", line 1310, in find_all
    generator = self.descendants
AttributeError: 'str' object has no attribute 'descendants'

如果有人可以提供帮助,我会非常亲切。谢谢!

1 个答案:

答案 0 :(得分:0)

您可以替换

soup.find_all('div', attrs={"class":"math"})

driver.find_element_by_css_selector('div.math').get_attribute('innerHTML')

如果您想要innerHTML目标div

driver.find_element_by_css_selector('div.math').text

如果您只想要div

的文字内容