获取所有链接的文本

时间:2018-06-27 02:05:32

标签: python python-3.x beautifulsoup

我正在尝试获取一些auditTime的文本,该文本生活在div内的li中。类似于:a。但是我没有做到这一点。我可以打印第一项,当我将div -> ul -> li -> a更改为.find时,控制台将返回错误:

findAll

直到现在我的代码:

"ResultSet object has no attribute `'%s'`. You're probably treating a list of items like a single item. Did you call `find_all()` when you meant to call `find()`?" 
`% key` AttributeError: `ResultSet` object has no attribute 'text'. You're probably treating a list of items like a single item. Did you call `find_all()` when you meant to call `find()`?

1 个答案:

答案 0 :(得分:0)

如果您未绑定到beautifulsoup,则可以使用Selenium WebDriver执行相同的操作。

我尝试了这段代码来获得想要的东西

from selenium import webdriver
import time

chrome_path  = 'path_to_chromedriver_exe'

driver = webdriver.Chrome(chrome_path)

driver.maximize_window()

driver.get('http://amoraosromances.blogspot.com/')
time.sleep(5)

parent_element = driver.find_element_by_css_selector('div#Label2.widget.Label > div > ul')
child_elements = parent_element.find_elements_by_tag_name('li')

for i in child_elements:
    print(i.text)


driver.quit()

输出类似于

ABANDONADO NO ALTAR (8)
ACIDENTE (97)
ADOLESCENTE (23)
ADORÁVEL PRISIONEIRA FANFIC (1)
ADULTÉRIO (26)
AEROMOÇA (3)
AGENCIA DE CASAMENTO (3)
AMANTE (74)
....

如果要为Chrome设置Selenium,可以使用此link入门。