我有类似以下代码的源代码。我正在尝试刮掉“ 11老虎”的弦。我是xpath的新手,有人可以建议如何使用硒或美丽的汤吗?我在考虑driver.find_element_by_xpath
或soup.find_all
。
来源:
<div class="count-box fixed_when_handheld s-vgLeft0_5 s-vgPullBottom1 s-vgRight0_5 u-colorGray6 u-fontSize18 u-fontWeight200" style="display: block;">
<div class="label-container u-floatLeft">11 tigers</div>
<div class="u-floatRight">
<div class="hide_when_tablet hide_when_desktop s-vgLeft0_5 s-vgRight0_5 u-textAlignCenter">
<div class="js-show-handheld-filters c-button c-button--md c-button--blue s-vgRight1">
Filter
</div>
<div class="js-save-handheld-filters c-button c-button--md c-button--transparent">
Save
</div>
</div>
</div>
<div class="cb"></div>
</div>
答案 0 :(得分:3)
您可以对BS和Selenium使用相同的.count-box .label-container
css选择器。
BS:
page = BeautifulSoup(yourhtml, "html.parser")
# if you need first one
label = page.select_one(".count-box .label-container").text
# if you need all
labels = page.select(".count-box .label-container")
for label in labels:
print(label.text)
硒:
labels = driver.find_elements_by_css_selector(".count-box .label-container")
for label in labels:
print(label.text)
答案 1 :(得分:0)
Sers给出的答案的变体形式。
page = BeautifulSoup(html_text, "lxml")
# first one
label = page.find('div',{'class':'count-box label-container')).text
# for all
labels = page.find('div',{'class':'count-box label-container'))
for label in labels:
print(label.text)
使用lxml
解析器,因为它更快。您需要通过pip install lxml
答案 2 :(得分:-1)
要提取文本 11张老虎,您可以使用以下任一解决方案:
使用css_selector
:
my_text = driver.find_element_by_css_selector("div.count-box>div.label-container.u-floatLeft").get_attribute("innerHTML")
使用xpath
:
my_text = driver.find_element_by_xpath("//div[contains(@class, 'count-box')]/div[@class='label-container u-floatLeft']").get_attribute("innerHTML")