我一直在尝试从网站上打印在<span>
标记中找到的文本。我尝试过的所有不会给我带来错误的内容都返回了空白。完全不打印任何内容。
在这里我的代码:
import time
import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
import ssl
from twilio.rest import Client
from twilio.rest import TwilioRestClient
browser = webdriver.Chrome()
browser.get(('https://www.hubzu.com/property/9007091467618-3632-Stokes-Drive-Sarasota-FL-34232'))
propertyname = browser.find_element_by_css_selector('span.h1')
propertyName1 = propertyname.text
print(propertyName1)
这里是我要从中拉出的span类:
<span class="h1">
<span id="streetName" class="header_bold propStreetAddress">
3632
Stokes Drive</span><span>, Sarasota, FL 34232</span>
</span>
答案 0 :(得分:2)
使用BeautifulSoup
来抓取更复杂的HTML代码片段要简单得多:
from bs4 import BeautifulSoup as soup
from selenium import webdriver
d = webdriver.Chrome()
d.get('https://www.hubzu.com/property/9007091467618-3632-Stokes-Drive-Sarasota-FL-34232')
print(soup(d.page_source, 'html.parser').find('span', {'class':'h1'}).text)
输出:
'\n\n3632\nStokes Drive, Sarasota, FL 34232\n'
答案 1 :(得分:0)
正在发生的事情是,有两个span
类的标签h1
。第一个是隐藏的。这就是为什么您得到空结果的原因,因为find_element
返回定位器找到的第一个元素。
请尝试以下操作:
browser.get(('https://www.hubzu.com/property/9007091467618-3632-Stokes-Drive-Sarasota-FL-34232'))
propertyname = browser.find_element_by_css_selector('div.row.header-top-navigation span.h1')
print(propertyname.text)
答案 2 :(得分:0)
它可能并不适用于所有情况,但在这种情况下,您可以只使用请求和bs4
from bs4 import BeautifulSoup as bs
import requests
r = requests.get('https://www.hubzu.com/property/9007091467618-3632-Stokes-Drive-Sarasota-FL-34232', headers = {'User-Agent' : 'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
print(soup.select_one('.img-responsive')['alt'])