我正在尝试对特定地址的trulia估计值进行网络爬取。尽管某些地址没有trulia估计。因此,我想首先尝试查找文本“ Trulia估算”,如果找到该文本,那么我将尝试查找该值。目前,我不知道如何找到此处显示的Trulia估算文本:
这是我到目前为止的代码:
from selenium import webdriver
from selenium.webdriver.remote import webelement
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import time
from bs4 import BeautifulSoup
import os
from datetime import datetime
from selenium.webdriver import ActionChains
driver = webdriver.Firefox(executable_path = 'C:\\Users\\Downloads\\geckodriver-v0.24.0-win64\\geckodriver.exe')
def get_trulia_estimate(address):
driver.get('https://www.trulia.com/')
print(address)
element = (By.ID, 'homepageSearchBoxTextInput')
WebDriverWait(driver, 10).until(EC.element_to_be_clickable(element)).click()
WebDriverWait(driver, 10).until(EC.element_to_be_clickable(element)).send_keys(address)
search_button = (By.CSS_SELECTOR, "button[data-auto-test-id='searchButton']")
WebDriverWait(driver, 50).until(EC.element_to_be_clickable(search_button)).click()
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'html.parser')
results = soup.find('div', {'class', 'Text__TextBase-sc-1cait9d-0 OmRik'})
print(results)
get_trulia_estimate('693 Bluebird Canyon Drive, Laguna Beach, CA 92651')
任何建议都将不胜感激。
答案 0 :(得分:3)
使用beautifulsoup
的版本:
import requests
from bs4 import BeautifulSoup
url = 'https://www.trulia.com/json/search/location/?query={}&searchType=for_sale'
search_string = '693 Bluebird Canyon Drive, Laguna Beach, CA 92651'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'}
d = requests.get(url.format(search_string), headers=headers).json()
property_url = 'https://www.trulia.com' + d['url']
soup = BeautifulSoup(requests.get(property_url, headers=headers).text, 'lxml')
print(soup.select_one('h3:has(+div span:contains("Trulia Estimate"))').text)
打印:
$1,735,031
CSS选择器h3:has(+div span:contains("Trulia Estimate"))
找到带有标签<h3>
的{{1}},其中标签<div>
包含<span>
,字符串“ Trulia Estimate”为直接同级。
进一步阅读:
答案 1 :(得分:0)
好像每次都生成CSS ...
我建议为此使用XPATH ...
使用.text
来获取文本。
您可能想更改为带有价格的父元素...因此,请使用(//div[@aria-label="Price trends are based on the Trulia Estimate"])[1]//../h3/div
作为xpath。
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import time
from bs4 import BeautifulSoup
import os
from datetime import datetime
from selenium.webdriver import ActionChains
driver = webdriver.Firefox(executable_path = 'geckodriver.exe')
def get_trulia_estimate(address):
driver.get('https://www.trulia.com/')
print(address)
element = (By.ID, 'homepageSearchBoxTextInput')
WebDriverWait(driver, 10).until(EC.element_to_be_clickable(element)).click()
WebDriverWait(driver, 10).until(EC.element_to_be_clickable(element)).send_keys(address)
search_button = (By.CSS_SELECTOR, "button[data-auto-test-id='searchButton']")
WebDriverWait(driver, 50).until(EC.element_to_be_clickable(search_button)).click()
time.sleep(3)
find_trulia_estimate_text = driver.find_element_by_xpath('(//div[@aria-label="Price trends are based on the Trulia Estimate"])[1]').text
print(find_trulia_estimate_text)
get_trulia_estimate('693 Bluebird Canyon Drive, Laguna Beach, CA 92651')
693蓝鸟峡谷大道,拉古纳海滩,加利福尼亚92651 Trulia估算
如果您使用价格的xpath,则输出为:
693蓝鸟峡谷大道,拉古纳海滩,加利福尼亚92651 $ 1,735,031
答案 2 :(得分:0)
如果您想尝试不使用beautifulsoup,
if driver.find_element_by_xpath("//span[contains(text(),'Trulia Estimate')]").is_displayed():
estimate = driver.find_element_by_xpath("//div[@data-testid='home-details-summary']//h3/div')]").text
else:
print("Estimate is not found")
print(estimate)