Redfin刮板,用于获取Redfin估算值

时间:2019-06-19 17:04:26

标签: python selenium beautifulsoup

我有一些与此相关的帖子,但是我发现了一个新问题。您会从link1link2中注意到,根据房屋是否在市场上,页面将以不同的方式显示红鳍估计值。我有办法从link1而不是在link2上获得redfin估计。

这是来自link2的html,用于存储我试图获取的Redfin估算值:

enter image description here \

我尝试采用类似的方法来获取link1的数据,但是我的代码返回了一个空列表。

这是我的代码:

from selenium import webdriver
from selenium.webdriver.remote import webelement
import pandas as pd
import time
from bs4 import BeautifulSoup

driver = webdriver.Chrome('chromedriver.exe')
driver.get('https://www.redfin.com/')


def get_redfin_estimate(address):
    search_box = driver.find_element_by_name('searchInputBox')
    search_box.send_keys(address)
    search_box.submit()
    time.sleep(3)
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    data = soup.find_all(lambda tag: tag.name == 'div' and tag.get('class') == ['statsValue'])
    for element in data:
        if "$" in element.text:
            return(element.text)
        else:
            return "N/A"


# print(get_redfin_estimate('687 Catalina Laguna Beach, CA 92651'))

search_box = driver.find_element_by_name('searchInputBox')
search_box.send_keys('687 Catalina Laguna Beach, CA 92651')
search_box.submit()
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'html.parser')
data = soup.find_all(lambda tag: tag.name == 'span' and tag.get('class') == ['value'])
print(data)


driver.quit()

如果有人对如何获得link2的Redfin估计值有任何建议,或者对我如何获得link1的Redfin估计值有建议,也请告诉我。

2 个答案:

答案 0 :(得分:1)

链接器1的选择器.avm .statsValue和链接2的[data-rf-test-id="avmLdpPrice"] .value,使用逗号,组合以选择现有的一个并在一个选择器中使用:

import re

soup = BeautifulSoup(driver.page_source, 'html.parser')
price = soup.select_one('.avm .statsValue, [data-rf-test-id="avmLdpPrice"] .value').text
price_numeric = re.sub("[^0-9]", "", price)
print(price)
print(price_numeric)

答案 1 :(得分:1)

要从link2获取redfin估算值,请尝试以下代码。

from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver=webdriver.Chrome()
driver.get("https://www.redfin.com/CA/Laguna-Beach/687-Catalina-St-92651/home/4889627")
time.sleep(3)
data=driver.page_source
soup=BeautifulSoup(data,'html.parser')
redfinestimate=soup.find('span',class_='avmLabel').find_next('span', class_='value').text
print(redfinestimate)

要从link1获取数据,请使用以下代码。

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver=webdriver.Chrome()
driver.get("https://www.redfin.com/")
element=driver.find_element_by_id('search-box-input')
element.send_keys('687 Catalina Laguna Beach, CA 92651')
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//button[@class='inline-block SearchButton clickable float-right']"))).click()
time.sleep(3)
data=driver.page_source
soup=BeautifulSoup(data,'html.parser')
redfinestimate=soup.find('span',class_='avmLabel').find_next('span', class_='value').text
print(redfinestimate)