我试图从网页http://www.har.com/4311-Childress-St/sale_40763013获取数据。它有房子的地址,价格和其他信息。我试图获取所有数据但只成功检索到地址,城市和邮编。以下是我的代码。我如何获得其他信息,如县,故事等?
def getHarData(driver):
driver.get("http://www.har.com/4311-Childress-St/sale_40763013")
try:
address = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "heading_22")))
cityzip = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "sub_heading")))
#price = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "heading_22 pb15")))
print (address.text + ", " + cityzip.text+ ", " +price.text)
except TimeoutException:
print("data not found")
答案 0 :(得分:1)
如果您只需要某些特定字段,我会创建一个很好的可重用函数来获取字段名称/标签的字段值:
def get_field_value(driver, field):
field = field.capitalize() + ":"
return driver.find_element_by_xpath("//div[@class = 'dc_label' and . = '%s']/following-sibling::div[@class = 'dc_value']" % field).text
用法:
county = get_field_value(driver, "county")
print(county) # prints "Harris County"
完整的工作样本:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
def get_field_value(driver, field):
field = field.capitalize() + ":"
return driver.find_element_by_xpath("//div[@class = 'dc_label' and . = '%s']/following-sibling::div[@class = 'dc_value']" % field).text
driver = webdriver.Firefox()
driver.get("http://www.har.com/4311-Childress-St/sale_40763013")
# wait for the page to load
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.CLASS_NAME, "dc_title")))
county = get_field_value(driver, "county")
print(county)