beautifulsoup找不到包含表类和id的表

时间:2017-09-26 19:17:03

标签: selenium html-table beautifulsoup

from bs4 import BeautifulSoup
from selenium import webdriver
import urllib2
import time

driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://www.zillow.com/homes/recently_sold/Culver-City-CA/house,condo,apartment_duplex,townhouse_type/20432063_zpid/51617_rid/12m_days/globalrelevanceex_sort/34.048605,-118.340178,33.963223,-118.47785_rect/12_zm/")
time.sleep(3)
driver.find_element_by_class_name("collapsible-header").click()
soup = BeautifulSoup(driver.page_source)
region = soup.find("div",{"id":"hdp-price-history"})
table = region.find('table',{'class':'zsg-content-component'})
print table  

我需要抓取器价格历史记录表,但结果始终为无

1 个答案:

答案 0 :(得分:0)

这是一个可以为您提供您想要获得的结果的脚本。

from bs4 import BeautifulSoup
from selenium import webdriver
import time

driver = webdriver.Chrome()
driver.get("https://www.zillow.com/homes/recently_sold/Culver-City-CA/house,condo,apartment_duplex,townhouse_type/20432063_zpid/51617_rid/12m_days/globalrelevanceex_sort/34.048605,-118.340178,33.963223,-118.47785_rect/12_zm/")
driver.find_element_by_class_name("collapsible-header").click()
time.sleep(5)
tree = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
table_tag = tree.select("table.zsg-table")[0]
tab_data = [[item.text for item in row_data.select("th,td")]
                for row_data in table_tag.select("tr")]

for data in tab_data:
    print(' '.join(data))

部分结果:

Date Event Price Agents 
06/16/17 Sold $940,000-0.9% K. Miller, A. Masket 
06/14/17 Price change $949,000-1.0%  
05/08/17 Pending sale $959,000  
04/17/17 Price change $959,000+1.1%  
02/27/17 Pending sale $949,000 

如果它符合目的,请不要忘记将其标记为您选择的答案。