找不到HTML中的课程

时间:2018-07-17 11:05:56

标签: python html web web-scraping web-crawler

我正在学习网络爬网,偶然发现了一个与HTML相关的问题(我想)

因此,我找到了这个自由职业者project作为学习资料,我必须在这24个典当行中找到“商店名称”,“地址”等。

我的问题是我无法在HTML中找到所有商店:

content = page_soup.findAll("div", {"class":"list list-unstyled"})

.......

from urllib.request import urlopen
from bs4 import BeautifulSoup as soup

my_url = 'https://www.thenpa.com/Find-A-Pawnbroker.aspx'
#opening up the connection, grabbing the page
uClient = urlopen(my_url)
#offloads the content
page_html = uClient.read()

uClient.close()
#html parsing
page_soup = soup(page_html, "html.parser")
#grabs each pawnbroker
content = page_soup.findAll("div", {"class":"list list-unstyled"})

应该这样

<div class=fab-loc-list"> == $0

Screenshot of HTML element inspection

但不是

1 个答案:

答案 0 :(得分:-1)

尝试一下

from selenium import webdriver
from bs4 import BeautifulSoup as soup
from pyvirtualdisplay import Display
import time

chrome_path = "/home/intellus/python_code/chromedriver"
display = Display(visible=1, size=(2600,720))
display.start()
driver = webdriver.Chrome(chrome_path)
my_url = 'https://www.thenpa.com/Find-A-Pawnbroker.aspx'

driver.get(my_url)
time.sleep(10)

page_html = driver.page_source
page_soup = soup(page_html, "html.parser")
content = page_soup.find("ul", {"class":"list list-unstyled"}).findAll("li")

print(len(content))