我正在学习网络爬网,偶然发现了一个与HTML相关的问题(我想)
因此,我找到了这个自由职业者project作为学习资料,我必须在这24个典当行中找到“商店名称”,“地址”等。
我的问题是我无法在HTML中找到所有商店:
content = page_soup.findAll("div", {"class":"list list-unstyled"})
.......
from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
my_url = 'https://www.thenpa.com/Find-A-Pawnbroker.aspx'
#opening up the connection, grabbing the page
uClient = urlopen(my_url)
#offloads the content
page_html = uClient.read()
uClient.close()
#html parsing
page_soup = soup(page_html, "html.parser")
#grabs each pawnbroker
content = page_soup.findAll("div", {"class":"list list-unstyled"})
应该这样
<div class=fab-loc-list"> == $0
Screenshot of HTML element inspection
但不是
答案 0 :(得分:-1)
尝试一下
from selenium import webdriver
from bs4 import BeautifulSoup as soup
from pyvirtualdisplay import Display
import time
chrome_path = "/home/intellus/python_code/chromedriver"
display = Display(visible=1, size=(2600,720))
display.start()
driver = webdriver.Chrome(chrome_path)
my_url = 'https://www.thenpa.com/Find-A-Pawnbroker.aspx'
driver.get(my_url)
time.sleep(10)
page_html = driver.page_source
page_soup = soup(page_html, "html.parser")
content = page_soup.find("ul", {"class":"list list-unstyled"}).findAll("li")
print(len(content))