Question

我正在尝试在下面的html上使用find_all（）;

http://www.simon.com/mall

基于对其他线程的建议，我通过以下网站运行了链接并发现错误，但我不确定所显示的错误是如何影响我在美丽的汤中尝试做的。

https://validator.w3.org/

这是我的代码;

from requests import get

url = 'http://www.simon.com/mall'
response = get(url)

from bs4 import BeautifulSoup

html = BeautifulSoup(response.text, 'html5lib')
mall_list = html.find_all('div', class_ = 'col-xl-4 col-md-6 ')

print(type(mall_list))
print(len(mall_list))

结果是;

"C:\Program Files\Anaconda3\python.exe" C:/Users/Chris/PycharmProjects/IT485/src/GetMalls.py
<class 'bs4.element.ResultSet'>
0

Process finished with exit code 0

我知道HTML中有数百个这样的div。为什么我没有得到任何比赛？

Answer 1

我有时也会使用BeautifulSoup。例如， html = BeautifulSoup.BeautifulSoup(response.text) mall_list = html.html.body.findAll('div',attrs={"class":"col-xl-4 col-md-6 "}) 你可以尝试一下！祝福！

Answer 2

您的代码看起来不错，但是，当我访问simon.com/mall链接并检查Chrome开发工具时，似乎没有类'col-xl-4 col-md-6'的任何实例。

尝试使用'col-xl-2'测试您的代码，您应该会看到一些结果。

Answer 3

假设您正在尝试从该页面解析不同产品的标题和位置（在脚本中提到）。问题是该页面的内容是动态生成的，因此您无法通过请求捕获它;相反，你需要使用任何浏览器模拟器，如selenium，这就是我在下面的代码中所做的。试一试：

from selenium import webdriver
from bs4 import BeautifulSoup
import time

driver = webdriver.Chrome()
driver.get('http://www.simon.com/mall')
time.sleep(3)

soup = BeautifulSoup(driver.page_source, 'lxml')
driver.quit()

for item in soup.find_all(class_="mall-list-item-text"):
    name = item.find_all(class_='mall-list-item-name')[0].text
    location = item.find_all(class_='mall-list-item-location')[0].text
    print(name,location)

结果：

ABQ Uptown Albuquerque, NM
Albertville Premium Outlets® Albertville, MN
Allen Premium Outlets® Allen, TX
Anchorage 5th Avenue Mall Anchorage, AK
Apple Blossom Mall Winchester, VA

Python美丽的汤find_all（）

3 个答案: