Question

使用下面的代码部分，既不会从网页中提取所有div。两者都为内容返回NoneType。这也适用于{"id": "image_wrap"}和其他人。这表明它没有提取完整的HTML。（我已将数据写入.html文档以进行检查）

但是，http://www.chictopia.com/photo/show/390693等其他网页成功下载了完整的html。只是某些URL执行此操作，其他工作完全正常。它似乎不是一个javascript问题，因为使用selenium时会发生同样的情况。

import bs4
import urllib3

url = "http://www.chictopia.com/photo/show/390695"
http = urllib3.PoolManager()
response = http.request('GET', url)
html = response.data
soup = bs4.BeautifulSoup(html, 'lxml')
content = soup.find("div", {"id": "image_wrap"})
print(content)

下面是使用selenium动态加载网页。

from selenium import webdriver
import bs4

url = "http://www.chictopia.com/photo/show/390695"
browser = webdriver.Chrome()
browser.get(url)
html = browser.page_source
soup = bs4.BeautifulSoup(html, 'lxml')
content = soup.find("div", {"id": "image_wrap"})
print(content)

为什么这只发生在某些网址上？

BeautifulSoup＆amp; Selenium - 不从网页中提取完整的html

0 个答案: