当我检查浏览器上的元素时,我显然可以看到确切的Web内容。但是当我尝试运行下面的脚本时,我看不到一些网页的详细信息。在网页中,我看到有“#document”元素,在运行脚本时缺少这些元素。如何查看#document元素的详细信息或使用脚本提取。?
from bs4 import BeautifulSoup
import requests
response = requests.get('http://123.123.123.123/')
soup = BeautifulSoup(response.content, 'html.parser')
print soup.prettify()
答案 0 :(得分:2)
您还需要其他请求以获取frame
页面内容:
from urlparse import urljoin
from bs4 import BeautifulSoup
import requests
BASE_URL = 'http://123.123.123.123/'
with requests.Session() as session:
response = session.get(BASE_URL)
soup = BeautifulSoup(response.content, 'html.parser')
for frame in soup.select("frameset frame"):
frame_url = urljoin(BASE_URL, frame["src"])
response = session.get(frame_url)
frame_soup = BeautifulSoup(response.content, 'html.parser')
print(frame_soup.prettify())