无法通过python web scraping从HTML文件中提取#document

时间:2017-03-22 13:02:01

标签: python html web-scraping beautifulsoup

当我检查浏览器上的元素时,我显然可以看到确切的Web内容。但是当我尝试运行下面的脚本时,我看不到一些网页的详细信息。在网页中,我看到有“#document”元素,在运行脚本时缺少这些元素。如何查看#document元素的详细信息或使用脚本提取。?

from bs4 import BeautifulSoup
import requests

response = requests.get('http://123.123.123.123/')
soup = BeautifulSoup(response.content, 'html.parser')
print soup.prettify()

enter image description here

1 个答案:

答案 0 :(得分:2)

您还需要其他请求以获取frame页面内容:

from urlparse import urljoin

from bs4 import BeautifulSoup
import requests

BASE_URL = 'http://123.123.123.123/'

with requests.Session() as session:
    response = session.get(BASE_URL)
    soup = BeautifulSoup(response.content, 'html.parser')

    for frame in soup.select("frameset frame"):
        frame_url = urljoin(BASE_URL, frame["src"])

        response = session.get(frame_url)
        frame_soup = BeautifulSoup(response.content, 'html.parser') 
        print(frame_soup.prettify())