Python Selenium获取XHTML的PageSource

时间:2017-04-06 12:57:14

标签: python python-3.x jsp selenium pdf

我想知道是否有办法打印整个html路径。我正在尝试验证pdf xhtml文件弹出窗口中的某些文本,但无法访问。我希望得到整个页面源并验证文本在那里。但是.page_source似乎只给我网址和描述,我希望得到每行代码。

1 个答案:

答案 0 :(得分:0)

一种可行的方法是让selenium找到起始页面标记(html)并获取所有与源代码相关的代码。

driver = webdriver.Firefox()
driver.get("http://stackoverflow.com/")
driver.find_element_by_tag_name("html").get_attribute('outerHTML')

Documentation

输出示例:

<html webdriver="true"><head>

<title>Stack Overflow</title>
    <link rel="shortcut icon" href="https://cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico?v=4f32ecc8f43d">
    <link rel="apple-touch-icon image_src" href="https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a">
    <link rel="search" type="application/opensearchdescription+xml" title="Stack Overflow" href="/opensearch.xml">
    <meta name="twitter:card" content="summary">
    <meta name="twitter:domain" content="stackoverflow.com">
    <meta property="og:type" content="website">
    <meta name="description" content="Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers">

    <meta property="og:image" itemprop="image primaryImageOfPage" content="https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon@2.png?v=73d79a89bded">
    <meta name="twitter:title" property="og:title" itemprop="title name" content="Stack Overflow">
    <meta name="twitter:description" property="og:description" itemprop="description" content="Q&amp;A for professional and enthusiast programmers">
    <meta property="og:url" content="http://stackoverflow.com/">

......