为什么总是将a0:附加到selenium pagesource中的所有标记

时间:2018-03-03 05:43:51

标签: java selenium selenium-webdriver

我在java中使用了selenium webdriver来获取url https://www.kapanlagi.com/的pagesource,这样我就可以在网页上自动执行某些操作。不幸的是,当我使用driver.getPageSource();我可以得到源代码,但它有一个附加到所有标签的a0:如下所示。源代码示例如下:

<a0:meta charset="utf-8" />
<a0:meta content="no-cache" http-equiv="Cache-Control" />
<a0:meta content="no-cache" http-equiv="Pragma" />
<a0:meta content="Tue, 22 Jan 2013 02:30:01 GMT" http-equiv="Expires" />
<a0:meta content="900" http-equiv="Refresh" />
<a0:meta content="KapanLagi.com, situs entertainment terbesar di Indonesia. Berita, gosip, resensi film &amp; musik, foto, game, kartu ucapan, dan banyak lagi. Kalau bukan sekarang, Kapan Lagi?" name="description" />
<a0:meta content="berita, infotainment, gossip, gosip, artis, artis indonesia, indonesia, game, entertainment, film, bioskop, resensi, musik, zodiac, kartu ucapan, kartu, kartu lebaran" name="keywords" />
<a0:meta content="1048538409" property="fb:admins" />
<a0:meta content="166048096750307" property="fb:app_id" />

<a0:link href="/manifest.json" rel="manifest" />
<a0:link rel="shortcut icon" href="https://cdns.klimg.com/kapanlagi.com/v5/i/favicon.ico" />
<a0:link href="https://cdns.klimg.com/" rel="dns-prefetch" />
<a0:link href="/feed/entertainment.xml" title="KapanLagi.com Atom Feed" type="application/atom+xml" rel="alternate" />
<a0:link href="https://m.kapanlagi.com/" media="only screen and (max-width: 640px)" rel="alternate" />  
<a0:link href="https://www.kapanlagi.com/" rel="canonical" />
<a0:link href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon.png" rel="apple-touch-icon" />
<a0:link href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-precomposed.png" rel="apple-touch-icon" />
<a0:link href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-114x114-precomposed.png" rel="apple-touch-icon" />
<a0:link href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-120x120-precomposed.png" rel="apple-touch-icon" />    
<a0:link href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-152x152-precomposed.png" rel="apple-touch-icon" />
<a0:title>Kalau Bukan Sekarang, Kapan Lagi? - KapanLagi.com</a0:title>

1 个答案:

答案 0 :(得分:0)

您没有提到正在使用的二进制文件的版本,而是使用 Selenium Java 客户端 v3.9.1 GeckoDriver v0.19.1 Firefox Quantum v58.0.2(64位)我能够看到一个正确的 PageSource 而没有a0:的任何前缀,如下所示:

  • 代码块:

    System.setProperty("webdriver.gecko.driver", "C:\\Utility\\BrowserDrivers\\geckodriver.exe");
    WebDriver driver = new FirefoxDriver();
    driver.get("https://www.kapanlagi.com/");
    System.out.println(driver.getPageSource());
    
  • 控制台输出:

     1520062739574  geckodriver INFO    geckodriver 0.19.1
     1520062739607  geckodriver INFO    Listening on 127.0.0.1:12306
     1520062740588  mozrunner::runner   INFO    Running command: "C:\\Program Files\\Mozilla Firefox\\firefox.exe" "-marionette" "-profile" "C:\\Users\\ATECHM~1\\AppData\\Local\\Temp\\rust_mozprofile.R5Wv9lx9f5K5"
     1520062744680  Marionette  INFO    Enabled via --marionette
     1520062762429  Marionette  INFO    Listening on port 2481
     1520062763089  Marionette  WARN    TLS certificate errors will be ignored for this session
     Mar 03, 2018 1:09:23 PM org.openqa.selenium.remote.ProtocolHandshake createSession
     INFO: Detected dialect: W3C
     <html xmlns="https://www.w3.org/1999/xhtml" xml:lang="en" class="firefox" lang="en"><head>
        <meta charset="utf-8">
        <meta http-equiv="Cache-Control" content="no-cache">
        <meta http-equiv="Pragma" content="no-cache">
        <meta http-equiv="Expires" content="Tue, 22 Jan 2013 02:30:01 GMT">
        <meta http-equiv="Refresh" content="900">
        <meta name="description" content="KapanLagi.com, situs entertainment terbesar di Indonesia. Berita, gosip, resensi film &amp; musik, foto, game, kartu ucapan, dan banyak lagi. Kalau bukan sekarang, Kapan Lagi?">
        <meta name="keywords" content="berita, infotainment, gossip, gosip, artis, artis indonesia, indonesia, game, entertainment, film, bioskop, resensi, musik, zodiac, kartu ucapan, kartu, kartu lebaran">
        <meta property="fb:admins" content="1048538409">
        <meta property="fb:app_id" content="166048096750307">
    
         <link rel="manifest" href="/manifest.json">
        <link href="https://cdns.klimg.com/kapanlagi.com/v5/i/favicon.ico" rel="shortcut icon">
        <link rel="dns-prefetch" href="https://cdns.klimg.com/">
        <link rel="alternate" type="application/atom+xml" title="KapanLagi.com Atom Feed" href="/feed/entertainment.xml">
        <link rel="alternate" media="only screen and (max-width: 640px)" href="https://m.kapanlagi.com/">   
        <link rel="canonical" href="https://www.kapanlagi.com/">
        <link rel="apple-touch-icon" href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon.png">
        <link rel="apple-touch-icon" href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-precomposed.png">
        <link rel="apple-touch-icon" href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-114x114-precomposed.png">
        <link rel="apple-touch-icon" href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-120x120-precomposed.png"> 
        <link rel="apple-touch-icon" href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-152x152-precomposed.png">
        <title>Kalau Bukan Sekarang, Kapan Lagi? - KapanLagi.com</title>