我在java中使用了selenium webdriver来获取url https://www.kapanlagi.com/的pagesource,这样我就可以在网页上自动执行某些操作。不幸的是,当我使用driver.getPageSource();我可以得到源代码,但它有一个附加到所有标签的a0:如下所示。源代码示例如下:
<a0:meta charset="utf-8" />
<a0:meta content="no-cache" http-equiv="Cache-Control" />
<a0:meta content="no-cache" http-equiv="Pragma" />
<a0:meta content="Tue, 22 Jan 2013 02:30:01 GMT" http-equiv="Expires" />
<a0:meta content="900" http-equiv="Refresh" />
<a0:meta content="KapanLagi.com, situs entertainment terbesar di Indonesia. Berita, gosip, resensi film & musik, foto, game, kartu ucapan, dan banyak lagi. Kalau bukan sekarang, Kapan Lagi?" name="description" />
<a0:meta content="berita, infotainment, gossip, gosip, artis, artis indonesia, indonesia, game, entertainment, film, bioskop, resensi, musik, zodiac, kartu ucapan, kartu, kartu lebaran" name="keywords" />
<a0:meta content="1048538409" property="fb:admins" />
<a0:meta content="166048096750307" property="fb:app_id" />
<a0:link href="/manifest.json" rel="manifest" />
<a0:link rel="shortcut icon" href="https://cdns.klimg.com/kapanlagi.com/v5/i/favicon.ico" />
<a0:link href="https://cdns.klimg.com/" rel="dns-prefetch" />
<a0:link href="/feed/entertainment.xml" title="KapanLagi.com Atom Feed" type="application/atom+xml" rel="alternate" />
<a0:link href="https://m.kapanlagi.com/" media="only screen and (max-width: 640px)" rel="alternate" />
<a0:link href="https://www.kapanlagi.com/" rel="canonical" />
<a0:link href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon.png" rel="apple-touch-icon" />
<a0:link href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-precomposed.png" rel="apple-touch-icon" />
<a0:link href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-114x114-precomposed.png" rel="apple-touch-icon" />
<a0:link href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-120x120-precomposed.png" rel="apple-touch-icon" />
<a0:link href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-152x152-precomposed.png" rel="apple-touch-icon" />
<a0:title>Kalau Bukan Sekarang, Kapan Lagi? - KapanLagi.com</a0:title>
答案 0 :(得分:0)
您没有提到正在使用的二进制文件的版本,而是使用 Selenium Java 客户端 v3.9.1 , GeckoDriver v0.19.1 和 Firefox Quantum v58.0.2(64位)我能够看到一个正确的 PageSource 而没有a0:
的任何前缀,如下所示:
代码块:
System.setProperty("webdriver.gecko.driver", "C:\\Utility\\BrowserDrivers\\geckodriver.exe");
WebDriver driver = new FirefoxDriver();
driver.get("https://www.kapanlagi.com/");
System.out.println(driver.getPageSource());
控制台输出:
1520062739574 geckodriver INFO geckodriver 0.19.1
1520062739607 geckodriver INFO Listening on 127.0.0.1:12306
1520062740588 mozrunner::runner INFO Running command: "C:\\Program Files\\Mozilla Firefox\\firefox.exe" "-marionette" "-profile" "C:\\Users\\ATECHM~1\\AppData\\Local\\Temp\\rust_mozprofile.R5Wv9lx9f5K5"
1520062744680 Marionette INFO Enabled via --marionette
1520062762429 Marionette INFO Listening on port 2481
1520062763089 Marionette WARN TLS certificate errors will be ignored for this session
Mar 03, 2018 1:09:23 PM org.openqa.selenium.remote.ProtocolHandshake createSession
INFO: Detected dialect: W3C
<html xmlns="https://www.w3.org/1999/xhtml" xml:lang="en" class="firefox" lang="en"><head>
<meta charset="utf-8">
<meta http-equiv="Cache-Control" content="no-cache">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="Tue, 22 Jan 2013 02:30:01 GMT">
<meta http-equiv="Refresh" content="900">
<meta name="description" content="KapanLagi.com, situs entertainment terbesar di Indonesia. Berita, gosip, resensi film & musik, foto, game, kartu ucapan, dan banyak lagi. Kalau bukan sekarang, Kapan Lagi?">
<meta name="keywords" content="berita, infotainment, gossip, gosip, artis, artis indonesia, indonesia, game, entertainment, film, bioskop, resensi, musik, zodiac, kartu ucapan, kartu, kartu lebaran">
<meta property="fb:admins" content="1048538409">
<meta property="fb:app_id" content="166048096750307">
<link rel="manifest" href="/manifest.json">
<link href="https://cdns.klimg.com/kapanlagi.com/v5/i/favicon.ico" rel="shortcut icon">
<link rel="dns-prefetch" href="https://cdns.klimg.com/">
<link rel="alternate" type="application/atom+xml" title="KapanLagi.com Atom Feed" href="/feed/entertainment.xml">
<link rel="alternate" media="only screen and (max-width: 640px)" href="https://m.kapanlagi.com/">
<link rel="canonical" href="https://www.kapanlagi.com/">
<link rel="apple-touch-icon" href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon.png">
<link rel="apple-touch-icon" href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-precomposed.png">
<link rel="apple-touch-icon" href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-114x114-precomposed.png">
<link rel="apple-touch-icon" href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-120x120-precomposed.png">
<link rel="apple-touch-icon" href="https://cdns.klimg.com/kapanlagi.com/v5/i/channel/apple-touch-icon-152x152-precomposed.png">
<title>Kalau Bukan Sekarang, Kapan Lagi? - KapanLagi.com</title>