早上比我的人更聪明,我有一些奇怪的问题网络刮刮Mashable.com,我希望有人可以解释一下。
Mashable的搜索页面填充了一个类似于......的块的结果。
<script>
window.__bootstrap = {"posts":[{"_id":"54b687d512d2cd49040027dd","id":"2015/01/14/bitcoin-price-200","title":"Bitcoin prices collapse below $200 for first time since 2013","title_tag":null,"author":"Seth Fiegerman","post_date":"2015-01-14T15:14:19+00:00","post_date_rfc":"Wed, 14 Jan 2015 15:14:19 +0000","sort_key":"1ybqcU","link":"http://mashable.com/2015/01/14/bitcoin-price-200/","content":{"plain":"Bitcoin prices are collapsing almost as quickly as they originally skyrocketed.
我克服此类渲染后问题的常用技巧是使用Selenium抓取页面但是今天事情不会计划。
通过Selenium
加载网址http://mashable.com/search/?t=stories&q=bitcoin&page=2 remoteSelenium$navigate(uri) # send selenium to page
html <- unlist(remoteSelenium$getPageSource()) # read in page contents
我明白了......
> html
applicationCacheEnabled rotatable handlesAlerts databaseEnabled version
"TRUE" "FALSE" "TRUE" "TRUE" "34.0.5"
platform nativeEvents acceptSslCerts webdriver.remote.sessionid webStorageEnabled
"MAC" "FALSE" "TRUE" "ed06539a-59dc-41a5-ba4e-07b2ed9a9490" "TRUE"
locationContextEnabled browserName takesScreenshot javascriptEnabled cssSelectorsEnabled
"TRUE" "firefox" "TRUE" "TRUE" "TRUE"
...而不是页面源本身。无法理解为什么或如何解决这个问题,因为它在我试过的其他任何地方都可以正常工作。有关其他问题/答案的想法或指示吗?