试图刮掉D​​iv Class

时间:2018-02-25 14:35:11

标签: r

我试图刮掉一个网站的div类,我相信,或者也许它是div id。无论如何,我试图从下面的链接中获取每位牙医的数据。

https://www.healthgrades.com/usearch?what=Dentistry&where=Canal%20Street%2C%20NY%2010013&pt=40.720901%2C%20-74.008904&pageNum=2&neCorner=40.73066740706064%2C-73.99139615546807&swCorner=40.711097020397304%2C-74.02644835278574&mapCenter=40.720901%2C-74.008904&zoomLevel=14.6&mapChanged=false&city=Canal%20Street&state=NY&zip=10013

这是我的代码示例。

library(rvest)

URL <- "https://www.healthgrades.com/usearch?what=Dentistry&where=Canal%20Street%2C%20NY%2010013&pt=40.720901%2C%20-74.008904&pageNum=2&neCorner=40.73066740706064%2C-73.99139615546807&swCorner=40.711097020397304%2C-74.02644835278574&mapCenter=40.720901%2C-74.008904&zoomLevel=14.6&mapChanged=false&city=Canal%20Street&state=NY&zip=10013"

scistarter_html <- read_html(URL)
scistarter_html

scistarter_html %>%
  html_nodes("a") %>%
  head()

scistarter_html %>%
  html_nodes("div") %>%
  head()

这给了我这个,看起来很好。

{xml_nodeset (6)}
[1] <div class="outofpage"><div id="div-gpt-ad-outofpage-oop"></div></div>
[2] <div id="div-gpt-ad-outofpage-oop"></div>
[3] <div class="hgGlobalHeader__Search" data-reactid="14">\n<label for="hgGlobalHea ...
[4] <div class="hgGlobalHeader__SearchForm" data-reactid="18"><div class="autosugge ...
[5] <div class="autosuggest-rex autosuggest-rex--header  autosuggest-rex--loading " ...
[6] <div class="autosuggest " data-reactid="28">\n<div class="autosuggest__header-c ...

下一行代码似乎根本不起作用。

scistarter_html %>%
  html_nodes("div#card-carousel-search") %>%
  html_nodes("table") %>%
  html_table() %>%
  "["(1) %>% str()


List of 1
 $ : NULL

同样,我正试图从这个id中删除数据。

<div class="card-carousel  " id="card-carousel-search">

至少,这是我认为它应该是什么。 我是从这里开始的。

https://rpubs.com/ryanthomas/webscraping-with-rvest

0 个答案:

没有答案