首先,我是网络抓取的初学者。
所以在this wesite上工作。我尝试通过讨论espisode来获取下一个网页的链接。使用SelectorGadget,我设法只获得带有主题
的框架的html部分html.s1e01 <- html("http://asoiaf.westeros.org/index.php/forum/41-e01-winter-is-coming/")
html.s1e01.page <- html_nodes(html.s1e01, ".ipsBox")
现在我想获得主题的所有链接,所以我尝试了
html_attr(html.s1e01.page, "href")
但我得到NA
。我在互联网上看到了类似的例子,它应该有效。有什么建议吗?
答案 0 :(得分:1)
html.s1e01.page <- html_nodes(html.s1e01, ".ipsBox .topic_title")
html.s1e01.topics <- html.s1e01.page %>% html_attr("href")
html.s1e01.topics
## [1] "http://asoiaf.westeros.org/index.php/topic/49408-poll-how-would-you-rate-episode-101/"
## [2] "http://asoiaf.westeros.org/index.php/topic/109202-death-of-john-aryn-season-4-episode-5-spoilers/"
## [3] "http://asoiaf.westeros.org/index.php/topic/49310-book-spoilers-episode-101-take-3/"
## [4] "http://asoiaf.westeros.org/index.php/topic/90902-sir-john-standingjonarryn/"
## [5] "http://asoiaf.westeros.org/index.php/topic/106105-did-anyone-notice-the-color-of-the-feather-in-lyannas-tomb/"
## [6] "http://asoiaf.westeros.org/index.php/topic/49116-book-tv-spoilers-what-was-left-out-and-what-was-left-in/"
## [7] "http://asoiaf.westeros.org/index.php/topic/49070-no-spoilers-ep101-discussion/"
## [8] "http://asoiaf.westeros.org/index.php/topic/49159-book-spoilers-the-book-was-better/"
## [9] "http://asoiaf.westeros.org/index.php/topic/57614-runes-in-agot-spoilers-i-suppose/"
## [10] "http://asoiaf.westeros.org/index.php/topic/49151-book-spoilers-ep101-discussion-mark-ii/"
## [11] "http://asoiaf.westeros.org/index.php/topic/49161-booktv-spoilers-dany-drogo/"
## [12] "http://asoiaf.westeros.org/index.php/topic/49071-book-spoilers-ep101-discussion/"
## [13] "http://asoiaf.westeros.org/index.php/topic/49100-no-spoilers-pre-airing-discussion/"