我有以下链接,我想从中删除新闻标题
https://timesofindia.indiatimes.com/2017/11/1/archivelist/year-2017,month-11,starttime-43040.cms
我在r
中做了以下事情library(rvest)
url = "https://timesofindia.indiatimes.com/2017/11/1/archivelist/year-2017,month-11,starttime-43040.cms"
results <- url %>%
read_html() %>%
html_nodes(xpath='/html/body/div[1]/table[1]')
但results
中没有数据。我想把这些新闻放在R数据帧中。
我该怎么办R
?
答案 0 :(得分:1)
您可以使用a
中的css选择器span
来获取这些标题 - 如果您想要更简单的代码然后对其进行操作。
代码:
library(rvest)
url = "https://timesofindia.indiatimes.com/2017/11/1/archivelist/year-2017,month-11,starttime-43040.cms"
results <- url %>%
read_html() %>%
html_nodes('span a') %>% html_text()
results
输出:
> results
[1] "Not the same old Kochi anymore"
[2] "Ramu Chellappa’s next to be based in Coimbatore"
[3] "Old is gold, cream n’ gold"
答案 1 :(得分:1)
你可以这样做
library(rvest)
url = "https://timesofindia.indiatimes.com/2017/11/1/archivelist/year-2017,month-11,starttime-43040.cms"
page <- read_html(url)
titles <- html_text(html_nodes(page,'.cnt div td:nth-child(1) span a'))
titles[1:5]
> titles[1:5]
[1] "Not the same old Kochi anymore" "Ramu Chellappa’s next to be based in Coimbatore"
[3] "Old is gold, cream n’ gold" "Meme and troll pages play catalysts in promoting Kannada pride"
[5] "Thallu, Kidu, Oola... Creativity had no bounds in Slangyalam"