使用RVest抓取数据时遇到问题

时间:2020-06-16 08:29:16

标签: r dplyr rvest

请尝试从Google新闻网站上抓取数据。我想使用rvest和dplyr软件包以及Google chrome上的选择器小工具来提取网站上热门话题的关键字。这是我的代码:

import android.os.Handler;

final Handler handler = new Handler();

if (EseCards_Slot1.isShown()) {
   EseCards_Slot1.performClick();
   bot_count++;
   Stop_Two_InstantPLay(); // stops 2 cards from playing i.e pick 2 and ride on
   handler.postDelayed(new Runnable() {
        @Override
        public void run() {
            if (EseCards_Slot2.isShown()) {
               EseCards_Slot2.performClick();
               bot_count++;
               Stop_Two_InstantPLay();
               // Here comes the next Handler.
               // It would be better to divide if statements into methods
               // and run them inside each new handler.postDelayed.
            }
        }
    }, 1000);

}

但是运行代码后,我收到以下错误消息:

library(rvest)
library(dplyr)
google.news<-read_html("https://news.google.com/topstories?hl=en-NG&gl=NG&ceid=NG:en")
google.news %>%
+html_nodes(".boy4he") %>%
+html_text()

请问可能有什么问题吗?谢谢任何人的意见或建议,

1 个答案:

答案 0 :(得分:1)

这有效:

library(rvest)
library(dplyr)
google.news<-read_html("https://news.google.com/topstories?hl=en-NG&gl=NG&ceid=NG:en")

google.news %>%
  html_nodes(css = ".boy4he") %>%
  html_attr("aria-label")

[1] "Godwin Obaseki"            "Abdullahi Umar Ganduje"    "Sanusi Lamido Sanusi"      "Zamfara"                  
 [5] "All Progressives Congress" "Dangote Group"             "Kano"                      "Senate of Nigeria"        
 [9] "Aliko Dangote"             "Muhammadu Buhari"  

这些值在html属性“ aria-label”中为“隐藏”:

<a class="boy4he" href="./topics/CAAqJQgKIh9DQkFTRVFvTEwyMHZNREV5YlRKa2RHd1NBbVZ1S0FBUAE?hl=en-NG&amp;gl=NG&amp;ceid=NG%3Aen" aria-label="Abdullahi Umar Ganduje"></a>