如何使用 rvest 抓取下拉菜单的文本?

时间:2021-02-26 09:53:03

标签: r rvest

我想抓取 this 网页给出的一些下拉菜单的所有现有文本。结构很简单,我也复习了之前给出的 the answers 但我的结果是零。代码是:

library(rvest);library(tidyverse)

pg <- read_html("https://www.siviltoplum.gov.tr/illere-ve-faaliyet-alanlarina-gore-dernekler")

pg %>% 
  html_nodes("option") %>% 
  html_text()

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:2)

看起来您试图从错误的页面中抓取...查看源代码会发现来自其他位置的 iframe。

<iframe allowtransparency="true" scrolling="no" src="https://derbis.dernekler.gov.tr/IstatistikDerneklerWeb/IlFaaliyetAlaniDernekler" style="width: 940px; height: 916px; color: rgb(255, 255, 255); margin-top: 0px; margin-left: 5px; float: left; background-color: transparent; fontColor: #ffffff; fontSize: 32px; border-size: 0px;" frameborder="0"></iframe>

read.this <- "https://derbis.dernekler.gov.tr/IstatistikDerneklerWeb/IlFaaliyetAlaniDernekler"

library( rvest )
library( tidyverse )

pg <- read_html(read.this, encoding = "latin1")

pg %>% 
  html_nodes("option") %>% 
  html_text()


# [1] "ADANA"                       "ADIYAMAN"                    "AFYONKARAHÄ°SAR"             "AÄ\u009eRI"                 
# [5] "AKSARAY"                     "AMASYA"                      "ANKARA"                      "ANTALYA"                    
# [9] "ARDAHAN"                     "ARTVÄ°N"                     "AYDIN"                       "BALIKESÄ°R"                 
# [13] "BARTIN"                      "BATMAN"                      "BAYBURT"                     "BÄ°LECÄ°K"                  
# [17] "BÄ°NGÃ\u0096L"               "BÄ°TLÄ°S"                    "BOLU"                        "BURDUR"  
....