我想抓取 this 网页给出的一些下拉菜单的所有现有文本。结构很简单,我也复习了之前给出的 the answers 但我的结果是零。代码是:
library(rvest);library(tidyverse)
pg <- read_html("https://www.siviltoplum.gov.tr/illere-ve-faaliyet-alanlarina-gore-dernekler")
pg %>%
html_nodes("option") %>%
html_text()
任何帮助将不胜感激。
答案 0 :(得分:2)
看起来您试图从错误的页面中抓取...查看源代码会发现来自其他位置的 iframe。
<iframe allowtransparency="true" scrolling="no" src="https://derbis.dernekler.gov.tr/IstatistikDerneklerWeb/IlFaaliyetAlaniDernekler" style="width: 940px; height: 916px; color: rgb(255, 255, 255); margin-top: 0px; margin-left: 5px; float: left; background-color: transparent; fontColor: #ffffff; fontSize: 32px; border-size: 0px;" frameborder="0"></iframe>
read.this <- "https://derbis.dernekler.gov.tr/IstatistikDerneklerWeb/IlFaaliyetAlaniDernekler"
library( rvest )
library( tidyverse )
pg <- read_html(read.this, encoding = "latin1")
pg %>%
html_nodes("option") %>%
html_text()
# [1] "ADANA" "ADIYAMAN" "AFYONKARAHÄ°SAR" "AÄ\u009eRI"
# [5] "AKSARAY" "AMASYA" "ANKARA" "ANTALYA"
# [9] "ARDAHAN" "ARTVÄ°N" "AYDIN" "BALIKESÄ°R"
# [13] "BARTIN" "BATMAN" "BAYBURT" "BÄ°LECÄ°K"
# [17] "BÄ°NGÃ\u0096L" "BÄ°TLÄ°S" "BOLU" "BURDUR"
....