我正试图通过以下链接为所有与艾滋病毒/艾滋病相关的非政府组织争取每个国家的表格:https://www.unodc.org/ngo/showExtendedSearch.do
我能够导航到网址并选择“HIV / AIDS”单选按钮。但是现在我还需要为dropbox'region'和'country'提取所有值,以便我可以在循环中使用它们来为每个国家依次webscrape表。如何收集两个保管箱的值?到目前为止我的代码如下:
#load library
library(RSelenium)
#Specify remote driver
remDr <- remoteDriver(browserName='firefox')
#Initialise session
remDr$open()
#navigate to advanced search page
url <- "https://www.unodc.org/ngo/showExtendedSearch.do"
remDr$navigate(url)
#Click 'HIV/AIDS' filter
webElem <- remDr$findElement(using = 'css',
value = '#applicationArea > form > table > tbody > tr > td > table:nth-child(7) > tbody > tr:nth-child(2) > td > table > tbody > tr > td:nth-child(2) > table > tbody > tr:nth-child(3) > td:nth-child(4) > input[type="checkbox"]')
webElem$clickElement()
答案 0 :(得分:0)
使用firebug或Developer Tools确定下拉菜单元素的xpath,然后使用getElementText
检索值:
region_element <- remDr$findElement('xpath', '//*[@id="applicationArea"]/form/table/tbody/tr/td/table[2]/tbody/tr[2]/td/table/tbody/tr[1]/td[2]/select')
regions <- strsplit(region_element$getElementText()[[1]], "\n")
country_element <- remDr$findElement('xpath', '//*[@id="applicationArea"]/form/table/tbody/tr/td/table[2]/tbody/tr[2]/td/table/tbody/tr[2]/td[2]/select')
countries <- strsplit(country_element$getElementText()[[1]], "\n")
R> print(regions[[1]])
[1] "Middle East and Northern Africa" "Eastern Africa"
[3] "Western Africa" "Central and Southern Africa"
[5] "Northern America" "Central America and the Caribbean"
[7] "Latin America" "Central and Western Asia"
[9] "Southern and Eastern Asia" "Europe"
[11] "Oceania"
R> print(head(countries[[1]]))
[1] "Afghanistan" "Albania" "Algeria" "American Samoa" "Andorra"
[6] "Angola"