我正在尝试在网站上搜索下面列出的网站。我在下面列出了我的初始代码:
library(rvest)
session = html_session("https://www.umass.edu/peoplefinder/")
session %>%
html_form %>%
.[[3]] %>%
set_values(search_text = "John") %>%
submit_form(session, .) %>%
html_node("table")
它似乎根本不起作用。有没有人有一些建议?
答案 0 :(得分:1)
library(rvest)
library(jsonlite)
page<-html_session("https://www.umass.edu/peoplefinder")
details<-rvest:::request_POST(page,url="https://www.umass.edu/peoplefinder/engine/",body=list("q"="John"))
s<-jsonlite::fromJSON("ok.json")
df<-as.data.frame(s)
您将获得可用的数据框df
以用于进一步处理
答案 1 :(得分:0)
目标网页中没有table
个节点,您可以通过尝试在页面中查找其他内容来确定该节点,例如:
> session %>% html_form %>% .[[3]] %>% set_values(search_text = "John") %>% submit_form(session, .) %>% html_node("ul")
Submitting with 'pf_search'
{xml_node}
<ul class="menu">
[1] <li class="first leaf go-umass"><a title="" href="https://go.umass.edu/">Go.UMass</a></li>
[2] <li class="leaf email"><a title="" href="//www.oit.umass.edu/email">Email</a></li>
[3] <li class="leaf spire"><a title="" href="https://www.spire.umass.edu/">SPIRE</a></li>
[4] <li class="leaf moodle"><a title="" href="https://moodle.umass.edu/">Moodle</a></li>
[5] <li class="leaf umassonline"><a title="" href="https://uma.umassonline.net/">Blackboard Learn</a ...
[6] <li class="last leaf udrive"><a title="" href="https://udrive.oit.umass.edu/">UDrive</a></li>
答案 2 :(得分:0)
这样得到答案:
umass_people_find = function(q)
"https://www.umass.edu/peoplefinder" %>%
html_session %>%
rvest:::request_POST(url = "https://www.umass.edu/peoplefinder/engine/",
body=list("q"=q) ) %>%
.$response %>%
httr::content("text") %>%
fromJSON %>%
.$Results