从div容器中提取链接

时间:2016-09-06 21:29:12

标签: html r xpath rvest

作为一个小方案项目,我试图用R从Polar的网站上提取我的训练数据。我现在正在尝试从日历概述页面中找到导航到各个培训课程的链接。

library(rvest)
url       <-"https://flow.polar.com/login/"  
pgsession <-html_session(url)               
pgform    <-html_form(pgsession)[[1]]       ## pull form from session

#log in
filled_form <- set_values(pgform,
                          `email` = "my username", 
                          `password` = "my password")

session = submit_form(pgsession,filled_form)
# Xpath found by using chrome
nodes <- html_nodes(session,xpath='//*[@id="calendarTab"]/table/tbody/tr[1]/td[2]/div/div/div[1]')
html_structure(nodes) # we're now at the day level where there should be links
[[1]]
<div.training.clean>

我应该在哪里找到链接...

使用chrome,链接的直接xpath应为:

html_nodes(session,xpath='//*[@id="calendarTab"]/table/tbody/tr[1]/td[2]/div/div/div[1]/div[1]/a')

结果是

{xml_nodeset (0)}

编辑: 如果它有帮助,那么在使用选择器(?)时尝试提取节点时会发生什么 - 路径:

html_node(session, xpath='#calendarTab > table > tbody > tr:nth-child(1) > td:nth-child(2) > div > div > div.training.clean > div:nth-child(2) > a')
{xml_missing}
<NA>
Warning message:
In xpath_search(x$node, x$doc, xpath = xpath, nsMap = ns, num_results = 1) :
  Invalid expression [1207]

真正找到链接的想法将非常感激。

0 个答案:

没有答案