Question

所以这就是我的情况：我已经通过一系列网址获得了很多成功，通常是通过抓取href（s）并将它们附加到域来创建的。这是我在这里使用的策略

＆＃13;

data = list()
for(i in 1:length(classes)){
  
  course <- read_html(classes[i])
  
  title <- course%>%
    html_node('h1')%>%
    html_text()
    
  description <- course%>%
    html_node('.block_content')%>%
    html_text()
  
  data[[length(data) + 1]] <- list(Title=title, Description=description)
}

＆＃13;

类是一堆看起来像这样的字符串（切断结尾并开始，因为它们是链接而我没有代表）

   [1] "ttp://catalog.pomona.edu/preview_course_nopop.php?catoid=" 
   [2] "ttp://catalog.pomona.edu/preview_course_nopop.php?catoid=" 
   [3] "ttp://catalog.pomona.edu/preview_course_nopop.php?catoid="
   [4] "ttp://catalog.pomona.edu/preview_course_nopop.php?catoid=" 
   [5] "ttp://catalog.pomona.edu/preview_course_nopop.php?catoid="
   ...
   [2340] "ttp://catalog.pomona.edu/preview_course_nopop.php?catoid"

单独测试链接时没有问题;如果我请求特定的URL而不是整个索引，循环也将正常运行。但是，如果我在整个类的长度上运行它，它会运行很长时间并只返回一个结果

> description
[1] "\n                  \n                      \t\t\t\t\t\tHELP\n\t\t\t\t\t\t2017-2018 Pomona College Catalog Print-Friendly Page [Add to Portfolio]                      \n                    THEA199IRPO - Theatre: Independent ResearchWhen Offered: Each semester.Instructor(s): StaffCredit: 0.5-1A substantial and significant piece of original research or creative product produced. Prerequisite course work required. Available for full or half-course credit.  Back to Top | Print-Friendly Page [Add to Portfolio]                  "
> title
[1] "THEA199IRPO - Theatre: Independent Research"

我老老实实地考虑到a）我之前已经成功了，b）链接没有被打破。我也没有收到任何错误消息。任何帮助都非常欢迎！

Rvest：通过网址循环只返回元素

0 个答案: