我导入了我想在r中使用的csv文件。在这里,我试图调用csv文件中的一个列。此列包含一个标题为“URL”的网址列表。然后,我想要从每个网址中删除数据的代码。简而言之,我想使用比列出c()函数中所有url更有效的方法,因为我有大约200个链接。
https://www.nytimes.com/2018/04/07/health/health-care-mergers-doctors.html?rref=collection%2Fsectioncollection%2Fhealth https://www.nytimes.com/2018/04/11/well/move/why-exercise-alone-may-not-be-the-key-to-weight-loss.html?rref=collection%2Fsectioncollection%2Fhealth https://www.nytimes.com/2018/04/07/health/antidepressants-withdrawal-prozac-cymbalta.html?rref=collection%2Fsectioncollection%2Fhealth https://www.nytimes.com/2018/04/09/well/why-you-should-get-the-new-shingles-vaccine.html?rref=collection%2Fsectioncollection%2Fhealth https://www.nytimes.com/2018/04/09/health/fda-essure-bayer-contraceptive-implant.html?rref=collection%2Fsectioncollection%2Fhealth https://www.nytimes.com/2018/04/09/health/hot-pepper-thunderclap-headaches.html?rref=collection%2Fsectioncollection%2Fhealth
运行此错误时出现错误:article <- links %>% map(read_html)
。
它给了我这样的信息:
(Error in UseMethod("read_xml") :
no applicable method for 'read_xml' applied to an object of class "factor")
以下是代码:
setwd("C:/Users/Majed/Desktop")
d <- read.csv("NYT.csv")
d
links<- d$URLs
article <- links %>% map(read_html)
title <-
article %>% map_chr(. %>% html_node("title") %>% html_text())
content <-
article %>% map_chr(. %>% html_nodes(".story-body-text") %>% html_text() %>% paste(., collapse = ""))
article_table <- data.frame("Title" = title, "Content" = content)
答案 0 :(得分:1)
请注意错误消息的含义:read_html
需要一个字符串,但您要给它一个因素。除非您包含参数read.csv
,否则stringsAsFactors = F
会将字符串转换为因子。 read_csv
来自readr
的{{1}}是一个很好的选择,如果你像我一样忘记你不想让字符串自动变成因素。
我无法在没有您的数据的情况下重现问题,但请尝试将网址转换为字符串:
links <- as.character(d$URLs)
article <- links %>% map(read_html)