Question

我导入了我想在r中使用的csv文件。在这里，我试图调用csv文件中的一个列。此列包含一个标题为“URL”的网址列表。然后，我想要从每个网址中删除数据的代码。简而言之，我想使用比列出c（）函数中所有url更有效的方法，因为我有大约200个链接。

https://www.nytimes.com/2018/04/07/health/health-care-mergers-doctors.html?rref=collection%2Fsectioncollection%2Fhealth https://www.nytimes.com/2018/04/11/well/move/why-exercise-alone-may-not-be-the-key-to-weight-loss.html?rref=collection%2Fsectioncollection%2Fhealth https://www.nytimes.com/2018/04/07/health/antidepressants-withdrawal-prozac-cymbalta.html?rref=collection%2Fsectioncollection%2Fhealth https://www.nytimes.com/2018/04/09/well/why-you-should-get-the-new-shingles-vaccine.html?rref=collection%2Fsectioncollection%2Fhealth https://www.nytimes.com/2018/04/09/health/fda-essure-bayer-contraceptive-implant.html?rref=collection%2Fsectioncollection%2Fhealth https://www.nytimes.com/2018/04/09/health/hot-pepper-thunderclap-headaches.html?rref=collection%2Fsectioncollection%2Fhealth

运行此错误时出现错误：article <- links %>% map(read_html)。

它给了我这样的信息：

(Error in UseMethod("read_xml") : 
no applicable method for 'read_xml' applied to an object of class "factor")

以下是代码：

setwd("C:/Users/Majed/Desktop")

d <- read.csv("NYT.csv")

d

links<- d$URLs

article <- links %>% map(read_html)

title <-
  article %>% map_chr(. %>% html_node("title") %>% html_text())

content <-
  article %>% map_chr(. %>% html_nodes(".story-body-text") %>% html_text() %>% paste(., collapse = ""))

article_table <- data.frame("Title" = title, "Content" = content)

Answer 1

请注意错误消息的含义：read_html需要一个字符串，但您要给它一个因素。除非您包含参数read.csv，否则stringsAsFactors = F会将字符串转换为因子。 read_csv来自readr的{{1}}是一个很好的选择，如果你像我一样忘记你不想让字符串自动变成因素。

我无法在没有您的数据的情况下重现问题，但请尝试将网址转换为字符串：

links <- as.character(d$URLs)

article <- links %>% map(read_html)

从csv.file调用列以提取其数据

1 个答案: