Question

我想从this link获取所有“样本”ID，我的意思是“GSM545657”，“GSM545658”等所有ID。我想使用新包rvest来解决我的问题，但我不熟悉CSS和xpath。我使用selectorgadget来获取CSS选择器。我选择了第一个ID：“GSM545657”，它变为绿色，然后我删除了我不想要的信息（它们变成了红色）。现在，所有样品ID都是绿色或红色。 CSS选择器显示如下：“tr：nth-child（23）.eye-protector-processed a”。我的代码就像这样显示了

Library(rvest);
myhtml<-html("http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE21610");
myhtml %>% html_nodes("tr:nth-child(23) .eye-protector-processed a") %>%html_text()

我在类（out）＆lt; - “XMLNodeSet”中发出错误错误：尝试在NULL上设置属性 如果我只选择两个ID，如“GSM545665”和“GSM545666”，我可以使用

myhtml %>% html_nodes("tr:nth-child(23) .eye-protector-processed a") %>%html_text()

得到结果，你能告诉我如何解决这个问题，我们将不胜感激。非常感谢你！

Answer 1

我认为你使用了错误的选择器。这对我有用：

myhtml <- html("http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE21610")
myhtml %>% 
  html_nodes("tr:nth-child(23) tr a") %>%
  html_text()

（但是rvest应该会给出更好的错误。我会提交错误）

使用rvest和selectorgadget从GEO中提取信息，得到错误：“类中的错误（out）＆lt; - ”XMLNodeSet“：尝试在NULL上设置属性”

1 个答案: