Question

我需要从以下网站抓取线索和回复：

https://forums.vwvortex.com/forumdisplay.php?5449-Atlas-SUV/page2&pp=200&sort=lastpost&order=desc&daysprune=-1

我尝试了以下代码：

url<-"https://forums.vwvortex.com/forumdisplay.php?5449-Atlas-SUV/page1&pp=200&sort=lastpost&order=desc&daysprune=-1"

N_pages <- 5    
A <- NULL    
D <- NULL    

for (j in 1: N_pages){    
review <- read_html(paste0(url, j))      
threads<- cbind(review %>% html_nodes(".threadtitle") %>%     html_text()     )    
author <- cbind(review %>% html_nodes(".label") %>%     html_text() )    
X<- rbind(A, threads, author)

x <- as.data.frame(X)
}

问题：我使用选择器小工具获取正确的HTML源。但是，当我运行代码时，没有得到所需的结果。

我得到的输出：

 V1

1 Title/thread Starter

2 Sticky: ****Please use the search****

3 Sticky: **** The Official Atlas SUV DIY/FAQ thread****

必填输出：

Threads   Author    Replies

Text    name, date  text

Text    name, date  text

Text    name, date  text

如何获取这些线程。我应该使用rvest还是通过API / Json使用？我知道该怎么做吗？

无法在R中使用rvest擦除线程

0 个答案: