我有点卡在这里。我想从网站上抓取数据,并提取用户评分,评论等内容。 我正在尝试将数据添加到数据框。
下面是我到目前为止的代码:
# Read html and select the URLs for each game review.
library(rvest)
library(dplyr)
library(plyr)
# Read the webpage and the number of ratings.
getGame <- function(metacritic_game) {
total_ratings<- metacritic_game %>%
html_nodes("strong") %>%
html_text()
total_ratings <- ifelse(length(total_ratings) == 0, NA,
as.numeric(strsplit(total_ratings, " ") [[1]][1]))
# Get the game title and the platform.
game_title <- metacritic_game %>%
html_nodes("h1") %>%
html_text()
game_platform <- metacritic_game %>%
html_nodes(".platform a") %>%
html_text()
game_platform <- strsplit(game_platform," ")[[1]][57:58]
game_platform <- gsub("\n","", game_platform)
game_platform<- paste(game_platform[1], game_platform[2], sep = " ")
game_publisher <- metacritic_game %>%
html_nodes(".publisher a:nth-child(1)") %>%
html_attr("href") %>%
strsplit("/company/")%>%
unlist()
game_publisher <- gsub("\\W", " ", game_publisher)
game_publisher <- strsplit(game_publisher,"\\t")[[2]][1]
release_date <- metacritic_game %>%
html_nodes(".release_data .data") %>%
html_text()
user_ratings <- metacritic_game %>%
html_nodes("#main .indiv") %>%
html_text() %>%
as.numeric()
user_name <- metacritic_game %>%
html_nodes(".name a") %>%
html_text()
review_date <- metacritic_game %>%
html_nodes("#main .date") %>%
html_text()
user_comment <- metacritic_game %>%
html_nodes("#main .review_section .review_body") %>%
html_text()
record_game <- data.frame(game_title = game_title,
game_platform = game_platform,
game_publisher = game_publisher,
username = user_name,
ratings = user_ratings,
date = review_date,
comments = user_comment)
}
metacritic_home <-read_html("https://www.metacritic.com/browse/games/score/metascore/90day/all/filtered")
game_urls <- metacritic_home %>%
html_nodes("#main .product_title a") %>%
html_attr("href")
get100games <- function(game_urls) {
data <- data.frame()
i = 1
for(i in 1:length(game_urls)) {
metacritic_game <- read_html(paste0("https://www.metacritic.com",
game_urls[i], "/user-reviews"))
record_game <- getGame(metacritic_game)
data <-rbind.fill(data, record_game)
print(i)
}
data
}
df100games <- get100games(game_urls)
但是,某些链接没有任何用户评论,因此 rvest无法找到该节点,并且出现以下错误:data.frame(game_title = game_title,game_platform = game_platform,中的错误: 参数意味着行数不同:1、0。
我尝试添加ifelse语句,例如:
username = ifelse(length(user_name) !=0 , user_name, NA),
ratings = ifelse(length(user_ratings) != 0,
user_ratings, NA),
date = ifelse(length(review_date) != 0,
review_date, NA),
comments = ifelse(length(user_comment) != 0,
user_comment, NA))
但是,数据框每场比赛只返回一个评论,而不返回所有评论。对此有何想法?
谢谢..
答案 0 :(得分:4)
您可以在possibly
包中使用函数运算符purrr
:
df100games <- purrr::map(game_urls, purrr::possibly(get100games, NULL)) %>%
purrr::compact() %>%
dplyr::bind_rows()
我相信这会返回您想要的输出。