Rvest将最终字符串转换为多行

时间:2016-03-31 11:18:54

标签: html r string gsub rvest

我已经使用rvest从ESPNFC.co.uk的足球比赛中取得评论,但我很难获得我需要的最终输出。

library("rvest")
library("xlsx")
espnfc<-html("http://www.espnfc.co.uk/commentary/422421/commentary.html")
  commentary<-espnfc %>%
  html_nodes("#convo-window") %>%
  html_text() 
commentary <- gsub ( "\n", "", commentary)
commentary <- gsub ( "\r", "", commentary)
commentary <- gsub ( "\t", "", commentary)

最终输出是一个巨大的字符串,但是我希望每分钟的动作成为数据帧中的一行,例如:

"90'Second Half ends, Liverpool 2, Sunderland 2."
"90'Attempt blocked. Adam Johnson (Sunderland) right footed shot from outside the box is blocked. Assisted by Patrick van Aanholt."
"90'Attempt missed. Jordon Ibe (Liverpool) right footed shot from outside the box is close, but misses to the left. Assisted by Mamadou Sakho."
"90'Lucas Leiva (Liverpool) wins a free kick in the attacking half."

我该如何解决这个问题?

1 个答案:

答案 0 :(得分:4)

使用css选择器可以让您的生活更轻松

espnfc<-html("http://www.espnfc.co.uk/commentary/422421/commentary.html")
commentary<-espnfc %>%
html_nodes(".comment p") %>%
html_text() 

minute<-espnfc %>%
html_nodes(".timestamp p") %>%
html_text() 

df<-data.frame(minute=minute,commentary=commentary)