将字符数据结构化为数据帧

时间:2018-11-23 16:03:27

标签: r rstudio

我在R中使用了rvest软件包来抓取一些Web数据,但是在将其转换为可用格式时遇到了很多麻烦。

我的数据当前如下所示:

     test
     [1] "v.  Philadelphia"                                           
     [2] "TD GardenRegular Season"                                    
     [3] "PTS: 23. Jayson TatumREB: 10. M. MorrisAST: 7. Kyrie Irving"
     [4] "PTS: 23. Joel EmbiidREB: 15. Ben SimmonsAST: 8. Ben Simmons"
     [5] "100.7 - 83.4" 
     [6] "@  Toronto"                                                         
     [7] "Air Canada Centre Regular Season"                              
     [8] "PTS: 21. Kyrie IrvingREB: 10. Al HorfordAST: 9. Al Horford" 
     [9] "PTS: 31. K. LeonardREB: 10. K. LeonardAST: 7. F. VanVleet"  
     [10] "115.6 - 103.3"        

有人可以帮助我执行正确的操作,以使其看起来像这样(作为数据框)并提供代码,我真的很感激:

     Opponent       Venue   
     Philadelphia   TD Garden
     Toronto        Air Canada Centre

我不需要任何其他信息。

1 个答案:

答案 0 :(得分:0)

让我知道是否有任何问题:)

# put your data in here
input <- c("v. Philadelphia", "TD GardenRegular Season", 
           "", "", "",
           "@ Toronto", "Air Canada Centre Regular Season",
           "", "", "")
index <- 1:length(input)

# raw table format
out_raw <- data.frame(Opponent = input[index%%5==1],
                      Venue = input[index%%5==2])

# using stringi package
install.packages("stringi")
library(stringi)

# copy and clean up
out_clean <- out_raw
out_clean$Opponent <- stri_extract_last_regex(out_raw$Opponent, "(?<=\\s).*$")
out_clean$Venue <- trimws(gsub("Regular Season", "", out_raw$Venue))
out_clean