我在R中使用了rvest软件包来抓取一些Web数据,但是在将其转换为可用格式时遇到了很多麻烦。
我的数据当前如下所示:
test
[1] "v. Philadelphia"
[2] "TD GardenRegular Season"
[3] "PTS: 23. Jayson TatumREB: 10. M. MorrisAST: 7. Kyrie Irving"
[4] "PTS: 23. Joel EmbiidREB: 15. Ben SimmonsAST: 8. Ben Simmons"
[5] "100.7 - 83.4"
[6] "@ Toronto"
[7] "Air Canada Centre Regular Season"
[8] "PTS: 21. Kyrie IrvingREB: 10. Al HorfordAST: 9. Al Horford"
[9] "PTS: 31. K. LeonardREB: 10. K. LeonardAST: 7. F. VanVleet"
[10] "115.6 - 103.3"
有人可以帮助我执行正确的操作,以使其看起来像这样(作为数据框)并提供代码,我真的很感激:
Opponent Venue
Philadelphia TD Garden
Toronto Air Canada Centre
我不需要任何其他信息。
答案 0 :(得分:0)
让我知道是否有任何问题:)
# put your data in here
input <- c("v. Philadelphia", "TD GardenRegular Season",
"", "", "",
"@ Toronto", "Air Canada Centre Regular Season",
"", "", "")
index <- 1:length(input)
# raw table format
out_raw <- data.frame(Opponent = input[index%%5==1],
Venue = input[index%%5==2])
# using stringi package
install.packages("stringi")
library(stringi)
# copy and clean up
out_clean <- out_raw
out_clean$Opponent <- stri_extract_last_regex(out_raw$Opponent, "(?<=\\s).*$")
out_clean$Venue <- trimws(gsub("Regular Season", "", out_raw$Venue))
out_clean