我正在尝试网络抓取一些数据。这就是我现在所拥有的:
library(XML)
library(dplyr)
theurl <- "http://www.iie.org/Research-and-Publications/Open-Doors/Data/International-Students/Enrollment-Trends/1948-2012"
tables <- readHTMLTable(theurl)
trends <- tables[[1]][3:67,] %>% rename("International Students"=V2, "Annual % Change"=V3, "Total Enrollment"=V4, "% Int'l"=V5) %>%
mutate(Year = strsplit(x = as.character(V1), "/"))
问题在于变量Year。它应该是1948年:2012年。我可以做trends$Year=1948:2012
但我想学习如何使用strsplit或类似的东西。
谢谢!
答案 0 :(得分:1)
我不确定您是否希望使用列V1
或Year
,但是有两种方法可以使用这两列:
# Using a Regular Expression: Search for the first instance of four numeric characters
# in a row. Keep them and throw away everything else.
trends$Year = gsub("([0-9]{4}).*", "\\1", trends$Year)
# Using the substr function: Subset the first four characters in the string.
trends$Year = substr(trends$Year, 1, 4)