按照" /"之前的规则第一个数字创建变量

时间:2014-10-13 19:20:52

标签: r

我正在尝试网络抓取一些数据。这就是我现在所拥有的:

library(XML)
library(dplyr)
theurl <- "http://www.iie.org/Research-and-Publications/Open-Doors/Data/International-Students/Enrollment-Trends/1948-2012"
tables <- readHTMLTable(theurl)
trends <- tables[[1]][3:67,] %>% rename("International Students"=V2, "Annual % Change"=V3, "Total Enrollment"=V4, "% Int'l"=V5) %>% 
  mutate(Year = strsplit(x = as.character(V1), "/"))

问题在于变量Year。它应该是1948年:2012年。我可以做trends$Year=1948:2012但我想学习如何使用strsplit或类似的东西。

谢谢!

1 个答案:

答案 0 :(得分:1)

我不确定您是否希望使用列V1Year,但是有两种方法可以使用这两列:

# Using a Regular Expression: Search for the first instance of four numeric characters 
# in a row. Keep them and throw away everything else.
trends$Year = gsub("([0-9]{4}).*", "\\1", trends$Year)

# Using the substr function: Subset the first four characters in the string.
trends$Year = substr(trends$Year, 1, 4)