我在R中有一个数据框,其中的列如下所示:
Venue
AAA 2001
BBB 2016
CCC 1996
... ....
ZZZ 2007
为了更轻松地使用数据框,我想将场地列拆分为两列,位置和年份,如下所示:
Location Year
AAA 2001
BBB 2016
CCC 1996
... ....
ZZZ 2007
我尝试了cSplit()
函数的各种变体来实现这一目标:
df = cSplit(df, "Venue", " ") #worked somewhat, however issues with places with multiple words (e.g. Los Angeles, Rio de Janeiro)
df = cSplit(df, "Venue", "[:digit:]")
df = cSplit(df, "Venue,", "[0-9]+")
到目前为止,这些对我来说都没有。如果有人能指出我正确的方向,我会很感激。
答案 0 :(得分:0)
最简单的方法是使用自动矢量化的file.html
stringr
或library(stringr)
df[,1:2] <- str_split(df$Venue, pattern = "\\s+(?=\\d)", simplify = TRUE)
colnames(df) <- c('Location', 'Year')
str_split_fixed
你也可以用基础R
来做str_split_fixed(df$Venue, pattern = "\\s+(?=\\d)", 2)
答案 1 :(得分:0)
这个怎么样?
d <- data.frame(Venue = c("AAA 2001", "BBB 2016", "CCC 1996", "cc d 2001"),
stringsAsFactors = FALSE)
d$Location <- gsub("[[:digit:]]", "", d$Venue)
d$Year <- gsub("[^[:digit:]]", "", d$Venue)
d
# Venue Location Year
# 1 AAA 2001 AAA 2001
# 2 BBB 2016 BBB 2016
# 3 CCC 1996 CCC 1996
# 4 cc d 2001 cc d 2001