从数据框中的字符串拆分数字

时间:2016-11-10 18:11:00

标签: r splitstackshape

我在R中有一个数据框,其中的列如下所示:

Venue
AAA 2001
BBB 2016
CCC 1996
... ....
ZZZ 2007

为了更轻松地使用数据框,我想将场地列拆分为两列,位置和年份,如下所示:

Location Year
AAA      2001
BBB      2016
CCC      1996
...      ....
ZZZ      2007

我尝试了cSplit()函数的各种变体来实现这一目标:

df = cSplit(df, "Venue", " ") #worked somewhat, however issues with places with multiple words (e.g. Los Angeles, Rio de Janeiro)
df = cSplit(df, "Venue", "[:digit:]")
df = cSplit(df, "Venue,", "[0-9]+")

到目前为止,这些对我来说都没有。如果有人能指出我正确的方向,我会很感激。

2 个答案:

答案 0 :(得分:0)

最简单的方法是使用自动矢量化的file.html

stringr

library(stringr) df[,1:2] <- str_split(df$Venue, pattern = "\\s+(?=\\d)", simplify = TRUE) colnames(df) <- c('Location', 'Year')

str_split_fixed

你也可以用基础R

来做
str_split_fixed(df$Venue, pattern = "\\s+(?=\\d)", 2)

答案 1 :(得分:0)

这个怎么样?

d <- data.frame(Venue = c("AAA 2001", "BBB 2016", "CCC 1996", "cc d 2001"),
         stringsAsFactors = FALSE)

d$Location <- gsub("[[:digit:]]", "", d$Venue)
d$Year <- gsub("[^[:digit:]]", "", d$Venue)
d
#       Venue Location Year
# 1  AAA 2001     AAA  2001
# 2  BBB 2016     BBB  2016
# 3  CCC 1996     CCC  1996
# 4 cc d 2001    cc d  2001