将句子字符串和数字变量中每个单词的前两个字母组合在一起

时间:2016-10-05 13:19:53

标签: r string

我有一个包含8个变量的数据框,我需要创建一个新列,它表示两列的组合,用作每个观察的ID。我需要组合的两列如下所示:

Aut <- c("Robert Lucas", "Finn Kydland & Edward Prescott", "Alan Blinder & Ben Bernanke", 
       "Lars Svensson & Lawrence Christiano & Robert Lucas", "Ben Bernanke")
Year <- c(1976, 1989, 1983, 1985, 1983)
df <- data.frame(Aut, Year)

我期望得到的ID变量是:

Aut                                  Year                      ID
Robert Lucas                         1976                    RoLu1976
Finn Kydland & Edward Prescott       1989                  FiKyEdPr1989
Lars Svensson & Lawrence Christiano  1983                LaSvLaChRoLu1983
& Robert Lucas
Alan Blinder & Ben Bernanke          1985                  AlBlBeBe1985
Ben Bernanke                         1983                    BeBe1983

3 个答案:

答案 0 :(得分:3)

另一个想法:

library(stringi)
lst <- sapply(stri_extract_all_words(df$Aut), function(x) substr(x, 0, 2))
df$ID <- paste0(sapply(lst, function(x) paste(x, collapse = '')), df$Year)

给出了:

#                                                 Aut Year               ID
#1                                       Robert Lucas 1976         RoLu1976
#2                     Finn Kydland & Edward Prescott 1989     FiKyEdPr1989
#3                        Alan Blinder & Ben Bernanke 1983     AlBlBeBe1983
#4 Lars Svensson & Lawrence Christiano & Robert Lucas 1985 LaSvLaChRoLu1985
#5                                       Ben Bernanke 1983         BeBe1983

答案 1 :(得分:2)

您可以尝试:

library(stringr)
# first split the individual names using "&" as pattern.
a <- str_split(df$Aut, "&")
# Then use lapply, split and sub to split first and last name. Then paste the 
# first two letters of each name together. 
a1 <- lapply(a, function(x){
  x1 <- str_split(str_trim(x), " ")
  paste0(unlist(lapply(x1, str_sub,1,2)), collapse="")
})
# Finally add the years. Resulting vector can be saved in df. 
df$ID <- paste0(unlist(a1), df$Year)

所有事情都集中在一个功能中:

foo <- function(a, b){
   a <- str_split(a, "&")
   a1 <- lapply(a, function(x){
           x1 <- str_split(str_trim(x), " ")
           paste0(unlist(lapply(x1, str_sub, 1, 2)), collapse="")
          })
   paste0(unlist(a1), b) 
}

foo(df$Aut, df$Year)
[1] "RoLu1976"         "FiKyEdPr1989"     "AlBlBeBe1983"     "LaSvLaChRoLu1985" "BeBe1983"   

答案 2 :(得分:0)

以下是包含base Rgsub

paste选项
df$ID <- with(df, paste0(gsub("(([A-Za-z]{2})[a-z& ]*)", "\\2", Aut), Year))
df$ID
#[1] "RoLu1976"         "FiKyEdPr1989"     "AlBlBeBe1983"   
#[4] "LaSvLaChRoLu1985" "BeBe1983"