我有一个包含8个变量的数据框,我需要创建一个新列,它表示两列的组合,用作每个观察的ID。我需要组合的两列如下所示:
Aut <- c("Robert Lucas", "Finn Kydland & Edward Prescott", "Alan Blinder & Ben Bernanke",
"Lars Svensson & Lawrence Christiano & Robert Lucas", "Ben Bernanke")
Year <- c(1976, 1989, 1983, 1985, 1983)
df <- data.frame(Aut, Year)
我期望得到的ID变量是:
Aut Year ID
Robert Lucas 1976 RoLu1976
Finn Kydland & Edward Prescott 1989 FiKyEdPr1989
Lars Svensson & Lawrence Christiano 1983 LaSvLaChRoLu1983
& Robert Lucas
Alan Blinder & Ben Bernanke 1985 AlBlBeBe1985
Ben Bernanke 1983 BeBe1983
答案 0 :(得分:3)
另一个想法:
library(stringi)
lst <- sapply(stri_extract_all_words(df$Aut), function(x) substr(x, 0, 2))
df$ID <- paste0(sapply(lst, function(x) paste(x, collapse = '')), df$Year)
给出了:
# Aut Year ID
#1 Robert Lucas 1976 RoLu1976
#2 Finn Kydland & Edward Prescott 1989 FiKyEdPr1989
#3 Alan Blinder & Ben Bernanke 1983 AlBlBeBe1983
#4 Lars Svensson & Lawrence Christiano & Robert Lucas 1985 LaSvLaChRoLu1985
#5 Ben Bernanke 1983 BeBe1983
答案 1 :(得分:2)
您可以尝试:
library(stringr)
# first split the individual names using "&" as pattern.
a <- str_split(df$Aut, "&")
# Then use lapply, split and sub to split first and last name. Then paste the
# first two letters of each name together.
a1 <- lapply(a, function(x){
x1 <- str_split(str_trim(x), " ")
paste0(unlist(lapply(x1, str_sub,1,2)), collapse="")
})
# Finally add the years. Resulting vector can be saved in df.
df$ID <- paste0(unlist(a1), df$Year)
所有事情都集中在一个功能中:
foo <- function(a, b){
a <- str_split(a, "&")
a1 <- lapply(a, function(x){
x1 <- str_split(str_trim(x), " ")
paste0(unlist(lapply(x1, str_sub, 1, 2)), collapse="")
})
paste0(unlist(a1), b)
}
foo(df$Aut, df$Year)
[1] "RoLu1976" "FiKyEdPr1989" "AlBlBeBe1983" "LaSvLaChRoLu1985" "BeBe1983"
答案 2 :(得分:0)
以下是包含base R
和gsub
paste
选项
df$ID <- with(df, paste0(gsub("(([A-Za-z]{2})[a-z& ]*)", "\\2", Aut), Year))
df$ID
#[1] "RoLu1976" "FiKyEdPr1989" "AlBlBeBe1983"
#[4] "LaSvLaChRoLu1985" "BeBe1983"