R:根据部分匹配字符将列添加到数据框

时间:2019-12-02 12:53:51

标签: r

我有一个带有ID和值列的示例数据框:

ID_short    Value
Boar            4
Pig             5
Duck            6
Dog             7
Cat             8
Horse           9

我有另一个数据框,该数据框的一列具有相同的ID,但扩展了更多字符:

ID_Extended
Duck_p15
Dog32
PigGG
Horse_p12
Cat_Ok
Boar_Ko_1999_test

我想将此ID_Extended列添加到第一个数据帧,并且我希望扩展ID仍与正确行中的短ID匹配。 ID是类字符。

所需输出示例:

ID  Value   ID_Extended
Boar    4   Boar_Ko_1999_test
Pig     5   PigGG
Duck    6   Duck_p15
Dog     7   Dog32
Cat     8   Cat_Ok
Horse   9   Horse_p12

2 个答案:

答案 0 :(得分:2)

这是东西:

df1$D_Extended <- 
  df2$ID_Extended[sapply(df1$ID_short, 
                         function(x) match(x, substr(df2$ID_Extended, 1, nchar(x))))]


df1
  ID_short Value        D_Extended
1     Boar     4 Boar_Ko_1999_test
2      Pig     5             PigGG
3     Duck     6          Duck_p15
4      Dog     7             Dog32
5      Cat     8            Cat_Ok
6    Horse     9         Horse_p12

数据:

df1 <- data.frame(
  ID_short = c("Boar", "Pig", "Duck", "Dog", "Cat", "Horse"), 
  Value = 4:9,
  stringsAsFactors = FALSE
)
df2 <- data.frame(
  ID_Extended = c("Duck_p15", "Dog32", "PigGG","Horse_p12", "Cat_Ok", "Boar_Ko_1999_test"),
  stringsAsFactors = FALSE
)

答案 1 :(得分:2)

从'df2'中提取'ID_Extended'的子字符串后,我们可以使用match

df1$ID_Extended <- df2$ID_Extended[match(df1$ID_short, 
            sub("^([A-Z][a-z]+).*", "\\1", df2$ID_Extended))]

数据

df1 <- structure(list(ID_short = c("Boar", "Pig", "Duck", "Dog", "Cat", 
"Horse"), Value = 4:9), class = "data.frame", row.names = c(NA, 
-6L))

df2 <- structure(list(ID_Extended = c("Duck_p15", "Dog32", "PigGG", 
"Horse_p12", "Cat_Ok", "Boar_Ko_1999_test")), class = "data.frame",
row.names = c(NA, 
-6L))