根据来自R中另一个数据框的规则,将几个变量添加到数据框

时间:2020-01-22 13:05:51

标签: r dataframe

这是我的数据(均为data.frames)的示例:

表1:

0_x 1_x 2_x .... 20_x
cat cat red......green
dog red green     cat
bee blue bee....  dog
........

和表2

  x    name       code
cat    animals     1
dog    animals     1
bee    animals.    1
green  colours     2
red    colours.    2
...

我想要获得以下结果:

0_y 1_y 2_y .... 20_y  0_x 1_x 2_x .... 20_x
1    1    2.....  2    cat cat red......green
1    2    2       1    dog red green     cat
1    2    1....   1    ....
........

基本上,表2包含我要用于创建要添加到表1的变量的规则 如果0_x是只猫,我希望0_y等于1(因为在表2中cat = 1)

如何以一种优雅的方式获得此结果? (如果我只有一个变量0_x,我只会合并,但这里有几个]

2 个答案:

答案 0 :(得分:1)

您可以在表2 x列和正在读取的列之间匹配您的值。这是一个在for循环中使用它的示例。

注意:认为df1的姓氏必须以字母开头,而不是数字。而且我使用字符串。

df1 <- data.frame(x_0 = c('cat','dog','bee'), 
                  x_1 = c('cat','red','blue') , 
                  x_2 = c('red','green','bee') )

df2 <- data.frame(x = c('cat','dog','bee','green','red','blue'),
                 name = c('animals','animals','animals','colours','colours','colours'),
                 code = c(1,1,1,2,2,2))

df1b = df1 ; colnames(df1b) <- sub("x","y",colnames(df1b))
df3 = cbind(df1b,df1)

for(i in 1:ncol(df1)){
  df3[,i] <- df2$code[match(df1[,i],df2$x)]
}
df3
#   y_0 y_1 y_2 x_0  x_1   x_2
# 1   1   1   2 cat  cat   red
# 2   1   2   2 dog  red green
# 3   1   2   1 bee blue   bee

答案 1 :(得分:1)

也许您可以使用以下基本R解决方案,即使用match() + unlist()

df1post <- df1
df1[] <- df2$code[match(unlist(df1),df2$x)]
dfout <- cbind(`names<-`(df1,gsub("_x","_y",names(df1))),df1post)

这样

> dfout
  0_y 1_y 2_y 20_y 0_x  1_x   2_x  20_x
1   1   1   2    2 cat  cat   red green
2   1   2   2    1 dog  red green   cat
3   1  NA   1    1 bee blue   bee   dog

数据

df1 <- structure(list(`0_x` = c("cat", "dog", "bee"), `1_x` = c("cat", 
"red", "blue"), `2_x` = c("red", "green", "bee"), `20_x` = c("green", 
"cat", "dog")), class = "data.frame", row.names = c(NA, -3L))

df2 <- structure(list(x = c("cat", "dog", "bee", "green", "red"), name = c("animals", 
"animals", "animals.", "colours", "colours."), code = c(1L, 1L, 
1L, 2L, 2L)), class = "data.frame", row.names = c(NA, -5L))