如何使用具有匹配变量的两个不同长度的不同数据帧减去一个值

时间:2019-09-20 16:21:16

标签: r dataframe matching

我有两个数据集。我想通过变量A匹配数据集,然后从数据集1中的数据集2中减去值。

df1 <- data.frame(A = c("1", "2","3"),
              B = c(10, 20, 30))
df2 <- data.frame(A = c("1", "1","1","2","2","2","3","3","3"),
              B = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
              C = c(100, 125, 150, 100, 150, 200, 100, 200, 300))

我希望df2有一个额外的列“ D”,它是 df2 $ C-df1 $ B与A列匹配。 前100-10    125-10    150-10    100-20    150-20 ...

df2 <- data.frame(A = c("1", "1","1","2", "2","2","3", "3", "3"),
              B = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
              C = c(100, 125, 150, 100, 150, 200, 100, 200, 300),
              D = c(90, 115, 140, 80, 130, 180, 70, 170, 270))

我应该如何创建df2 $ D?

2 个答案:

答案 0 :(得分:1)

R为底的

df1 <- data.frame(A = c("1", "2","3"),
                  B = c(10, 20, 30))
df2 <- data.frame(A = c("1", "1","1","2","2","2","3","3","3"),
                  B = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
                  C = c(100, 125, 150, 100, 150, 200, 100, 200, 300))


df <- merge(df1,df2,by = "A")
df$D <- df$C-df$B.x
df$B <- df$B.y
df[,c("B.x","B.y")] <- NULL

> df
  A   C   D B
1 1 100  90 1
2 1 125 115 2
3 1 150 140 3
4 2 100  80 1
5 2 150 130 2
6 2 200 180 3
7 3 100  70 1
8 3 200 170 2
9 3 300 270 3

使用data.table,您可以直接更新联接:

library(data.table)
df1 <- setDT(df1)
df2 <- setDT(df2)
df2[df1,D := C-i.B,on = "A"]

> df2
   A B   C   D
1: 1 1 100  90
2: 1 2 125 115
3: 1 3 150 140
4: 2 1 100  80
5: 2 2 150 130
6: 2 3 200 180
7: 3 1 100  70
8: 3 2 200 170
9: 3 3 300 270

dplyr

library(dplyr)

df2 %>%
  merge(df1,by = "A") %>%
  mutate(D = C - B.y,
         B = B.x,
         B.x = NULL,
         B.y = NULL) 

答案 1 :(得分:1)

base R中与match

df2$D <- with(df2, C - df1$B[match(A, df1$A)])