根据id列中的公共值从另一个数据框中减去数据框的行

时间:2020-09-21 19:01:39

标签: r dataframe

我有两个数据框:

> head(df1)
    UG  S N_l N_b Girder1 Girder2 Girder3 Girder4 Girder5 Girder6 Girder7 Source
1   84 12   6   7       6       6       6       6       6       6       6   Code
2  124  9   4   7       4       4       4       4       3       3       3   Code
9   84  9   4   7       4       4       4       4       3       3       3   Code
24 124 12   6   7       6       6       6       6       6       6       6   Code
45 124 15   8   7       8       8       8       8       8       7       3   Code
49  84 15   8   7       8       8       8       8       8       7       3   Code

> head(df2)
  UG  S N_b N_l Girder1 Girder2 Girder3 Girder4 Girder5 Girder6 Girder7 Source
1 84  9   5   3      NA       2       3       3       3       2      NA    CSi
2 84 12   5   4      NA       2       3       4       3       3      NA    CSi
3 84 15   5   5      NA       3       3       5       3       3      NA    CSi
4 92  9   5   3      NA       2       3       3       3       2      NA    CSi
5 92 12   5   4      NA       2       3       4       3       3      NA    CSi
6 92 15   5   5      NA       3       3       5       3       3      NA    CSi

当我想看到两个以Girder开头的列的数据框之间的差异时。

lanes.difference <- df2[5:11]-df1[5:11]

这仅按行号顺序进行,但这不是我要尝试的操作。 当列UGSN_lN_b相同时,我想从df2中减去df1的行。

数据:

> dput(df1)
structure(list(UG = c(84, 124, 84, 124, 124, 84, 84, 124, 116, 
100, 108, 92, 84, 124, 116, 108, 100, 92, 124, 116, 108, 100, 
92, 84), S = c(12, 9, 9, 12, 15, 15, 12, 9, 9, 9, 9, 9, 9, 12, 
12, 12, 12, 12, 15, 15, 15, 15, 15, 15), N_l = c(6, 4, 4, 6, 
8, 8, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5), 
N_b = c(7, 7, 7, 7, 7, 7, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
5, 5, 5, 5, 5, 5, 5), Girder1 = c(6, 4, 4, 6, 8, 8, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA), Girder2 = c(6, 4, 4, 6, 8, 8, 4, 3, 3, 3, 3, 3, 3, 4, 
4, 4, 4, 4, 5, 5, 5, 5, 5, 5), Girder3 = c(6, 4, 4, 6, 8, 
8, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5), 
Girder4 = c(6, 4, 4, 6, 8, 8, 4, 3, 3, 3, 3, 3, 3, 4, 4, 
4, 4, 4, 5, 5, 5, 5, 5, 5), Girder5 = c(6, 3, 3, 6, 8, 8, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), Girder6 = c(6, 
3, 3, 6, 7, 7, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3), Girder7 = c(6, 3, 3, 6, 3, 3, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
Source = c("Code", "Code", "Code", "Code", "Code", "Code", 
"Code", "Code", "Code", "Code", "Code", "Code", "Code", "Code", 
"Code", "Code", "Code", "Code", "Code", "Code", "Code", "Code", 
"Code", "Code")), row.names = c(1L, 2L, 9L, 24L, 45L, 49L, 
67L, 68L, 69L, 73L, 74L, 75L, 83L, 134L, 135L, 137L, 138L, 139L, 
199L, 200L, 204L, 205L, 206L, 211L), class = "data.frame")

> dput(df2)
structure(list(UG = c(84L, 84L, 84L, 92L, 92L, 92L, 100L, 100L, 
100L, 108L, 108L, 108L, 116L, 116L, 116L, 124L, 124L, 124L, 84L, 
84L, 84L, 124L, 124L, 124L), S = c(9L, 12L, 15L, 9L, 12L, 15L, 
9L, 12L, 15L, 9L, 12L, 15L, 9L, 12L, 15L, 9L, 12L, 15L, 9L, 12L, 
15L, 9L, 12L, 15L), N_b = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 7L, 7L, 7L, 7L, 7L, 7L), 
N_l = c(3L, 4L, 5L, 3L, 4L, 5L, 3L, 4L, 5L, 3L, 4L, 5L, 3L, 
4L, 5L, 3L, 4L, 5L, 4L, 6L, 8L, 4L, 6L, 8L), Girder1 = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, 2L, 3L, 3L, 2L, 3L, 3L), Girder2 = c(2L, 2L, 3L, 
2L, 2L, 3L, 2L, 2L, 3L, 2L, 2L, 3L, 2L, 2L, 3L, 2L, 2L, 3L, 
3L, 3L, 8L, 3L, 3L, 8L), Girder3 = c(3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 6L, 
8L, 3L, 6L, 8L), Girder4 = c(3L, 4L, 5L, 3L, 4L, 5L, 3L, 
4L, 5L, 3L, 4L, 5L, 3L, 4L, 5L, 3L, 4L, 5L, 4L, 6L, 8L, 3L, 
6L, 8L), Girder5 = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 6L, 8L, 3L, 6L, 8L
), Girder6 = c(2L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 
3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 7L, 3L, 3L, 8L), Girder7 = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, 3L, 3L, 3L, 3L, 3L, 3L), Source = c("CSi", "CSi", 
"CSi", "CSi", "CSi", "CSi", "CSi", "CSi", "CSi", "CSi", "CSi", 
"CSi", "CSi", "CSi", "CSi", "CSi", "CSi", "CSi", "CSi", "CSi", 
"CSi", "CSi", "CSi", "CSi")), row.names = c(NA, -24L), class = "data.frame")

2 个答案:

答案 0 :(得分:1)

首先,我鼓励您研究tidyverse软件包。无论使用base R还是使用某些程序包,按名称调用dataframe列都不是一个好习惯。

否则,还有许多其他软件包可极大地帮助您处理数据,请用Google搜索您喜欢的软件包。

library(tidyverse)

首先,我在df名称前加上其各自的列名称

colnames(df1) <- paste("df1", colnames(df1), sep = "_")
colnames(df2) <- paste("df2", colnames(df2), sep = "_")

然后我将他们加入一个df。我不能使用左联接,因为您没有通用的身份列

df12 <- cbind(df1, df2)

这是在tidyverse中完成的部分。首先,我过滤df1df2中相等的列。然后,我创建(mutate)一个新列,该列采用两个Girder列的差值。您可以在mutate中添加任意多个变量,就像我对Girder2所做的那样

df12 %>% filter(df1_UG == df2_UG | df1_S == df2_S | df1_N_l == df2_N_l | df1_N_b == df1_N_b) %>% 
mutate(Girder2Diff = df1_Girder2 - df2_Girder2)

您可以在mutatefilter上进行搜索,以了解它们的工作原理。

这花了一些时间,但希望对您有帮助

答案 1 :(得分:0)

按照@Kay的建议,我以相同的方式对两个数据框进行了排序。我不确定是否有更短的方法。

dt[order(-date), reverse_sum := cumsum(var)]