匹配行和列,然后减去一个值

时间:2019-10-08 14:59:44

标签: r

我正在调和两个数据集。 A包含交易列表和值。 B包含一个过程后的多个值。我想从A中已标识的字段中减去B中的值。

library(tidyverse)
A<-tribble(
  ~idA, ~group, ~column, ~value, ~idB,
  1, "x", "t1", 11, 1,
  2, "x", "t1",  22, 3,
  3, "x", "t3",  33, 4,
  4, "x", "t1",  25, 5)

B<-tribble(
  ~idB, ~group, ~t1, ~t2, ~t3,
  1, "x", 11, 0, 0,
  2, "x", 0, 11, 0,
  3, "x", 22, 0, 0 ,
  4, "x", 0, 0, 33,
  5, "x", 50, 50, 50)

所需的输出:

Boutput<-tribble(
  ~idB, ~g,~t1, ~t2, ~t3,
  1, "x", 0, 0, 0, 
  2, "x", 0, 11, 0, 
  3, "x", 0, 0, 0,  
  4, "x", 0, 0, 0,  
  5, "x", 25, 50, 50)

我尝试过inner_joining然后根据规则进行变异。

如何在数学上减去匹配项?

2 个答案:

答案 0 :(得分:2)

我对发布此消息很犹豫,但认为这可能有助于寻找其他解决方案。

我可能会考虑先将A从长转换为宽:

Awide <- A %>%
  pivot_wider(names_from = column)

R> Awide
# A tibble: 4 x 5
    idA group   idB    t1    t3
  <dbl> <chr> <dbl> <dbl> <dbl>
1     1 x         1    11    NA
2     2 x         3    22    NA
3     3 x         4    NA    33
4     4 x         5    25    NA

在这种情况下,t2没有任何值。在加入AB之前,请确保所有3列(t1t2t3)都有列:

cols <- c("idA", "group", "idB", "t1", "t2", "t3")
missing <- setdiff(cols, names(Awide))
Awide[missing] <- NA
Awide <- Awide[cols]

R> Awide
# A tibble: 4 x 6
    idA group   idB    t1 t2       t3
  <dbl> <chr> <dbl> <dbl> <lgl> <dbl>
1     1 x         1    11 NA       NA
2     2 x         3    22 NA       NA
3     3 x         4    NA NA       33
4     4 x         5    25 NA       NA

然后可以执行left_join并确保所有NAs都为零,以便以后减去。

AB <- left_join(B, Awide, by=c("idB", "group")) %>%
  mutate_at(c("t1.y", "t2.y", "t3.y"), ~replace(., is.na(.), 0))

R> AB
# A tibble: 5 x 9
    idB group  t1.x  t2.x  t3.x   idA  t1.y  t2.y  t3.y
  <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1 x        11     0     0     1    11     0     0
2     2 x         0    11     0    NA     0     0     0
3     3 x        22     0     0     2    22     0     0
4     4 x         0     0    33     3     0     0    33
5     5 x        50    50    50     4    25     0     0

然后在与模式t*.xt*.y匹配的列上进行减法(可以根据需要使用替代方法):

tdiff <- AB[,grepl("^t.*\\.x$", names(AB))] - AB[,grepl("^t.*\\.y$", names(AB))]

R> tdiff
  t1.x t2.x t3.x
1    0    0    0
2    0   11    0
3    0    0    0
4    0    0    0
5   25   50   50

然后将这些总计绑定到AB以获得最终结果:

cbind(AB[,1:2,drop=FALSE], tdiff)

  idB group t1.x t2.x t3.x
1   1     x    0    0    0
2   2     x    0   11    0
3   3     x    0    0    0
4   4     x    0    0    0
5   5     x   25   50   50

答案 1 :(得分:0)

这是我想出的循环

Bout<-B
for (i in A$idA){
  Bout[A$idB[i],A$column[i]] <- (as.numeric(Bout[A$idB[i],A$column[i]])) - A$value[i]
}
Bout