保留行顺序,同时根据条件将一个data.frame的值添加到另一个

时间:2018-08-14 23:59:45

标签: r tidyverse

tstep <- rep(c("a", "b", "c", "d", "e"), 5)
Variable <- c(rep(c("v"), 5), rep(c("w"), 5), rep(c("x"), 5), rep(c("y"), 5), rep(c("x"), 5))  
Value <- c(1,2,3,4,5,10,11,12,13,14,33,22,44,57,5,3,2,1,2,3,34,24,11,11,7)
Scenario <- c(rep(c("i"), 20), rep(c("j"), 5) ) 
df1 <- data.frame(tstep, Variable, Value, Scenario)

tstep <- c("a", "b", "c", "d", "e")
Variable <- rep(c("x"), 5) 
Value <- c(100, 34, 100,22, 100)
Scenario <- c(rep(c("i"), 5))
df2<- data.frame(tstep, Variable, Value, Scenario)

我发现了类似的帖子,但似乎可能有很多方法。我希望找到一种快速的方法,因为这些是〜0.5 gb长的.csvs样本,其中包含许多变量,我可能需要包含更多列。我希望不需要削减并放回df1

您更愿意将$Value中的df2添加到df1中以匹配tstep,Variable和Scenario列,并在df1中保留原始行顺序? / p>

 #df2 from above, that I want to add to df1 from above, for matching rows
 tstep Variable Value Scenario
  a        x     100        i
  b        x     34         i
  c        x     100        i
  d        x     22         i
  e        x     100        i

  #df1 from above               #desired df1:
  tstep Variable Value Scenario tstep Variable Value Scenario
  a        v     1         i    a        v     1         i
  b        v     2         i    b        v     2         i
  c        v     3         i    c        v     3         i
  d        v     4         i    d        v     4         i
  e        v     5         i    e        v     5         i
  a        w    10         i    a        w    10         i
  b        w    11         i    b        w    11         i
  c        w    12         i    c        w    12         i
  d        w    13         i    d        w    13         i
  e        w    14         i    e        w    14         i
  a        x    33         i    a        x   133         i
  b        x    22         i    b        x    56         i
  c        x    44         i    c        x   144         i
  d        x    57         i    d        x    79         i
  e        x     5         i    e        x   105         i
  a        y     3         i    a        y     3         i
  b        y     2         i    b        y     2         i
  c        y     1         i    c        y     1         i
  d        y     2         i    d        y     2         i
  e        y     3         i    e        y     3         i
  a        x    34         j    a        x    34         j
  b        x    24         j    b        x    24         j
  c        x    11         j    c        x    11         j
  d        x    11         j    d        x    11         j
  e        x     7         j    e        x     7         j

2 个答案:

答案 0 :(得分:3)

以下是使用data.table软件包和更新连接的简短解决方案:

library(data.table)
#convert df1 and df2 into data.table
setDT(df1)
setDT(df2)

#this is an update join. 
#'join' df1 with df2 using tstep, Variable, Scenario. 
#'update' (`:=`) Value in df1 using its Value + df2's Value where there is join
df1[df2, Value := Value + i.Value, on=.(tstep, Variable, Scenario)]
df1

输出:

    tstep Variable Value Scenario
 1:     a        v     1        i
 2:     b        v     2        i
 3:     c        v     3        i
 4:     d        v     4        i
 5:     e        v     5        i
 6:     a        w    10        i
 7:     b        w    11        i
 8:     c        w    12        i
 9:     d        w    13        i
10:     e        w    14        i
11:     a        x   133        i
12:     b        x    56        i
13:     c        x   144        i
14:     d        x    79        i
15:     e        x   105        i
16:     a        y     3        i
17:     b        y     2        i
18:     c        y     1        i
19:     d        y     2        i
20:     e        y     3        i
21:     a        x    34        j
22:     b        x    24        j
23:     c        x    11        j
24:     d        x    11        j
25:     e        x     7        j
    tstep Variable Value Scenario

data.table的一些入门资料: https://github.com/Rdatatable/data.table/wiki/Getting-started


要在应用于多个csv时解决OP的评论:

library(data.table)
rbindlist(
    lapply(c("csv1.csv", "csv14.csv"), function(nm) {
        x <- fread(nm)
        x[x[Variable=="y"], Value := Value + i.Value, on=.(tstep, Variable, Scenario)]
        x
    }),
    use.names=TRUE)

答案 1 :(得分:0)

不是最有效的解决方案,而是一种可能的替代方法:

library(dplyr)    

df1 %>% 
  left_join(df2, by = c("tstep", "Variable", "Scenario")) %>%
  mutate(Value.x = if_else(is.na(Value.y), Value.x, Value.x + Value.y)) %>%
  select(1, 2, Value = 3, 4)