tstep <- rep(c("a", "b", "c", "d", "e"), 5)
Variable <- c(rep(c("v"), 5), rep(c("w"), 5), rep(c("x"), 5), rep(c("y"), 5), rep(c("x"), 5))
Value <- c(1,2,3,4,5,10,11,12,13,14,33,22,44,57,5,3,2,1,2,3,34,24,11,11,7)
Scenario <- c(rep(c("i"), 20), rep(c("j"), 5) )
df1 <- data.frame(tstep, Variable, Value, Scenario)
tstep <- c("a", "b", "c", "d", "e")
Variable <- rep(c("x"), 5)
Value <- c(100, 34, 100,22, 100)
Scenario <- c(rep(c("i"), 5))
df2<- data.frame(tstep, Variable, Value, Scenario)
我发现了类似的帖子,但似乎可能有很多方法。我希望找到一种快速的方法,因为这些是〜0.5 gb长的.csvs
样本,其中包含许多变量,我可能需要包含更多列。我希望不需要削减并放回df1
。
您更愿意将$Value
中的df2
添加到df1
中以匹配tstep,Variable和Scenario列,并在df1
中保留原始行顺序? / p>
#df2 from above, that I want to add to df1 from above, for matching rows
tstep Variable Value Scenario
a x 100 i
b x 34 i
c x 100 i
d x 22 i
e x 100 i
#df1 from above #desired df1:
tstep Variable Value Scenario tstep Variable Value Scenario
a v 1 i a v 1 i
b v 2 i b v 2 i
c v 3 i c v 3 i
d v 4 i d v 4 i
e v 5 i e v 5 i
a w 10 i a w 10 i
b w 11 i b w 11 i
c w 12 i c w 12 i
d w 13 i d w 13 i
e w 14 i e w 14 i
a x 33 i a x 133 i
b x 22 i b x 56 i
c x 44 i c x 144 i
d x 57 i d x 79 i
e x 5 i e x 105 i
a y 3 i a y 3 i
b y 2 i b y 2 i
c y 1 i c y 1 i
d y 2 i d y 2 i
e y 3 i e y 3 i
a x 34 j a x 34 j
b x 24 j b x 24 j
c x 11 j c x 11 j
d x 11 j d x 11 j
e x 7 j e x 7 j
答案 0 :(得分:3)
以下是使用data.table
软件包和更新连接的简短解决方案:
library(data.table)
#convert df1 and df2 into data.table
setDT(df1)
setDT(df2)
#this is an update join.
#'join' df1 with df2 using tstep, Variable, Scenario.
#'update' (`:=`) Value in df1 using its Value + df2's Value where there is join
df1[df2, Value := Value + i.Value, on=.(tstep, Variable, Scenario)]
df1
输出:
tstep Variable Value Scenario
1: a v 1 i
2: b v 2 i
3: c v 3 i
4: d v 4 i
5: e v 5 i
6: a w 10 i
7: b w 11 i
8: c w 12 i
9: d w 13 i
10: e w 14 i
11: a x 133 i
12: b x 56 i
13: c x 144 i
14: d x 79 i
15: e x 105 i
16: a y 3 i
17: b y 2 i
18: c y 1 i
19: d y 2 i
20: e y 3 i
21: a x 34 j
22: b x 24 j
23: c x 11 j
24: d x 11 j
25: e x 7 j
tstep Variable Value Scenario
data.table
的一些入门资料:
https://github.com/Rdatatable/data.table/wiki/Getting-started
要在应用于多个csv时解决OP的评论:
library(data.table)
rbindlist(
lapply(c("csv1.csv", "csv14.csv"), function(nm) {
x <- fread(nm)
x[x[Variable=="y"], Value := Value + i.Value, on=.(tstep, Variable, Scenario)]
x
}),
use.names=TRUE)
答案 1 :(得分:0)
不是最有效的解决方案,而是一种可能的替代方法:
library(dplyr)
df1 %>%
left_join(df2, by = c("tstep", "Variable", "Scenario")) %>%
mutate(Value.x = if_else(is.na(Value.y), Value.x, Value.x + Value.y)) %>%
select(1, 2, Value = 3, 4)