我对R很陌生,有一个问题,您可能会觉得很简单。我有两个具有相同确切列名的数据框。一个数据框大约有58k行(每行是商品编号,每列是一个月-值是数量)。第二个数据帧是第一个数据帧的较小子集(大约有1000行)。第二个数据帧中的行将始终在第一个中具有值。我需要做的是从第一个较大的数据帧中减去每个月/文章的第二个数据帧数量。几乎就像是对两个值的vlookup。有任何想法吗?
更新:这是我认为在SQL中的外观:
SELECT I.Division,
ILS.Brand,
ILS.Cust #,
ILS.Article,
ILS.201811change - SLT.201811change AS '201811change',
ILS.201812change - SLT.201812change AS '201812change',
ILS.201901change - SLT.201901change AS '201901change',
ILS.201903change,
ILS.201904change,
ILS.201905change,
ILS.201906change,
ILS.201907change,
ILS.201808change,
ILS.201809change
FROM ILS LEFT OUTER JOIN SLT ON ILS.Article = SLT.Article
答案 0 :(得分:0)
您可以在left_join
上使用dplyr
函数,类似于SQL的LEFT JOIN
。在您的情况下,简化形式为ISL %>% left_join(SLS, by = "Article")
。请查看下面的完整代码:
# data.frame simulation
strs3 <- c("Brand", "Cust", "Article", "201808change", "201809change", "201903change", "201904change", "201905change",
"201906change", "201907change", "201811change", "201812change", "201901change")
n <- 1000
total <- cbind(
as.data.frame(matrix(sample(LETTERS, 3 * n, replace = TRUE), ncol = 3)),
matrix(rnorm(n * 10), ncol = 10)
)
names(total) <- c("Brand", "Cust", "Article", "201808change", "201809change", "201903change", "201904change", "201905change",
"201906change", "201907change", "201811change", "201812change", "201901change")
spl <- ceiling(n * 57 / 58)
ils <- total[1:spl, ]
u <- unique(ils$Article)
ul <- length(u)
slt <- total[(spl + 1): (spl + ul), ]
slt$Article <- u
# left join
z <- ils %>% left_join(slt, by = "Article") %>%
mutate(`201811change` = `201811change.x` - `201811change.y`) %>%
mutate(`201812change` = `201812change.x` - `201812change.y`) %>%
mutate(`201901change` = `201901change.x` - `201901change.y`) %>%
select(-ends_with("y")) %>% select(-one_of("201811change.x", "201812change.x", "201901change.x"))
str(z)
输出(结果数据帧的结构):
'data.frame': 983 obs. of 13 variables:
$ Brand.x : Factor w/ 26 levels "A","B","C","D",..: 16 23 19 20 19 26 7 21 22 9 ...
$ Cust.x : Factor w/ 26 levels "A","B","C","D",..: 21 15 25 3 24 2 1 26 3 23 ...
$ Article : Factor w/ 26 levels "A","B","C","D",..: 13 14 2 17 23 13 4 1 17 15 ...
$ 201808change.x: num -1.398 -0.357 -1.042 -0.653 -1.037 ...
$ 201809change.x: num 1.483 0.604 0.276 0.846 -1.245 ...
$ 201903change.x: num -0.733 -0.413 0.61 -1.037 1.048 ...
$ 201904change.x: num -0.794 -1.0688 0.577 0.3368 0.0472 ...
$ 201905change.x: num -0.427 -0.898 1.124 -0.435 -0.304 ...
$ 201906change.x: num 2.094 0.177 -0.892 -1.655 -1.091 ...
$ 201907change.x: num 0.228 0.546 0.141 -1.166 -0.687 ...
$ 201811change : num 1.5082 0.0148 -0.5335 -0.763 -1.7196 ...
$ 201812change : num 1.415 -2.128 -0.576 1.205 -0.631 ...
$ 201901change : num -0.883 -0.892 -2.032 -2.172 0.483 ...