我想合并15列,这些列具有3个相同的列(因此它具有5个相同的副本)。我的数据看起来像这样(例如,为了简单起见,它只有3个副本)
request.node
要这样
date sku1 prod1 tot1 sku2 prod2 tot2 sku3 prod3 tot3
01/02/2019 100 a 100
01/02/2019 100 a 200 101 b 50
02/02/2019 101 b 100
02/02/2019 101 b 50 102 c 100 100 a 50
02/02/2019 102 c 50
有人知道该怎么做吗?非常感谢
答案 0 :(得分:0)
一个选项是melt
中的data.table
,可能需要多个measure
patterns
library(data.table)
melt(setDT(df1), measure = patterns("^prod", "^tot"), na.rm = TRUE,
value.name = c( "all_prod", "total"))[, c(list(sku = first(sku1)),
lapply(.SD, sum, na.rm = TRUE)), .(date, all_prod),
.SDcols = c("total")][order(date)]
# date all_prod sku total
#1: 01/02/2019 a 100 300
#2: 01/02/2019 b 100 50
#3: 02/02/2019 b 101 150
#4: 02/02/2019 c 102 150
#5: 02/02/2019 a 101 50
df1 <- structure(list(date = structure(c(1L, 1L, 2L, 2L, 2L), .Label =
c("01/02/2019", "02/02/2019"), class = "factor"), sku1 = c(100, 100, 101, 101,
102), prod1 = structure(c(1L, 1L, 2L, 2L, 3L), .Label = c("a",
"b", "c"), class = "factor"), tot1 = c(100, 200, 100, 50, 50),
sku2 = c(NA, 101, NA, 102, NA), prod2 = structure(c(NA, 1L,
NA, 2L, NA), .Label = c("b", "c"), class = "factor"), tot2 = c(NA,
50, NA, 100, NA), sku3 = c(NA, NA, NA, 100, NA), prod3 =
structure(c(NA, NA, NA, 1L, NA), .Label = "a", class = "factor"), tot3 = c(NA,
NA, NA, 50, NA)), row.names = c(NA, -5L), class = "data.frame")
答案 1 :(得分:0)
使用dplyr
和tidyr
,我们可以将数据gather
转换为长格式,从列名中删除数字,spread
将其转换为宽格式,{{1} } group_by
和date
的值,并在每个组中获取prod
中的sum
个值。
tot
数据
library(dplyr)
library(tidyr)
df %>%
gather(key, value, -date, na.rm = TRUE) %>%
mutate(key = sub("(.*)\\d+", "\\1", key)) %>%
group_by(key) %>%
mutate(row = row_number()) %>%
spread(key, value) %>%
mutate_at(vars(sku, tot), as.numeric) %>%
group_by(date, prod) %>%
summarise(sku = sku[1L],
tot = sum(tot))
# date prod sku tot
# <fct> <chr> <dbl> <dbl>
#1 01/02/2019 a 100 300
#2 01/02/2019 b 101 50
#3 02/02/2019 a 100 50
#4 02/02/2019 b 101 150
#5 02/02/2019 c 102 150