我获得了一个Excel史诗的例子。宽表说明了产品对(在行中)和机器(在列中)的容量。该表看起来类似于下一个可重复示例中的表(请注意使用data.table
。data.frame
/ tidyverse解决方案是受欢迎的,但首选data.table
个解决方案):
a <- data.table(names = c("product 1", "product 2"), "9-10" = c(1, 5), "21-23" = c(3, 2))
> a
names 9-10 21-23
1: product 1 1 3
2: product 2 5 2
问题是“9-10”意味着机器9和10具有相同的容量(分别为产品1和2的1和5)。我正在寻找一种方法,以一个看起来像b
的表结束:
> b
names 9 10 21 23
1: product 1 1 1 3 3
2: product 2 5 5 2 2
我用以下代码实现了它:
for (i in unlist(strsplit(names(a)[2:3], split = "-", fixed = TRUE))){
a[, print(i) := .SD, .SDcols = grep(paste0(i, "\\b"), names(a)[2:3], value = TRUE)]
}
a[, names(a)[2:3] := NULL]
我想知道什么是更干净的方法。
答案 0 :(得分:4)
使用data.table
我们可以创建一个索引和子集,然后调整名称。
# data
a <- data.table(names = c("product 1", "product 2"),
"9-10" = c(1, 5),
"21-23" = c(3, 2))
# names split
name_pos <- strsplit(names(a), split = "-")
# create index for subsetting based on name_pos
index <- rep(seq_along(name_pos), times = lengths(name_pos))
# index and adjust names
a_final <- a[, ..index]
# thanks to Frank for suggestion
setnames(a_final, unlist(name_pos))
答案 1 :(得分:4)
data.table
的另一种可能性:
melt(a, id = 1)[, unlist(tstrsplit(variable,'-')), by = .(names, value)
][, dcast(.SD, names ~ V1)]
给出:
names 10 21 23 9 1: product 1 1 3 3 1 2: product 2 5 2 2 5
答案 2 :(得分:1)
解决方案是将tidyr
用作:
library(tidyr)
library(dplyr)
a %>% gather(variable, value, -names) %>%
separate(variable, c("col1","col2")) %>% mutate(value2 = value) %>%
spread(col1, value) %>% spread(col2, value2) %>%
group_by(names) %>%
summarise_all(sum,na.rm = TRUE) %>%
as.data.frame()
# names 21 9 10 23
# 1 product 1 3 1 1 3
# 2 product 2 2 5 5 2