编辑:为超过最小的例子道歉。我用一个更简约的例子来解决这个问题,看起来像aosmith的答案已经解决了!
这是this question之后的下一步,在同一过程中。这真是太过分了。
我有一个包含一系列变量的数据集,每个变量都有低,中和高值。还有多个识别变量,我在这里呼叫"场景"和"月"只是为了这个例子。我正在进行涉及3个不同值的计算,其中一些值具有低,中或高值,在每种情况下和每个月都有所不同。
# generating a practice dataset
library(dplyr)
library(tidyr)
set.seed(123)
pracdf <- bind_cols(expand.grid(ID = letters[1:2],
month = 1:2,
scenario = c("a", "b")),
data_frame(p.mid = runif(8, 100, 1000),
a = rep(runif(2), 4),
b = rep(runif(2), 4),
c = rep(runif(2), 4)))
pracdf <- pracdf %>% mutate(p.low = p.mid * 0.75,
p.high = p.mid * 1.25) %>%
gather(p.low, p.mid, p.high, key = "ptype", value = "p")
# all of that is just to generate the practice dataset.
# 2 IDs * 2 months * 2 scenarios * 3 different values of p = 24 total rows in this dataset
# Do the calculation
pracdf2 <- pracdf %>%
mutate(result = p * a * b * c)
这完全&#34;聚集&#34;数据集具有我想要的结果。让我们做一个扩展类型的操作,以一种更具可读性的方式来实现这一点,每个月,场景和p型组合都有它自己的列。示例列名称为&quot; month1_scenario.a_p.low&#39;。此数据集的总数为2个月* 3 p类型* 2个方案= 12列。
# this fully "gathered" dataset is exactly what I want.
# Let's put it in a format that the supervisor for this project will be happy with
# ID, month, scenario, and p.type are all "key" variables
# spread() only allows one key variable at a time, so...
pracdf2.spread1 <- pracdf2 %>% spread(ptype, result, sep = ".")
# Produces NA's. Looks like it's messing up with the different values of p
pracdf2.spread2 <- pracdf2 %>% select(-p) %>% spread(ptype, result, sep = ".")
# that's better, now let's spread across scenarios
pracdf2.spread2.spread2low <- pracdf2.spread2 %>% select(-ptype.p.high, -ptype.p.mid) %>% spread(scenario, ptype.p.low, sep = ".")
pracdf2.spread2.spread2mid <- pracdf2.spread2 %>% select(-ptype.p.low, -ptype.p.high) %>% spread(scenario, ptype.p.mid, sep = ".")
pracdf2.spread2.spread2high <- pracdf2.spread2 %>% select(-ptype.p.mid, -ptype.p.low) %>% spread(scenario, ptype.p.high, sep = ".")
pracdf2.spread2.spread2 <- pracdf2.spread2.spread2low %>% left_join(pracdf2.spread2.spread2mid)
# Ok, that was rough and will clearly spiral out of control quickly
# what am I still doing with my life?
我可以使用spread()来传播每个键列,然后为每个后续值列重做点差,但这需要很长时间,并且可能容易出错。
这样做有更清洁,更整洁,更时尚的方法吗?
谢谢!
答案 0 :(得分:3)
您可以使用 tidyr 中的unite
在展开之前将三列合并为一列。
然后您可以spread
使用新列作为key
,将“结果”作为value
。
在传播之前,我还删除了“a”到“p”列,因为在所需的结果中似乎不需要这些。
pracdf2 %>%
unite("allgroups", month, scenario, ptype) %>%
select(-(a:p)) %>%
spread(allgroups, result)
# A tibble: 2 x 13
ID `1_a_p.high` `1_a_p.low` `1_a_p.mid` `1_b_p.high` `1_b_p.low` `1_b_p.mid` `2_a_p.high` `2_a_p.low`
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 160 96.2 128 423 254 338 209 126
2 b 120 72.0 96.0 20.9 12.5 16.7 133 79.5
# ... with 4 more variables: `2_a_p.mid` <dbl>, `2_b_p.high` <dbl>, `2_b_p.low` <dbl>, `2_b_p.mid` <dbl>