Question

我正在Pipe output of one data.frame to another using dplyr

上建立我之前关于SO的问题

我想创建六个相关矩阵，让我分析过去三年中花费和销售数量的相关性演变。实质上，我正在寻找2 X [3X3]类型列表。到目前为止，我可以使用3X3创建tidyr::map()列表，为每个Product_Type和Quantity单独调用，但我在一次向量化调用中未能成功。正如您将在下面看到的，我的代码中存在大量冗余。

这是我的数据：

dput(DFile_Gather)
structure(list(Order.ID = c(456, 567, 345, 567, 2345, 8910, 8910, 
789, 678, 456, 345, 8910, 234, 1234, 456), Calendar.Year = c(2015, 
2015, 2016, 2015, 2017, 2015, 2015, 2016, 2015, 2015, 2016, 2015, 
2016, 2016, 2015), Product_Type = c("Insurance", "Insurance", 
"Tire", "Tire", "Rental", "Insurance", "Servicing", "Truck", 
"Tire", "Servicing", "Truck", "Rental", "Car", "Servicing", "Tire"
), Mexican_Pesos = c(35797.32, 1916.25, 19898.62, 0, 22548.314011, 
686.88, 0, 0, 0, 0, 0, 0, 0, 0, 203276.65683), Quantity = c(0.845580721440663, 
0.246177053792905, 2.10266268677851, 1.89588258358317, 0.00223077008050406, 
0.454640961140588, 1.92032156606277, 0.475872861771994, 0.587966920885798, 
0.721024745664671, 0.696609684682582, 0.0441522564791413, 0.872232778060772, 
0.343347997825813, 0.716224049425646)), .Names = c("Order.ID", 
"Calendar.Year", "Product_Type", "Mexican_Pesos", "Quantity"), row.names = c(54L, 
55L, 13L, 15L, 50L, 58L, 28L, 37L, 16L, 24L, 33L, 48L, 2L, 29L, 
14L), class = "data.frame")

这是我第一次迭代的代码：即计算Product_Type的相关矩阵

DFile_Spread_PType <- spread(DFile_Gather[-length(DFile_Gather)],key = Product_Type, value = Mexican_Pesos)

DFile<-DFile_Spread_PType
CYear <- unique(DFile$Calendar.Year)
DFile_Corr_PType <- purrr::map(CYear, ~ dplyr::filter(DFile, Calendar.Year == .)) %>% 
  purrr::map(~ cor(.[,colnames(DFile)[3:length(colnames(DFile))]]) ) %>%
  structure(., names = CYear)

最后，这是我的代码，用于按数量进行相关矩阵的第二次迭代：

DFile_Spread_Qty <- spread(subset( DFile_Gather, select = -Mexican_Pesos),key = Product_Type, value = Quantity)
DFile<-DFile_Spread_Qty
DFile_Corr_Qty <- purrr::map(CYear, ~ dplyr::filter(DFile, Calendar.Year == .)) %>% 
  purrr::map(~ cor(.[,colnames(DFile)[3:length(colnames(DFile))]]) ) %>%
  structure(., names = CYear)

正如您在上面所看到的，冗余太多，代码看起来很笨重。如果有人能帮助我，我真诚地感激。我特意找两件事：

1）通过没有任何冗余来做我上面做的事

2）如果可能的话，在顶层获得2X3X3的列表，即Quantity和Product_Type，然后参考上述每一个的3x3相关矩阵。

我在SO上搜索了类似的主题，但我认为在类似主题上没有任何主题。

提前致谢。

Answer 1

以下没有冗余并且不使用包。将Product_Type设为一个因子，然后按年份拆分，给出年份s列表。现在，使用Map和s在每个内部迭代上使用Values并运行tapply，使用双cor转换为宽格式。

DG <- transform(DFile_Gather, Product_Type = factor(Product_Type))
s <- split(DG, DG$Calendar.Year)
Values <- c("Mexican_Pesos", "Quantity")
By <- c("Order.ID", "Product_Type")
res <- Map(function(v) Map(function(s) cor(tapply(s[, v], s[By], c)), s), Values)

Answer 2

要获得每个响应变量和年份组合的Product_Type之间的相关性，您可以将数据集重新整形为方便的格式，将数据集拆分为多个因子组合的列表，并通过{{获取相关性1}}在map的帮助下选择列。但是，这不会返回列表列表。

dplyr::select

列表列表采取了额外步骤，因为我必须先通过响应变量library(purrr) library(tidyr) DFile_Gather %>% gather(type, value, Mexican_Pesos:Quantity) %>% spread(Product_Type, value) %>% split(list(.$Calendar.Year, .$type)) %>% map(~cor(dplyr::select(.x, Car:Truck)))，然后在该列表的每个元素中split按split。然后，我使用Calendar.Year代替at_depth来计算列表中每个列表的map之间的相关性。最低级别的工作由Product_Type中的2表示。

at_depth

收集和传播后的临时数据集的前几行/列如下所示：

DFile_Gather %>%
    gather(type, value, Mexican_Pesos:Quantity) %>%
    spread(Product_Type, value) %>%
    split(.$type) %>%
    map(~split(.x, .x$Calendar.Year)) %>%
    at_depth(2, ~cor(dplyr::select(.x, Car:Truck)))

使用tidyr :: map或dplyr管道输出多个变量

2 个答案: