R-如何对特定列值进行分组并动态创建新的列值列表

时间:2017-10-05 02:34:27

标签: r

我有如下数据框,我需要对特定列进行分组并按列值创建新列表。

我的数据框

  Domain      Process      Name            value1          value2  

        ML          First       Peter             T1               45
        ML          First       Peter             FT               34
        ML          First       Peter             T1               34
        ML          First       Jhon              LL               11
        ML          First       Jhon              LL               11
        ML          Second      Peter             IO               22
        ML          Second      Peter             IO               33
        ML          Second      Peter             IO               33
        ML          four        Peter             IO               33 

我预期的数据框架。

Domain    Process      Name        column                listofvalues             

ML         First      Peter          value1               list(info1 = "T1", "Count"="2",list(info2 = "FT", "Count"="1"))
ML         First      Peter          value2               list(info1 = "45", "Count"="1",list(info2 ="34", "Count"="2"))
ML         First      Jhon           value1               list(info1 = "LL", "Count"="2") 
ML         First      Jhon           value2               list(info1 = "11", "Count"="2")            
ML         Second     Peter          value1               list(info1 = "IO", "Count"="3")
ML         Second     Peter          value2               list(info1 = "22", "Count"="1",list(info2 ="33", "Count"="2"))
ML         four       Peter          value1               list(info1 = "IO", "Count"="1")
ML  

输入数据。

structure(list(Domain = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), .Label = "ML", class = "factor"), Process = structure(c(1L, 
1L, 1L, 1L, 1L, 3L, 3L, 3L, 2L), .Label = c("First", "four", 
"Second"), class = "factor"), Name = structure(c(2L, 2L, 2L, 
1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Jhon", "Peter"), class = "factor"), 
    value1 = structure(c(4L, 1L, 4L, 3L, 3L, 2L, 2L, 2L, 2L), .Label = c("FT", 
    "IO", "LL", "T1"), class = "factor"), value2 = structure(c(5L, 
    4L, 4L, 1L, 1L, 2L, 3L, 3L, 3L), .Label = c("11", "22", "33", 
    "34", "45"), class = "factor")), .Names = c("Domain", "Process", 
"Name", "value1", "value2"), row.names = c(NA, -9L), class = "data.frame")

1 个答案:

答案 0 :(得分:1)

您可以使用gather中的nesttidyr来完成您的目标:

library(tidyr)
library(dplyr)

df <- df %>%
  gather(key, value, -c(Domain, Process, Name)) %>%
  group_by(Domain, Process, Name, key, value) %>%
  summarise(count = n()) %>%
  nest(key, value, count, .key = "listofvalues")

df

# # A tibble: 8 x 5
#     Domain Process  Name    key     listofvalues
#      <chr>   <chr> <chr>  <chr>           <list>
#   1     ML   First  Jhon value1 <tibble [1 x 2]>
#   2     ML   First  Jhon value2 <tibble [1 x 2]>
#   3     ML   First Peter value1 <tibble [2 x 2]>
#   4     ML   First Peter value2 <tibble [2 x 2]>
#   5     ML    four Peter value1 <tibble [1 x 2]>
#   6     ML    four Peter value2 <tibble [1 x 2]>
#   7     ML  Second Peter value1 <tibble [1 x 2]>
#   8     ML  Second Peter value2 <tibble [2 x 2]>

df$listofvalues[[3]]

# # A tibble: 2 x 2
#   value count
#   <chr> <int>
# 1    FT     1
# 2    T1     2

如果您确定spread嵌套列,则可以添加

mutate(listofvalues = purrr::map(listofvalues, spread, value, count))
但是,除非确实有必要,否则我不建议使用。部分原因是您的数字值会成为名称。

df$listofvalues[[4]]

# # A tibble: 1 x 2
#    `34`  `45`
# * <int> <int>
# 1     2     1