我有如下数据框,我需要对特定列进行分组并按列值创建新列表。
我的数据框
Domain Process Name value1 value2
ML First Peter T1 45
ML First Peter FT 34
ML First Peter T1 34
ML First Jhon LL 11
ML First Jhon LL 11
ML Second Peter IO 22
ML Second Peter IO 33
ML Second Peter IO 33
ML four Peter IO 33
我预期的数据框架。
Domain Process Name column listofvalues
ML First Peter value1 list(info1 = "T1", "Count"="2",list(info2 = "FT", "Count"="1"))
ML First Peter value2 list(info1 = "45", "Count"="1",list(info2 ="34", "Count"="2"))
ML First Jhon value1 list(info1 = "LL", "Count"="2")
ML First Jhon value2 list(info1 = "11", "Count"="2")
ML Second Peter value1 list(info1 = "IO", "Count"="3")
ML Second Peter value2 list(info1 = "22", "Count"="1",list(info2 ="33", "Count"="2"))
ML four Peter value1 list(info1 = "IO", "Count"="1")
ML
输入数据。
structure(list(Domain = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = "ML", class = "factor"), Process = structure(c(1L,
1L, 1L, 1L, 1L, 3L, 3L, 3L, 2L), .Label = c("First", "four",
"Second"), class = "factor"), Name = structure(c(2L, 2L, 2L,
1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Jhon", "Peter"), class = "factor"),
value1 = structure(c(4L, 1L, 4L, 3L, 3L, 2L, 2L, 2L, 2L), .Label = c("FT",
"IO", "LL", "T1"), class = "factor"), value2 = structure(c(5L,
4L, 4L, 1L, 1L, 2L, 3L, 3L, 3L), .Label = c("11", "22", "33",
"34", "45"), class = "factor")), .Names = c("Domain", "Process",
"Name", "value1", "value2"), row.names = c(NA, -9L), class = "data.frame")
答案 0 :(得分:1)
您可以使用gather
中的nest
和tidyr
来完成您的目标:
library(tidyr)
library(dplyr)
df <- df %>%
gather(key, value, -c(Domain, Process, Name)) %>%
group_by(Domain, Process, Name, key, value) %>%
summarise(count = n()) %>%
nest(key, value, count, .key = "listofvalues")
df
# # A tibble: 8 x 5
# Domain Process Name key listofvalues
# <chr> <chr> <chr> <chr> <list>
# 1 ML First Jhon value1 <tibble [1 x 2]>
# 2 ML First Jhon value2 <tibble [1 x 2]>
# 3 ML First Peter value1 <tibble [2 x 2]>
# 4 ML First Peter value2 <tibble [2 x 2]>
# 5 ML four Peter value1 <tibble [1 x 2]>
# 6 ML four Peter value2 <tibble [1 x 2]>
# 7 ML Second Peter value1 <tibble [1 x 2]>
# 8 ML Second Peter value2 <tibble [2 x 2]>
df$listofvalues[[3]]
# # A tibble: 2 x 2
# value count
# <chr> <int>
# 1 FT 1
# 2 T1 2
如果您确定spread
嵌套列,则可以添加
mutate(listofvalues = purrr::map(listofvalues, spread, value, count))
但是,除非确实有必要,否则我不建议使用。部分原因是您的数字值会成为名称。
df$listofvalues[[4]]
# # A tibble: 1 x 2
# `34` `45`
# * <int> <int>
# 1 2 1