如何通过在数据框R中创建列值列表来对列值进行分组。
我的数据框,
CustNumber Queue CustID ProNo#
1 Start 1 ESC
2 Start 1 Check
1 Start 1,1,1 hjju623,hjju623
1 Start 1,2,1,1 First44,ESC
2 Start 1,etc,ex rere43
3 Start 1, 5597595494 151ss5151, 4949we49
我正在使用下面的代码通过摸索CustNumber,Queue来创建列值列表。
val<- df %>%
gather(key,Value, -c(Queue,CustNumber)) %>%
group_by(Queue,CustNumber, key,Value) %>%
summarise(Count = n())%>%
nest(key,Value,Count,.key = "listofvalues")
它给出了。
Queue CustNumber Key listofvalues
Start 1 CustID list(Value = c("1", "1,1,1", "1,2,1,1"), Count = c(1, 1, 1))
Start 1 ProNo# list(Value = c("ESC", "First44,ESC", "hjju623,hjju623"), Count = c(1, 1, 1))
Start 2 CustID list(Value = c("1", "1,etc,ex"), Count = c(1, 1))
Start 2 ProNo# list(Value = c("Check", "rere43"), Count = c(1, 1))
Start 3 CustID list(Value = "1, 5597595494", Count = 1)
Start 3 ProNo# list(Value = "151ss5151, 4949we49", Count = 1)
但我的预期数据框是
Queue CustNumber Key listofvalues
Start 1 CustID list(Value = c("1", "2"), Count = c(7,1))
Start 1 ProNo# list(Value = c("ESC", "First44", "hjju623"), Count = c(2, 1, 2))
Start 2 CustID list(Value = c("1", "etc","ex"), Count = c(2, 1,1))
Start 2 ProNo# list(Value = c("Check", "rere43"), Count = c(1, 1))
Start 3 CustID list(Value = "1", "5597595494", Count = c(1,1))
Start 3 ProNo# list(Value = "151ss5151", "4949we49", Count = c(1,1))
请帮我完成这件事。
数据框的输入。
df<-structure(list(CustNumber = c("1", "2", "1",
"1", "2", "3"), Queue = c("Start", "Start",
"Start", "Start", "Start", "Start"), CustID = c("1", "1", "1,1,1",
"1,2,1,1", "1,etc,ex", "1, 5597595494"), `ProNo#` = c("ESC", "Check", "hjju623,hjju623",
"First44,ESC", "rere43", "151ss5151, 4949we49")), .Names = c("CustNumber",
"Queue", "CustID", "ProNo#"), row.names = c(NA, 6L), class = "data.frame")
答案 0 :(得分:0)
我们需要拆分字符串值。使用separate_rows
我们可以将其转换为&#39; long&#39;格式,然后在summarise
内获取unique
&#39;值&#39;以及table
library(dplyr)
library(tidyr)
res <- df %>%
gather(key,Value, -c(Queue,CustNumber)) %>%
separate_rows(Value, sep=",") %>%
group_by(CustNumber, Queue, key) %>%
summarise(Count = list(list(Value = unique(Value),
Count = table(factor(Value, levels = unique(Value))))))
res$Count[[1]]
#$Value
#[1] "1" "2"
#$Count
#1 2
#7 1
答案 1 :(得分:0)
这应该给出所需的输出:
library(tidyr)
library(dplyr)
df %>%
gather(key, Value, -c(Queue,CustNumber)) %>%
rowwise() %>%
mutate(value = strsplit(Value, split = ",")) %>%
unnest() %>%
group_by(Queue, CustNumber, key, Value) %>%
summarise(Count = n()) %>%
nest(key, Value, Count, .key = "listofvalues")