如何通过在数据框R

时间:2018-01-04 06:41:24

标签: r

如何通过在数据框R中创建列值列表来对列值进行分组。

我的数据框,

CustNumber    Queue        CustID          ProNo#

1             Start         1               ESC

2             Start         1               Check

1             Start         1,1,1           hjju623,hjju623

1             Start         1,2,1,1         First44,ESC

2             Start         1,etc,ex        rere43

3             Start         1, 5597595494   151ss5151, 4949we49

我正在使用下面的代码通过摸索CustNumber,Queue来创建列值列表。

val<- df %>%
  gather(key,Value, -c(Queue,CustNumber)) %>%
  group_by(Queue,CustNumber, key,Value) %>%
  summarise(Count = n())%>%
  nest(key,Value,Count,.key = "listofvalues")

它给出了。

Queue     CustNumber    Key     listofvalues

Start       1          CustID   list(Value = c("1", "1,1,1", "1,2,1,1"), Count = c(1, 1, 1))

Start       1          ProNo#   list(Value = c("ESC", "First44,ESC", "hjju623,hjju623"), Count = c(1, 1, 1))

Start       2          CustID   list(Value = c("1", "1,etc,ex"), Count = c(1, 1))   

Start       2          ProNo#   list(Value = c("Check", "rere43"), Count = c(1, 1))  

Start       3          CustID   list(Value = "1, 5597595494", Count = 1)

Start       3          ProNo#   list(Value = "151ss5151, 4949we49", Count = 1)

但我的预期数据框是

Queue     CustNumber    Key     listofvalues

Start       1          CustID   list(Value = c("1", "2"), Count = c(7,1))

Start       1          ProNo#   list(Value = c("ESC", "First44", "hjju623"), Count = c(2, 1, 2))

Start       2          CustID   list(Value = c("1", "etc","ex"), Count = c(2, 1,1))   

Start       2          ProNo#   list(Value = c("Check", "rere43"), Count = c(1, 1))  

Start       3          CustID   list(Value = "1", "5597595494", Count = c(1,1))

Start       3          ProNo#   list(Value = "151ss5151", "4949we49", Count = c(1,1))

请帮我完成这件事。

数据框的输入。

df<-structure(list(CustNumber = c("1", "2", "1", 
"1", "2", "3"), Queue = c("Start", "Start", 
"Start", "Start", "Start", "Start"), CustID = c("1", "1", "1,1,1", 
"1,2,1,1", "1,etc,ex", "1, 5597595494"), `ProNo#` = c("ESC", "Check", "hjju623,hjju623", 
"First44,ESC", "rere43", "151ss5151, 4949we49")), .Names = c("CustNumber", 
"Queue", "CustID", "ProNo#"), row.names = c(NA, 6L), class = "data.frame")

2 个答案:

答案 0 :(得分:0)

我们需要拆分字符串值。使用separate_rows我们可以将其转换为&#39; long&#39;格式,然后在summarise内获取unique&#39;值&#39;以及table

的频率
library(dplyr)
library(tidyr)
res <- df %>% 
         gather(key,Value, -c(Queue,CustNumber)) %>% 
         separate_rows(Value, sep=",") %>% 
         group_by(CustNumber, Queue, key) %>% 
         summarise(Count = list(list(Value = unique(Value),
                            Count = table(factor(Value, levels = unique(Value))))))

res$Count[[1]]
#$Value
#[1] "1" "2"

#$Count

#1 2 
#7 1 

答案 1 :(得分:0)

这应该给出所需的输出:

library(tidyr)
library(dplyr)
df %>%
  gather(key, Value, -c(Queue,CustNumber)) %>% 
  rowwise() %>% 
  mutate(value = strsplit(Value, split = ",")) %>% 
  unnest() %>% 
  group_by(Queue, CustNumber, key, Value) %>%
  summarise(Count = n()) %>% 
  nest(key, Value, Count, .key = "listofvalues")