通过汇总一些列并保留其余列来对数据进行分组

时间:2018-07-18 10:05:13

标签: r dataframe grouping

我有一个这样的数据框:

exdataframe <- data.frame(c(rep("ma1",4),rep("ma2",3),rep("ma3",2),rep("ma4",1)),
                          c(rep("1",4),rep("2",3),rep("3",2),rep("1",1)),
                          c(rep("xxx",4),rep("yyyy",3),rep("zz",2),rep("xxx",1)),
                          c("2018-05-27","2018-06-24", "2018-07-01" ,"2018-07-08","2018-06-24", "2018-07-01" ,"2018-07-08","2018-05-27","2018-06-24", "2018-07-01"),
                          c(112,1,3,0,0,0,3,19,45,9),
                          c(1000,0,0,0,200,300,8,90.9,0,1))
colnames(exdataframe) <- c("ID","classid","classname","date","x","y")

我想按列“ ID”将此数据帧分组,同时对x和y列求和,并保留所有列。当我这样做时:

exdataframe_gr <- exdataframe %>% group_by(ID) %>% filter(x == sum(x),y == sum(y))

我得到的数据帧只有一行,这是与原始数据帧中的一个条目相对应的行。我想要的输出是:

ID  ClassID Classname   Date                X   Y
ma1   1      xxx       "could be anything"  116 1000
ma2   2      yyyy      "could be anything"  3   508
ma3   3       zz       "could be anything"  64  90.9
ma4   1      xxx       "could be anything"  9   1

日期列可能是多余的-我不在乎其值。我的原始数据比这大得多-2000行45列。

我在这里搜索了互联网,但找不到类似的示例。感谢您的帮助,因为我找不到解决方案。

2 个答案:

答案 0 :(得分:0)

如果让您满意,请告诉我。不幸的是,这没有Date列,但是正如我所见"could be anything"一样,所以我想您不需要它。

exdataframe %>% 
  group_by(ID, classid, classname) %>% 
  summarise(x = sum(x),y=sum(y))

# A tibble: 4 x 5
# Groups:   ID, classid [?]
  ID    classid classname     x      y
  <fct> <fct>   <fct>     <dbl>  <dbl>
1 ma1   1       xxx         116 1000  
2 ma2   2       yyyy          3  508  
3 ma3   3       zz           64   90.9
4 ma4   1       xxx           9    1 

将保留所有列的解决方案:

exdataframe_gr <- exdataframe %>% 
  group_by(ID) %>% 
  mutate(x = sum(x),y=sum(y)) %>%
  ungroup() %>%
  distinct(ID, .keep_all = TRUE)

# A tibble: 4 x 6
  ID    classid classname date           x      y
  <fct> <fct>   <fct>     <fct>      <dbl>  <dbl>
1 ma1   1       xxx       2018-05-27   116 1000  
2 ma2   2       yyyy      2018-06-24     3  508  
3 ma3   3       zz        2018-05-27    64   90.9
4 ma4   1       xxx       2018-07-01     9    1

答案 1 :(得分:0)

library(tidyverse)
exdataframe %>% group_by(ID)%>% mutate_if(is.factor,as.character) %>% nest() %>%
                mutate(classid = map_chr(data,function(x) as.character(x[,'classid'][1,])),
                       classname = map_chr(data,function(x) as.character(x[,'classname'][1,])),
                       date = map_chr(data, function(x) paste(x[,'date'][1], collapse = " | ")),
                       x = map_dbl(data,function(x)sum(x[,'x'])),
                       y = map_dbl(data,function(x)sum(x[,'y']))) %>% 
               select(-data)



    # A tibble: 4 x 6
     ID    classid classname date                                             
      x      y
    <fct> <chr>   <chr>     <chr>                                        <dbl>  <dbl>
   1 ma1   1       xxx       "c(\"2018-05-27\", \"2018-06-24\", \"2018-~ 116    1.00e3 
   2 ma2   2       yyyy      "c(\"2018-06-24\", \"2018-07-01\", \"2018-~   3.00 5.08e2
   3 ma3   3       zz        "c(\"2018-05-27\", \"2018-06-24\")"          64.0  9.09e1
   4 ma4   1       xxx       2018-07-01                                    9.00 1.00e0