我有一个这样的数据框:
exdataframe <- data.frame(c(rep("ma1",4),rep("ma2",3),rep("ma3",2),rep("ma4",1)),
c(rep("1",4),rep("2",3),rep("3",2),rep("1",1)),
c(rep("xxx",4),rep("yyyy",3),rep("zz",2),rep("xxx",1)),
c("2018-05-27","2018-06-24", "2018-07-01" ,"2018-07-08","2018-06-24", "2018-07-01" ,"2018-07-08","2018-05-27","2018-06-24", "2018-07-01"),
c(112,1,3,0,0,0,3,19,45,9),
c(1000,0,0,0,200,300,8,90.9,0,1))
colnames(exdataframe) <- c("ID","classid","classname","date","x","y")
我想按列“ ID”将此数据帧分组,同时对x和y列求和,并保留所有列。当我这样做时:
exdataframe_gr <- exdataframe %>% group_by(ID) %>% filter(x == sum(x),y == sum(y))
我得到的数据帧只有一行,这是与原始数据帧中的一个条目相对应的行。我想要的输出是:
ID ClassID Classname Date X Y
ma1 1 xxx "could be anything" 116 1000
ma2 2 yyyy "could be anything" 3 508
ma3 3 zz "could be anything" 64 90.9
ma4 1 xxx "could be anything" 9 1
日期列可能是多余的-我不在乎其值。我的原始数据比这大得多-2000行45列。
我在这里搜索了互联网,但找不到类似的示例。感谢您的帮助,因为我找不到解决方案。
答案 0 :(得分:0)
如果让您满意,请告诉我。不幸的是,这没有Date
列,但是正如我所见"could be anything"
一样,所以我想您不需要它。
exdataframe %>%
group_by(ID, classid, classname) %>%
summarise(x = sum(x),y=sum(y))
# A tibble: 4 x 5
# Groups: ID, classid [?]
ID classid classname x y
<fct> <fct> <fct> <dbl> <dbl>
1 ma1 1 xxx 116 1000
2 ma2 2 yyyy 3 508
3 ma3 3 zz 64 90.9
4 ma4 1 xxx 9 1
将保留所有列的解决方案:
exdataframe_gr <- exdataframe %>%
group_by(ID) %>%
mutate(x = sum(x),y=sum(y)) %>%
ungroup() %>%
distinct(ID, .keep_all = TRUE)
# A tibble: 4 x 6
ID classid classname date x y
<fct> <fct> <fct> <fct> <dbl> <dbl>
1 ma1 1 xxx 2018-05-27 116 1000
2 ma2 2 yyyy 2018-06-24 3 508
3 ma3 3 zz 2018-05-27 64 90.9
4 ma4 1 xxx 2018-07-01 9 1
答案 1 :(得分:0)
library(tidyverse)
exdataframe %>% group_by(ID)%>% mutate_if(is.factor,as.character) %>% nest() %>%
mutate(classid = map_chr(data,function(x) as.character(x[,'classid'][1,])),
classname = map_chr(data,function(x) as.character(x[,'classname'][1,])),
date = map_chr(data, function(x) paste(x[,'date'][1], collapse = " | ")),
x = map_dbl(data,function(x)sum(x[,'x'])),
y = map_dbl(data,function(x)sum(x[,'y']))) %>%
select(-data)
# A tibble: 4 x 6
ID classid classname date
x y
<fct> <chr> <chr> <chr> <dbl> <dbl>
1 ma1 1 xxx "c(\"2018-05-27\", \"2018-06-24\", \"2018-~ 116 1.00e3
2 ma2 2 yyyy "c(\"2018-06-24\", \"2018-07-01\", \"2018-~ 3.00 5.08e2
3 ma3 3 zz "c(\"2018-05-27\", \"2018-06-24\")" 64.0 9.09e1
4 ma4 1 xxx 2018-07-01 9.00 1.00e0