我正在尝试比较结构相同(尺寸,列名,行名等)相同的两个数据帧(df1,df2),并在两个数据帧之间保持最大值。我实际上有数百列和行,但这是一些伪装的数据:
df1:
Date Fruit Num Color
2013-11-24 Banana 2 Yellow
2013-11-24 Orange 8 Orange
2013-11-24 Apple 7 Green
2013-11-24 Celery 10 Green
df2:
Date Fruit Num Color
2013-11-24 Banana 22 Yellow
2013-11-24 Orange 8 Orange
2013-11-24 Apple 7 Green
2013-11-24 Celery 1 Green
关于SO做类似事情的例子很多,但在python中不是R: Comparing two dataframes and getting the differences, Compare two dataframes to get comparison value in in another dataframe 等
我尝试了dplyr方法,但是我不知道如何对所有列(数百个)正确执行此操作。
library(dplyr)
test <- rbind(df1, df2)
test2 <- test %>%
group_by(Date) %>%
summarise(max = max(.))
鉴于我上面的假装数据,所需的输出应为:
new.df:
Date Fruit Num Color
2013-11-24 Banana 22 Yellow
2013-11-24 Orange 8 Orange
2013-11-24 Apple 7 Green
2013-11-24 Celery 10 Green
感谢您的帮助。
答案 0 :(得分:1)
尝试一下:
test %>%
group_by_if(.,is.factor) %>%
summarise_if(is.numeric, max)
# A tibble: 4 x 4
# Groups: Date, Fruit [?]
Date Fruit Color Num
<fct> <fct> <fct> <dbl>
1 2013-11-24 Apple Green 7
2 2013-11-24 Banana Yellow 22
3 2013-11-24 Celery Green 10
4 2013-11-24 Orange Orange 8
答案 1 :(得分:1)
一种可能性是将所有非数字列分组,然后获取数字列的最大值:
library(tidyverse)
rbind(df1, df2) %>%
group_by_at(vars(one_of(names(select_if(df2,negate(is.numeric)))))) %>%
summarise_if(is.numeric, max)
#> # A tibble: 4 x 4
#> # Groups: Date, Fruit [4]
#> Date Fruit Color Num
#> <fct> <fct> <fct> <dbl>
#> 1 2013-11-24 Apple Green 7
#> 2 2013-11-24 Banana Yellow 22
#> 3 2013-11-24 Celery Green 10
#> 4 2013-11-24 Orange Orange 8
由reprex package(v0.2.1)于2019-05-20创建
您也可以尝试合并两个数据框,然后保持最大值:
df1 %>% right_join(df2, by=c("Date","Fruit","Color")) %>%
mutate(Num = pmax(Num.x, Num.y)) %>% select(-Num.x, -Num.y)
答案 2 :(得分:-1)
或尝试
2 21.05.2019 10:10:00
4 21.05.2019 10:30:00
6 21.05.2019 10:50:00
1 21.05.2019 10:00:00
7 21.05.2019 11:00:00