比较两个数据框并获得最大值

时间:2019-05-20 21:37:10

标签: r dataframe dplyr

我正在尝试比较结构相同(尺寸,列名,行名等)相同的两个数据帧(df1,df2),并在两个数据帧之间保持最大值。我实际上有数百列和行,但这是一些伪装的数据:

df1:
Date       Fruit  Num  Color 
2013-11-24 Banana 2 Yellow
2013-11-24 Orange  8 Orange
2013-11-24 Apple   7 Green
2013-11-24 Celery 10 Green

df2:
Date       Fruit  Num  Color 
2013-11-24 Banana 22 Yellow
2013-11-24 Orange  8 Orange
2013-11-24 Apple   7 Green
2013-11-24 Celery 1 Green

关于SO做类似事情的例子很多,但在python中不是R: Comparing two dataframes and getting the differencesCompare two dataframes to get comparison value in in another dataframe

我尝试了dplyr方法,但是我不知道如何对所有列(数百个)正确执行此操作。

library(dplyr)
test <- rbind(df1, df2)
test2 <- test %>%
  group_by(Date) %>%
summarise(max = max(.))

鉴于我上面的假装数据,所需的输出应为:

new.df:
Date       Fruit  Num  Color 
2013-11-24 Banana 22 Yellow
2013-11-24 Orange  8 Orange
2013-11-24 Apple   7 Green
2013-11-24 Celery 10 Green

感谢您的帮助。

3 个答案:

答案 0 :(得分:1)

尝试一下:

test %>%
  group_by_if(.,is.factor) %>%
  summarise_if(is.numeric, max)

# A tibble: 4 x 4
# Groups:   Date, Fruit [?]
  Date       Fruit  Color    Num
  <fct>      <fct>  <fct>  <dbl>
1 2013-11-24 Apple  Green      7
2 2013-11-24 Banana Yellow    22
3 2013-11-24 Celery Green     10
4 2013-11-24 Orange Orange     8

答案 1 :(得分:1)

一种可能性是将所有非数字列分组,然后获取数字列的最大值:

library(tidyverse)

rbind(df1, df2) %>%
    group_by_at(vars(one_of(names(select_if(df2,negate(is.numeric)))))) %>%
    summarise_if(is.numeric, max)

#> # A tibble: 4 x 4
#> # Groups:   Date, Fruit [4]
#>   Date       Fruit  Color    Num
#>   <fct>      <fct>  <fct>  <dbl>
#> 1 2013-11-24 Apple  Green      7
#> 2 2013-11-24 Banana Yellow    22
#> 3 2013-11-24 Celery Green     10
#> 4 2013-11-24 Orange Orange     8

reprex package(v0.2.1)于2019-05-20创建

您也可以尝试合并两个数据框,然后保持最大值:

df1 %>% right_join(df2, by=c("Date","Fruit","Color")) %>% 
        mutate(Num = pmax(Num.x, Num.y)) %>% select(-Num.x, -Num.y)

答案 2 :(得分:-1)

或尝试

2 21.05.2019 10:10:00
4 21.05.2019 10:30:00
6 21.05.2019 10:50:00
1 21.05.2019 10:00:00
7 21.05.2019 11:00:00