如何基于两个重复的行来平均行?

时间:2019-05-18 06:59:40

标签: r duplicates average

我有2000行,其中有一些重复项,我想根据重复项对行进行平均。

Site  Location Line    Band1
Cal   BC04     BC04A   130
Cal   BC04     BC04B   131
Cal   BC04     BC04C   129

我尝试过:

 bind_cols(
    FC %>% distinct(site) %>% .[,-Band1],  # pull out columns we aren't      aggregating
  FC[,c(1, Band1)] %>% group_by(Band1) %>%
    summarise_each(funs(mean)) %>% .[,-1]  # aggregate other columns
)

所以理想情况下,我想得出以下结果:

Site  Location Line    Band1
Cal   BC04     BC04A   130

2 个答案:

答案 0 :(得分:2)

使用dplyr,您可以执行以下操作:

df %>%
 group_by(Site) %>%
 filter(n() > 1) %>%
 mutate(Band1 = mean(Band1)) %>%
 slice(1) %>%
 ungroup()

  Site  Location Line  Band1
  <chr> <chr>    <chr> <dbl>
1 Cal   BC04     BC04A   130

此处保留重复的“ Site”值,计算“ Band1”的平均值,并选择每个“ Site”的第一行。

也许您还想绑定重复行和非重复行:

df %>%
 group_by(Site) %>%
 filter(n() > 1) %>%
 mutate(Band1 = mean(Band1)) %>%
 slice(1) %>%
 ungroup() %>%
 bind_rows(df %>%
            group_by(Site) %>%
            filter(n() < 1) %>%
            ungroup())

或者如果您只想根据每个“网站”的重复值进行计算:

df %>%
 group_by(Site, dup = duplicated(Site)) %>%
 filter(dup) %>%
 mutate(Band1 = mean(Band1)) %>%
 slice(1) %>%
 ungroup() %>%
 select(-dup)

  Site  Location Line  Band1
  <chr> <chr>    <chr> <dbl>
1 Cal   BC04     BC04B   130

答案 1 :(得分:1)

我喜欢这个的data.table

x <-data.frame( 
Site = c( "Cal","Cal","Cal"),
Location = c( "BC04","BC04","BC04"),
Line = c( "BC04A","BC04B","BC04C"),
Band1= c(130,131, 129))

library( data.table)
x<- data.table( x )

x[ , .(Band1=mean( Band1 )) , by = c("Site","Location")]