我有2000行,其中有一些重复项,我想根据重复项对行进行平均。
Site Location Line Band1
Cal BC04 BC04A 130
Cal BC04 BC04B 131
Cal BC04 BC04C 129
我尝试过:
bind_cols(
FC %>% distinct(site) %>% .[,-Band1], # pull out columns we aren't aggregating
FC[,c(1, Band1)] %>% group_by(Band1) %>%
summarise_each(funs(mean)) %>% .[,-1] # aggregate other columns
)
所以理想情况下,我想得出以下结果:
Site Location Line Band1
Cal BC04 BC04A 130
答案 0 :(得分:2)
使用dplyr
,您可以执行以下操作:
df %>%
group_by(Site) %>%
filter(n() > 1) %>%
mutate(Band1 = mean(Band1)) %>%
slice(1) %>%
ungroup()
Site Location Line Band1
<chr> <chr> <chr> <dbl>
1 Cal BC04 BC04A 130
此处保留重复的“ Site”值,计算“ Band1”的平均值,并选择每个“ Site”的第一行。
也许您还想绑定重复行和非重复行:
df %>%
group_by(Site) %>%
filter(n() > 1) %>%
mutate(Band1 = mean(Band1)) %>%
slice(1) %>%
ungroup() %>%
bind_rows(df %>%
group_by(Site) %>%
filter(n() < 1) %>%
ungroup())
或者如果您只想根据每个“网站”的重复值进行计算:
df %>%
group_by(Site, dup = duplicated(Site)) %>%
filter(dup) %>%
mutate(Band1 = mean(Band1)) %>%
slice(1) %>%
ungroup() %>%
select(-dup)
Site Location Line Band1
<chr> <chr> <chr> <dbl>
1 Cal BC04 BC04B 130
答案 1 :(得分:1)
我喜欢这个的data.table
x <-data.frame(
Site = c( "Cal","Cal","Cal"),
Location = c( "BC04","BC04","BC04"),
Line = c( "BC04A","BC04B","BC04C"),
Band1= c(130,131, 129))
library( data.table)
x<- data.table( x )
x[ , .(Band1=mean( Band1 )) , by = c("Site","Location")]