我有一个包含地区名称,家庭纬度和经度的数据集。数据集具有 2000 household locations 。我想根据地区名称计算纬度和经度的平均值。接下来,我要添加两个新列(即Lat_mean,Long_mean),其中将存储每个家庭的平均Lat和Long。
我只能够汇总经度和纬度的平均值。我不知道如何将汇总数据粘贴为每个ID的新列(请参见代码)
id <- c(1,2,3,4,5,6)
district <- c("A", "B", "C", "A", "A", "B")
lat <- c(28.6, 30.2, 35.9, 27.5, 27.9, 31.5)
long <- c(77.5, 85.2, 66.5, 75.0, 79.2, 88.8)
df <- data.frame(id, district, lat, long)
df_group <- df %>% group_by(district) %>% summarise_at(vars(lat:long), mean)
我期望以下几点。 Lat_mean和Long_mean列将添加到“ df”,并且每个ID将具有基于地区名称的值。请参见下图。
答案 0 :(得分:3)
我们可以使用mutate_at
代替summarise_at
。在list
中,指定name
,这样它将创建一个以suffix
作为名称的新列
library(dplyr)
df %>%
group_by(district) %>%
mutate_at(vars(lat, long), list(mean = mean))
# A tibble: 6 x 6
# Groups: district [3]
# id district lat long lat_mean long_mean
# <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#1 1 A 28.6 77.5 28 77.2
#2 2 B 30.2 85.2 30.8 87
#3 3 C 35.9 66.5 35.9 66.5
#4 4 A 27.5 75 28 77.2
#5 5 A 27.9 79.2 28 77.2
#6 6 B 31.5 88.8 30.8 87
答案 1 :(得分:1)
> df %>%
mutate(lat_mean = ave(lat, district, FUN=mean),
lon_mean = ave(long, district, FUN=mean))
id district lat long lat_mean lon_mean
1 1 A 28.6 77.5 28.00 77.23333
2 2 B 30.2 85.2 30.85 87.00000
3 3 C 35.9 66.5 35.90 66.50000
4 4 A 27.5 75.0 28.00 77.23333
5 5 A 27.9 79.2 28.00 77.23333
6 6 B 31.5 88.8 30.85 87.00000