我有一个看起来像这样的数据集,
Locations Lat Long
1 El Ay 36.086 4.777
2 Burbank, California 34.181 -118.309
3 Nashville, TN 36.163 -86.782
4 On the lam 42.920 -80.285
5 San Dog, CA 32.734 -117.193
6 New York City 40.713 -74.006
7 Dreamland 33.642 -97.315
8 LA 34.052 -118.244
9 Los Angeles 34.052 -118.244
10 United States 37.090 -95.713
基本上,第一列是用户输入的地点名称,第2列和第3列是这些城市的纬度和经度。
我想使用ddply()按照Lat和Lng列出城市的频率来总结这个数据集,我尝试了ddply(data, .(Lat, Long), summarize, count = length(Lat))
它给了我下面的表格(没有城市名称)
Lat Long count
1 32.734 -117.193 1
2 33.642 -97.315 1
3 34.052 -118.244 2
4 34.181 -118.309 1
5 36.086 4.777 1
6 36.163 -86.782 1
7 37.090 -95.713 1
8 40.713 -74.006 1
9 42.920 -80.285 1
我也尝试了ddply(data, .(Locations, Lat, Long), summarize, count = length(Lat))
并获得了
Locations Lat Long count
1 Burbank, California 34.181 -118.309 1
2 Dreamland 33.642 -97.315 1
3 El Ay 36.086 4.777 1
4 LA 34.052 -118.244 1
5 Los Angeles 34.052 -118.244 1
6 Nashville, TN 36.163 -86.782 1
7 New York City 40.713 -74.006 1
8 On the lam 42.920 -80.285 1
9 San Dog, CA 32.734 -117.193 1
10 United States 37.090 -95.713 1
我想保留列名,但也希望将洛杉矶和洛杉矶列在一起(名称可以是洛杉矶或洛杉矶)。我该怎么办?
由于
答案 0 :(得分:3)
使用dplyr
,按常用纬度和经度将位置组合在一起并提供计数。如果同一纬度/经度有多个名称,则只保留名字。
library(dplyr)
data2 <- data %>%
group_by(Lat, Long) %>%
summarize(
Locations = first(Locations),
Count = n())
结果:
> data2
Source: local data frame [9 x 4]
Groups: Lat [?]
Lat Long Locations Count
(dbl) (dbl) (fctr) (int)
1 32.734 -117.193 SanDog,CA 1
2 33.642 -97.315 Dreamland 1
3 34.052 -118.244 LA 2
4 34.181 -118.309 Burbank,California 1
5 36.086 4.777 ElAy 1
6 36.163 -86.782 Nashville,TN 1
7 37.090 -95.713 UnitedStates 1
8 40.713 -74.006 NewYorkCity 1
9 42.920 -80.285 Onthelam 1