R:使用ddply并将具有不同拼写的城市名称组合在一起

时间:2016-01-22 18:29:06

标签: r plyr

我有一个看起来像这样的数据集,

             Locations    Lat     Long
1                El Ay 36.086    4.777
2  Burbank, California 34.181 -118.309
3        Nashville, TN 36.163  -86.782
4           On the lam 42.920  -80.285
5          San Dog, CA 32.734 -117.193
6        New York City 40.713  -74.006
7            Dreamland 33.642  -97.315
8                   LA 34.052 -118.244
9          Los Angeles 34.052 -118.244
10       United States 37.090  -95.713

基本上,第一列是用户输入的地点名称,第2列和第3列是这些城市的纬度和经度。

我想使用ddply()按照Lat和Lng列出城市的频率来总结这个数据集,我尝试了ddply(data, .(Lat, Long), summarize, count = length(Lat))它给了我下面的表格(没有城市名称)

     Lat     Long count
1 32.734 -117.193     1
2 33.642  -97.315     1
3 34.052 -118.244     2
4 34.181 -118.309     1
5 36.086    4.777     1
6 36.163  -86.782     1
7 37.090  -95.713     1
8 40.713  -74.006     1
9 42.920  -80.285     1

我也尝试了ddply(data, .(Locations, Lat, Long), summarize, count = length(Lat))并获得了

             Locations    Lat     Long count
1  Burbank, California 34.181 -118.309     1
2            Dreamland 33.642  -97.315     1
3                El Ay 36.086    4.777     1
4                   LA 34.052 -118.244     1
5          Los Angeles 34.052 -118.244     1
6        Nashville, TN 36.163  -86.782     1
7        New York City 40.713  -74.006     1
8           On the lam 42.920  -80.285     1
9          San Dog, CA 32.734 -117.193     1
10       United States 37.090  -95.713     1

我想保留列名,但也希望将洛杉矶和洛杉矶列在一起(名称可以是洛杉矶或洛杉矶)。我该怎么办?

由于

1 个答案:

答案 0 :(得分:3)

使用dplyr,按常用纬度和经度将位置组合在一起并提供计数。如果同一纬度/经度有多个名称,则只保留名字。

library(dplyr)

data2 <- data %>%
  group_by(Lat, Long) %>%
  summarize(
    Locations = first(Locations),
    Count = n())

结果:

> data2
Source: local data frame [9 x 4]
Groups: Lat [?]

     Lat     Long          Locations Count
   (dbl)    (dbl)             (fctr) (int)
1 32.734 -117.193          SanDog,CA     1
2 33.642  -97.315          Dreamland     1
3 34.052 -118.244                 LA     2
4 34.181 -118.309 Burbank,California     1
5 36.086    4.777               ElAy     1
6 36.163  -86.782       Nashville,TN     1
7 37.090  -95.713       UnitedStates     1
8 40.713  -74.006        NewYorkCity     1
9 42.920  -80.285           Onthelam     1