R - 将街道名称与来自不同df的邮政编码匹配,并将结果应用于新列

时间:2018-04-15 11:24:48

标签: r

我正在尝试使用df1中的街道名称来定位sdf2中的邮政编码。 df2包含与df1相同的街道名称,但街道可以与多个邮政编码链接(在几个城市中可以找到相同的街道名称)。要查找返回多个值的邮政编码,我需要采用最常见的结果。找到邮政编码后,我需要将其添加到df1中与街道名称相同的一行中的新列。 df1包含500,000行,df2包含900,000多行

head(AllNICrimeData, 10)
 Month Longitude Latitude             Location            Crime.type
1  2015-01 -6.003289 54.55165      SALISBURY PLACE Anti-social behaviour
2  2015-01 -5.707979 54.59231                      Anti-social behaviour
3  2015-01 -5.815976 54.73161        MILEBUSH PARK Anti-social behaviour
4  2015-01 -6.393411 54.19788 COLLEGE SQUARE NORTH Anti-social behaviour
5  2015-01 -6.251798 54.85970         STAFFA DRIVE Anti-social behaviour
6  2015-01 -7.206893 54.62265    KILLYCLOGHER ROAD Anti-social behaviour
7  2015-01 -5.915793 54.59242      RAVENHILL REACH Anti-social behaviour
8  2015-01 -5.535389 54.48792                      Anti-social behaviour
9  2015-01 -7.322812 54.99940   GREAT JAMES STREET Anti-social behaviour
10 2015-01 -5.954670 54.61568         JAMAICA ROAD Anti-social behaviour

head(CleanNIPostcodeData[, 6:14],)
Number Primary_Thorfare Alt_Thorfare Secondary_Thorfare  Locality          
Townland        Town County Postcode
1    134   WHITEPARK ROAD         <NA>               <NA> BALLINTOY 
BALLINTOY DEMESNE BALLYCASTLE ANTRIM  BT546ND
2     27  PRINCESS STREET         <NA>               <NA>      <NA>         
PORT RUSH    PORTRUSH ANTRIM  BT568AX
3   <NA>   COVEHILL COURT         <NA>               <NA>      <NA>        
GLENAMANUS    PORTRUSH ANTRIM  BT568GL
4    271     OLDPARK ROAD         <NA>               <NA>      <NA>        
TOWN PARKS     BELFAST ANTRIM  BT146QR
5     2A    RAMORE STREET         <NA>               <NA>      <NA>         
PORT RUSH    PORTRUSH ANTRIM  BT568BD
6     52  EGLINTON STREET         <NA>               <NA>      <NA>         
PORT RUSH    PORTRUSH ANTRIM  BT568DY

我需要实现的是在df1中找到与街道关联的df2中的频繁邮政编码,并将邮政编码添加到与df 1中的街道相同的行中的新列。下面的示例显示了位置的位置与多个邮政编码相关联:

table(CleanNIPostcodeData$Postcode[AllNICrimeData$Location[3] == CleanNIPostcodeData$Primary_Thorfare])
BT387PU BT387QR 
 22      64 

我已经能够确定如何获得最频繁的邮政编码,当多个邮政编码与某个位置相关联,但我无法使用所有街道的邮政编码创建新列。

names(which.max(table(CleanNIPostcodeData$Postcode[AllNICrimeData$Location[3] == CleanNIPostcodeData$Primary_Thorfare])))

在上面的例子中,我找到了df1中第3个街道名称最常见的邮政编码。输出是邮政编码“BT387QR”

如何获取上面的代码以应用于整个列并在df1中创建并填充新的邮政编码列

预期输出是df1中的新列,其中包含街道名称的匹配邮政编码。

1 个答案:

答案 0 :(得分:1)

您所需要的只是使用dplyr::left_join加入两个data.frames并获取Postcode

以下结果是修改后的数据显示逻辑。

library(dplyr)
AllNICrimeData %>% left_join(select(CleanNIPostcodeData, Primary_Thorfare,Postcode) , 
by=c("Location" = "Primary_Thorfare"))


#        Month Longitude Latitude             Location            Crime.type Postcode
#   1  2015-01 -6.003289 54.55165      SALISBURY PLACE Anti-social behaviour     <NA>
#   2  2015-01 -5.707979 54.59231                 <NA> Anti-social behaviour     <NA>
#   3  2015-01 -5.815976 54.73161        MILEBUSH PARK Anti-social behaviour     <NA>
#   4  2015-01 -6.393411 54.19788 COLLEGE SQUARE NORTH Anti-social behaviour     <NA>
#   5  2015-01 -6.251798 54.85970         STAFFA DRIVE Anti-social behaviour     <NA>
#   6  2015-01 -7.206893 54.62265    KILLYCLOGHER ROAD Anti-social behaviour     <NA>
#   7  2015-01 -5.915793 54.59242      RAVENHILL REACH Anti-social behaviour     <NA>
#   8  2015-01 -5.535389 54.48792                 <NA> Anti-social behaviour     <NA>
#   9  2015-01 -7.322812 54.99940   GREAT JAMES STREET Anti-social behaviour     <NA>
#   10 2015-01 -5.954670 54.61568         JAMAICA ROAD Anti-social behaviour  BT568DY

如果我必须保留OP提到的Postcode搜索逻辑,那么解决方案可以写成:

AllNICrimeData$newcol <- mapply(function(x)names(which.max(table(CleanNIPostcodeData$Postcode[x == CleanNIPostcodeData$Primary_Thorfare]))),
AllNICrimeData$Location)

数据:

AllNICrimeData <- read.table(text = 
"Month Longitude Latitude             Location            Crime.type
1  2015-01 -6.003289 54.55165     ' SALISBURY PLACE' 'Anti-social behaviour'
2  2015-01 -5.707979 54.59231                   NA   'Anti-social behaviour'
3  2015-01 -5.815976 54.73161        'MILEBUSH PARK' 'Anti-social behaviour'
4  2015-01 -6.393411 54.19788 'COLLEGE SQUARE NORTH' 'Anti-social behaviour'
5  2015-01 -6.251798 54.85970         'STAFFA DRIVE' 'Anti-social behaviour'
6  2015-01 -7.206893 54.62265    'KILLYCLOGHER ROAD' 'Anti-social behaviour'
7  2015-01 -5.915793 54.59242      'RAVENHILL REACH' 'Anti-social behaviour'
8  2015-01 -5.535389 54.48792                   NA   'Anti-social behaviour'
9  2015-01 -7.322812 54.99940   'GREAT JAMES STREET' 'Anti-social behaviour'
10 2015-01 -5.954670 54.61568         'JAMAICA ROAD' 'Anti-social behaviour'",
header = TRUE, stringsAsFactors = FALSE)



CleanNIPostcodeData <- read.table(text = 
"Number Primary_Thorfare    Alt_Thorfare Secondary_Thorfare  Locality  Townland               Town     County  Postcode
1    134   'WHITEPARK ROAD'         <NA>               <NA> BALLINTOY 'BALLINTOY DEMESNE' BALLYCASTLE    ANTRIM  BT546ND
2     27  'PRINCESS STREET'         <NA>               <NA>      <NA> 'PORT RUSH'            PORTRUSH    ANTRIM  BT568AX
3   <NA>   'COVEHILL COURT'         <NA>               <NA>      <NA> GLENAMANUS           PORTRUSH    ANTRIM  BT568GL
4    271     'OLDPARK ROAD'         <NA>               <NA>      <NA> 'TOWN PARKS'            BELFAST    ANTRIM  BT146QR
5     2A    'RAMORE STREET'         <NA>               <NA>      <NA> 'PORT RUSH'            PORTRUSH    ANTRIM  BT568BD
6     52  'JAMAICA ROAD'         <NA>               <NA>      <NA> 'PORT RUSH'            PORTRUSH    ANTRIM  BT568DY",
header = TRUE, stringsAsFactors = FALSE)