我正在尝试使用df1中的街道名称来定位sdf2中的邮政编码。 df2包含与df1相同的街道名称,但街道可以与多个邮政编码链接(在几个城市中可以找到相同的街道名称)。要查找返回多个值的邮政编码,我需要采用最常见的结果。找到邮政编码后,我需要将其添加到df1中与街道名称相同的一行中的新列。 df1包含500,000行,df2包含900,000多行
head(AllNICrimeData, 10)
Month Longitude Latitude Location Crime.type
1 2015-01 -6.003289 54.55165 SALISBURY PLACE Anti-social behaviour
2 2015-01 -5.707979 54.59231 Anti-social behaviour
3 2015-01 -5.815976 54.73161 MILEBUSH PARK Anti-social behaviour
4 2015-01 -6.393411 54.19788 COLLEGE SQUARE NORTH Anti-social behaviour
5 2015-01 -6.251798 54.85970 STAFFA DRIVE Anti-social behaviour
6 2015-01 -7.206893 54.62265 KILLYCLOGHER ROAD Anti-social behaviour
7 2015-01 -5.915793 54.59242 RAVENHILL REACH Anti-social behaviour
8 2015-01 -5.535389 54.48792 Anti-social behaviour
9 2015-01 -7.322812 54.99940 GREAT JAMES STREET Anti-social behaviour
10 2015-01 -5.954670 54.61568 JAMAICA ROAD Anti-social behaviour
head(CleanNIPostcodeData[, 6:14],)
Number Primary_Thorfare Alt_Thorfare Secondary_Thorfare Locality
Townland Town County Postcode
1 134 WHITEPARK ROAD <NA> <NA> BALLINTOY
BALLINTOY DEMESNE BALLYCASTLE ANTRIM BT546ND
2 27 PRINCESS STREET <NA> <NA> <NA>
PORT RUSH PORTRUSH ANTRIM BT568AX
3 <NA> COVEHILL COURT <NA> <NA> <NA>
GLENAMANUS PORTRUSH ANTRIM BT568GL
4 271 OLDPARK ROAD <NA> <NA> <NA>
TOWN PARKS BELFAST ANTRIM BT146QR
5 2A RAMORE STREET <NA> <NA> <NA>
PORT RUSH PORTRUSH ANTRIM BT568BD
6 52 EGLINTON STREET <NA> <NA> <NA>
PORT RUSH PORTRUSH ANTRIM BT568DY
我需要实现的是在df1中找到与街道关联的df2中的频繁邮政编码,并将邮政编码添加到与df 1中的街道相同的行中的新列。下面的示例显示了位置的位置与多个邮政编码相关联:
table(CleanNIPostcodeData$Postcode[AllNICrimeData$Location[3] == CleanNIPostcodeData$Primary_Thorfare])
BT387PU BT387QR
22 64
我已经能够确定如何获得最频繁的邮政编码,当多个邮政编码与某个位置相关联,但我无法使用所有街道的邮政编码创建新列。
names(which.max(table(CleanNIPostcodeData$Postcode[AllNICrimeData$Location[3] == CleanNIPostcodeData$Primary_Thorfare])))
在上面的例子中,我找到了df1中第3个街道名称最常见的邮政编码。输出是邮政编码“BT387QR”
如何获取上面的代码以应用于整个列并在df1中创建并填充新的邮政编码列
预期输出是df1中的新列,其中包含街道名称的匹配邮政编码。
答案 0 :(得分:1)
您所需要的只是使用dplyr::left_join
加入两个data.frames并获取Postcode
以下结果是修改后的数据显示逻辑。
library(dplyr)
AllNICrimeData %>% left_join(select(CleanNIPostcodeData, Primary_Thorfare,Postcode) ,
by=c("Location" = "Primary_Thorfare"))
# Month Longitude Latitude Location Crime.type Postcode
# 1 2015-01 -6.003289 54.55165 SALISBURY PLACE Anti-social behaviour <NA>
# 2 2015-01 -5.707979 54.59231 <NA> Anti-social behaviour <NA>
# 3 2015-01 -5.815976 54.73161 MILEBUSH PARK Anti-social behaviour <NA>
# 4 2015-01 -6.393411 54.19788 COLLEGE SQUARE NORTH Anti-social behaviour <NA>
# 5 2015-01 -6.251798 54.85970 STAFFA DRIVE Anti-social behaviour <NA>
# 6 2015-01 -7.206893 54.62265 KILLYCLOGHER ROAD Anti-social behaviour <NA>
# 7 2015-01 -5.915793 54.59242 RAVENHILL REACH Anti-social behaviour <NA>
# 8 2015-01 -5.535389 54.48792 <NA> Anti-social behaviour <NA>
# 9 2015-01 -7.322812 54.99940 GREAT JAMES STREET Anti-social behaviour <NA>
# 10 2015-01 -5.954670 54.61568 JAMAICA ROAD Anti-social behaviour BT568DY
如果我必须保留OP提到的Postcode
搜索逻辑,那么解决方案可以写成:
AllNICrimeData$newcol <- mapply(function(x)names(which.max(table(CleanNIPostcodeData$Postcode[x == CleanNIPostcodeData$Primary_Thorfare]))),
AllNICrimeData$Location)
数据:强>
AllNICrimeData <- read.table(text =
"Month Longitude Latitude Location Crime.type
1 2015-01 -6.003289 54.55165 ' SALISBURY PLACE' 'Anti-social behaviour'
2 2015-01 -5.707979 54.59231 NA 'Anti-social behaviour'
3 2015-01 -5.815976 54.73161 'MILEBUSH PARK' 'Anti-social behaviour'
4 2015-01 -6.393411 54.19788 'COLLEGE SQUARE NORTH' 'Anti-social behaviour'
5 2015-01 -6.251798 54.85970 'STAFFA DRIVE' 'Anti-social behaviour'
6 2015-01 -7.206893 54.62265 'KILLYCLOGHER ROAD' 'Anti-social behaviour'
7 2015-01 -5.915793 54.59242 'RAVENHILL REACH' 'Anti-social behaviour'
8 2015-01 -5.535389 54.48792 NA 'Anti-social behaviour'
9 2015-01 -7.322812 54.99940 'GREAT JAMES STREET' 'Anti-social behaviour'
10 2015-01 -5.954670 54.61568 'JAMAICA ROAD' 'Anti-social behaviour'",
header = TRUE, stringsAsFactors = FALSE)
CleanNIPostcodeData <- read.table(text =
"Number Primary_Thorfare Alt_Thorfare Secondary_Thorfare Locality Townland Town County Postcode
1 134 'WHITEPARK ROAD' <NA> <NA> BALLINTOY 'BALLINTOY DEMESNE' BALLYCASTLE ANTRIM BT546ND
2 27 'PRINCESS STREET' <NA> <NA> <NA> 'PORT RUSH' PORTRUSH ANTRIM BT568AX
3 <NA> 'COVEHILL COURT' <NA> <NA> <NA> GLENAMANUS PORTRUSH ANTRIM BT568GL
4 271 'OLDPARK ROAD' <NA> <NA> <NA> 'TOWN PARKS' BELFAST ANTRIM BT146QR
5 2A 'RAMORE STREET' <NA> <NA> <NA> 'PORT RUSH' PORTRUSH ANTRIM BT568BD
6 52 'JAMAICA ROAD' <NA> <NA> <NA> 'PORT RUSH' PORTRUSH ANTRIM BT568DY",
header = TRUE, stringsAsFactors = FALSE)