我有两个数据集,一个数据集带有一个地址列,另一个数据集包含地点名称及其对应的纬度和经度。
商店的数据集:
+--------------------+-----------+--------------------------------------------------+
| Store name | Postcodes | Address |
+--------------------+-----------+--------------------------------------------------+
| Floral showers | 2000 | Street 45, Level 9, Sydney, New South Wales 2000 |
| Cookie box | 4300 | Shop 3, Queensland 4300 |
| Mango troopers | 2010 | Aberdeen, Bankstown, NSW |
| Building AE44 | 4300 | 778/9 Goulburn Street, QLD |
| Floral showers Co. | 2230 | Steert 47 Cronulla, New South Wales 2230 |
| Vinci supplies | 2560 | West AIRDS, Mayfaille NSW |
+--------------------+-----------+--------------------------------------------------+
最新信息的数据集:
+-------------------+-------+-------------+--------------+
| Locality | State | Latitude | Longitude |
+-------------------+-------+-------------+--------------+
| ABERDARE | NSW | 151.317476 | -32.977861 |
| ABERDEEN | NSW | 151.102917 | -32.14622 |
| ACACIA PLATEAU | NSW | 152.49765 | -28.36456 |
| AIRDS | NSW | 150.768408 | -34.194216 |
| ADAMINABY | NSW | 148.769744 | -35.997349 |
| ABERCROMBIE RIVER | NSW | 149.3476918 | -33.91030648 |
| CRONULLA | NSW | 151.136596 | -34.093213 |
| SYDNEY | NSW | 151.268071 | -33.794883 |
+-------------------+-------+-------------+--------------+
我想创建一个新列,以从地址列中获取每个商店的位置,并从其他数据集中填充纬度和经度。由于地址不是固定格式,因此我知道必须进行字符串搜索。但是,我不确定如何在两列之间进行比较。
以下是两个示例dput输出:
structure(list(Stores_names = c("Floral showers", "Cookie box",
"Mango troopers", "Building AE44", "Floral showers Co.", "Vinci supplies"
), Postcodes = c("2000", "4300", "2010", "4300", "2230", "2560"
), Address = c("Street 45, Level 9, Sydney, New South Wales 2000",
"Shop 3, Queensland 4300", "Aberdeen, Bankstown, NSW", "778/9 Goulburn Street, QLD",
"Steert 47 Cronulla, New South Wales 2230", "West AIRDS, Mayfaille NSW"
)), class = "data.frame", row.names = c(NA, -6L))
structure(list(Localities = c("ABERDARE", "ABERDEEN", "ACACIA PLATEAU",
"AIRDS", "ADAMINABY", "ABERCROMBIE RIVER", "CRONULLA", "SYDNEY"
), State = c("NSW", "NSW", "NSW", "NSW", "NSW", "NSW", "NSW",
"NSW"), lat = c("151.317476", "151.102917", "152.49765", "150.768408",
"148.769744", "149.3476918", "151.136596", "151.268071"), long = c("-32.977861",
"-32.14622", "-28.36456", "-34.194216", "-35.997349", "-33.91030648",
"-34.093213", "-33.794883")), class = "data.frame", row.names = c(NA,
-8L))
我的最终数据集应包含三个新列:位置,纬度和经度。
+--------------------+-----------+--------------------------------------------------+----------+------------+------------+
| Store name | Postcodes | Address | Locality | lat | long |
+--------------------+-----------+--------------------------------------------------+----------+------------+------------+
| Floral showers | 2000 | Street 45, Level 9, Sydney, New South Wales 2000 | Sydney | 151.268071 | -33.794883 |
| Cookie box | 4300 | Shop 3, Queensland 4300 | | | |
| Mango troopers | 2010 | Aberdeen, Bankstown, NSW | Aberdeen | 151.102917 | -32.14622 |
| Building AE44 | 4300 | 778/9 Goulburn Street, QLD | | | |
| Floral showers Co. | 2230 | Steert 47 Cronulla, New South Wales 2230 | Cronulla | 151.136596 | -34.093213 |
| Vinci supplies | 2560 | West AIRDS, Mayfaille NSW | AIRDS | 150.768408 | -34.194216 |
+--------------------+-----------+--------------------------------------------------+----------+------------+------------+
在lat long集中找不到的那些可以保留为空白,但是我需要来自store数据集的所有数据。
感谢您的帮助!
答案 0 :(得分:0)
这项工作:
library(stringr)
library(dplyr)
df %>% mutate(city = str_extract(toupper(Address),paste0(df1$Localities, collapse = '|'))) %>%
left_join(df1, by = c("city"="Localities"), keep = T) %>% select(-c(city,State))
Stores_names Postcodes Address Localities lat long
1 Floral showers 2000 Street 45, Level 9, Sydney, New South Wales 2000 SYDNEY 151.268071 -33.794883
2 Cookie box 4300 Shop 3, Queensland 4300 <NA> <NA> <NA>
3 Mango troopers 2010 Aberdeen, Bankstown, NSW ABERDEEN 151.102917 -32.14622
4 Building AE44 4300 778/9 Goulburn Street, QLD <NA> <NA> <NA>
5 Floral showers Co. 2230 Steert 47 Cronulla, New South Wales 2230 CRONULLA 151.136596 -34.093213
6 Vinci supplies 2560 West AIRDS, Mayfaille NSW AIRDS 150.768408 -34.194216
>