我知道这是一个regex
问题,可能已经回答了,但是我无法弄清楚这个特定问题的答案。我有5000个地址的数据集,其中一些地址表示为:
199 REEDSDALE ROAD MILTON, MA (42.252352, -71.075213)
2014 WASHINGTON STREET NEWTON, MA (42.332339, -71.246592)
75 FRANCIS STREET BOSTON, MA (42.335954, -71.107661)
235 NORTH PEARL STREET BROCKTON, MA (42.09707, -71.065645)
41 HIGHLAND AVENUE WINCHESTER, MA (42.465496, -71.121408)
第一个逗号是地址城市与州的分隔,但也包含纬度和经度坐标。我有兴趣将坐标分为两列,纬度和经度为
lat lon
42.252352 -71.075213
42.332339 -71.246592
42.335954 -71.107661
42.09707 -71.065645
42.465496 -71.121408
任何帮助,我们将不胜感激!
答案 0 :(得分:3)
一种选择是使用正则表达式环视来提取数字部分
library(tidyverse)
data_frame(lat = str_extract(lines, "(?<=\\()-?[0-9.]+"),
lon = str_extract(lines, "-?[0-9.]+(?=\\))"))
# A tibble: 5 x 2
# lat lon
# <chr> <chr>
#1 42.252352 -71.075213
#2 42.332339 -71.246592
#3 42.335954 -71.107661
#4 42.09707 -71.065645
#5 42.465496 -71.121408
或者使用read.csv
,然后删除字符,直到(
,包括{{1}在内的(
和)
(最后),使{ {1}}作为gsub
的分隔符,分为两列
,
read.csv