https://data.sfgov.org/Transportation/Bike-Share-Stations/gtyg-jpkj
我正在处理这个数据集,我想知道是否可以将几何体(表格中的Geom)转换为两列:R中的经度和纬度。
谢谢!
答案 0 :(得分:1)
RSocrata::read.socrata
和tidyr::extract
简明扼要:
library(tidyverse)
df <- RSocrata::read.socrata('https://data.sfgov.org/Transportation/Bike-Share-Stations/gtyg-jpkj')
df <- df %>% extract(Geom, c('lat', 'lon'), '\\((.*), (.*)\\)', convert = TRUE)
# print nicely
df %>% select(UID, Site.ID, lat, lon) %>% as_data_frame()
#> # A tibble: 107 x 4
#> UID Site.ID lat lon
#> * <int> <chr> <dbl> <dbl>
#> 1 1 SF-T24 S1 37.75182 -122.4266
#> 2 2 SF-G33 S1 37.79350 -122.3928
#> 3 3 SOMA-06A 37.78974 -122.3947
#> 4 4 SF-T22 S5 37.75128 -122.4318
#> 5 5 SF-R25 S4 37.75671 -122.4210
#> 6 6 NOMA-2E 37.79861 -122.4008
#> 7 7 SF-L33 S4 37.77590 -122.3932
#> 8 8 SF-O24 S4 37.76623 -122.4269
#> 9 9 Market-03B 37.78099 -122.4117
#> 10 10 SF-O28 S2 37.76723 -122.4108
#> # ... with 97 more rows
答案 1 :(得分:0)
是的。最简单的方法可能是使用tidyr
包。这是单线:
library(tidyr)
df <- fread("~/Downloads/Bike_Share_Stations.csv") # Read data
extract(df, Geom, into = c('Lat', 'Lon'), '\\((.*),(.*)\\)', conv = T)
最后一个参数是使用组匹配的正则表达式。这是一个简单的模式:它以文字(
开头。最内部的两个括号(.*)
是逗号分隔的两个坐标。只提取这些。该模式以相应的文字)
结束。
以下是结果数据的子集:
UID Site ID Last Edited Date Lat Lon
1: 1 SF-T24 S1 05/23/2016 12:00:00 AM +0000 37.7518243814 -122.426627114
2: 2 SF-G33 S1 05/23/2016 12:00:00 AM +0000 37.7935049482 -122.392846514
3: 3 SOMA-06A 05/23/2016 12:00:00 AM +0000 37.7897420277 -122.394678441
4: 4 SF-T22 S5 05/23/2016 12:00:00 AM +0000 37.7512809413 -122.431836215
5: 5 SF-R25 S4 05/23/2016 12:00:00 AM +0000 37.7567132725 -122.421038213
---
103: 103 Embr-E 05/23/2016 12:00:00 AM +0000 37.8047749378 -122.403247294
104: 104 SF-N26 S1 05/23/2016 12:00:00 AM +0000 37.7682271629 -122.420291015
105: 105 Market-11B 05/23/2016 12:00:00 AM +0000 37.7922638478 -122.397066071
106: 106 SF-O27 S2 05/23/2016 12:00:00 AM +0000 37.7671609432 -122.415485214
107: 107 SF-T23 S5 05/23/2016 12:00:00 AM +0000 37.7514609421 -122.429135213
答案 2 :(得分:0)
我认为Geom
列已包含纬度/经度。
library(tidyverse)
df <- df %>%
mutate(Geom = gsub('[()°]', '', Geom)) %>%
separate(col = Geom, into = c('Latitude', 'Longitude '), sep = '\\,')
首先,我们使用gsub('[()°]', '', geom)
删除括号和度数符号,然后替换Geom
列。然后我们separate
将Geom
列添加到新的Latitude
和Longitude
列中,并使用逗号分隔符sep = '\\,'
。