我相信这将是一个非常直截了当的答案。我是R的新手,仍然在寻找我周围的数据类型。目前从MySQL导入数据但是我无法弄清楚如何将WKT点类型中的列分开。
我正在运行以下语句,该语句涉及对数据库中包含的shapefile的查询。
mydb = dbConnect(MySQL(), user='root', password='mrwolf',dbname='jtw_schema', host='localhost')
strSQL = "select sa2_main11, astext(shape) as geom from centroids
where (gcc_name11 = 'Greater Sydney')
and (sa4_name11 != 'Central Coast')
and (sa4_name11 not like '%Outer West%' )
and (sa4_name11 not like '%Baulkham Hills%')
and (sa4_name11 not like '%Outer South West%')"
dfCord = dbGetQuery(mydb, strSQL)
结果是:
sa2_main11 geom
1 116011303 POINT(150.911550090995 -33.7568493603359)
2 116011304 POINT(150.889312296536 -33.7485997378428)
3 116011305 POINT(150.898781823296 -33.7817496751367)
4 116011306 POINT(150.872046414103 -33.7649465663774)
....
我想要实现的是
sa2_main11 Lat Long
1 116011303 150.911550090995 -33.7568493603359
2 116011304 150.889312296536 -33.7485997378428
3 116011305 150.898781823296 -33.7817496751367
4 116011306 150.872046414103 -33.7649465663774
....
道歉,如果这是一个非常简单的问题,但已经寻找分离WKT数据而无法找到任何例子。可以尝试字符串搜索或类似但我想可能有一个" R-ish"这样做的方法。
答案 0 :(得分:2)
不是直接的答案,而是一种解决方法。 (假设geom列是一个字符向量?不确定这是否是你要找的。)
df <- data.frame(sa2_main11 = c("a","b","c", "d"),
geom = c("POINT(150.911550090995 -33.7568493603359)",
"POINT(150.889312296536 -33.7485997378428)",
"POINT(150.898781823296 -33.7817496751367)",
"POINT(150.872046414103 -33.7649465663774)"), stringsAsFactors = F)
df$longitude <- as.numeric(gsub(".*?([-]*[0-9]+[.][0-9]+).*", "\\1", df$geom))
df$latitude <- as.numeric(gsub(".* ([-]*[0-9]+[.][0-9]+).*", "\\1", df$geom))
df$geom <- NULL
答案 1 :(得分:1)
如果您从数据库获取df
作为data.frame,则这适用于您的数据集。
df <- data.frame(sa2_main11 = c(116011303, 116011304, 116011305, 116011306),
geom = c("POINT(150.911550090995 -33.7568493603359)",
"POINT(150.889312296536 -33.7485997378428)",
"POINT(150.898781823296 -33.7817496751367)",
"POINT(150.872046414103 -33.7649465663774)"))
geom <- sub(df$geom, pattern = "POINT", replacement = "")
geom <- sub(geom, pattern = "[(]", replacement = "")
geom <- sub(geom, pattern = "[)]", replacement = "")
lonlat <- unlist(strsplit(geom, split = " "))
df$lat <- lonlat[seq(1, length(lonlat), 2)]
df$long <- lonlat[seq(2, length(lonlat), 2)]
df
# sa2_main11 geom lat long
# 1 116011303 POINT(150.911550090995 -33.7568493603359) 150.911550090995 -33.7568493603359
# 2 116011304 POINT(150.889312296536 -33.7485997378428) 150.889312296536 -33.7485997378428
# 3 116011305 POINT(150.898781823296 -33.7817496751367) 150.898781823296 -33.7817496751367
# 4 116011306 POINT(150.872046414103 -33.7649465663774) 150.872046414103 -33.7649465663774
答案 2 :(得分:0)
最后,我设法使用对SQL查询的更改来分离lat和long,如下所示。特别是SUBSTR命令。似乎比在R里清理它更有意义。
select sa2_main11, substr(ASTEXT(shape), 7, 12) as lon,
case
when ltrim(substr(ASTEXT(shape), 23, 12)) > 0
then ltrim(substr(ASTEXT(shape), 23, 10)) * -1
else ltrim(substr(ASTEXT(shape), 23, 12))
end
as lat from centroids
这产生了以下输出:
sa2_main11, lon, lat
'116011303', '150.91155009', '-33.7568493'
'116011304', '150.88931229', '-33.7485997'
'116011305', '150.89878182', '-33.7817496'
'116011306', '150.87204641', '-33.7649465'
'116011307', '150.93909408', '-33.7617792'
非常感谢您的建议,这些都有助于理解R