R - 从数据帧中的String中提取坐标

时间:2018-04-26 07:53:30

标签: r dataframe extract

我在R数据帧中有这样的数据 - 这些都放在一个名为SHAPE的列中(下面只是摘录):

  • " POINT(16.361866982751053 48.177421074512125)"
  • " POINT(16.30410258091979 48.16069903617549)"
  • " POINT(16.226971074542572 48.20539106235006)"
  • " POINT(16.36781410799229 48.25479849185693)"

我想提取坐标,以便将它们放在一列中,并且#34; X"和一列" Y"我的数据帧的数字格式。 挑战在于数字的长度并不总是相同。

结果应如下所示

第X栏:

  • 16.361866982751053
  • 16.30410258091979
  • 16.226971074542572
  • 16.36781410799229

Y栏:

  • 48.177421074512125
  • 48.16069903617549
  • 48.20539106235006
  • 48.25479849185693

2 个答案:

答案 0 :(得分:2)

只是提供另一种解决方案,这次使用strsplit()lapply()

df <- data.frame(SHAPE = c("POINT (16.361866982751053 48.177421074512125)",
                           "POINT (16.30410258091979 48.16069903617549)",
                           "POINT (16.226971074542572 48.20539106235006)",
                           "POINT (16.36781410799229 48.25479849185693)"),
                 stringsAsFactors = F)

df[c("x", "y")] <- do.call(rbind, lapply(strsplit(df$SHAPE, "[()]"), function(col) {
  (parts <- unlist(strsplit(col[2], " ")))
}))
df

这会产生

                                          SHAPE                  x                  y
1 POINT (16.361866982751053 48.177421074512125) 16.361866982751053 48.177421074512125
2   POINT (16.30410258091979 48.16069903617549)  16.30410258091979  48.16069903617549
3  POINT (16.226971074542572 48.20539106235006) 16.226971074542572  48.20539106235006
4   POINT (16.36781410799229 48.25479849185693)  16.36781410799229  48.25479849185693
> 

答案 1 :(得分:1)

使用sub

point <- "POINT (16.361866982751053 48.177421074512125)"
x <- sub("POINT \\((\\d+\\.\\d+) \\d+\\.\\d+\\)", "\\1", point, perl=TRUE)
y <- sub("POINT \\(\\d+\\.\\d+ (\\d+\\.\\d+)\\)", "\\1", point, perl=TRUE)

Demo