为坐标选择正确的正则表达式表达式

时间:2020-10-06 15:08:36

标签: r regex decimal coordinates dms

我有各种格式的坐标,并尝试获得或多或少的通用转换例程。

为此,我尝试使用正则表达式来解析字符串的各个元素,并尝试通过它们在字符串中的出现索引来获取度,分和秒的单个信息。

它对某些人有效....但并非对所有人都有效。我非常确信我的问题与我对正则表达式的有限了解密切相关。

问题是:谁对正则表达式模式有更好的了解并可能会有所帮助?

我试图编译一小段代码来演示该问题。运行以下示例显示,对于前四个坐标和最后三个坐标,我得到了三个分量。其余的(介于两者之间)仅提供2个组件。...

coords = c("-53°30''30.54'",
       "s55°30' 30.54",
       "55°30'30.54n",
       "0°1 0.5S",
       "-0°30'30''s",
       "S55 30 30",
       "-55°30'30''",
       "-55° 30' 30''",
       "-55°   30'   30",
       "-55 sometimes with text rests 30 30''",
       "55°30'30,54S",
       "S55° 30' 30,54",
       "-55° 30' 30.54''"
       )

for (i in 1:length (coords)) {
    pattern   <- gregexpr ("[0-9.]+", coords [i])
    print (as.character (unique (unlist (regmatches (coords [i], pattern)))))
}


<Output>
[1] "53"    "30"    "30.54"
[1] "55"    "30"    "30.54"
[1] "55"    "30"    "30.54"
[1] "0"   "1"   "0.5"
[1] "0"  "30"
[1] "55" "30"
[1] "55" "30"
[1] "55" "30"
[1] "55" "30"
[1] "55" "30"
[1] "55" "30" "54"
[1] "55" "30" "54"
[1] "55"    "30"    "30.54"

下面的正则表达式是一个非常令人印象深刻的怪物;-) 然而,当坐标的格式略有不同时(例如dec_deg),它仍然存在一些问题。在这种情况下,不能正确识别字符串的第一个或第二个数字。我只是用这样的坐标编译了一个列表:

坐标= c(“-53°30''30.54'”, “ s55°30'30.54”, “ 55°30'30.54n”, “ 0°1 0.5S”, “ -0°30'30” s, “ S55 30 30”, “ -55°30'30”“, “ -55°30'30”“, “ -55°30'30”, “ -55有时带有文字,其余30 30英寸”, “ 55°30'30,54S”, “ S55°30'30,54”, “ -55°30'30.54”“, “ -55.5432 30 30.54”, “ -55.30.30”, “ 55.555”, “ 55,555S”, “ S55,555”, “ S55.555”, “ 55,555°S”, “ 55.555°”, “ -55,555”, “ -55.555”

       )

2 个答案:

答案 0 :(得分:1)

我们可以尝试使用regexecregmatches来精确匹配每行中的三个数字。此处的“数字”定义为整数或具有小数部分的整数(小数点为点或逗号)。

我们可以使用do.call将上述输出的向量列表转换为矩阵。

regex <- "^.*?(-?\\d+(?:[,.]\\d+)?).*?(-?\\d+(?:[,.]\\d+)?).*?(-?\\d+(?:[,.]\\d+)?).*$"
do.call(rbind, lapply(regmatches(coords, regexec(regex, coords)), function(x) x[2:4]))

      [,1]  [,2] [,3]   
 [1,] "-53" "30" "30.54"
 [2,] "55"  "30" "30.54"
 [3,] "55"  "30" "30.54"
 [4,] "0"   "1"  "0.5"  
 [5,] "-0"  "30" "30"   
 [6,] "55"  "30" "30"   
 [7,] "-55" "30" "30"   
 [8,] "-55" "30" "30"   
 [9,] "-55" "30" "30"   
[10,] "-55" "30" "30"   
[11,] "55"  "30" "30,54"
[12,] "55"  "30" "30,54"
[13,] "-55" "30" "30.54"

答案 1 :(得分:1)

stringr似乎可以正常工作...

library(stringr)
str_extract_all(str_replace_all(coords, ",", "."), "[0-9.\\-]+")

[[1]]
[1] "-53"   "30"    "30.54"

[[2]]
[1] "55"    "30"    "30.54"

[[3]]
[1] "55"    "30"    "30.54"

[[4]]
[1] "0"   "1"   "0.5"

[[5]]
[1] "-0" "30" "30"

[[6]]
[1] "55" "30" "30"

[[7]]
[1] "-55" "30"  "30" 

[[8]]
[1] "-55" "30"  "30" 

[[9]]
[1] "-55" "30"  "30" 

[[10]]
[1] "-55" "30"  "30" 

[[11]]
[1] "55"    "30"    "30.54"

[[12]]
[1] "55"    "30"    "30.54"

[[13]]
[1] "-55"   "30"    "30.54"
相关问题