正则表达式:提取多个子字符串,忽略变体匹配

时间:2016-05-10 11:16:24

标签: regex

","(.*?) (Railway Station)

我需要以下列格式从原始数据列表中提取工作站城市和纬度/经度:

"22238","Flinders Street Railway Station (Melbourne City)","-37.8183051340585","144.966964346166"
"22239","North Melbourne Railway Station (West Melbourne)","-37.8063098353473","144.94151017321"
"22240","Footscray Railway Station (Footscray)","-37.8014134330439","144.902020057667"
"22241","Sunshine Railway Station (Sunshine)","-37.7885363319246","144.832878204953"

所需的输出是:

Flinders Street -37.8183051340585,144.966964346166
North Melbourne -37.8063098353473,144.94151017321
Footscray -37.8014134330439,144.902020057667
Sunshine -37.7885363319246,144.832878204953

我就如何处理这个问题提出了一些建议。

使用","(.*?) (Railway Station)提取了城市城市,但还有另外两个不受欢迎的匹配:

","Flinders Street Railway Station
Flinders Street
Railway Station

在上面,我怎么能单独匹配Flinders Street

其次,要检索纬度和经度,我应该执行单独的正则表达式调用,还是只需一个搜索模式就可以实现这一点?

最后,我应该使用正则表达式或之后的代码中的纬度/经度删除引号,将问题分成两个步骤。

例如,来自: "-37.8183051340585","144.966964346166"
于:
  -37.8183051340585,144.966964346166

或者,我从错误的角度来看这个问题,用,作为分隔符来分割文本会更简单,然后将焦点放在具有特定模式的较小子串上。你有什么想法?

2 个答案:

答案 0 :(得分:2)

你可以这样做:

"([^"]+)\s+Railway\sStation[^,]+,"([^"]+)","([^"]+)"$

\1Flinders Street\2-37.8183051340585\3144.966964346166

Demo

答案 1 :(得分:1)

您的正则表达式","(.*?) (Railway Station)匹配,正如您在示例中所说,","Flinders Street Railway Station。它捕获 Flinders StreetRailway Station。注意匹配和捕获之间的区别。匹配是正则表达式匹配的一切(当然)。捕获是()中包含的正则表达式的一部分。与您的(.*?) - - (Railway Station)一样,为您提供两个捕获组。

要匹配,而不是捕获,Railway Station删除括号 - 例如","(.*?) Railway Station。这将匹配与您的相同,但只有捕获该电台。然后,要在不捕获城市的情况下进行匹配,请添加与括号和其他任何内容匹配的\([^)]*\)

最后,添加","([^"]*)","([^"]*)"将捕获另外两个捕获组中的坐标,给出最终的

","(.*?) Railway Station \([^)]*\)","([^"]*)","([^"]*)"

完成这项工作。

Check it out here at regex101

此致