在下面的示例代码段中,我有一些需要编辑的JSON(超过1400个条目)。我需要做两件事:
在此示例行中:“phone”:“+ 44 2079693900”,我需要删除+44和2079693900之间的空格,但是对于所有记录。导致:“+ 442079693900”
对于纬度和经度,我需要摆脱数字周围的双引号,因为我使用的API只接受这些值作为浮点数。 示例:“latitude”:“51.51736”,需要:“纬度”:51.51736
我最熟悉Ruby,并且过去已经对此进行了一些JSON解析,但我认为Regex将是用于此类基本数据清理任务的最佳工具。我已经提到了regex101.com和regular-expressions.info,但我现在很困惑。提前谢谢!
[
{
"id": "101756",
"name": "1 Lombard Street
"email": "reception@1lombardstreet.com",
"website": "http://www.1lombardstreet.com",
"location": {
"latitude": "51.5129",
"longitude": "-0.089",
"address": {
"line1": "1 Lombard Street",
"line2": "",
"line3": "",
"postcode": "EC3V 9AA",
"city": "London",
"country": "UK"
}
}
},
{
"id": "105371",
"name": "108 Brasserie",
"phone": "+44 2079693900",
"email": "enquiries@108marylebonelane.com",
"website": "http://www.108brasserie.com",
"location": {
"latitude": "51.51795",
"longitude": "-0.15079",
"address": {
"line1": "108 Marylebone Lane",
"line2": "",
"line3": "",
"postcode": "W1U 2QE",
"city": "London",
"country": "UK"
}
}
},
{
"id": "108701",
"name": "1901 Restaurant",
"phone": "+44 2076187000",
"email": "london.restres@andaz.com",
"website": "http://www.andazdining.com",
"location": {
"latitude": "51.51736",
"longitude": "-0.08123",
"address": {
"line1": "Andaz Hotel",
"line2": "40 Liverpool Street",
"line3": "",
"postcode": "EC2M 7QN",
"city": "London",
"country": "UK"
}
}
},
{
"id": "102190",
"name": "2 Bridge Place",
"phone": "+44 2078028555",
"email": "fb@dtlondonvictoria.com",
"website": "http://crimsonhotels.comdoubletreelondonvictoriadiningpre-theatre-dining",
"location": {
"latitude": "51.49396",
"longitude": "-0.14343",
"address": {
"line1": "2 Bridge Place",
"line2": "Victoria",
"line3": "",
"postcode": "SW1V 1QA",
"city": "London",
"country": "UK"
}
}
},
{
"id": "102063",
"name": "2 Veneti",
"phone": "+44 2076370789",
"email": "2veneti@btconnect.com",
"website": "http://www.2veneti.com",
"location": {
"latitude": "51.5168",
"longitude": "-0.14673",
"address": {
"line1": "10 Wigmore Street",
"line2": "",
"line3": "",
"postcode": "W1U 2RD",
"city": "London",
"country": "UK"
}
}
},
答案 0 :(得分:1)
您可以使用以下正则表达式:
("phone":\s*"\+44)\s+|("(?:latitude|longitude)":\s*)"([^"]+)"
进行以下替换:
$1$2$3
我们的想法是捕获我们想要的东西而不是捕获我们不想要的东西,然后使用反向引用来恢复我们想要保留的子串。
正则表达式解释:
该模式包含2个与|
交替运算符连接的备选方案:
("phone":\s*"\+44)\s+
:
("phone":\s*"\+44)
- 第一个匹配文字"phone":
+可选空格的捕获组,然后是+44
字面意思\s+
- 我们将删除的一个或多个空格("(?:latitude|longitude)":\s*)"([^"]+)"
:
("(?:latitude|longitude)":\s*)
- 匹配"latitude":
或"longitude":
以及0个或更多空白字符的第二个捕获组"
- 文字"
,我们会放弃([^"]+)
- 第三个捕获组匹配"
以外的一个或多个字符(我们会保留该字符)"
- 再次,我们将放弃的文字"
。请参阅demo