在带有正则表达式的JSON字符串中删除+44之后的空格和lat / long周围的引号

时间:2015-07-07 10:16:30

标签: regex json

在下面的示例代码段中,我有一些需要编辑的JSON(超过1400个条目)。我需要做两件事:

  1. 在此示例行中:“phone”:“+ 44 2079693900”,我需要删除+44和2079693900之间的空格,但是对于所有记录。导致:“+ 442079693900”

  2. 对于纬度和经度,我需要摆脱数字周围的双引号,因为我使用的API只接受这些值作为浮点数。 示例:“latitude”:“51.51736”,需要:“纬度”:51.51736

  3. 我最熟悉Ruby,并且过去已经对此进行了一些JSON解析,但我认为Regex将是用于此类基本数据清理任务的最佳工具。我已经提到了regex101.com和regular-expressions.info,但我现在很困惑。提前谢谢!

    [
      {
        "id": "101756",
        "name": "1 Lombard Street
        "email": "reception@1lombardstreet.com",
        "website": "http://www.1lombardstreet.com",
        "location": {
          "latitude": "51.5129",
          "longitude": "-0.089",
          "address": {
            "line1": "1 Lombard Street",
            "line2": "",
            "line3": "",
            "postcode": "EC3V 9AA",
            "city": "London",
            "country": "UK"
          }
        }
      },
      {
        "id": "105371",
        "name": "108 Brasserie",
        "phone": "+44 2079693900",
        "email": "enquiries@108marylebonelane.com",
        "website": "http://www.108brasserie.com",
        "location": {
          "latitude": "51.51795",
          "longitude": "-0.15079",
          "address": {
            "line1": "108 Marylebone Lane",
            "line2": "",
            "line3": "",
            "postcode": "W1U 2QE",
            "city": "London",
            "country": "UK"
          }
        }
      },
      {
        "id": "108701",
        "name": "1901 Restaurant",
        "phone": "+44 2076187000",
        "email": "london.restres@andaz.com",
        "website": "http://www.andazdining.com",
        "location": {
          "latitude": "51.51736",
          "longitude": "-0.08123",
          "address": {
            "line1": "Andaz Hotel",
            "line2": "40 Liverpool Street",
            "line3": "",
            "postcode": "EC2M 7QN",
            "city": "London",
            "country": "UK"
          }
        }
      },
      {
        "id": "102190",
        "name": "2 Bridge Place",
        "phone": "+44 2078028555",
        "email": "fb@dtlondonvictoria.com",
        "website": "http://crimsonhotels.comdoubletreelondonvictoriadiningpre-theatre-dining",
        "location": {
          "latitude": "51.49396",
          "longitude": "-0.14343",
          "address": {
            "line1": "2 Bridge Place",
            "line2": "Victoria",
            "line3": "",
            "postcode": "SW1V 1QA",
            "city": "London",
            "country": "UK"
          }
        }
      },
      {
        "id": "102063",
        "name": "2 Veneti",
        "phone": "+44 2076370789",
        "email": "2veneti@btconnect.com",
        "website": "http://www.2veneti.com",
        "location": {
          "latitude": "51.5168",
          "longitude": "-0.14673",
          "address": {
            "line1": "10 Wigmore Street",
            "line2": "",
            "line3": "",
            "postcode": "W1U 2RD",
            "city": "London",
            "country": "UK"
          }
        }
      },
    

1 个答案:

答案 0 :(得分:1)

您可以使用以下正则表达式:

("phone":\s*"\+44)\s+|("(?:latitude|longitude)":\s*)"([^"]+)"

进行以下替换:

$1$2$3

我们的想法是捕获我们想要的东西而不是捕获我们不想要的东西,然后使用反向引用来恢复我们想要保留的子串。

正则表达式解释

该模式包含2个与|交替运算符连接的备选方案:

  1. ("phone":\s*"\+44)\s+
    • ("phone":\s*"\+44) - 第一个匹配文字"phone": +可选空格的捕获组,然后是+44字面意思
    • \s+ - 我们将删除的一个或多个空格
  2. ("(?:latitude|longitude)":\s*)"([^"]+)"
    • ("(?:latitude|longitude)":\s*) - 匹配"latitude":"longitude":以及0个或更多空白字符的第二个捕获组
    • " - 文字",我们会放弃
    • ([^"]+) - 第三个捕获组匹配"以外的一个或多个字符(我们会保留该字符)
    • " - 再次,我们将放弃的文字"
  3. 请参阅demo