括号表达式[]中的\ g等特殊表达式

时间:2019-03-13 22:52:33

标签: regex bash grep

我正在尝试使用扩展grep从JSON提取数据。我使用的正则表达式在我的regexr instance上可以使用,但是由于某些原因,它不能在bash中使用。

我尝试了很多事情,特别是bare double dash以及对正则表达式的各种较小修改,以便转义。

#!/bin/bash
networks='{ "networks": [ { "admin_state_up": true, "availability_zone_hints": [], "availability_zones": [], "created_at": "2019-03-12T23:45:13Z", "description": "", "id": "7188504a-72cb-4590-a9b0-414732017837", "ipv4_address_scope": null, "ipv6_address_scope": null, "is_default": false, "mtu": 1450, "name": "BLUE", "port_security_enabled": true, "project_id": "187d635aec4c43fe8e8918afb3a5c82e", "provider:network_type": "vxlan", "provider:physical_network": null, "provider:segmentation_id": 86, "revision_number": 2, "router:external": false, "shared": false, "status": "ACTIVE", "subnets": [], "tags": [], "tenant_id": "187d635aec4c43fe8e8918afb3a5c82e", "updated_at": "2019-03-12T23:45:13Z" }, { "admin_state_up": true, "availability_zone_hints": [], "availability_zones": [], "created_at": "2019-03-12T23:45:13Z", "description": "", "id": "ed82083f-0a7c-4322-a4fb-de8db23e2bae", "ipv4_address_scope": null, "ipv6_address_scope": null, "is_default": false, "mtu": 1450, "name": "RED", "port_security_enabled": true, "project_id": "187d635aec4c43fe8e8918afb3a5c82e", "provider:network_type": "vxlan", "provider:physical_network": null, "provider:segmentation_id": 108, "revision_number": 2, "router:external": false, "shared": false, "status": "ACTIVE", "subnets": [], "tags": [], "tenant_id": "187d635aec4c43fe8e8918afb3a5c82e", "updated_at": "2019-03-12T23:45:13Z" }, { "admin_state_up": true, "availability_zone_hints": [], "availability_zones": [], "created_at": "2019-03-12T23:45:13Z", "description": "", "id": "1eb6647e-869e-4e83-9468-43e2c320bccc", "ipv4_address_scope": null, "ipv6_address_scope": null, "is_default": false, "mtu": 1450, "name": "public", "port_security_enabled": true, "project_id": "187d635aec4c43fe8e8918afb3a5c82e", "provider:network_type": "vxlan", "provider:physical_network": null, "provider:segmentation_id": 32, "revision_number": 2, "router:external": false, "shared": false, "status": "ACTIVE", "subnets": [], "tags": [], "tenant_id": "187d635aec4c43fe8e8918afb3a5c82e", "updated_at": "2019-03-12T23:45:13Z" } ] }'
result=`echo $networks | grep -oE '"(id|name)": "([\w+-]+)"'`
echo $result

上述代码不起作用,但是如果我切换到以下正则表达式,则它起作用。我也只需要为id字段添加提取内容,以便能够使用\ 2反向引用(第2组)提取ID和名称

grep -oE '"(id|name)": "(\w+)"'

您能帮我理解为什么脚本不起作用吗?

全格式JSON

{
  "networks": [{
    "admin_state_up": true,
    "availability_zone_hints": [],
    "availability_zones": [],
    "created_at": "2019-03-12T23:45:13Z",
    "description": "",
    "id": "7188504a-72cb-4590-a9b0-414732017837",
    "ipv4_address_scope": null,
    "ipv6_address_scope": null,
    "is_default": false,
    "mtu": 1450,
    "name": "BLUE",
    "port_security_enabled": true,
    "project_id": "187d635aec4c43fe8e8918afb3a5c82e",
    "provider:network_type": "vxlan",
    "provider:physical_network": null,
    "provider:segmentation_id": 86,
    "revision_number": 2,
    "router:external": false,
    "shared": false,
    "status": "ACTIVE",
    "subnets": [],
    "tags": [],
    "tenant_id": "187d635aec4c43fe8e8918afb3a5c82e",
    "updated_at": "2019-03-12T23:45:13Z"
  }, {
    "admin_state_up": true,
    "availability_zone_hints": [],
    "availability_zones": [],
    "created_at": "2019-03-12T23:45:13Z",
    "description": "",
    "id": "ed82083f-0a7c-4322-a4fb-de8db23e2bae",
    "ipv4_address_scope": null,
    "ipv6_address_scope": null,
    "is_default": false,
    "mtu": 1450,
    "name": "RED",
    "port_security_enabled": true,
    "project_id": "187d635aec4c43fe8e8918afb3a5c82e",
    "provider:network_type": "vxlan",
    "provider:physical_network": null,
    "provider:segmentation_id": 108,
    "revision_number": 2,
    "router:external": false,
    "shared": false,
    "status": "ACTIVE",
    "subnets": [],
    "tags": [],
    "tenant_id": "187d635aec4c43fe8e8918afb3a5c82e",
    "updated_at": "2019-03-12T23:45:13Z"
  }, {
    "admin_state_up": true,
    "availability_zone_hints": [],
    "availability_zones": [],
    "created_at": "2019-03-12T23:45:13Z",
    "description": "",
    "id": "1eb6647e-869e-4e83-9468-43e2c320bccc",
    "ipv4_address_scope": null,
    "ipv6_address_scope": null,
    "is_default": false,
    "mtu": 1450,
    "name": "public",
    "port_security_enabled": true,
    "project_id": "187d635aec4c43fe8e8918afb3a5c82e",
    "provider:network_type": "vxlan",
    "provider:physical_network": null,
    "provider:segmentation_id": 32,
    "revision_number": 2,
    "router:external": false,
    "shared": false,
    "status": "ACTIVE",
    "subnets": [],
    "tags": [],
    "tenant_id": "187d635aec4c43fe8e8918afb3a5c82e",
    "updated_at": "2019-03-12T23:45:13Z"
  }]
}

3 个答案:

答案 0 :(得分:2)

根据man grep

  

反斜杠字符和特殊表达式

     

符号\ w是[[:alnum:]]的同义词,\ W是[^ [:alnum:]]的同义词。 ...方括号表达式是由[和]括起来的字符列表。 ...要包含文字],请将其放在列表的第一位。同样,要包含文字^,请将其放置在除第一个以外的任何位置。最后,要包含文字,请放在最后。

基本上,\w在评估时被这些字符替换为字面意义上的,从而给您"([[[:alnum:]]+-]+)",在美国标准语言环境中,您可以得到"([[a-zA-Z0-9]+-]+)"

由于括号表达式被其看到的第一个]截断了(除非它是括号表达式的第一个元素),因此该组仅为[[[:alnum:]]+或“ 1个或多个数字” ,字母和[。此表达式后跟-]+,表示“正好是一个连字符和一个或多个]”。这显然很糟糕。

如果您尝试

echo $networks | grep -oE '"(id|name)": "([[:alnum:]+-]+)"'

\w,不带括号,相关部分表示“由一个或多个数字,字母,连字符和加号组成的一组(由"包围)”,该输出:

"id": "7188504a-72cb-4590-a9b0-414732017837"
"name": "BLUE"
"id": "ed82083f-0a7c-4322-a4fb-de8db23e2bae"
"name": "RED"
"id": "1eb6647e-869e-4e83-9468-43e2c320bccc"
"name": "public"

答案 1 :(得分:1)

使用PERL(-P)而不是Extended(-E)正则表达式,看起来\w被解释为预期的,没有转义的问题:请注意-oP

result=$( echo $networks | grep -oP '"(id|name)": "([\w+-]+)"' ) ; 
echo $result
"id": "7188504a-72cb-4590-a9b0-414732017837" "name": "BLUE" "id": "ed82083f-0a7c-4322-a4fb-de8db23e2bae" "name": "RED" "id": "1eb6647e-869e-4e83-9468-43e2c320bccc" "name": "public"

答案 2 :(得分:0)

作为一种解决方法(它不能解决“转义\w的问题)

result=$( echo $networks | grep -oE '"(id|name)": "([a-zA-Z_+-]+)"' ) ; 
echo $result

打印我:

"name": "BLUE" "name": "RED" "name": "public"

注意:更喜欢使用$( )语法来执行子shell,而不是反引号。