postgres regexp_matches不会返回所有匹配项

时间:2019-08-09 01:18:42

标签: regex postgresql

  <itemBody>
    <gapMatchInteraction responseIdentifier="RESPONSE2" shuffle="false" hideprompt="true" emptyquestion="false" maxChoices="1" orientation="horizontal" gapmatchpersistentheader="inherit" required="true" mappedValue="">
      <prompt>
        <p>Add question here.</p>
      </prompt>
      <gapText identifier="ChoiceA" width="" matchMin="0" matchMax="0">
        <p>Text A</p>
      </gapText>
      <gapText identifier="ChoiceB" width="" matchMin="0" matchMax="0">
        <p>Text B</p>
      </gapText>
      <gapText identifier="ChoiceC" width="" matchMin="0" matchMax="0">
        <p>Text C</p>
      </gapText>
      <gapText identifier="ChoiceD" width="" matchMin="0" matchMax="0">
        <p>Text D</p>
      </gapText>
      <gapText identifier="ChoiceE" width="" matchMin="0" matchMax="0">
        <p>Text E</p>
      </gapText>
      <gapText identifier="ChoiceF" width="" matchMin="0" matchMax="0">
        <p>Text E</p>
      </gapText>
      <blockquote>
        <p>Some text <gap identifier="G1" width="" modified="true" label="GAP 1"><p>GAP 1</p></gap>
<gap identifier="G2" width="" modified="true" label="GAP 2"><p>GAP 2</p></gap><gap identifier="G3" width="" modified="true" label="Gap 3"><p>Gap 3</p></gap><gap identifier="G4" width="" modified="true" label="Gap 4"><p>Gap 4</p></gap></p>
      </blockquote>
    </gapMatchInteraction>
    <gapMatchInteraction responseIdentifier="RESPONSE3" shuffle="false" hideprompt="true" emptyquestion="false" maxChoices="1" orientation="horizontal" gapmatchpersistentheader="inherit" required="true" mappedValue="">
      <prompt>
        <p>Add question here.</p>
      </prompt>
      <gapText identifier="ChoiceA" width="" matchMin="0" matchMax="0">
        <p>Text A</p>
      </gapText>
      <gapText identifier="ChoiceB" width="" matchMin="0" matchMax="0">
        <p>Text B</p>
      </gapText>
      <gapText identifier="ChoiceC" width="" matchMin="0" matchMax="0">
        <p>Text C</p>
      </gapText>
      <gapText identifier="ChoiceD" width="" matchMin="0" matchMax="0">
        <p>Text D</p>
      </gapText>
      <gapText identifier="ChoiceE" width="" matchMin="0" matchMax="0">
        <p>Text E</p>
      </gapText>
      <gapText identifier="ChoiceF" width="" matchMin="0" matchMax="0">
        <p>Text E</p>
      </gapText>
      <blockquote>
        <p>Some text <gap identifier="G1" width="" modified="true" label="GAP 1"><p>GAP 1</p></gap><gap identifier="G2" width="" modified="true" label="GAP 2"><p>GAP 2</p></gap><gap identifier="G3" width="" modified="true" label="Gap 3"><p>Gap 3</p></gap></p>
      </blockquote>
    </gapMatchInteraction>
  </itemBody>

我想找到所有出现的

gapMatchInteractionresponseIdentifier="RESPONSE2"gapidentifier="G1"

(即,仅是粗体部分)-依次为RESPONSE2,G1,G2,G3,G4,RESPONSE3,G1,G2,G3。

这是我的正则表达式字符串:

(?:<gapMatchInteraction responseIdentifier="(RESPONSE\w)".+?)?<gap identifier="(\w+?)"

我在线测试了这个https://regex101.com/,它与所有出现的事件按顺序匹配。

问题是,当我在PostgreSQL regexp_matches函数中使用它时,我只会得到[RESPONSE2,G3]。这是我的查询:

select regexp_matches(column_name, '(?:<gapMatchInteraction responseIdentifier="(RESPONSE\w)".+?)?<gap identifier="(\w+?)"','gis') 
from my_table

不确定是什么问题。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

我会使用xmltable()而不是正则表达式:

select x.*
from the_table,
    xmltable ('//gapMatchInteraction[@responseIdentifier="RESPONSE2"]//gap[@identifier="G2"]'
      passing cast(column_name as xml)
      COLUMNS 
        id text path '@identifier',
        width text path '@width',
        modified boolean path '@modified',
        label text path '@label',
        content xml path './*'
  ) as x
;  

返回

id | width | modified | label | content     
---+-------+----------+-------+-------------
G2 |       | true     | GAP 2 | <p>GAP 2</p>

在线示例:https://dbfiddle.uk/?rdbms=postgres_10&fiddle=b45869d8b6bc0292f36e7035194de490