QRegExp在html中提取标签之间的字符串

时间:2016-08-25 18:52:39

标签: regex qt parsing qregexp

情况很棘手,因为我无法访问qt模块上的webkits,我被迫使用QRegExp解析HTML文件:

该文件包含我需要提取的字符串,这些字符串位于li标记之间。

如果我写了一个QRegExp

QRegExp ("[^</li>]([a-zA-Z0-9_./]+)");

我可以提取li标签之间的所有字符串。但我所需要的只是:

Pg_1_qds_Bin_Indicator_2

Pg_1_qds_Bin_Indicator_3

Pg_1_qds_Ana_Indicator_1 以及li

之间所有类似的名称

其他一些名称包含的内容不在文件中,但在完整文件中: TEMPLATE_LOGO

Pg_1_Command_By_Text

除了TEMPLATE_LOGO _

之外,所有名称都以Pg_开头

我觉得其他行有像[或其他标记之类的字符来识别该行中不需要该字符串。

该文件位于下方, 所以,TL; DR 需要QRegExp来提取li标签之间的上述名称。

<ul>
  <li><a href="#symbols">Symbol report</a></li>
<ul>
  <li><a href="#symbolsConsistency">Consistency</a></li>
  <li><a href="#symbolCharacteristics">Symbol characteristics</a></li>
  <li><a href="#basicSymbols">Display of basic symbols</a></li>


    <ul>
      <li>Pg_1_qds_Bin_Indicator_2</li>
    <ul>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionalignment] = (Right)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptiontextcolor] = (Color {0, 0, 0, 255})</li>
      <li>[QDSConsistency.report.field.logicIndicator.isdescriptiondisplayed] = (true)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontfamily] = (Arial)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontsize] = (11 pt)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontstyle] = (Normal)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontposition] = (LEFT)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptiontext] = (v1)</li>
      <li>[QDSConsistency.report.field.logicIndicator.backgroundcolor] = (Color {238, 238, 238, 255})</li>
      <li>[QDSConsistency.report.field.logicIndicator.digitnumber] = (8)</li>
      <li>[QDSConsistency.report.field.logicIndicator.widgetfont] = (FONT1)</li>
      <li>[QDSConsistency.report.field.logicIndicator.shortname] = (Pg_1_qds_Bin_Indicator_2)</li>
      <li>[QDSConsistency.report.field.logicIndicator.precision] = (2)</li>
      <li>[QDSConsistency.report.field.logicIndicator.widgetuserfontfamily] = (Arial)</li>
      <li>[QDSConsistency.report.field.logicIndicator.widgetuserfontsize] = (11 pt)</li>
      <li>[QDSConsistency.report.field.analogIndicator.widgetuserfontstyle] = (NORMAL)</li>
      <li>[QDSConsistency.report.field.analogIndicator.dynamicbackgroundcolor] = (Color {238, 238, 238, 255})</li>
      <li>[QDSConsistency.report.field.analogIndicator.longname] = (Pg_1_qds_Bin_Indicator_2_v1)</li>
      <li>[QDSConsistency.report.field.logicIndicator.heith] = (32)</li>
      <li>[QDSConsistency.report.field.logicIndicator.weigth] = (50)</li>
      <li>[QDSConsistency.report.field.logicIndicator.poxX] = (352)</li>
      <li>[QDSConsistency.report.field.logicIndicator.poxY] = (116)</li>
      <li>[QDSConsistency.report.field.logicIndicator.valuealignment] = (Left)</li>
      <li>[QDSConsistency.report.field.logicIndicator.value_0] = (Off)</li>
      <li>[QDSConsistency.report.field.logicIndicator.value_1] = (On)</li>
    </ul>
      <li>Pg_1_qds_Bin_Indicator_3</li>
    <ul>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionalignment] = (Right)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptiontextcolor] = (Color {0, 0, 0, 255})</li>
      <li>[QDSConsistency.report.field.logicIndicator.isdescriptiondisplayed] = (true)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontfamily] = (Arial)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontsize] = (11 pt)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontstyle] = (Normal)</li>
      <li>[QDSConsistency.report.field.logicIndicator.descriptionfontposition] = (LEFT)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptiontext] = (v1)</li>
      <li>[QDSConsistency.report.field.logicIndicator.backgroundcolor] = (Color {238, 238, 238, 255})</li>
      <li>[QDSConsistency.report.field.logicIndicator.digitnumber] = (8)</li>
      <li>[QDSConsistency.report.field.logicIndicator.widgetfont] = (FONT1)</li>
      <li>[QDSConsistency.report.field.logicIndicator.shortname] = (Pg_1_qds_Bin_Indicator_3)</li>
      <li>[QDSConsistency.report.field.logicIndicator.precision] = (2)</li>
      <li>[QDSConsistency.report.field.logicIndicator.widgetuserfontfamily] = (Arial)</li>
      <li>[QDSConsistency.report.field.logicIndicator.widgetuserfontsize] = (11 pt)</li>
      <li>[QDSConsistency.report.field.analogIndicator.widgetuserfontstyle] = (NORMAL)</li>
      <li>[QDSConsistency.report.field.analogIndicator.dynamicbackgroundcolor] = (Color {238, 238, 238, 255})</li>
      <li>[QDSConsistency.report.field.analogIndicator.longname] = (Pg_1_qds_Bin_Indicator_3_v1)</li>
      <li>[QDSConsistency.report.field.logicIndicator.heith] = (32)</li>
      <li>[QDSConsistency.report.field.logicIndicator.weigth] = (50)</li>
      <li>[QDSConsistency.report.field.logicIndicator.poxX] = (446)</li>
      <li>[QDSConsistency.report.field.logicIndicator.poxY] = (187)</li>
      <li>[QDSConsistency.report.field.logicIndicator.valuealignment] = (Left)</li>
      <li>[QDSConsistency.report.field.logicIndicator.value_0] = (Off)</li>
      <li>[QDSConsistency.report.field.logicIndicator.value_1] = (On)</li>
    </ul>
    </ul>
    <p><em>Analog indicator :</em></p>
    <ul>
      <li>Pg_1_qds_Ana_Indicator_1</li>
    <ul>
      <li>[QDSConsistency.report.field.analogIndicator.descriptionalignment] = (Right)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptiontextcolor] = (Color {0, 0, 0, 255})</li>
      <li>[QDSConsistency.report.field.analogIndicator.isdescriptiondisplayed] = (true)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptionfontfamily] = (Arial)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptionfontsize] = (11 pt)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptionfontstyle] = (Normal)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptionfontposition] = (LEFT)</li>
      <li>[QDSConsistency.report.field.analogIndicator.descriptiontext] = (v0)</li>
      <li>[QDSConsistency.report.field.analogIndicator.backgroundcolor] = (Color {238, 238, 238, 255})</li>
      <li>[QDSConsistency.report.field.analogIndicator.digitnumber] = (8)</li>
      <li>[QDSConsistency.report.field.analogIndicator.widgetfont] = (FONT1)</li>
      <li>[QDSConsistency.report.field.analogIndicator.shortname] = (Pg_1_qds_Ana_Indicator_1)</li>
      <li>[QDSConsistency.report.field.analogIndicator.precision] = (2)</li>
      <li>[QDSConsistency.report.field.analogIndicator.widgetuserfontfamily] = (Arial)</li>
      <li>[QDSConsistency.report.field.analogIndicator.widgetuserfontsize] = (11 pt)</li>
      <li>[QDSConsistency.report.field.analogIndicator.widgetuserfontstyle] = (NORMAL)</li>
      <li>[QDSConsistency.report.field.analogIndicator.dynamicbackgroundcolor] = (Color {238, 238, 238, 255})</li>
      <li>[QDSConsistency.report.field.analogIndicator.longname] = (Pg_1_qds_Ana_Indicator_1_v0)</li>
      <li>[QDSConsistency.report.field.analogIndicator.heith] = (32)</li>

4 个答案:

答案 0 :(得分:1)

归功于Trey先生和WiktorStribiżew先生,他们的答案导致了所需的解决方案。

QRegExp exp1("<li>(Pg_.*|TEMPLATE_LOGO_.*)<\\/li>");

答案 1 :(得分:0)

根据您的评论,看看这是否适合您:<li>.*\((Pg_1[^)]*|TEMPLATE_LOGO).*?<\/li>

应匹配以&#34; Pg_1&#34;开头的任何字符串,或者特别是&#34; TEMPLATE_LOGO&#34;,这些字符串位于li个标记之间。

答案 2 :(得分:0)

不,你不是被迫使用QRegExp解析html。 常规表达式匹配器只能匹配常规语法语言。 HTML不是具有常规语法的语言。所以it will not ever reliably work.使用HTML解析器!我建议Gumbo。它是一个独立的基于C的解析器,具有易于使用的API。

答案 3 :(得分:0)

  

regex101.com/r/yG9aZ8/2解决方案导致了最后的灵魂   如果你可以发布这个我会关闭这个帖子,但我需要改进   它适用于TEMPLATE_LOGO

然后只需添加第二替代方案:TEMPLATE_LOGO。*

QRegExp exp1("<li>(Pg_.*|TEMPLATE_LOGO_.*)<\\/li>");

信任去Trey先生和WiktorStribiżew先生,他们的答案导致了所需的解决方案。