从HTML

时间:2016-01-19 02:47:47

标签: regex notepad++

我有一个保存的HTML页面,我在记事本++中打开了。我想从html文件中提取所有附加点。以下HTML中的示例:

<div class="contentBar">
    <div class="banner" style="">
        <span class="bannerRepeat"></span>
        <span class="bannerDecal"></span>
    </div>
    <div>
        <div class="logo" data-dojo-attach-point="pageLogoPt">
            ABC
        </div>
        <div class="title" data-dojo-attach-point="pageTitlePt">
            ABC
        </div>
        <div class="userPane">
            <div>
                <span class="LoginCell LoginText"><span data-dojo-attach-point="welcomeBlockPt">Welcome</span>, <b data-dojo-attach-point="usernameBlockPt">User Name</b></span>
                <span widgetid="acme_Button_0" id="acme_Button_0" class="LoginCell Button" data-dojo-type="acme.Button" data-dojo-props="size: 'small'" data-dojo-attach-point="logOutButtonPt"><span widgetid="dijit_form_Button_0" class="dijit dijitReset dijitInline dijitButton ButtonSmall" role="presentation"><span class="dijitReset dijitInline dijitButtonNode" data-dojo-attach-event="ondijitclick:__onClick" role="presentation"><span style="-moz-user-select: none;" aria-disabled="false" id="dijit_form_Button_0" tabindex="0" class="dijitReset dijitStretch dijitButtonContents" data-dojo-attach-point="titleNode,focusNode" role="button" aria-labelledby="dijit_form_Button_0_label"><span class="dijitReset dijitInline dijitIcon dijitNoIcon" data-dojo-attach-point="iconNode"></span><span class="dijitReset dijitToggleButtonIconChar">&#9679;</span><span class="dijitReset dijitInline dijitButtonText" id="dijit_form_Button_0_label" data-dojo-attach-point="containerNode">Logout</span></span></span><input value="" class="dijitOffScreen" data-dojo-attach-event="onclick:_onClick" tabindex="-1" role="presentation" aria-hidden="true" data-dojo-attach-point="valueNode" type="button"></span></span>
            </div>
            <div>
                <span id="printLink" style="display:none;">Print</span>
                <span id="zoomPercentageDisplay"><span data-dojo-attach-point="zoomBlockPt">Zoom</span>: 100%</span>
                <span id="smallFontSizeLink" style="font-size: .8em;">A</span>
                <span id="defaultFontSizeLink" style="font-size: 1em;">AA</span>
                <span id="largeFontSizeLink" style="font-size: 1.2em;">AAA</span> 
            </div>
        </div>
    </div>
</div>

我想得到:

pageLogoPt
pageTitlePt
welcomeBlockPt
usernameBlockPt
etc ...

这可能吗?感谢

1 个答案:

答案 0 :(得分:2)

您可以执行以下操作:

  1. (data-dojo-attach-point="[^"]+)(?=")替换为\n\1\n。这将把您正在寻找的内容放在不同的行上。
  2. 根据正则表达式data-dojo-attach-point="[^"]+标记全部。勾选&#34;书签行&#34;复选框。
  3. 搜索 - &gt;书签 - &gt;删除未标记的行
  4. data-dojo-attach-point="替换为空白。
  5. 这将为您列出每个项目的列表。

    在Notepad ++ 6.8.8上测试。

    https://superuser.com/questions/477628/export-all-regular-expression-matches-in-textpad-or-notepad-as-a-list的启发。