用sed删除JavaScript

时间:2015-02-03 19:14:01

标签: javascript html sed

我得到了很多痛苦,因为这段代码不起作用。我尝试从html文件中提取所有html标签和javascript标签以及javascript的内容,并获得清晰的内容。

sed  -e 's/<[^>]\+>/ /g' -e '/<script/,/<\/script>/d'

此代码删除了html标记和脚本标记,但未删除脚本内容。

sed  -e 's/<[^>]\+>/ /g' -e 's/<script>try.*<\/script>//'

这应该适用于更多脚本标签,但仍然不会删除内容。 然而,这段代码正在删除脚本和内容,但我似乎无法让它与html删除一起工作。

awk '/<script>/{p=1} /<\/script>/{p=0;next}!p'

因此,当我将它组合并制作类似下一代码的内容时,它会删除脚本和内容但HTML标记仍然存在

sed 's/<[^>]\+>/ /g' | awk '/<script>/{p=1} /<\/script>/{p=0;next}!p'

示例数据:

<html>
<head>
    <title>BTKRSH //</title>
    <link rel="stylesheet" type="text/css" href="style.css"> 
</head>
<script>
    //test test 
</script>
<body>
    <div class="left">
        <table style="width: 100%; height: 100%;">
            <div id="closebtn">
                <a class="hidden-x">    <img src="x-gray.png"></img> </a>
            </div>
            <tr><td style="vertical-align: middle; text-align: center;">
                <div class="menu">
                    <a>PODCASTS</a>
                    <div class="hidden-menu podcasts">
                        <iframe width="400" height="400" src="https://www.mixcloud.com/widget/iframe/?feed=http%3A%2F%2Fwww.mixcloud.com%2FBTKRSH%2F&amp;embed_uuid=f78341ae-da15-480f-9604-d6812bb9a83d&amp;replace=0&amp;stylecolor=190303&amp;embed_type=widget_standard" frameborder="0"></iframe><div style="clear: both; height: 3px; width: 392px;"></div><p style="display: block; font-size: 11px; font-family: 'Open Sans', Helvetica, Arial, sans-serif; margin: 0px; padding: 3px 4px; color: rgb(153, 153, 153); width: 392px;"><a href="http://www.mixcloud.com/BTKRSH/?utm_source=widget&amp;amp;utm_medium=web&amp;amp;utm_campaign=base_links&amp;amp;utm_term=resource_link" target="_blank" style="color: rgb(25, 3, 3); font-weight: bold;">
                    </div>
                    <a>RELEASES</a>
                    <div class="hidden-menu releases">

                    </div>
                    <a>ARTISTS</a>
                    <div class="hidden-menu artists">

                    </div>
                    <a>LINKS</a>
                    <div class="hidden-menu links">

                    </div>
                    <a>ABOUT</a>
                    <div class="hidden-menu about">

                    </div>
                    <a>CONTACT</a>
                    <div class="hidden-menu contact">

                    </div>
                </div>
            </td></tr>

        </table>

    </div>

    <div class="right">
        <table style="width:100%; height: 100%;">
            <tr><td style="vertucal-align: middle; text-align: center">
                <img src="2001_7.jpg" class="btkrsh-mask" width="600" height="500" ></img>
                    <div id="graph-art">
                    <p>BACKGROUND ARTIST</p>
                    <a href="http://www.facebook.com/btkrsh">SIMON C PAIGE<a>
                </div>
            <td></tr>
        </table>


    </div>

</body>

结果:

BTKRSH //

            //test test









                                             PODCASTS



                                             RELEASES



                                             ARTISTS



                                             LINKS



                                             ABOUT



                                             CONTACT















                                             BACKGROUND ARTIST
                                             SIMON C PAIGE

或者当我使用删除脚本标签和内容的代码时,我得到了这个:

<html>
    <head>
            <title>BTKRSH //</title>
            <link rel="stylesheet" type="text/css" href="style.css">
    </head>
    <body>
            <div class="left">
                    <table style="width: 100%; height: 100%;">
                            <div id="closebtn">
                                    <a class="hidden-x">    <img src="x-gray.png"></img> </a>
                            </div>
                            <tr><td style="vertical-align: middle; text-align: center;">
                                    <div class="menu">
                                            <a>PODCASTS</a>
                                            <div class="hidden-menu podcasts">
                                                    <iframe width="400" height="400" src="https://www.mixcloud.com/widget/iframe/?feed=http%3A%2F%2Fwww.mixcloud.com%2FBTKRSH%2F&amp;embed_uuid=f78341ae-da15-480f-9604-d6812bb9a83d&amp;replace=0&amp;stylecolor=190303&amp;embed_type=widget_standard" frameborder="0"></iframe><div style="clear: both; height: 3px; width: 392px;"></div><p style="display: block; font-size: 11px; font-family: 'Open Sans', Helvetica, Arial, sans-serif; margin: 0px; padding: 3px 4px; color: rgb(153, 153, 153); width: 392px;"><a href="http://www.mixcloud.com/BTKRSH/?utm_source=widget&amp;amp;utm_medium=web&amp;amp;utm_campaign=base_links&amp;amp;utm_term=resource_link" target="_blank" style="color: rgb(25, 3, 3); font-weight: bold;">
                                            </div>
                                            <a>RELEASES</a>
                                            <div class="hidden-menu releases">

                                            </div>
                                            <a>ARTISTS</a>
                                            <div class="hidden-menu artists">

                                            </div>
                                            <a>LINKS</a>
                                            <div class="hidden-menu links">

                                            </div>
                                            <a>ABOUT</a>
                                            <div class="hidden-menu about">

                                            </div>
                                            <a>CONTACT</a>
                                            <div class="hidden-menu contact">

                                            </div>
                                    </div>
                            </td></tr>

                    </table>

            </div>

            <div class="right">
                    <table style="width:100%; height: 100%;">
                            <tr><td style="vertucal-align: middle; text-align: center">
                                    <img src="2001_7.jpg" class="btkrsh-mask" width="600" height="500" ></img>
                                            <div id="graph-art">
                                            <p>BACKGROUND ARTIST</p>
                                            <a href="http://www.facebook.com/btkrsh">SIMON C PAIGE<a>
                                    </div>
                            <td></tr>
                    </table>


            </div>

    </body>

您会看到脚本标记和内容已经消失了

欢迎任何帮助,谢谢!

0 个答案:

没有答案