使用nodejs和cheerio解析脚本标记内容

时间:2016-08-10 09:57:42

标签: node.js web-scraping cheerio

我希望使用cheerio或其他模块将配置对象的sources数组传递给jwplayer("vplayer").setup

<HTML>
<HEAD>
    <link rel="stylesheet" type="text/css" href="http://thevideos.tv/css/main.css">
    <script language="JavaScript" type="text/javascript" CHARSET="UTF-8"
            src="http://thevideos.tv/js/jquery.min.js"></script>
</HEAD>
<BODY topmargin=0 leftmargin=0 style="background:transparent;">

<table cellpadding=0 cellspacing=0>
    <tr>
        <td valign=top>
            <div style="position:relative;width:728px;height:410px;">
                <div id="play_limit_box">
                    <a href="http://thevideos.tv/premium.html" target="_blank">Upgrade you account</a> to watch videos
                    with no limits!
                </div>

                <span id='vplayer'><img src="http://192.99.62.187/i/01/00077/u0mqgq67qz76.jpg"
                                        style="width:728px;height:410px;"></span>    
            </div>
        </td>
    </tr>
</table>


<script type='text/javascript'>    jwplayer("vplayer").setup({
    sources: [{
        file: "http://192.99.62.187/kj2vyrxjey6vtaw52apz4kuggj6xfcc27pjizr5rhnrcgv73id7wwhzxlqda/v.mp4",
        label: "240p"
    }, {
        file: "http://192.99.62.187/kj2vyrxjey6vtaw52apz4kuggj6xfcc27pjizr5rhfbsgv73id76twjcd2ha/v.mp4",
        label: "360p"
    }]
});
</script>

<script>
    var sid = 90446;
    var wid = 115535;
</script>

</BODY>
</HTML>

可以用cheerio完成吗?如果不是我必须使用什么以及如何使用?

提前致谢:)

1 个答案:

答案 0 :(得分:7)

您可以使用cheerio检索脚本标记的内容,但您必须自己解析内容。这应该适合您,假设相关的脚本标记始终按照您描述的方式提供:

$ = cheerio.load(html);

var textNode = $('body > script').map((i, x) => x.children[0])
                                 .filter((i, x) => x && x.data.match(/jwplayer/)).get(0);

if (textNode){
    var scriptText = textNode.data.replace(/\r?\n|\r/g, "")
                                  .replace(/file:/g, '"file":')
                                  .replace(/label:/g, '"label":');
    var jsonString = /sources:(.*)}\);/.exec(scriptText)[1];
    var sources    = JSON.parse(jsonString);
}