在NodeJS中修改HTML文件

时间:2012-10-25 13:19:58

标签: node.js

让我先介绍两件事。我目前正在使用grunt完成这些任务,我也知道Yeoman有我要求的东西。我真的很喜欢Yeoman,但是对于我正在研究的这个特定项目来说,它有点过于自以为是。

所以我有以下HTML文件:

<html>
    <head>
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
        <title></title>
        <meta name="description" content="">
        <meta name="viewport" content="width=device-width">

        <!-- START-CSS-MIN:css/build/min.css -->
        <link rel="stylesheet" href="css/bootstrap/bootstrap-2.1.1.css">
        <link rel="stylesheet" href="css/normalize.css">
        <link rel="stylesheet" href="css/boilerplate.css">
        <!-- END-CSS-MIN -->

        <!-- START-JS-MIN:js/build/modernizr.js -->
        <script src="js/libraries/modernizr.js"></script>
        <!-- END-JS-MIN -->
    </head>
    <body>
        <!--[if lt IE 7]>
            <p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
        <![endif]-->

        <p>Hello world! This is a basline HTML5 template (based on HTML5 Boilerplate).</p>

        <!-- START-JS-MIN:js/build/libraries.js -->
        <script src="js/libraries/underscore.js"></script>
        <script src="js/libraries/jquery/jquery.js"></script>
        <!-- END-JS-MIN -->
    </body>
</html>

现在您可以看到CSS-MIN和JS-MIN评论。现在我已经有了一个自定义的grunt构建任务,可以在注释中正确收集所有这些文件(使用htmlparser),然后根据注释直接缩小和连接它们。构建过程的最后一步是创建该HTML文件的新版本(供生产使用),用新文件替换注释。例如,上面的代码将变为:

<html>
    <head>
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
        <title></title>
        <meta name="description" content="">
        <meta name="viewport" content="width=device-width">

        <link rel="stylesheet" href="css/build/min.css">

        <script src="js/build/modernizr.js"></script>
    </head>
    <body>
        <!--[if lt IE 7]>
            <p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
        <![endif]-->

        <p>Hello world! This is a basline HTML5 template (based on HTML5 Boilerplate).</p>

        <script src="js/build/libraries.js"></script>
    </body>
</html>

我的问题是如何在NodeJS中执行此操作? htmlparser NPM模块非常适合解析HTML,但是现在我需要修改HTML(在特定位置删除和添加某些元素)。在NodeJS代码中有没有关于如何做到这一点的好包/教程?

2 个答案:

答案 0 :(得分:2)

我不太确定这是否对评论行有帮助,但与DOM参考相比,这应该不是一个需要解决的问题。

考虑使用:https://github.com/tmpvar/jsdom

还有其他选择。 (https://github.com/joyent/node/wiki/modules)

答案 1 :(得分:0)

您可以使用cheerio

以下代码将完全生成您提供的输出(除了一些小的空白差异)

const $ = require('cheerio').load(inputHtml);

// Returns a filter function that selects the comments with the provided indexes
const commentRemovalFilter = (commentIndexes)=>{
    let commentIndex=-1;
    return (index, node)=>{
        const isComment = node.type === 'comment';
        if(isComment)commentIndex++;
        return isComment && commentIndexes.includes(commentIndex);
    }
}
    

$('head').contents().filter(commentRemovalFilter([0,1,2,3])).remove();
$('head link').remove();
$('head script').remove();

//Cheerio respects whitespace provided here
$('head').append(`
        <link rel="stylesheet" href="css/build/min.css">

        <script src="js/build/modernizr.js"></script>
`)


$('body').contents().filter(commentRemovalFilter([1,2])).remove();
$('body script').remove();
$('body').append(`      <script src="js/build/libraries.js"></script>
`)

console.log($.html())

输出:

<html><head>
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
        <title></title>
        <meta name="description" content="">
        <meta name="viewport" content="width=device-width">

        
        
        
        
        

        
        
        
    
        <link rel="stylesheet" href="css/build/min.css">

        <script src="js/build/modernizr.js"></script>
</head>
    <body>
        <!--[if lt IE 7]>
            <p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
        <![endif]-->

        <p>Hello world! This is a basline HTML5 template (based on HTML5 Boilerplate).</p>

        
        
        
        
    
      <script src="js/build/libraries.js"></script>
</body></html>