Question

我正在编写代码以从HTML代码的标记中提取所有纯内容，我知道可以使用document元素来完成。但我只需要使用REGEX 我已经编写了以下代码，但是其中有些错误无法解决。

function htmlToText(html) {
      return html.
        replace(/(.|\n)*<body.*>/, ''). //remove up till body
        replace(/<\/body(.|\n)*/, ''). //remove from </body
        replace(/<.+\>/, ''). //remove tags
        replace(/^\s\n*$/gm, '');  //remove empty lines
    }

这是解决方案

function htmlToText(html) {
          return html.
            replace(/(.|\n)*<body.*>/, ''). //remove up till body
            replace(/<\/body(.|\n)*/g, ''). //remove from </body
            replace(/<.+\>/g, ''). //remove tags
            replace(/^\s\n*$/gm, '');  //remove empty lines
        }

Answer 1

无需过多考虑，您可以model = Model.new model.array_field = [1] model.changed? => true model.changes['array_field'] => [[], [1]]

document.body.innerText

JSFiddle example

使用REGEX在Node JS中提取HTML文档的文本

1 个答案: