我有一个简单的项目。我需要帮助,这是一个相关的项目。我需要读取一个HTML文件,然后将其转换为JSON格式。我想将匹配作为代码和文本。我该如何实现?
这样,我有两个HTML标记
<p>In practice, it is usually a bad idea to modify global variables inside the function scope since it often is the cause of confusion and weird errors that are hard to debug.<br />
If you want to modify a global variable via a function, it is recommended to pass it as an argument and reassign the return-value.<br />
For example:</p>
<pre><code class="{python} language-{python}">a_var = 2
def a_func(some_var):
return 2**3
a_var = a_func(a_var)
print(a_var)
</code></pre>
mycode:
const fs = require('fs')
const showdown = require('showdown')
var read = fs.readFileSync('./test.md', 'utf8')
function importer(mdFile) {
var result = []
let json = {}
var converter = new showdown.Converter()
var text = mdFile
var html = converter.makeHtml(text);
for (var i = 0; i < html.length; i++) {
htmlRead = html[i]
if(html == html.match(/<p>(.*?)<\/p>/g))
json.text = html.match(/<p>(.*?)<\/p>/g)
if(html == html.match(/<pre>(.*?)<\/pre>/g))
json.code = html.match(/<pre>(.*?)<\/pre>/g
}
return html
}
console.log(importer(read))
如何在代码上获得这些匹配项?
新代码:我将所有p标签都写在同一个json中,如何将每个p标签写到不同的json块中?
$('html').each(function(){
if ($('p').text != undefined) {
json.code = $('p').text()
json.language = "Text"
}
})
答案 0 :(得分:2)
我建议使用Cheerio。它试图将jQuery功能实现到Node.js。
/(?:<|<)3/g
您应该查看Cheerio并阅读其文档。我觉得它真的很整洁!
编辑:针对问题的新部分
您可以遍历每个元素并将其插入到JSON对象数组中,如下所示:
const cheerio = require('cheerio')
var html = "<p>In practice, it is usually a bad idea to modify global variables inside the function scope since it often be the cause of confusion and weird errors that are hard to debug.<br />If you want to modify a global variable via a function, it is recommended to pass it as an argument and reassign the return-value.<br />For example:</p>"
const $ = cheerio.load(html)
var paragraph = $('p').html(); //Contents of paragraph. You can manipulate this in any other way you like
//...You would do the same for any other element you require
因此,得到的JSON对象数组应如下所示:
var jsonObject = []; //An array of JSON objects that will hold everything
$('p').each(function() { //Loop for each paragraph
//Now let's take the content of the paragraph and put it into a json object
jsonObject.push({"paragraph":$(this).html()}); //Add data to the main jsonObject
});
我相信您还应该阅读JSON及其运作方式。
答案 1 :(得分:0)
“ hpq”不是最常见的HTML解析库之一,但我认为它非常适合您的请求,因为它的1行描述是
用于将HTML解析和查询为对象形状的实用程序。
此实时浏览器页面很好地说明了其功能:
您遇到的问题是它是为浏览器创建的(它需要HTML字符串或DOM元素作为输入),所以我不确定是否将它与node一起使用。