我想使用nodejs将PDF文件数据转换为JSON数据

时间:2019-11-01 09:42:06

标签: node.js file-conversion

我想转换JSON格式的PDF文件数据。我希望以正确的JSON格式输出文本,但是我的代码可以转换普通的JSON。这是用来做什么的? npmpdf-parse不能提供适当的格式,pdf2json也不能提供适当的格式。

var fs=require('fs');
const pdf = require('pdf-parse');
module.exports.simplePdfUpload= (req, res) => {
    upload(req, res, (err) => {
        let dataBuffer = fs.readFileSync(req.files[0].path);  
        pdf(dataBuffer).then(function(data) {
            res.send({"jsondata":data,})
        })
        .catch(function(error){
        })
    })
}

输出-

{
    'waters including interstate wetlands; (3) all other waters such as ' +
    'intrastate lakes, rivers, streams (including intermittent \nstreams),  ' +
    'mudflats,  sandflats,  wetlands,  sloughs,  prairie  potholes,  wet  ' +
    'meadows,  playa  lakes,  or  natural  ponds,  etc.,  which  the  use, \n' +
    'degradation, or destruction could affect interstate/ foreign commerce; (4) ' +
    'all impoundments of waters otherwise defined as waters of the U. S., \n(5) ' +
    'tributaries of waters identified in 1 through 4 above; (6) the territorial ' +
    'seas; and (7) wetlands adjacent to waters identified in 1 through 6 \n' +
    'above. Only the USACE has the authority to make a final wetlands ' +
    'jurisdictional determination. \n ',
    version: '1.10.100'
} 

但是我想输出这种类型

{
    "Info":
    {
        "Company": "ABC",
        "Team": "node"
    },
    "Number of members": 4,
    "Time to finish": "1 day"
}

1 个答案:

答案 0 :(得分:0)

pdf-parse仅在您需要使用其他库的情况下才提供pdf文本。

pdf.js提取