Google Vision Api支持PDF和TIFF文本检测,但是它也可以与包含图像的PDf一起使用吗?

时间:2018-08-16 11:49:28

标签: api pdf google-cloud-platform vision

我正在尝试使用包含图像以及Google Vision API的pdf,但是会引发以下错误:

  

4:35:12.207 pm信息dialogflowFirebaseFulfillment Dialogflow请求   标头:   {“主机”:“ us-central1-detecttext-5a0c3.cloudfunctions.net”,“用户代理”:“ Apache-HttpClient / 4.5.4   (Java / 1.8.0_181)”,“传输编码”:“分块”,“接受”:“文本/纯文本,    / ”,“接受字符集”:“ big5,big5-hkscs,cesu-8,euc-jp,euc-kr,gb18030,gb2312,gbk,ibm-thai,ibm00858,ibm01140,ibm01141 ,   ibm01142,ibm01143,ibm01144,ibm01145,ibm01146,ibm01147,ibm01148,   ibm01149,ibm037,ibm1026,ibm1047,ibm273,ibm277,ibm278,ibm280,   ibm284,ibm285,ibm290,ibm297,ibm420,ibm424,ibm437,ibm500,   ibm775,ibm850,ibm852,ibm855,ibm857,ibm860,ibm861,ibm862,   ibm863,ibm864,ibm865,ibm866,ibm868,ibm869,ibm870,ibm871,   ibm918,iso-2022-cn,iso-2022-jp,iso-2022-jp-2,iso-2022-kr,   iso-8859-1,iso-8859-13,iso-8859-15,iso-8859-2,iso-8859-3,   iso-8859-4,iso-8859-5,iso-8859-6,iso-8859-7,iso-8859-8,   iso-8859-9,jis_x0201,jis_x0212-1990,koi8-r,koi8-u,shift_jis,   tis-620,us-ascii,utf-16,utf-16be,utf-16le,utf-32,utf-32be,   utf-32le,utf-8,windows-1250,windows-1251,windows-1252,   Windows-1253,Windows-1254,Windows-1255,Windows-1256,Windows-1257,   Windows-1258,Windows-31j,x-big5-hkscs-2001,x-big5-solaris,   x-compound_text,x-euc-jp-linux,x-euc-tw,x-eucjp-open,x-ibm1006,   x-ibm1025,x-ibm1046,x-ibm1097,x-ibm1098,x-ibm1112,x-ibm1122,   x-ibm1123,x-ibm1124,x-ibm1166,x-ibm1364,x-ibm1381,x-ibm1383,   x-ibm300,x-ibm33722,x-ibm737,x-ibm833,x-ibm834,x-ibm856,   x-ibm874,x-ibm875,x-ibm921,x-ibm922,x-ibm930,x-ibm933,x-ibm935,   x-ibm937,x-ibm939,x-ibm942,x-ibm942c,x-ibm943,x-ibm943c,   x-ibm948,x-ibm949,x-ibm949c,x-ibm950,x-ibm964,x-ibm970,   x-iscii91,x-iso-2022-cn-cns,x-iso-2022-cn-gb,x-iso-8859-11,   x-jis0208,x-jisautodetect,x-johab,x-macarabic,x-maccentraleurope,   x-maccroatian,x-maccyrillic,x-macdingbat,x-macgreek,x-machebrew,   x-maciceland,x-macroman,x-macromania,x-macsymbol,x-macthai,   x-macturkish,x-macukraine,x-ms932_0213,x-ms950-hkscs,   x-ms950-hkscs-xp,x-mswin-936,x-pck,x-sjis_0213,x-utf-16le-bom,   x-utf-32be-bom,x-utf-32le-bom,x-windows-50220,x-windows-50221,   x-windows-874,x-windows-949,x-windows-950,   x-windows-iso2022jp“,”内容类型“:” application / json;   charset = UTF-8“,” function-execution-id“:” dvrpphf9f855“,” x-appengine-api-ticket“:” 4b7e84f29e9ce22b“,” x-appengine-city“:”?“,” x-appengine- citylatlong“:” 0.000000,0.000000“,” x-appengine-country“:” US“,” x-appengine-https“:” on“,” x-appengine-region“:”?“,” x-appengine- user-ip“:” 35.193.50.245“,” x-cloud-trace-context“:” 9d163f59b7fc5d0049692efae5269b4c / 11159965978299906906; o = 1“,” x-forwarded-for“:”“ 35.193.50.245,   35.193.50.245“,” x-forwarded-proto“:” https“,” accept-encoding“:” gzip“} 4:35:12.045 pm outlined_flag dialogflowFirebaseFulfillment功能   执行开始于4:32:49.480 pm警告
  dialogflowFirebaseFulfillment错误:{错误:提取错误   PDF文件中的图片gs://detecttext-5a0c3.appspot.com/NFM-11099M1.pdf       在GoogleError.Error(本机)       在新的GoogleError上(/user_code/node_modules/@google-cloud/vision/node_modules/google-gax/build/src/GoogleError.js:46:42)       在Operation._unpackResponse(/user_code/node_modules/@google-cloud/vision/node_modules/google-gax/build/src/longrunning.js:228:29)       在/user_code/node_modules/@google-cloud/vision/node_modules/google-gax/build/src/longrunning.js:214:18   代码:13}

2 个答案:

答案 0 :(得分:3)

我遇到了同样的问题-就我而言,在上传中设置gzip: true会导致问题:

this.client
    .bucket(bucketName)
    .upload(fullPath, {
        gzip: false,  // setting gzip: true causes the pdf annotator to fail
        metadata: {
            cacheControl: 'no-cache',
        },
    })
    .then(...)

答案 1 :(得分:0)

PDF/TIFF Document Text Detection将仅从PDF收集文本,但是当我尝试使用this PDF file发出请求时,您遇到的错误似乎有所不同,根本没有问题。