我的pdf作家有什么问题?

时间:2018-01-31 06:25:49

标签: pdf pdf-generation postscript

我正在编写代码来制作pdf(当然是来自postscript),并且我已尽力遵循规范。但是imagemagick的identify说我的外部参照表有问题。

任何人都可以看到我的问题在哪里/什么?

$ echo quit | gsnd -q pw.ps dancingmen.ps | identify -
   **** Warning:  An error occurred while reading an XREF table.
   **** The file has been damaged.  This may have been caused
   **** by a problem while converting or transfering the file.
   **** Ghostscript will attempt to recover the data.

   **** This file had errors that were repaired or ignored.
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

-=>/tmp/magick-16940kBciKvHuOrD3 PBM 612x792 612x792+0+0 16-bit Bilevel Gray 61KB 0.000u 0:00.000

我的pdf(在Linux上用ghostscript制作,单个LF eols):

%PDF-1.3

1 0 obj
<< /Type /Catalog 
/Pages 2 0 R 
>> 
endobj

2 0 obj
<< /Kids [ 3 0 R ] 
/Type /Pages 
/Count 1 
>> 
endobj

3 0 obj
<< /Contents [ 4 0 R ] 
/MediaBox [ 0.0 0.0 612.0 792.0 ] 
/Type /Page 
/Parent 2 0 R 
>> 
endobj

4 0 obj
<< /Length 1287 
>> 
stream
2.0 4.0 m 2.0 3.9 l 2.05516 3.9 2.1 3.94484 2.1 4.0 c 2.1 4.05516 2.05516 4.1 2.0 4.1 c 1.94484 4.1 1.9 4.05516 1.9 4.0 c 1.9 3.94484 1.94484 3.9 2.0 3.9 c f 2.0 3.6 m 2.5 3.1 l S -2.0 3.6 m -1.5 3.1 l S 2.0 3.1 m 2.4 2.8 l 2.1 2.4 l 2.2 2.35 l S -2.0 3.1 m -1.7 2.6 l -1.5 2.8 l S 2.0 3.9 m 2.0 3.6 l 2.0 3.1 l S 3.0 4.0 m 3.0 3.9 l 3.05516 3.9 3.1 3.94484 3.1 4.0 c 3.1 4.05516 3.05516 4.1 3.0 4.1 c 2.94484 4.1 2.9 4.05516 2.9 4.0 c 2.9 3.94484 2.94484 3.9 3.0 3.9 c f 3.0 3.6 m 3.5 3.1 l S -3.0 3.6 m -2.5 4.1 l S 3.0 3.1 m 3.0 2.3 l 3.15 2.3 l S -3.0 3.1 m -3.0 2.3 l -2.85 2.3 l S 3.0 3.9 m 3.0 3.6 l 3.0 3.1 l S 4.0 4.0 m 4.0 3.9 l 4.05516 3.9 4.1 3.94484 4.1 4.0 c 4.1 4.05516 4.05516 4.1 4.0 4.1 c 3.94484 4.1 3.9 4.05516 3.9 4.0 c 3.9 3.94484 3.94484 3.9 4.0 3.9 c f 4.0 3.6 m 4.5 4.1 l S -4.0 3.6 m -3.5 4.1 l S 4.0 3.1 m 4.3 2.6 l 4.5 2.8 l S -4.0 3.1 m -3.7 2.6 l -3.5 2.8 l S 4.0 3.9 m 4.0 3.6 l 4.0 3.1 l S 5.0 4.0 m 5.0 3.9 l 5.05516 3.9 5.1 3.94484 5.1 4.0 c 5.1 4.05516 5.05516 4.1 5.0 4.1 c 4.94484 4.1 4.9 4.05516 4.9 4.0 c 4.9 3.94484 4.94484 3.9 5.0 3.9 c f 5.0 3.6 m 5.5 4.1 l 5.5 4.3 l 5.6 4.3 l 5.6 4.2 l 5.5 4.2 l S -5.0 3.6 m -4.5 3.1 l S 5.0 3.1 m 5.4 2.8 l 5.1 2.4 l 5.2 2.35 l S -5.0 3.1 m -4.6 2.8 l -4.9 2.4 l -4.8 2.35 l S 5.0 3.9 m 5.0 3.6 l 5.0 3.1 l S
endstream
endobj

xref
0 4
0000000000 65535 f 
0000000010 00000 n 
0000000063 00000 n 
0000000127 00000 n 
0000000234 00000 n 
trailer
<<
  /Root 1 0 R
  /Size 4
>>
startxref
1581
%%EOF

作为参考,这是正在转换的postscript drawing

更新:我已修复了上述几个问题:缺少xref个关键字,%%EOF而不是$$EOF。来自identify的错误相同,但Chrome浏览器的查看器实际上向我显示了一张图片(非常小,位于左下角,因为我还没有处理图形状态)。

link to file

link to newer file with single content stream

ghostscript的输出:

$ echo pstack quit | gsnd -q data/pw.ps data/dancingmen.ps | gsnd -sDEVICE=ps2write -dPDFDEBUG -dPDFSTOPONERROR -
GPL Ghostscript 9.18 (2015-10-05)
Copyright (C) 2015 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
   **** Warning:  An error occurred while reading an XREF table.
   **** The file has been damaged.  This may have been caused
   **** by a problem while converting or transfering the file.
   **** Ghostscript will attempt to recover the data.
<<
/Root 1 0 R
/Size 4 >>
%Resolving: [1 0]
<<
/Type /Catalog /Pages 2 0 R
>>
endobj
%Resolving: [2 0]
<<
/Kids [
3 0 R
]
/Type /Pages /Count 1 >>
endobj
%Resolving: [3 0]
<<
/Contents [
4 0 R
]
/MediaBox [
0.0 0.0 612.0 792.0 ]
/Type /Page /Parent 2 0 R
>>
endobj
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [2 0]
Processing pages 1 through 1.
Page 1
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [4 0]
<<
/Length 1288 >>
stream
%FilePosition: 270
endobj
2.0 4.0 m
2.0 3.9 l
2.05516 3.9 2.1 3.94484 2.1 4.0 c
2.1 4.05516 2.05516 4.1 2.0 4.1 c
1.94484 4.1 1.9 4.05516 1.9 4.0 c
1.9 3.94484 1.94484 3.9 2.0 3.9 c
f
2.0 3.6 m
2.5 3.1 l
S
-2.0 3.6 m
-1.5 3.1 l
S
2.0 3.1 m
2.4 2.8 l
2.1 2.4 l
2.2 2.35 l
S
-2.0 3.1 m
-1.7 2.6 l
-1.5 2.8 l
S
2.0 3.9 m
2.0 3.6 l
2.0 3.1 l
S
3.0 4.0 m
3.0 3.9 l
3.05516 3.9 3.1 3.94484 3.1 4.0 c
3.1 4.05516 3.05516 4.1 3.0 4.1 c
2.94484 4.1 2.9 4.05516 2.9 4.0 c
2.9 3.94484 2.94484 3.9 3.0 3.9 c
f
3.0 3.6 m
3.5 3.1 l
S
-3.0 3.6 m
-2.5 4.1 l
S
3.0 3.1 m
3.0 2.3 l
3.15 2.3 l
S
-3.0 3.1 m
-3.0 2.3 l
-2.85 2.3 l
S
3.0 3.9 m
3.0 3.6 l
3.0 3.1 l
S
4.0 4.0 m
4.0 3.9 l
4.05516 3.9 4.1 3.94484 4.1 4.0 c
4.1 4.05516 4.05516 4.1 4.0 4.1 c
3.94484 4.1 3.9 4.05516 3.9 4.0 c
3.9 3.94484 3.94484 3.9 4.0 3.9 c
f
4.0 3.6 m
4.5 4.1 l
S
-4.0 3.6 m
-3.5 4.1 l
S
4.0 3.1 m
4.3 2.6 l
4.5 2.8 l
S
-4.0 3.1 m
-3.7 2.6 l
-3.5 2.8 l
S
4.0 3.9 m
4.0 3.6 l
4.0 3.1 l
S
5.0 4.0 m
5.0 3.9 l
5.05516 3.9 5.1 3.94484 5.1 4.0 c
5.1 4.05516 5.05516 4.1 5.0 4.1 c
4.94484 4.1 4.9 4.05516 4.9 4.0 c
4.9 3.94484 4.94484 3.9 5.0 3.9 c
f
5.0 3.6 m
5.5 4.1 l
5.5 4.3 l
5.6 4.3 l
5.6 4.2 l
5.5 4.2 l
S
-5.0 3.6 m
-4.5 3.1 l
S
5.0 3.1 m
5.4 2.8 l
5.1 2.4 l
5.2 2.35 l
S
-5.0 3.1 m
-4.6 2.8 l
-4.9 2.4 l
-4.8 2.35 l
S
5.0 3.9 m
5.0 3.6 l
5.0 3.1 l
S

   **** This file had errors that were repaired or ignored.
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

%Resolving: [2 0]
%Resolving: [1 0]

更新:叹息。如果我展示代码,我认为它是最好的。该程序旨在挂钩postscript和捕获路径的某些绘图操作符,并生成内容的pdf文件。我暂时忽略了输出的质量,特别是转换矩阵。

/prompt {} def
<<

/.create-pdf-data {  % called at start
    install-operator-overrides
}

/.create-pdf-page {  % called at showpage
    1 /PageNumber +=
    << /Type /Page
       /Parent pdf-object-names /Pages get create-ref
       /MediaBox [gsave newpath clippath pathbbox grestore]
       /Contents []
    >>
    current-page-name dup 3 1 roll create-object
    pdf-object-names exch get create-ref add-to-pages-kids

    [ display-list {
        exch pop
        create-content-stream
    } for-each ]
    { ( ) exch strcat strcat } reduce
    add-content-to-page
}
/current-page-name {
    (Page) PageNumber  as-string strcat
}
/current-page {
    pdf-objects pdf-object-names current-page-name get get
}

/.output-pdf {    % called at quit
    /OutputFileName where { pop OutputFileName }{ (%stdout) } ifelse
    (w) file write-pdf
    pstack
}

/operator-overrides <<
   %/start                   .create-pdf-data
    /stroke    ({ mark-path  /S cvx ] display  //super//call })
    /fill      ({ mark-path  /f cvx ] display  //super//call })
    /showpage  ({            .create-pdf-page  //super//call })
    /quit      ({            .output-pdf       //super//call })
>>

/install-operator-overrides {
    operator-overrides {
        1 index load
        dup /super exch def
            type /arraytype eq { /exec load }{ /dummyproc cvx } ifelse  
            /call exch def
        cvx exec  userdict 3 1 roll put
    } forall
    userdict /dummyproc {} put
}


/PageNumber 0
/+= { dup load 3 2 roll add store }

/write-pdf {
    /f exch def
    (1.3) write-header
    write-body
    write-xref-table
    write-trailer
}

/pdf-output-file-position 0
/write-header {
    /pdf-output-file-position 0 store
    (%PDF-) .w .w \n \n
}

/write-body {
    write-objects-and-save-positions
}

/write-objects-and-save-positions {
    pdf-objects {
        1 index save-position
        write-object
    } for-each
}

/write-xref-table {
    (xref) .w \n
    pdf-output-file-position /xref-position exch def
    (0 ) .w pdf-object-positions length 1 sub .n \n
    0 format-10 .w ( 65535 f ) .w \n
    pdf-object-positions {
        write-xref-table-row
    } for-each
}
/write-xref-table-row {
    exch pop format-10 .w
      ( 00000 n ) .w \n
}
/format-10-string 20 string
/format-10 {
    format-10-string cvs
    (0000000000) 0  10 3 index length sub getinterval
    exch strcat
}

/write-trailer {
    (trailer) .w \n
    (<<) .w \n
    (  /Root 1 0 R) .w \n
    (  /Size ) .w pdf-objects length 1 sub .n \n
    (>>) .w \n
    (startxref) .w \n
    xref-position .n \n
    (%%EOF) .w \n
}


/create-content-stream {
    to-string-with-spaces
    %dup length ==only ( ) print  ==
}

/write-object {
    exch .n ( 0 obj) .w \n
    dup write-dict
    pdf-streams exch 2 copy known { write-stream }{ pop pop } ifelse
    (endobj) .w \n \n
}

/write-stream {
    (stream) .w \n
    get .w \n
    (endstream) .w \n
}

/write-dict {
    (<< ) .w
    { exch write-thing write-thing \n } forall
    (>> ) .w \n
}

/write-thing {
    +is-ref   { write-ref      }{
    +is-name  { write-name     }{
    +is-array { write-array    }{
    +is-null  { pop (null ) .w }{
                .n ( ) .w
    } ifelse } ifelse } ifelse } ifelse
}

/write-ref {
    ref .n ( 0 R ) .w
}

/write-name {
    dup xcheck not { (/) .w } if
    .n ( ) .w
}

/write-array {
    ([ ) .w
    { write-thing } forall
    (] ) .w
}

/+is-ref   { dup is-ref   }
/+is-name  { dup is-name  }
/+is-array { dup is-array }
/+is-null  { dup is-null  }

/is-string { type /stringtype eq }
/is-array { type /arraytype eq }
/is-name  { type  /nametype eq }
/is-null  { type  /nulltype eq }
/is-ref   { +is-name { is-ref-format }{ pop false } ifelse }
/is-ref-format  { ref-check-string cvs 0 1 getinterval (&) eq }
/ref-check-string 20 string

/ref { 10 string cvs rest cvi }
/create-ref { (&) exch 10 string cvs strcat cvn }

/mark-path {    [    { /m } { /l } { /c } { /h } pathforall  }

/display {  add-to-display-list  }
/display-list <<
    0 null
>>
/add-to-display-list {  display-list dup 3 1 roll length exch put  }
/clear-display-list { /display-list << 0 null >> store }

/pdf-objects << % integer keys
    0 null
    1 << /Type /Catalog  /Pages /&2           >>
    2 << /Type /Pages    /Kids  []   /Count 0 >>
>>
/pdf-object-names << % integer values
    /Catalog 1
    /Pages   2
>>
/pdf-object-positions << % integer keys
    0 null
>>
/pdf-streams <<
>>

/create-object { % dict name
    exch pdf-objects dup length 3 2 roll put
    pdf-object-names exch pdf-objects length 1 sub put
}
/object { % name -> dict
    pdf-object-names exch get  pdf-objects exch get
}
/save-position {
    pdf-object-positions exch pdf-output-file-position put
}
/Pages {
    pdf-objects pdf-object-names /Pages get get
}


/add-content-to-page {
    << 
        /Length 2 index length 1 add
    >> dup 3 2 roll pdf-streams 3 1 roll put
    /current-content create-object
    pdf-object-names /current-content get create-ref
    current-page /Contents 2 copy get [ exch {}forall counttomark 4 add -1 roll ] put
}

/add-to-pages-kids { % ref
    Pages /Kids 2 copy get [ exch {}forall counttomark 4 add -1 roll ] put
    Pages /Count 2 copy get 1 add put
}



/.w { f exch  dup length /pdf-output-file-position +=  writestring }
/.n { dup is-string not { .n-string cvs } if  .w }
/.n-string 100 string
/\n { (\n) .w }
/to-string-with-spaces {  {as-string} map {( ) exch strcat strcat} reduce  }
/map { 1 index xcheck 3 1 roll [ 3 1 roll forall ] exch { cvx } if }
/reduce { exch dup first exch rest 3 -1 roll forall }
/first { 0 get }
/rest { 1 1 index length 1 sub getinterval }
/as-string { 20 string cvs dup length 13 gt { 0 7 getinterval } if }
/strcat { 2 copy length exch length add string dup 4 2 roll
  3 copy pop 0 exch  putinterval  exch length exch putinterval }
/for-each { % dict proc     key(int) value  *proc*
    1 1 3 index length 1 sub   % d p 1 1 lim
    [ 6 5 roll                   % p 1 1 lim [ d
    1 /index cvx /get cvx        % p 1 1 lim [ d 1 index get
    9 8 roll /exec cvx ] cvx       % 1 1 lim { d 1 index get p exec }
    for
}

>>
{ dup {
    dup type /arraytype ne {
        def
    }{  % Dict name proc
        [ 3 index /begin cvx
          3 -1 roll {} forall
          /end cvx
        ] cvx
        def
    } ifelse
} forall pop
} pop
begin



.create-pdf-data

1 个答案:

答案 0 :(得分:2)

叹息,再次在评论中用完了......

将文件放在某处,而不是粘贴它会有所帮助。 PDF文件是二进制的,长度计算取决于CR / LF对,这意味着每个/长度可能不正确,并且无法通过查看粘贴的文件来判断。

类似地,外部参照表偏移可能不正确。实际上,条目1的偏移看起来不正确,即使假设是LF EOL,但是无法从粘贴的文件中确定它。

请注意,错误消息来自Ghostscript(IM用于处理PDF文件)。如果您刚刚将PDF文件提供给Ghostscript,您可能会获得更多信息。您也可以尝试设置-DPDFDEBUG和-dPDFSTOPONERROR,组合将打印出GS正在处理的对象以及它认为的问题(如果存在PostScript错误)。其他PDF问题通常会发送某种反向通道输出。

请注意,Ghostscript消息引用了外部参照表作为问题:

  

****警告:读取XREF表时发生错误。

所以我怀疑你的外部参照表不正确(另见下面的对象0)。

不破坏,但不是最佳做法:

xref条目0,自由对象链表的头部,偏移量为0000000028应为0.

您的文件似乎结束了$$ EOF而不是%% EOF。

通常的做法是将二进制文件放在第2行的注释中,以便强制应用程序在传输时将文件视为二进制文件

最好忽略Resources字典而不是使用null对象,它更小。

同样,最好再次使用单个内容流(尽管最近的Adobe引擎生成多个流),因为它更小。

显然这是一项正在进行中的早期工作,我相信你会及时处理这些问题。

如果你要在某个地方发布实际的PDF文件,我可以看看。

[编辑]

所以第一个问题是外部参照表子部分不正确。该小节以2个数字,初始索引和表格中的条目数开头。外部参照表有5个条目,从索引0开始,一直到索引4.小节说

  

0 4

将其更正为0 5会导致我们出现下一个问题,预告片词典中的“大小”条目为4,应为5。

但Ghostscript仍在抱怨。

最后一个问题是startxref偏移是不正确的。目前这是:

  

startxref       1581

但'xref'关键字的实际字节偏移量是字节1576。

如果我纠正了所有这三个问题,那么Ghostscript会毫无怨言地打开文件。它已经确实渲染了内容(非常小,因为没有CTM操作)但现在它不需要修复文件。