使用ITextSharp编辑PDF格式的超链接和锚点

时间:2011-07-05 05:38:29

标签: hyperlink itextsharp editing

我正在使用iTextSharp库和C#.Net来分割我的PDF文件。

考虑一个名为sample.pdf的PDF文件,其中包含72页。此sample.pdf包含具有导航到其他页面的超链接的页面。例如:在第4页中有三个超链接,当点击时导航到相应的第24,27,28页。与第4页一样,有近12页与它们有这个超链接。

现在使用iTextSharp库我已将此PDF页面拆分为72个单独的文件,并以1.pdf,2.pdf .... 72.pdf的名称保存。所以在4.pdf中点击超链接我需要让PDF导航到24.pdf,27.pdf,28.pdf。

请帮帮我。如何在4.pdf中编辑和设置超链接,以便导航到相应的pdf文件。

谢谢你, 阿肖克

3 个答案:

答案 0 :(得分:6)

你想要的是完全可能的。你想要的是要求你使用低级PDF对象(PdfDictionary,PdfArray等)。

每当有人需要使用这些对象时,我总是将它们引用到PDF Reference。在您的情况下,您将要检查第7章(特别是第3节)和第12章,第3节(文档级导航)和第5章(注释)。

假设您已经阅读过,这就是您需要做的事情:

  1. 逐步浏览每个页面的注释数组(在原始文档中,在分解之前)。
    1. 查找所有链接注释&他们的目的地。
    2. 为与新文件对应的链接构建新目标。
    3. 将新目的地写入链接注释。
  2. 使用PdfCopy将此页面写入新PDF(它将复制注释以及页面内容)。
  3. 步骤1.1并不简单。有几种不同的“本地goto”注释格式。您需要确定给定链接指向的页面。某些链接可能表示PDF等效于“下一页”或“上一页”,而其他链接则包含对特定页面的引用。这将是“间接对象引用”,而不是页码。

    要确定页面引用中的页码,您需要... ouch。好的。最有效的方法是为原始文档中的每个页面调用PdfReader.GetPageRef(int pageNum)并将其缓存在地图中(reference-> pageNum)。

    然后,您可以通过创建远程goto PdfAction并将其写入链接注释的“A”(操作)条目来构建“远程goto”链接,删除之前存在的任何内容(可能是“Dest”)。

    我不会很好地说C#,所以我会把实际的实现留给你。

答案 1 :(得分:3)

好的,基于@Mark Storer这里的一些入门代码。第一种方法创建一个包含10个页面的示例PDF和第一页上的一些链接,这些链接会跳转到PDF的不同部分,因此我们可以使用。第二种方法打开在第一种方法中创建的PDF,并遍历每个注释,试图找出注释链接到哪个页面并将其输出到TRACE窗口。代码在VB中,但如果需要,应该很容易转换为C#。它的目标是iTextSharp 5.1.1.0。

如果我有机会,我可能会尝试更进一步,实际上拆分并重新链接,但我现在没有时间。

Option Explicit On
Option Strict On

Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports System.IO

Public Class Form1
    ''//Folder that we are working in
    Private Shared ReadOnly WorkingFolder As String = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Hyperlinked PDFs")
    ''//Sample PDF
    Private Shared ReadOnly BaseFile As String = Path.Combine(WorkingFolder, "Sample.pdf")

    Private Shared Sub CreateSamplePdf()
        ''//Create our output directory if it does not exist
        Directory.CreateDirectory(WorkingFolder)

        ''//Create our sample PDF
        Using Doc As New iTextSharp.text.Document(PageSize.LETTER)
            Using FS As New FileStream(BaseFile, FileMode.Create, FileAccess.Write, FileShare.Read)
                Using writer = PdfWriter.GetInstance(Doc, FS)
                    Doc.Open()

                    ''//Turn our hyperlinks blue
                    Dim BlueFont As Font = FontFactory.GetFont("Arial", 12, iTextSharp.text.Font.NORMAL, iTextSharp.text.BaseColor.BLUE)

                    ''//Create 10 pages with simple labels on them
                    For I = 1 To 10
                        Doc.NewPage()
                        Doc.Add(New Paragraph(String.Format("Page {0}", I)))
                        ''//On the first page add some links
                        If I = 1 Then

                            ''//Go to pages relative to this page
                            Doc.Add(New Paragraph(New Chunk("First Page", BlueFont).SetAction(New PdfAction(PdfAction.FIRSTPAGE))))

                            Doc.Add(New Paragraph(New Chunk("Next Page", BlueFont).SetAction(New PdfAction(PdfAction.NEXTPAGE))))

                            Doc.Add(New Paragraph(New Chunk("Prev Page", BlueFont).SetAction(New PdfAction(PdfAction.PREVPAGE)))) ''//This one does not make sense but is here for completeness

                            Doc.Add(New Paragraph(New Chunk("Last Page", BlueFont).SetAction(New PdfAction(PdfAction.LASTPAGE))))

                            ''//Go to a specific hard-coded page number
                            Doc.Add(New Paragraph(New Chunk("Go to page 5", BlueFont).SetAction(PdfAction.GotoLocalPage(5, New PdfDestination(0), writer))))
                        End If
                    Next
                    Doc.Close()
                End Using
            End Using
        End Using
    End Sub
    Private Shared Sub ListPdfLinks()

        ''//Setup some variables to be used later
        Dim R As PdfReader
        Dim PageCount As Integer
        Dim PageDictionary As PdfDictionary
        Dim Annots As PdfArray

        ''//Open our reader
        R = New PdfReader(BaseFile)
        ''//Get the page cont
        PageCount = R.NumberOfPages

        ''//Loop through each page
        For I = 1 To PageCount
            ''//Get the current page
            PageDictionary = R.GetPageN(I)

            ''//Get all of the annotations for the current page
            Annots = PageDictionary.GetAsArray(PdfName.ANNOTS)

            ''//Make sure we have something
            If (Annots Is Nothing) OrElse (Annots.Length = 0) Then Continue For

            ''//Loop through each annotation
            For Each A In Annots.ArrayList

                ''//I do not completely understand this but I think this turns an Indirect Reference into an actual object, but I could be wrong
                ''//Anyway, convert the itext-specific object as a generic PDF object
                Dim AnnotationDictionary = DirectCast(PdfReader.GetPdfObject(A), PdfDictionary)

                ''//Make sure this annotation has a link
                If Not AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK) Then Continue For

                ''//Make sure this annotation has an ACTION
                If AnnotationDictionary.Get(PdfName.A) Is Nothing Then Continue For

                ''//Get the ACTION for the current annotation
                Dim AnnotationAction = DirectCast(AnnotationDictionary.Get(PdfName.A), PdfDictionary)

                ''//Test if it is a named actions such as /FIRST, /LAST, etc
                If AnnotationAction.Get(PdfName.S).Equals(PdfName.NAMED) Then
                    Trace.Write("GOTO:")
                    If AnnotationAction.Get(PdfName.N).Equals(PdfName.FIRSTPAGE) Then
                        Trace.WriteLine(1)
                    ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.NEXTPAGE) Then
                        Trace.WriteLine(Math.Min(I + 1, PageCount)) ''//Any links that go past the end of the document should just go to the last page
                    ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.LASTPAGE) Then
                        Trace.WriteLine(PageCount)
                    ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.PREVPAGE) Then
                        Trace.WriteLine(Math.Max(I - 1, 1)) ''//Any links the go before the first page should just go to the first page
                    End If


                    ''//Otherwise see if its a GOTO page action
                ElseIf AnnotationAction.Get(PdfName.S).Equals(PdfName.GOTO) Then

                    ''//Make sure that it has a destination
                    If AnnotationAction.GetAsArray(PdfName.D) Is Nothing Then Continue For

                    ''//Once again, not completely sure if this is the best route but the ACTION has a sub DESTINATION object that is an Indirect Reference.
                    ''//The code below gets that IR, asks the PdfReader to convert it to an actual page and then loop through all of the pages
                    ''//to see which page the IR points to. Very inneficient but I could not find a way to get the page number based on the IR.

                    ''//AnnotationAction.GetAsArray(PdfName.D) gets the destination
                    ''//AnnotationAction.GetAsArray(PdfName.D).ArrayList(0) get the indirect reference part of the destination (.ArrayList(1) has fitting options)
                    ''//DirectCast(AnnotationAction.GetAsArray(PdfName.D).ArrayList(0), PRIndirectReference) turns it into a PRIndirectReference
                    ''//The full line gets us an actual page object (actually I think it could be any type of pdf object but I have not tested that).
                    ''//BIG NOTE: This line really should have a bunch more sanity checks in place
                    Dim AnnotationReferencedPage = PdfReader.GetPdfObject(DirectCast(AnnotationAction.GetAsArray(PdfName.D).ArrayList(0), PRIndirectReference))
                    Trace.Write("GOTO:")
                    ''//Re-loop through all of the pages in the main document comparing them to this page
                    For J = 1 To PageCount
                        If AnnotationReferencedPage.Equals(R.GetPageN(J)) Then
                            Trace.WriteLine(J)
                            Exit For
                        End If
                    Next
                End If
            Next
        Next
    End Sub

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        CreateSamplePdf()
        ListPdfLinks()
        Me.Close()
    End Sub
End Class

答案 2 :(得分:0)

以下此功能使用iTextSharp:

  1. 打开PDF
  2. 通过PDF页面
  3. 检查每个页面上的注释是否为ANCHORS
  4. 步骤#4是在这里插入你想要的任何逻辑......更新链接,记录它们等等。

        /// <summary>Inspects PDF files for internal links.
        /// </summary>
        public static void FindPdfDocsWithInternalLinks()
        {
            foreach (var fi in PdfFiles) {
                try {
                    var reader = new PdfReader(fi.FullName);
                    // Pagination
                    for(var i = 1; i <= reader.NumberOfPages; i++) {
                        var pageDict = reader.GetPageN(i);
                        var annotArray = (PdfArray)PdfReader.GetPdfObject(pageDict.Get(PdfName.ANNOTS));
                        if (annotArray == null) continue;
                        if (annotArray.Length <= 0) continue;
                        // check every annotation on the page
                        foreach (var annot in annotArray.ArrayList) {
                            var annotDict = (PdfDictionary)PdfReader.GetPdfObject(annot);
                            if (annotDict == null) continue;
                            var subtype = annotDict.Get(PdfName.SUBTYPE).ToString();
                            if (subtype != "/Link") continue;
                            var linkDict = (PdfDictionary)annotDict.GetDirectObject(PdfName.A);
                            if (linkDict == null) continue;
                            // if it makes it this far, its an Anchor annotation
                            // so we can grab it's URI
                            var sUri = linkDict.Get(PdfName.URI).ToString();
                            if (String.IsNullOrEmpty(sUri)) continue;
                        }
                    }
                    reader.Close();
                }
                catch (InvalidPdfException e)
                {
                    if (!fi.FullName.Contains("_vti_cnf"))
                        Console.WriteLine("\r\nInvalid PDF Exception\r\nFilename: " + fi.FullName + "\r\nException:\r\n" + e);
                    continue;
                }
                catch (NullReferenceException e) 
                {
                    if (!fi.FullName.Contains("_vti_cnf"))
                        Console.WriteLine("\r\nNull Reference Exception\r\nFilename: " + fi.Name + "\r\nException:\r\n" + e);
                    continue;
                }
            }
    
            // DO WHATEVER YOU WANT HERE
        }
    
    祝你好运。