Question

我试图从treenode传输数据（至少我认为它是什么），其中包含的数据远远超过我的需要。我很难操纵treenode中的数据。我宁愿有一个数组，只为我提供数据操作所需的数据。

我希望更高的费率有以下变量： 1. BookmarkNumber（整数） 2.日期（字符串） 3. DocumentType（字符串） 4. BookmarkPageNumberString（string） 5. BookmarkPageNumberInteger（整数）

我想从变量book_mark的数据中得到上面定义的比率（在我的代码中可以看到）。

我已经和他摔跤了两天。任何帮助将非常感激。我可能确定这个问题没有正确表达，所以请提出问题，以便我可以在必要时进一步解释。

非常感谢

我想尝试做的是创建一个Windows窗体程序，该程序将具有多个书签的PDF文件解析为每个书签/章节的离散PDF文件，同时使用正确的命名约定将书签保存在正确的文件夹中，文件夹和命名约定取决于要解析的书签/章节的PDF名称和标题名称。

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.IO;
using itextsharp.pdfa;
using iTextSharp.awt;
using iTextSharp.testutils;
using iTextSharp.text;
using iTextSharp.xmp;
using iTextSharp.xtra;

namespace WindowsFormsApplication1
{


    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }


        private void ChooseImageFileWrapper_Click(object sender, EventArgs e)
        {
            OpenFileDialog openFileDialog1 = new OpenFileDialog();
            openFileDialog1.InitialDirectory = GlobalVariables.InitialDirectory;
            openFileDialog1.Filter = "Pdf Files|*.pdf";
            openFileDialog1.RestoreDirectory = true;
            openFileDialog1.Title = "Image File Wrapper Chooser";

            if (openFileDialog1.ShowDialog() == DialogResult.OK)
            {
                try
                {
                    GlobalVariables.ImageFileWrapperPath = openFileDialog1.FileName;

                }
                catch (Exception ex)
                {
                    MessageBox.Show("Error: Could not read file from disk. Original error: " + ex.Message);
                }
            }
            ImageFileWrapperPath.Text = GlobalVariables.ImageFileWrapperPath;
        }

        private void ImageFileWrapperPath_TextChanged(object sender, EventArgs e)
        {

        }


        private void button2_Click(object sender, EventArgs e)
        {
            iTextSharp.text.pdf.PdfReader pdfReader = new iTextSharp.text.pdf.PdfReader(GlobalVariables.ImageFileWrapperPath);
            IList<Dictionary<string, object>> book_mark = iTextSharp.text.pdf.SimpleBookmark.GetBookmark(pdfReader);

            List<ImageFileWrapperBookmarks> IFWBookmarks = new List<ImageFileWrapperBookmarks>();
            foreach (Dictionary<string, object> bk in book_mark) // bk is a single instance of book_mark
            {
                ImageFileWrapperBookmarks.BookmarkNumber = ImageFileWrapperBookmarks.BookmarkNumber + 1;
                foreach (KeyValuePair<string, object> kvr in bk) // kvr is the key/value in bk
                {
                    if (kvr.Key == "Kids" || kvr.Key == "kids")
                    {
                        //create recursive program for children
                    }
                    else if (kvr.Key == "Title" || kvr.Key == "title")
                    {

                    }
                    else if (kvr.Key == "Page" || kvr.Key == "page")
                    {

                    }

                }
            }

            MessageBox.Show(GlobalVariables.ImageFileWrapperPath);
        }
    }
}

Answer 1

这是解析PDF并创建类似于您描述的数据结构的一种方法。首先是数据结构：

public class BookMark
{
    static int _number;
    public BookMark() { Number = ++_number; }
    public int Number { get; private set; }
    public string Title { get; set; }
    public string PageNumberString { get; set; }
    public int PageNumberInteger { get; set; }
    public static void ResetNumber() { _number = 0; }

    // bookmarks title may have illegal filename character(s)
    public string GetFileName()
    {
        var fileTitle = Regex.Replace(
            Regex.Replace(Title, @"\s+", "-"), 
            @"[^-\w]", ""
        );
        return string.Format("{0:D4}-{1}.pdf", Number, fileTitle);
    }
}

创建Bookmark（上图）列表的方法：

List<BookMark> ParseBookMarks(IList<Dictionary<string, object>> bookmarks)
{
    int page;
    var result = new List<BookMark>();
    foreach (var bookmark in bookmarks)
    {
        // add top-level bookmarks
        var stringPage = bookmark["Page"].ToString();
        if (Int32.TryParse(stringPage.Split()[0], out page))
        {
            result.Add(new BookMark() {
                Title = bookmark["Title"].ToString(),
                PageNumberString = stringPage,
                PageNumberInteger = page
            });
        }

        // recurse
        if (bookmark.ContainsKey("Kids"))
        {
            var kids = bookmark["Kids"] as IList<Dictionary<string, object>>;
            if (kids != null && kids.Count > 0)
            {
                result.AddRange(ParseBookMarks(kids));
            }
        }
    }
    return result;
}

像这样调用上面的方法将结果转储到文本文件中：

void DumpResults(string path)
{
    using (var reader = new PdfReader(path))
    {
        // need this call to parse page numbers
        reader.ConsolidateNamedDestinations();

        var bookmarks = ParseBookMarks(SimpleBookmark.GetBookmark(reader));
        var sb = new StringBuilder();
        foreach (var bookmark in bookmarks)
        {
            sb.AppendLine(string.Format(
                "{0, -4}{1, -100}{2, -25}{3}",
                bookmark.Number, bookmark.Title,
                bookmark.PageNumberString, bookmark.PageNumberInteger
            ));
        }
        File.WriteAllText(outputTextFile, sb.ToString());
    }
}

更大的问题是如何将每个Bookmark提取到一个单独的文件中。如果每个 Bookmark 开始新页面，那么很容易：

迭代ParseBookMarks()
选择以当前BookMark.Number开头的页面范围，并以 next BookMark.Number - 1
使用该页面范围创建单独的文件。

这样的事情：

void ProcessPdf(string path)
{
    using (var reader = new PdfReader(path))
    {
        // need this call to parse page numbers
        reader.ConsolidateNamedDestinations();

        var bookmarks = ParseBookMarks(SimpleBookmark.GetBookmark(reader));
        for (int i = 0; i < bookmarks.Count; ++i)
        {
            int page = bookmarks[i].PageNumberInteger;
            int nextPage = i + 1 < bookmarks.Count
                // if not top of page will be missing content
                ? bookmarks[i + 1].PageNumberInteger - 1 

                /* alternative is to potentially add redundant content:
                ? bookmarks[i + 1].PageNumberInteger
                */

                : reader.NumberOfPages;
            string range = string.Format("{0}-{1}", page, nextPage);

            // DEMO!
            if (i < 10)
            {
                var outputPath = Path.Combine(OUTPUT_DIR, bookmarks[i].GetFileName());
                using (var readerCopy = new PdfReader(reader))
                {
                    var number = bookmarks[i].Number;
                    readerCopy.SelectPages(range);
                    using (FileStream stream = new FileStream(outputPath, FileMode.Create))
                    {
                        using (var document = new Document())
                        {
                            using (var copy = new PdfCopy(document, stream))
                            {
                                document.Open();
                                int n = readerCopy.NumberOfPages;
                                for (int j = 0; j < n; )
                                {
                                    copy.AddPage(copy.GetImportedPage(readerCopy, ++j));
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

问题在于，所有书签都不太可能出现在PDF的每个页面的顶部。要了解我的意思，请尝试评论/取消注释bookmarks[i + 1].PageNumberInteger行。

如何创建一个数组并从树节点变量填充

1 个答案: