如何创建一个数组并从树节点变量填充

时间:2016-03-19 13:45:34

标签: c# arrays pdf tree itextsharp

我试图从treenode传输数据(至少我认为它是什么),其中包含的数据远远超过我的需要。我很难操纵treenode中的数据。我宁愿有一个数组,只为我提供数据操作所需的数据。

我希望更高的费率有以下变量: 1. BookmarkNumber(整数) 2.日期(字符串) 3. DocumentType(字符串) 4. BookmarkPageNumberString(string) 5. BookmarkPageNumberInteger(整数)

我想从变量book_mark的数据中得到上面定义的比率(在我的代码中可以看到)。

我已经和他摔跤了两天。任何帮助将非常感激。我可能确定这个问题没有正确表达,所以请提出问题,以便我可以在必要时进一步解释。

非常感谢

我想尝试做的是创建一个Windows窗体程序,该程序将具有多个书签的PDF文件解析为每个书签/章节的离散PDF文件,同时使用正确的命名约定将书签保存在正确的文件夹中,文件夹和命名约定取决于要解析的书签/章节的PDF名称和标题名称。

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.IO;
using itextsharp.pdfa;
using iTextSharp.awt;
using iTextSharp.testutils;
using iTextSharp.text;
using iTextSharp.xmp;
using iTextSharp.xtra;

namespace WindowsFormsApplication1
{


    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }


        private void ChooseImageFileWrapper_Click(object sender, EventArgs e)
        {
            OpenFileDialog openFileDialog1 = new OpenFileDialog();
            openFileDialog1.InitialDirectory = GlobalVariables.InitialDirectory;
            openFileDialog1.Filter = "Pdf Files|*.pdf";
            openFileDialog1.RestoreDirectory = true;
            openFileDialog1.Title = "Image File Wrapper Chooser";

            if (openFileDialog1.ShowDialog() == DialogResult.OK)
            {
                try
                {
                    GlobalVariables.ImageFileWrapperPath = openFileDialog1.FileName;

                }
                catch (Exception ex)
                {
                    MessageBox.Show("Error: Could not read file from disk. Original error: " + ex.Message);
                }
            }
            ImageFileWrapperPath.Text = GlobalVariables.ImageFileWrapperPath;
        }

        private void ImageFileWrapperPath_TextChanged(object sender, EventArgs e)
        {

        }


        private void button2_Click(object sender, EventArgs e)
        {
            iTextSharp.text.pdf.PdfReader pdfReader = new iTextSharp.text.pdf.PdfReader(GlobalVariables.ImageFileWrapperPath);
            IList<Dictionary<string, object>> book_mark = iTextSharp.text.pdf.SimpleBookmark.GetBookmark(pdfReader);

            List<ImageFileWrapperBookmarks> IFWBookmarks = new List<ImageFileWrapperBookmarks>();
            foreach (Dictionary<string, object> bk in book_mark) // bk is a single instance of book_mark
            {
                ImageFileWrapperBookmarks.BookmarkNumber = ImageFileWrapperBookmarks.BookmarkNumber + 1;
                foreach (KeyValuePair<string, object> kvr in bk) // kvr is the key/value in bk
                {
                    if (kvr.Key == "Kids" || kvr.Key == "kids")
                    {
                        //create recursive program for children
                    }
                    else if (kvr.Key == "Title" || kvr.Key == "title")
                    {

                    }
                    else if (kvr.Key == "Page" || kvr.Key == "page")
                    {

                    }

                }
            }

            MessageBox.Show(GlobalVariables.ImageFileWrapperPath);
        }
    }
}

1 个答案:

答案 0 :(得分:0)

这是解析PDF并创建类似于您描述的数据结构的一种方法。首先是数据结构:

public class BookMark
{
    static int _number;
    public BookMark() { Number = ++_number; }
    public int Number { get; private set; }
    public string Title { get; set; }
    public string PageNumberString { get; set; }
    public int PageNumberInteger { get; set; }
    public static void ResetNumber() { _number = 0; }

    // bookmarks title may have illegal filename character(s)
    public string GetFileName()
    {
        var fileTitle = Regex.Replace(
            Regex.Replace(Title, @"\s+", "-"), 
            @"[^-\w]", ""
        );
        return string.Format("{0:D4}-{1}.pdf", Number, fileTitle);
    }
}

创建Bookmark(上图)列表的方法:

List<BookMark> ParseBookMarks(IList<Dictionary<string, object>> bookmarks)
{
    int page;
    var result = new List<BookMark>();
    foreach (var bookmark in bookmarks)
    {
        // add top-level bookmarks
        var stringPage = bookmark["Page"].ToString();
        if (Int32.TryParse(stringPage.Split()[0], out page))
        {
            result.Add(new BookMark() {
                Title = bookmark["Title"].ToString(),
                PageNumberString = stringPage,
                PageNumberInteger = page
            });
        }

        // recurse
        if (bookmark.ContainsKey("Kids"))
        {
            var kids = bookmark["Kids"] as IList<Dictionary<string, object>>;
            if (kids != null && kids.Count > 0)
            {
                result.AddRange(ParseBookMarks(kids));
            }
        }
    }
    return result;
}

像这样调用上面的方法将结果转储到文本文件中:

void DumpResults(string path)
{
    using (var reader = new PdfReader(path))
    {
        // need this call to parse page numbers
        reader.ConsolidateNamedDestinations();

        var bookmarks = ParseBookMarks(SimpleBookmark.GetBookmark(reader));
        var sb = new StringBuilder();
        foreach (var bookmark in bookmarks)
        {
            sb.AppendLine(string.Format(
                "{0, -4}{1, -100}{2, -25}{3}",
                bookmark.Number, bookmark.Title,
                bookmark.PageNumberString, bookmark.PageNumberInteger
            ));
        }
        File.WriteAllText(outputTextFile, sb.ToString());
    }
}

更大的问题是如何将每个Bookmark提取到一个单独的文件中。如果每个 Bookmark 开始新页面,那么很容易:

  1. 迭代ParseBookMarks()
  2. 的返回值
  3. 选择以当前BookMark.Number开头的页面范围,并以 next BookMark.Number - 1
  4. 结尾
  5. 使用该页面范围创建单独的文件。
  6. 这样的事情:

    void ProcessPdf(string path)
    {
        using (var reader = new PdfReader(path))
        {
            // need this call to parse page numbers
            reader.ConsolidateNamedDestinations();
    
            var bookmarks = ParseBookMarks(SimpleBookmark.GetBookmark(reader));
            for (int i = 0; i < bookmarks.Count; ++i)
            {
                int page = bookmarks[i].PageNumberInteger;
                int nextPage = i + 1 < bookmarks.Count
                    // if not top of page will be missing content
                    ? bookmarks[i + 1].PageNumberInteger - 1 
    
                    /* alternative is to potentially add redundant content:
                    ? bookmarks[i + 1].PageNumberInteger
                    */
    
                    : reader.NumberOfPages;
                string range = string.Format("{0}-{1}", page, nextPage);
    
                // DEMO!
                if (i < 10)
                {
                    var outputPath = Path.Combine(OUTPUT_DIR, bookmarks[i].GetFileName());
                    using (var readerCopy = new PdfReader(reader))
                    {
                        var number = bookmarks[i].Number;
                        readerCopy.SelectPages(range);
                        using (FileStream stream = new FileStream(outputPath, FileMode.Create))
                        {
                            using (var document = new Document())
                            {
                                using (var copy = new PdfCopy(document, stream))
                                {
                                    document.Open();
                                    int n = readerCopy.NumberOfPages;
                                    for (int j = 0; j < n; )
                                    {
                                        copy.AddPage(copy.GetImportedPage(readerCopy, ++j));
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
    

    问题在于,所有书签都不太可能出现在PDF的每个页面的顶部。要了解我的意思,请尝试评论/取消注释bookmarks[i + 1].PageNumberInteger行。