我试图从treenode传输数据(至少我认为它是什么),其中包含的数据远远超过我的需要。我很难操纵treenode中的数据。我宁愿有一个数组,只为我提供数据操作所需的数据。
我希望更高的费率有以下变量: 1. BookmarkNumber(整数) 2.日期(字符串) 3. DocumentType(字符串) 4. BookmarkPageNumberString(string) 5. BookmarkPageNumberInteger(整数)
我想从变量book_mark的数据中得到上面定义的比率(在我的代码中可以看到)。
我已经和他摔跤了两天。任何帮助将非常感激。我可能确定这个问题没有正确表达,所以请提出问题,以便我可以在必要时进一步解释。
非常感谢
我想尝试做的是创建一个Windows窗体程序,该程序将具有多个书签的PDF文件解析为每个书签/章节的离散PDF文件,同时使用正确的命名约定将书签保存在正确的文件夹中,文件夹和命名约定取决于要解析的书签/章节的PDF名称和标题名称。using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.IO;
using itextsharp.pdfa;
using iTextSharp.awt;
using iTextSharp.testutils;
using iTextSharp.text;
using iTextSharp.xmp;
using iTextSharp.xtra;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void ChooseImageFileWrapper_Click(object sender, EventArgs e)
{
OpenFileDialog openFileDialog1 = new OpenFileDialog();
openFileDialog1.InitialDirectory = GlobalVariables.InitialDirectory;
openFileDialog1.Filter = "Pdf Files|*.pdf";
openFileDialog1.RestoreDirectory = true;
openFileDialog1.Title = "Image File Wrapper Chooser";
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
try
{
GlobalVariables.ImageFileWrapperPath = openFileDialog1.FileName;
}
catch (Exception ex)
{
MessageBox.Show("Error: Could not read file from disk. Original error: " + ex.Message);
}
}
ImageFileWrapperPath.Text = GlobalVariables.ImageFileWrapperPath;
}
private void ImageFileWrapperPath_TextChanged(object sender, EventArgs e)
{
}
private void button2_Click(object sender, EventArgs e)
{
iTextSharp.text.pdf.PdfReader pdfReader = new iTextSharp.text.pdf.PdfReader(GlobalVariables.ImageFileWrapperPath);
IList<Dictionary<string, object>> book_mark = iTextSharp.text.pdf.SimpleBookmark.GetBookmark(pdfReader);
List<ImageFileWrapperBookmarks> IFWBookmarks = new List<ImageFileWrapperBookmarks>();
foreach (Dictionary<string, object> bk in book_mark) // bk is a single instance of book_mark
{
ImageFileWrapperBookmarks.BookmarkNumber = ImageFileWrapperBookmarks.BookmarkNumber + 1;
foreach (KeyValuePair<string, object> kvr in bk) // kvr is the key/value in bk
{
if (kvr.Key == "Kids" || kvr.Key == "kids")
{
//create recursive program for children
}
else if (kvr.Key == "Title" || kvr.Key == "title")
{
}
else if (kvr.Key == "Page" || kvr.Key == "page")
{
}
}
}
MessageBox.Show(GlobalVariables.ImageFileWrapperPath);
}
}
}
答案 0 :(得分:0)
这是解析PDF并创建类似于您描述的数据结构的一种方法。首先是数据结构:
public class BookMark
{
static int _number;
public BookMark() { Number = ++_number; }
public int Number { get; private set; }
public string Title { get; set; }
public string PageNumberString { get; set; }
public int PageNumberInteger { get; set; }
public static void ResetNumber() { _number = 0; }
// bookmarks title may have illegal filename character(s)
public string GetFileName()
{
var fileTitle = Regex.Replace(
Regex.Replace(Title, @"\s+", "-"),
@"[^-\w]", ""
);
return string.Format("{0:D4}-{1}.pdf", Number, fileTitle);
}
}
创建Bookmark
(上图)列表的方法:
List<BookMark> ParseBookMarks(IList<Dictionary<string, object>> bookmarks)
{
int page;
var result = new List<BookMark>();
foreach (var bookmark in bookmarks)
{
// add top-level bookmarks
var stringPage = bookmark["Page"].ToString();
if (Int32.TryParse(stringPage.Split()[0], out page))
{
result.Add(new BookMark() {
Title = bookmark["Title"].ToString(),
PageNumberString = stringPage,
PageNumberInteger = page
});
}
// recurse
if (bookmark.ContainsKey("Kids"))
{
var kids = bookmark["Kids"] as IList<Dictionary<string, object>>;
if (kids != null && kids.Count > 0)
{
result.AddRange(ParseBookMarks(kids));
}
}
}
return result;
}
像这样调用上面的方法将结果转储到文本文件中:
void DumpResults(string path)
{
using (var reader = new PdfReader(path))
{
// need this call to parse page numbers
reader.ConsolidateNamedDestinations();
var bookmarks = ParseBookMarks(SimpleBookmark.GetBookmark(reader));
var sb = new StringBuilder();
foreach (var bookmark in bookmarks)
{
sb.AppendLine(string.Format(
"{0, -4}{1, -100}{2, -25}{3}",
bookmark.Number, bookmark.Title,
bookmark.PageNumberString, bookmark.PageNumberInteger
));
}
File.WriteAllText(outputTextFile, sb.ToString());
}
}
更大的问题是如何将每个Bookmark
提取到一个单独的文件中。如果每个 Bookmark
开始新页面,那么很容易:
ParseBookMarks()
BookMark.Number
开头的页面范围,并以 next BookMark.Number - 1
这样的事情:
void ProcessPdf(string path)
{
using (var reader = new PdfReader(path))
{
// need this call to parse page numbers
reader.ConsolidateNamedDestinations();
var bookmarks = ParseBookMarks(SimpleBookmark.GetBookmark(reader));
for (int i = 0; i < bookmarks.Count; ++i)
{
int page = bookmarks[i].PageNumberInteger;
int nextPage = i + 1 < bookmarks.Count
// if not top of page will be missing content
? bookmarks[i + 1].PageNumberInteger - 1
/* alternative is to potentially add redundant content:
? bookmarks[i + 1].PageNumberInteger
*/
: reader.NumberOfPages;
string range = string.Format("{0}-{1}", page, nextPage);
// DEMO!
if (i < 10)
{
var outputPath = Path.Combine(OUTPUT_DIR, bookmarks[i].GetFileName());
using (var readerCopy = new PdfReader(reader))
{
var number = bookmarks[i].Number;
readerCopy.SelectPages(range);
using (FileStream stream = new FileStream(outputPath, FileMode.Create))
{
using (var document = new Document())
{
using (var copy = new PdfCopy(document, stream))
{
document.Open();
int n = readerCopy.NumberOfPages;
for (int j = 0; j < n; )
{
copy.AddPage(copy.GetImportedPage(readerCopy, ++j));
}
}
}
}
}
}
}
}
}
问题在于,所有书签都不太可能出现在PDF的每个页面的顶部。要了解我的意思,请尝试评论/取消注释bookmarks[i + 1].PageNumberInteger
行。