我想使用iTextSharp将以下HTML转换为PDF,但不知道从哪里开始:
<style>
.headline{font-size:200%}
</style>
<p>
This <em>is </em>
<span class="headline" style="text-decoration: underline;">some</span>
<strong>sample<em> text</em></strong>
<span style="color: red;">!!!</span>
</p>
答案 0 :(得分:137)
首先,尽管HTML和PDF是在同一时间创建的,但它们并不相关。 HTML旨在传达更高级别的信息,例如段落和表格。虽然有控制它的方法,但最终由浏览器来绘制这些更高级别的概念。 PDF旨在传达文档,文档必须“看起来”无论在何处呈现都是相同的。
在HTML文档中,您可能有一个100%宽的段落,根据显示器的宽度,它可能需要2行或10行,当您打印时,它可能是7行,当您在手机可能需要20行。但是,PDF文件必须独立于渲染设备,因此无论您的屏幕尺寸如何,必须始终呈现完全相同的颜色。
由于上面的必须,PDF不支持“表格”或“段落”等抽象内容。 PDF支持三种基本内容:文本,线条/形状和图像。 (还有其他一些东西,比如注释和电影,但我试图在这里保持简单。)在PDF中你没有说“这是一个段落,浏览器做你的事!”。相反,你说,“使用这个确切的字体在这个确切的X,Y位置绘制这个文本,不要担心,我之前已经计算过文本的宽度,所以我知道它将全部适合这一行”。你也不要说“这是一张桌子”,而是你说“在这个确切的位置画这个文字,然后在我之前计算过的另一个确切位置画一个矩形,所以我知道它会出现在文本周围”
其次,iText和iTextSharp解析HTML和CSS。而已。 ASP.Net,MVC,Razor,Struts,Spring等都是HTML框架,但iText / iTextSharp 100%不知道它们。与DataGridViews,Repeater,Templates,Views等相同,它们都是特定于框架的抽象。从你选择的框架中获取HTML是你的责任,iText对你没有帮助。如果您收到The document has no pages
的异常,或者您认为“iText未解析我的HTML”,那么您几乎可以确定don't actually have HTML,您只会认为这样做。
第三,多年来一直存在的内置课程是HTMLWorker
但是已经被XMLWorker
(Java / .Net取代了。在HTMLWorker
上正在进行零工作,该工作不支持CSS文件,并且对最基本的CSS属性的支持有限,实际上breaks on certain tags。如果您没有看到HTML attribute or CSS property and value in this file,那么HTMLWorker
可能不支持XMLWorker
。 HTMLWorker
有时可能会更复杂,但这些并发症也会make it more extensible。
下面是C#代码,它显示了如何将HTML标记解析为iText抽象,这些抽象会自动添加到您正在处理的文档中。 C#和Java非常相似,因此转换它应该相对容易。 Example#1使用内置的class="headline"
来解析HTML字符串。由于只支持内联样式,XMLWorker
会被忽略,但其他所有内容都应该有效。示例#2与第一个示例相同,只是它使用//Create a byte array that will eventually hold our final PDF
Byte[] bytes;
//Boilerplate iTextSharp setup here
//Create a stream that we can write to, in this case a MemoryStream
using (var ms = new MemoryStream()) {
//Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF
using (var doc = new Document()) {
//Create a writer that's bound to our PDF abstraction and our stream
using (var writer = PdfWriter.GetInstance(doc, ms)) {
//Open the document for writing
doc.Open();
//Our sample HTML and CSS
var example_html = @"<p>This <em>is </em><span class=""headline"" style=""text-decoration: underline;"">some</span> <strong>sample <em> text</em></strong><span style=""color: red;"">!!!</span></p>";
var example_css = @".headline{font-size:200%}";
/**************************************************
* Example #1 *
* *
* Use the built-in HTMLWorker to parse the HTML. *
* Only inline CSS is supported. *
* ************************************************/
//Create a new HTMLWorker bound to our document
using (var htmlWorker = new iTextSharp.text.html.simpleparser.HTMLWorker(doc)) {
//HTMLWorker doesn't read a string directly but instead needs a TextReader (which StringReader subclasses)
using (var sr = new StringReader(example_html)) {
//Parse the HTML
htmlWorker.Parse(sr);
}
}
/**************************************************
* Example #2 *
* *
* Use the XMLWorker to parse the HTML. *
* Only inline CSS and absolutely linked *
* CSS is supported *
* ************************************************/
//XMLWorker also reads from a TextReader and not directly from a string
using (var srHtml = new StringReader(example_html)) {
//Parse the HTML
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
}
/**************************************************
* Example #3 *
* *
* Use the XMLWorker to parse HTML and CSS *
* ************************************************/
//In order to read CSS as a string we need to switch to a different constructor
//that takes Streams instead of TextReaders.
//Below we convert the strings into UTF8 byte array and wrap those in MemoryStreams
using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(example_css))) {
using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(example_html))) {
//Parse the HTML
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss);
}
}
doc.Close();
}
}
//After all of the PDF "stuff" above is done and closed but **before** we
//close the MemoryStream, grab all of the active bytes from the stream
bytes = ms.ToArray();
}
//Now we just need to do something with those bytes.
//Here I'm writing them to disk but if you were in ASP.Net you might Response.BinaryWrite() them.
//You could also write the bytes to a database in a varbinary() column (but please don't) or you
//could pass them to another function for further PDF processing.
var testFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "test.pdf");
System.IO.File.WriteAllBytes(testFile, bytes);
代替。 Example#3也解析了简单的CSS示例。
{{1}}
HTML-to-PDF要求有好消息。作为this answer showed, W3C标准css-break-3将解决问题 ......这是一份候选建议书,计划在测试后成为今年的最终建议书。
由于不是那么标准,有一些解决方案,带有C#插件,如print-css.rocks所示。
答案 1 :(得分:9)
@Chris Haas非常清楚地解释了如何使用itextSharp
将HTML
转换为PDF
,非常有帮助
我的补充是:
通过使用HtmlTextWriter
我将html标签放在HTML
表+内联CSS中,我得到了我想要的PDF而不使用XMLWorker
。
修改:添加示例代码:
ASPX页面:
<asp:Panel runat="server" ID="PendingOrdersPanel">
<!-- to be shown on PDF-->
<table style="border-spacing: 0;border-collapse: collapse;width:100%;display:none;" >
<tr><td><img src="abc.com/webimages/logo1.png" style="display: none;" width="230" /></td></tr>
<tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla.</td></tr>
<tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla.</td></tr>
<tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla</td></tr>
<tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla</td></tr>
<tr style="line-height:10px;height:10px;"><td style="display:none;font-size:11px;color:#10466E;padding:0px;text-align:center;"><i>blablabla</i> Pending orders report<br /></td></tr>
</table>
<asp:GridView runat="server" ID="PendingOrdersGV" RowStyle-Wrap="false" AllowPaging="true" PageSize="10" Width="100%" CssClass="Grid" AlternatingRowStyle-CssClass="alt" AutoGenerateColumns="false"
PagerStyle-CssClass="pgr" HeaderStyle-ForeColor="White" PagerStyle-HorizontalAlign="Center" HeaderStyle-HorizontalAlign="Center" RowStyle-HorizontalAlign="Center" DataKeyNames="Document#"
OnPageIndexChanging="PendingOrdersGV_PageIndexChanging" OnRowDataBound="PendingOrdersGV_RowDataBound" OnRowCommand="PendingOrdersGV_RowCommand">
<EmptyDataTemplate><div style="text-align:center;">no records found</div></EmptyDataTemplate>
<Columns>
<asp:ButtonField CommandName="PendingOrders_Details" DataTextField="Document#" HeaderText="Document #" SortExpression="Document#" ItemStyle-ForeColor="Black" ItemStyle-Font-Underline="true"/>
<asp:BoundField DataField="Order#" HeaderText="order #" SortExpression="Order#"/>
<asp:BoundField DataField="Order Date" HeaderText="Order Date" SortExpression="Order Date" DataFormatString="{0:d}"></asp:BoundField>
<asp:BoundField DataField="Status" HeaderText="Status" SortExpression="Status"></asp:BoundField>
<asp:BoundField DataField="Amount" HeaderText="Amount" SortExpression="Amount" DataFormatString="{0:C2}"></asp:BoundField>
</Columns>
</asp:GridView>
</asp:Panel>
C#代码:
protected void PendingOrdersPDF_Click(object sender, EventArgs e)
{
if (PendingOrdersGV.Rows.Count > 0)
{
//to allow paging=false & change style.
PendingOrdersGV.HeaderStyle.ForeColor = System.Drawing.Color.Black;
PendingOrdersGV.BorderColor = Color.Gray;
PendingOrdersGV.Font.Name = "Tahoma";
PendingOrdersGV.DataSource = clsBP.get_PendingOrders(lbl_BP_Id.Text);
PendingOrdersGV.AllowPaging = false;
PendingOrdersGV.Columns[0].Visible = false; //export won't work if there's a link in the gridview
PendingOrdersGV.DataBind();
//to PDF code --Sam
string attachment = "attachment; filename=report.pdf";
Response.ClearContent();
Response.AddHeader("content-disposition", attachment);
Response.ContentType = "application/pdf";
StringWriter stw = new StringWriter();
HtmlTextWriter htextw = new HtmlTextWriter(stw);
htextw.AddStyleAttribute("font-size", "8pt");
htextw.AddStyleAttribute("color", "Grey");
PendingOrdersPanel.RenderControl(htextw); //Name of the Panel
Document document = new Document();
document = new Document(PageSize.A4, 5, 5, 15, 5);
FontFactory.GetFont("Tahoma", 50, iTextSharp.text.BaseColor.BLUE);
PdfWriter.GetInstance(document, Response.OutputStream);
document.Open();
StringReader str = new StringReader(stw.ToString());
HTMLWorker htmlworker = new HTMLWorker(document);
htmlworker.Parse(str);
document.Close();
Response.Write(document);
}
}
当然包括iTextSharp参考文件到cs文件
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.html.simpleparser;
using iTextSharp.tool.xml;
希望这有帮助!
谢谢
答案 2 :(得分:6)
从2018年开始,还有 iText7 (旧的iTextSharp库的下一个迭代版本)及其HTML到PDF包: itext7.pdfhtml
用法很简单:
HtmlConverter.ConvertToPdf(
new FileInfo(@"Path\to\Html\File.html"),
new FileInfo(@"Path\to\Pdf\File.pdf")
);
方法有很多重载。
更新:iText *产品系列具有dual licensing model:免费开放源代码,用于商业用途。
答案 3 :(得分:0)
我使用以下代码创建PDF
protected void CreatePDF(Stream stream)
{
using (var document = new Document(PageSize.A4, 40, 40, 40, 30))
{
var writer = PdfWriter.GetInstance(document, stream);
writer.PageEvent = new ITextEvents();
document.Open();
// instantiate custom tag processor and add to `HtmlPipelineContext`.
var tagProcessorFactory = Tags.GetHtmlTagProcessorFactory();
tagProcessorFactory.AddProcessor(
new TableProcessor(),
new string[] { HTML.Tag.TABLE }
);
//Register Fonts.
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.Register(HttpContext.Current.Server.MapPath("~/Content/Fonts/GothamRounded-Medium.ttf"), "Gotham Rounded Medium");
CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
var htmlPipelineContext = new HtmlPipelineContext(cssAppliers);
htmlPipelineContext.SetTagFactory(tagProcessorFactory);
var pdfWriterPipeline = new PdfWriterPipeline(document, writer);
var htmlPipeline = new HtmlPipeline(htmlPipelineContext, pdfWriterPipeline);
// get an ICssResolver and add the custom CSS
var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
cssResolver.AddCss(CSSSource, "utf-8", true);
var cssResolverPipeline = new CssResolverPipeline(
cssResolver, htmlPipeline
);
var worker = new XMLWorker(cssResolverPipeline, true);
var parser = new XMLParser(worker);
using (var stringReader = new StringReader(HTMLSource))
{
parser.Parse(stringReader);
document.Close();
HttpContext.Current.Response.ContentType = "application /pdf";
if (base.View)
HttpContext.Current.Response.AddHeader("content-disposition", "inline;filename=\"" + OutputFileName + ".pdf\"");
else
HttpContext.Current.Response.AddHeader("content-disposition", "attachment;filename=\"" + OutputFileName + ".pdf\"");
HttpContext.Current.Response.Cache.SetCacheability(HttpCacheability.NoCache);
HttpContext.Current.Response.WriteFile(OutputPath);
HttpContext.Current.Response.End();
}
}
}
答案 4 :(得分:-5)
这是我用作指南的链接。希望这有帮助!
Converting HTML to PDF using ITextSharp
protected void Page_Load(object sender, EventArgs e)
{
try
{
string strHtml = string.Empty;
//HTML File path -http://aspnettutorialonline.blogspot.com/
string htmlFileName = Server.MapPath("~") + "\\files\\" + "ConvertHTMLToPDF.htm";
//pdf file path. -http://aspnettutorialonline.blogspot.com/
string pdfFileName = Request.PhysicalApplicationPath + "\\files\\" + "ConvertHTMLToPDF.pdf";
//reading html code from html file
FileStream fsHTMLDocument = new FileStream(htmlFileName, FileMode.Open, FileAccess.Read);
StreamReader srHTMLDocument = new StreamReader(fsHTMLDocument);
strHtml = srHTMLDocument.ReadToEnd();
srHTMLDocument.Close();
strHtml = strHtml.Replace("\r\n", "");
strHtml = strHtml.Replace("\0", "");
CreatePDFFromHTMLFile(strHtml, pdfFileName);
Response.Write("pdf creation successfully with password -http://aspnettutorialonline.blogspot.com/");
}
catch (Exception ex)
{
Response.Write(ex.Message);
}
}
public void CreatePDFFromHTMLFile(string HtmlStream, string FileName)
{
try
{
object TargetFile = FileName;
string ModifiedFileName = string.Empty;
string FinalFileName = string.Empty;
/* To add a Password to PDF -http://aspnettutorialonline.blogspot.com/ */
TestPDF.HtmlToPdfBuilder builder = new TestPDF.HtmlToPdfBuilder(iTextSharp.text.PageSize.A4);
TestPDF.HtmlPdfPage first = builder.AddPage();
first.AppendHtml(HtmlStream);
byte[] file = builder.RenderPdf();
File.WriteAllBytes(TargetFile.ToString(), file);
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(TargetFile.ToString());
ModifiedFileName = TargetFile.ToString();
ModifiedFileName = ModifiedFileName.Insert(ModifiedFileName.Length - 4, "1");
string password = "password";
iTextSharp.text.pdf.PdfEncryptor.Encrypt(reader, new FileStream(ModifiedFileName, FileMode.Append), iTextSharp.text.pdf.PdfWriter.STRENGTH128BITS, password, "", iTextSharp.text.pdf.PdfWriter.AllowPrinting);
//http://aspnettutorialonline.blogspot.com/
reader.Close();
if (File.Exists(TargetFile.ToString()))
File.Delete(TargetFile.ToString());
FinalFileName = ModifiedFileName.Remove(ModifiedFileName.Length - 5, 1);
File.Copy(ModifiedFileName, FinalFileName);
if (File.Exists(ModifiedFileName))
File.Delete(ModifiedFileName);
}
catch (Exception ex)
{
throw ex;
}
}
您可以下载示例文件。只需将要转换的html
放在files
文件夹中即可运行。它将自动生成pdf文件并将其放在同一文件夹中。但在您的情况下,您可以在htmlFileName
变量中指定html路径。