Html到PDf转换unicode字符呈现为空

时间:2014-06-25 10:58:45

标签: c# asp.net itextsharp

我正在使用itext sharp将一些html转换为pdf。首先,我已经将一些html字符串填入String Writer,然后使用下面提到的代码将字节数组转换为pdf

问题是unicode字符[特定阿拉伯语]呈现为空。

我的代码是

var sw = new StringWriter();
                sw = GetHtmlContent();// here i fetch html
                byte[] data;
                using (var sr = new StringReader(sw.ToString()))
                {                   
                    using (var ms = new MemoryStream())
                    {
                        using (var pdfDoc = new Document())
                        {
                            //Bind a parser to our PDF document
                            using (var htmlparser = new HTMLWorker(pdfDoc))
                            {
                                //Bind the writer to our document and our final stream
                                using (var w = PdfWriter.GetInstance(pdfDoc, ms))
                                {
                                    pdfDoc.Open();
                                    //Parse the HTML directly into the document
                                    htmlparser.Parse(sr);
                                    pdfDoc.Close();
                                    //Grab the bytes from the stream before closing it
                                    data = ms.ToArray();
                                }
                            }
                        }
                    }
                }
                Response.Buffer = false;
                Response.Clear();
                Response.ClearContent();
                Response.ClearHeaders();
                Response.ContentType = "application/pdf";
                Response.AddHeader("Content-Disposition", "attachment; filename=Test.pdf");
                Response.BinaryWrite(data);
                Response.End(); 

请帮我解决它的错误

1 个答案:

答案 0 :(得分:-1)

Check below steps to display unicode characters in converting Html to Pdf

  1. 创建HTMLWorker
  2. 注册unicode字体并指定它
  3. 创建样式表并将编码设置为Identity-H
  4. 将样式表分配给html解析器
  5. 检查以下代码

        TextReader reader = new StringReader(html);
        Document document = new Document(PageSize.A4, 30, 30, 30, 30);
        PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(FileName, FileMode.Create));
        HTMLWorker worker = new HTMLWorker(document);
        document.Open();
        FontFactory.Register("C:\\Windows\\Fonts\\ARIALUNI.TTF", "arial unicode ms");
        iTextSharp.text.html.simpleparser.StyleSheet ST = new iTextSharp.text.html.simpleparser.StyleSheet();
        ST.LoadTagStyle("body", "encoding", "Identity-H");
        worker.Style = ST;
        worker.StartDocument();
    
  6. 检查以下链接以获取更多理解....

    使用此方法在从HTML转换为PDF时也会显示印地语,土耳其语和特殊字符。查看下面的演示图片。

    enter image description here