使用ParseXHtml unicode字符的iTextSharp HTML到PDF不会被解析

时间:2017-06-20 11:16:08

标签: html asp.net pdf unicode itext

首先,我在过去的两天里一直在处理这个问题,我想我可能最终会问它,因为我找不到任何有效的解决方案。首先让我介绍一下这个问题,然后我会解释一下我尝试过的问题。

正如标题所介绍的那样,我正在尝试使用iTextSharp将HTML转换为PDF,HTML也包括gridview。

 using (StringWriter sw = new StringWriter())
        {
            using (HtmlTextWriter hw = new HtmlTextWriter(sw))
            {


                System.Text.Encoding Enc = System.Text.Encoding.GetEncoding("UTF-8");
                iTextSharp.text.pdf.BaseFont STF_Helvetica_Turkish = iTextSharp.text.pdf.BaseFont.CreateFont("Helvetica", "CP1254", iTextSharp.text.pdf.BaseFont.NOT_EMBEDDED);

                iTextSharp.text.Font fontNormal = new iTextSharp.text.Font(STF_Helvetica_Turkish, 12, iTextSharp.text.Font.NORMAL);

                StringReader sr = new StringReader(sw.ToString());
                string contentHtml = PrintElem();
                contentHtml = contentHtml.Replace("Ş", "S");
                contentHtml = contentHtml.Replace("İ", "I");
                contentHtml = contentHtml.Replace("ı", "i");
                contentHtml = contentHtml.Replace("Ğ", "G");
                contentHtml = contentHtml.Replace("Ü", "U");
                contentHtml = contentHtml.Replace("ğ", "g");
                contentHtml = contentHtml.Replace("ş", "s");
                StringReader srHtml = new StringReader(contentHtml);
                Stream denemeStream = GenerateStreamFromString(srHtml.ToString());



                iTextSharp.text.Document pdfDoc = new iTextSharp.text.Document(iTextSharp.text.PageSize.A4, 10f, 10f, 10f, 0f);
                MemoryStream ms = new MemoryStream();
                iTextSharp.text.pdf.PdfWriter writer = iTextSharp.text.pdf.PdfWriter.GetInstance(pdfDoc, ms);

                pdfDoc.Open();

                using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(PrintElem().ToString())))
                {
                    using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(PrintElem().ToString())))
                    {
                        //iTextSharp.tool.xml.XMLWorkerFontProvider fontProvider = new iTextSharp.tool.xml.XMLWorkerFontProvider();

                        //Parse the HTML
                        iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, pdfDoc, msHtml, (Stream)null);
                    }
                }

                //iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, pdfDoc, denemeStream,null,Encoding.UTF8);
                pdfDoc.Close();

                MemoryStream ret = new MemoryStream(ms.ToArray());
                return ret;



            }
        }

正如你所看到的,我尝试的最后一个解决方案是将所有土耳其语字符改为英语字符,但pdf输出仍然没有显示它们。 到目前为止,我已经尝试将编码改为互联网上提供的所有内容。我试图添加字体但我没有做到,因为你可以看到我没有使用ParseXHtml的重载函数(如果你知道如何将这个fontNormal添加到解析我也很乐意尝试一下)

PrintElem函数返回以下HTML内容(或包含土耳其字符的类似内容)

<html>
   <head>
      <h3 align='center'>ABG SİGORTA ALACAK/VERECEK MUTABAKAT EKSTRESİ</h3>
      <style>#ContentPlaceHolder1_vaultsListGridview{width: 100%;} td:nth-child(3) {text-align: right;}td:nth-child(4) {text-align: right;}.netWorthClass{text-align: right;}</style>
   </head>
   <body >
      <br/>
      <div>Kasa Adı = 1</div>
      <div>Devir = 56 TL</div>
      <br/>
      <div>
         \r\n\t
         <table class=\"table table-hover table-striped\" cellspacing=\"0\" rules=\"all\" border=\"1\" id=\"ContentPlaceHolder1_vaultsListGridview\" style=\"border-collapse:collapse;\">
            \r\n\t\t
            <thead>
               \r\n\t\t\t
               <tr>
                  \r\n\t\t\t\t
                  <th scope=\"col\">Tarih</th>
                  <th scope=\"col\">Açıklama</th>
                  <th scope=\"col\">Giren</th>
                  <th scope=\"col\">Çıkan</th>
                  \r\n\t\t\t
               </tr>
               \r\n\t\t
            </thead>
            <tbody>
               \r\n\t\t\t
               <tr>
                  \r\n\t\t\t\t
                  <td>\r\n                                                    <span id=\"ContentPlaceHolder1_vaultsListGridview_fullDateLabel_0\">26/05/2017</span>\r\n                                                </td>
                  <td>\r\n                                                    <span id=\"ContentPlaceHolder1_vaultsListGridview_detailLabel_0\">MART AYI KALAN VE NİSAYIN AYI SSK ÖDEMESİ</span>\r\n                                                </td>
                  <td>\r\n                                                    0\r\n                                                </td>
                  <td>\r\n                                                    1,295.00\r\n                                                </td>
                  \r\n\t\t\t
               </tr>
               <tr>
                  \r\n\t\t\t\t
                  <td>\r\n                                                    <span id=\"ContentPlaceHolder1_vaultsListGridview_fullDateLabel_1\">31/05/2017</span>\r\n                                                </td>
                  <td>\r\n                                                    <span id=\"ContentPlaceHolder1_vaultsListGridview_detailLabel_1\">NİSAN AYI KOMİSYON</span>\r\n                                                </td>
                  <td>\r\n                                                    1,351.00\r\n                                                </td>
                  <td>\r\n                                                    0\r\n                                                </td>
                  \r\n\t\t\t
               </tr>
               \r\n\t\t
            </tbody>
            \r\n\t
         </table>
         \r\n
      </div>
      <br/>
      <div class='netWorthClass'>Giren Miktar = 1351 TL</div>
      <div class='netWorthClass'>Çıkan Miktar = 1295 TL</div>
      <br/>
   </body>
</html>

我从调试器中获取了上面的html部分,对不起,但你看到上面有问题的字符。

我很乐意尝试你提供的一切。提前谢谢。

0 个答案:

没有答案