将pdf转换为文本 - 无法识别带有'UniKS-UCS2-H'的错误字体'HYSMyeongJoStd-Medium'

时间:2015-11-02 07:05:23

标签: asp.net vb.net itextsharp vb.net-2010

面临以下问题

  

字体'HYSMyeongJoStd-Medium'与'UniKS-UCS2-H'无法识别。

来自PdfTextExtractor.GetTextFromPage(pdReader,intPIndex) 同时将文本附加到字符串构建器中。

Public Function ConvertPDFToText(bytes As Byte()) As String
        Dim spPDFText As New StringBuilder()
        Try
            Dim pdReader As New PdfReader(bytes)
            Dim numberOfPage As Integer = pdReader.NumberOfPages
            For intPIndex As Integer = 1 To numberOfPage
                spPDFText.Append(PdfTextExtractor.GetTextFromPage(pdReader, intPIndex))
            Next
        Catch ex As Exception               
            ExceptionLog.ErrorHandling(ex)
            Throw
        End Try
        Return spPDFText.ToString().Replace("�", " ")
    End Function

0 个答案:

没有答案