微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

iText 7.1.15 'SymbolEncoding' 不是受支持的编码名称

如何解决iText 7.1.15 'SymbolEncoding' 不是受支持的编码名称

从 iTextSharp 5.5.13.2 迁移到 iText 7.1.15 后,我正在测试我的应用程序,并在从特定机构的 PDF 文档中提取文本时遇到异常。这些文件包含>;但是,iTextSharp 能够成功地从这些 PDF 文档中提取所有文本。我也在 iText 7.1.14 中重现了这个异常。

比较 itext7 和 iTextSharp 之间的 PdfEncodings 类后,似乎符号编码在那里。

由于在从同一个 PDF 中提取文本时,此异常仅发生在 iText7 上,而不发生在 iTextSharp 上,因此我认为这是一个错误

有什么想法吗?

这是个例外:

'SymbolEncoding' is not a supported encoding name.
For information on defining a custom encoding,see the documentation for the Encoding.RegisterProvider method.
Parameter name: name ArgumentException
at System.Globalization.EncodingTable.internalGetCodePageFromName(String name)
at System.Globalization.EncodingTable.GetCodePageFromName(String name)
at iText.IO.Util.IanaEncodings.GetEncodingEncoding(String name)
at iText.IO.Util.EncodingUtil.ConvertToBytes(Char[] chars,String encoding)
at iText.IO.Font.PdfEncodings.ConvertToBytes(String text,String encoding)
at iText.IO.Font.FontEncoding.FillNamedEncoding()
at iText.IO.Font.FontEncoding.CreateFontEncoding(String baseEncoding)
at iText.Kernel.Font.PdfType1Font..ctor(PdfDictionary fontDictionary)
at iText.Kernel.Font.PdfFontFactory.CreateFont(PdfDictionary fontDictionary)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.GetFont(PdfDictionary fontDict)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.SetTextFontOperator.Invoke(PdfCanvasProcessor processor,PdfLiteral operator,IList`1 operands)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.InvokeOperator(PdfLiteral operator,IList`1 operands)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.ProcessContent(Byte[] contentBytes,PdfResources resources)
at PDFKeeper.WindowsApplication.PdfFileInfo.GetText()
at PDFKeeper.WindowsApplication.UploadService.UploadStagedPdfsAndSupplementalData()
at PDFKeeper.WindowsApplication.UploadService.ExecuteUploadCycle()
at System.Threading.Tasks.Task.Execute()

这是我使用 iText7 的应用程序的功能

Public Function GetText() As String
    Using reader = New PdfReader(fileInfo.FullName)
        Dim textString As New StringBuilder
        Using pdfDoc As New PdfDocument(reader)
            For page As Integer = 1 To pdfDoc.GetNumberOfPages
                Dim strategy As ITextExtractionStrategy = New LocationTextExtractionStrategy
                Dim pageText As String = PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(page),strategy)
                Dim lines As String() = pageText.Split(ControlChars.Lf)
                For Each line In lines
                    textString.AppendLine(line)
                Next
            Next
        End Using
        Return textString.ToString
    End Using
End Function

这与使用 iTextSharp 时的功能相同:

Public Function GetText() As String
    Using reader = New PdfReader(fileInfo.FullName)
        Dim textString As New StringBuilder
        For page As Integer = 1 To reader.NumberOfPages
            Try
                Dim strategy As ITextExtractionStrategy = New LocationTextExtractionStrategy
                Dim pageText As String = PdfTextExtractor.GetTextFromPage(reader,page,strategy)
                Dim lines As String() = pageText.Split(ControlChars.Lf)
                For Each line In lines
                    textString.AppendLine(line)
                Next
            Catch ex As InlineImageParseException
            End Try
        Next
        Return textString.ToString
    End Using
End Function

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。