微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

vb.NET 使用正则表达式在 PDF 中查找单词

如何解决vb.NET 使用正则表达式在 PDF 中查找单词

我正在尝试使用 Aspose.pdf 和正则表达式在 pdf 文件中查找某些单词。代码运行没有错误,但永远不会返回 TRUE。

Public Shared Function FindInPDF(sourcePdf As String,searchPhrase As String) As Boolean

        Try
            ' Open document
            Dim pdfDocument = New Document(sourcePdf)

            '   "D[a-z]{7}"
            ' Create TextAbsorber object to find all the phrases matching the regular expression
            Dim absorber As Aspose.Pdf.Text.TextFragmentAbsorber = New Aspose.Pdf.Text.TextFragmentAbsorber(searchPhrase) With {
                .TextSearchOptions = New TextSearchOptions(True)
            }

            ' Accept the absorber for all the pages
            pdfDocument.Pages.Accept(absorber)

            ' Loop through the fragments
            For Each textFragment As Aspose.Pdf.Text.TextFragment In absorber.TextFragments
                Console.WriteLine("Text : {0} ",textFragment.Text)
                FindInPDF = True
            Next

        Catch ex As Exception
            MessageBox.Show(ex.Message)
        End Try
        Return FindInPDF
    End Function

我的代码错误吗?

通过searchPhrase函数中插入正则表达式字符串

解决方法

我没有使用付费图书馆 Aspose.pdf,而是改用了 iTextSharp。它具有相同的功能。

Public Shared Function GetTextFromPDF2(ByVal PdfFileName As String,searchPhrase As String) As Boolean

        Try
            Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)
            Dim sOut = ""
            For i = 1 To oReader.NumberOfPages
                Dim its As New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy
                sOut &= iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(oReader,i,its)
                Dim adrRx As Regex = New Regex(searchPhrase)
                Dim keyword As New List(Of String)
                For Each item As Match In adrRx.Matches(sOut.ToLower)
                    keyword.Add(item.Value)
                    GetTextFromPDF2 = True
                Next

            Next
        Catch ex As Exception
            MessageBox.Show(ex.Message)
        End Try
        Return GetTextFromPDF2
    End Function

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。