微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

我如何使用itext7从存储在Blob存储中的pdf中提取文本?

如何解决我如何使用itext7从存储在Blob存储中的pdf中提取文本?

我正在使用itext7从pdf提取文本。这是我的代码,用于提取本地pdf文件的文本:

 var pageText = new StringBuilder();  
    using(PdfDocument pdfDocument = new PdfDocument(new PdfReader("E:\\es.pdf"))) {  
        var pageNumbers = pdfDocument.GetNumberOfPages();  
        for (int i = 1; i <= pageNumbers; i++) {  
            LocationTextExtractionStrategy strategy = new LocationTextExtractionStrategy();  
            PdfCanvasProcessor parser = new PdfCanvasProcessor(strategy);  
            parser.ProcesspageContent(pdfDocument.GetFirstPage());  
            pageText.Append(strategy.GetResultantText());  
        }  
    } 

但是,我不知道如何解析存储在azure blob存储中的pdf。

解决方法

如果您想阅读pdf格式的天蓝色斑点,请参考以下代码

 string storageAccountName = "andyprivate";
            string accountKey = "";
            var blobServiceClient = new BlobServiceClient(
                new Uri($"https://{storageAccountName}.blob.core.windows.net"),new StorageSharedKeyCredential(storageAccountName,accountKey),new BlobClientOptions());

            var containerClient = blobServiceClient.GetBlobContainerClient("test");
            var blob = containerClient.GetBlobClient("sample.pdf");
            BlobProperties properties = await blob.GetPropertiesAsync();
            var pageText = new StringBuilder();
            using (var stream = await blob.OpenReadAsync(position: 0,bufferSize: (int)properties.ContentLength))
            using (PdfDocument pdfDocument = new PdfDocument(new PdfReader(stream))) {
                var pageNumbers = pdfDocument.GetNumberOfPages();
                for (int i = 1; i <= pageNumbers; i++)
                {
                    LocationTextExtractionStrategy strategy = new LocationTextExtractionStrategy();
                    PdfCanvasProcessor parser = new PdfCanvasProcessor(strategy);
                    parser.ProcessPageContent(pdfDocument.GetPage(i));
                    pageText.Append(strategy.GetResultantText());
                    pageText.Append(Environment.NewLine);


                }

                Console.WriteLine(pageText);
            }

enter image description here

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。