将 pdf 页面渲染为图像 PDFBox Java 时缺少 PDAnnotationTextMarkup

如何解决将 pdf 页面渲染为图像 PDFBox Java 时缺少 PDAnnotationTextMarkup

我使用 PDFBox API 编写的代码可以突出显示 PDF 中的单词,但是当我将突出显示的 PDF 页面转换为图像时,我突出显示的任何内容都会从图像中消失。

下面的屏幕截图带有突出显示的文本,为了突出显示,我使用了 PDFBox 的 PDAnnotationTextMarkup 类:

Highlighted PDF Page

下面是pdf页面转图片后的图片:

Highlighted PDF Page Image after converting

以下是我用于将 PDF 转换为图像的代码:

PDDocument document = PDDocument.load(new File(pdfFilename));
PDFRenderer pdfRenderer = new PDFRenderer(document);
int pageCounter = 0;
for (PDPage page : document.getPages())
{
   BufferedImage bim = pdfRenderer.renderImageWithDPI(pageCounter,300,ImageType.RGB);

   ImageIOUtil.writeImage(bim,pdfFilename + "-" + (pageCounter++) + ".png",300);
}
document.close();

请指出这里有什么问题,为什么 PDFRenderer 无法获取带有突出显示的红色框的 PDF 页面图像。

以下是我用来突出显示 PDF 文本的代码:

private void highlightText(String pdfFilePath,String highlightedPdfFilePath) {
    try {
        // Loading an existing document
        
        File file = new File(highlightedPdfFilePath);
        if (!file.exists()) {
            file = new File(pdfFilePath);
        }
        
        PDDocument document = PDDocument.load(file);
        
        // extended PDFTextStripper class
        PDFTextStripper stripper = new PDFTextHighlighter();
        
        // Get number of pages
        int number_of_pages = document.getDocumentCatalog().getPages().getCount();

        // The method writeText will invoke an override version of
        // writeString
        Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
        stripper.writeText(document,dummy);
        
        // Print collected information
        System.out.println("tokenStream:::"+tokenStream);
        System.out.println("tokenStream size::"+tokenStream.size());
        System.out.println("coordinates size::"+coordinates.size());

        double page_height;
        double page_width;
        double width,height,minx,maxx,miny,maxy;
        int rotation;
        
        // scan each page and highlitht all the words inside them
        for (int page_index = 0; page_index < number_of_pages; page_index++) {
            // get current page
            PDPage page = document.getPage(page_index);
            
            // Get annotations for the selected page
            List<PDAnnotation> annotations = page.getAnnotations();

            // Define a color to use for highlighting text
            PDColor red = new PDColor(new float[] { 1,0 },PDDeviceRGB.INSTANCE);

            // Page height and width
            page_height = page.getMediaBox().getHeight();
            page_width = page.getMediaBox().getWidth();
            // Scan collected coordinates
            for (int i = 0; i < coordinates.size(); i++) {
                if (!differencePgaeNumber.contains(page_index)) {
                    differencePgaeNumber.add(page_index);
                }
                // if the current coordinates are not related to the current
                // page,ignore them
                if ((int) coordinates.get(i)[4] != (page_index + 1))
                    continue;
                else {
                    // get rotation of the page...portrait..landscape..
                    rotation = (int) coordinates.get(i)[7];

                    // page rotated of 90degrees
                    if (rotation == 90) {
                        height = coordinates.get(i)[5];
                        width = coordinates.get(i)[6];
                        width = (page_height * width) / page_width;

                        // define coordinates of a rectangle
                        maxx = coordinates.get(i)[1];
                        minx = coordinates.get(i)[1] - height;
                        miny = coordinates.get(i)[0];
                        maxy = coordinates.get(i)[0] + width;
                    } else // i should add here the cases -90/-180 degrees
                    {
                        height = coordinates.get(i)[5];
                        minx = coordinates.get(i)[0];
                        maxx = coordinates.get(i)[2];
                        miny = page_height - coordinates.get(i)[1];
                        maxy = page_height - coordinates.get(i)[3] + height;
                    }
                    
                    // Add an annotation for each scanned word
                    PDAnnotationTextMarkup txtMark = new PDAnnotationTextMarkup(
                            PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);
                    txtMark.setColor(red);
                    txtMark.setConstantOpacity((float) 0.3); // 30%
                                                                // transparent
                    PDRectangle position = new PDRectangle();
                    position.setLowerLeftX((float) minx);
                    position.setLowerLeftY((float) miny);
                    position.setUpperRightX((float) maxx);
                    position.setUpperRightY((float) ((float) maxy + height));
                    txtMark.setRectangle(position);

                    float[] quads = new float[8];
                    quads[0] = position.getLowerLeftX(); // x1
                    quads[1] = position.getUpperRightY() - 2; // y1
                    quads[2] = position.getUpperRightX(); // x2
                    quads[3] = quads[1]; // y2
                    quads[4] = quads[0]; // x3
                    quads[5] = position.getLowerLeftY() - 2; // y3
                    quads[6] = quads[2]; // x4
                    quads[7] = quads[5]; // y5
                    txtMark.setQuadPoints(quads);
                    txtMark.setContents(tokenStream.get(i).toString());
                    
                    annotations.add(txtMark);
                }
            }
        }

        // Saving the document in a new file
        File highlighted_doc = new File(highlightedPdfFilePath);
        
        document.save(highlighted_doc);
        document.close();
    } catch (IOException e) {
        System.out.println(e);
    }
}

解决方法

您需要使用此调用构建注释的视觉外观:

txtMark.constructAppearances(document);

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)&gt; insert overwrite table dwd_trade_cart_add_inc &gt; select data.id, &gt; data.user_id, &gt; data.course_id, &gt; date_format(
错误1 hive (edu)&gt; insert into huanhuan values(1,&#39;haoge&#39;); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive&gt; show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 &lt;configuration&gt; &lt;property&gt; &lt;name&gt;yarn.nodemanager.res