Jsoup - 不寻常且困难的 HTTP 响应代码 403

如何解决Jsoup - 不寻常且困难的 HTTP 响应代码 403

我制作了一个程序,可以使用 Jsoup 从 Internet 下载许多文件(图像)。当我收到 java.io.IOException: Server returned HTTP response code: 403 for URL 错误时,我进行了谷歌研究,发现问题出在身份验证上。

我得到的最完整、最详尽的答案是here。我按照建议的步骤操作。当我要求浏览器加载一个所需的图像时,Live HTTP Headers 扩展为我提供了以下输出:

[url of my image]
Host: [host name] 
User-Agent: [a long string detailing my browser,OS and more]
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip,deflate,br
DNT: 1
Connection: keep-alive
Cookie: __cfduid=d7ae93ab45711bffa2c46bc1ef8e7b3411612879897
Upgrade-Insecure-Requests: 1

NS_ERROR_NET_ON_RESOLVING

然后我通过添加一行代码来相应地修改我的程序的 downloadFile() 方法,该代码会告诉主机请求是如何来自(我的)浏览器的:urlConnection.addRequestProperty("User-Agent","[a long string detailing my browser,OS and more]" );

没用。

我决定更精确地添加关于我的浏览器发送的 GET 请求的每一个信息,如下所示:

    public static boolean downloadFile(String URL,File directory,String nameFile) {
        File file = directory.toPath().resolve(nameFile).toFile();
        
        try {
                Files.createFile(file.toPath()) != null;
                } catch (UnsupportedOperationException e) {         e.printStackTrace();        }
                  catch (FileAlreadyExistsException e) {            e.printStackTrace();        }
                  catch (SecurityException e) {                     e.printStackTrace();        }
                  catch (IOException e) {                           e.printStackTrace();        }
            }

        ReadableByteChannel readChannel = null;
        FileOutputStream fileOS = null;
        

        URLConnection urlConnection = null;
        try {   urlConnection = new URL(URL).openConnection();
            } catch (MalformedURLException e1) {            e1.printStackTrace();       }
              catch (IOException e1) {                      e1.printStackTrace();       }
        urlConnection.addRequestProperty("User-Agent",OS and more]" );
        urlConnection.addRequestProperty("Accept","text/html,*/*;q=0.8"        );
        urlConnection.addRequestProperty("Accept-Language","it-IT,en;q=0.3"       );
        urlConnection.addRequestProperty("Accept-Encoding","gzip,br"     );
        urlConnection.addRequestProperty("DNT","1"     );
        urlConnection.addRequestProperty("Connection","keep-alive"        );
        urlConnection.addRequestProperty("Cookie","__cfduid=d7ae93ab45711bffa2c46bc1ef8e7b3411612879897"      );
        urlConnection.addRequestProperty("Upgrade-Insecure-Requests","1"     );
        
        urlConnection.setReadTimeout(5000);
        urlConnection.setConnectTimeout(5000);

          
        try {
            readChannel = Channels.newChannel(new URL(URL).openStream());
            } catch (MalformedURLException e) {         e.printStackTrace();        return false;       }
              catch (IOException e) {                   e.printStackTrace();        return false;       }
        try {
            fileOS = new FileOutputStream(file);
            } catch (FileNotFoundException e) {         e.printStackTrace();        return false;       }
              catch (SecurityException e) {             e.printStackTrace();        return false;       }
        
        FileChannel writeChannel = fileOS.getChannel();
        try {
            writeChannel.transferFrom(readChannel,Long.MAX_VALUE);
            } catch (IllegalArgumentException e) {      e.printStackTrace();        return false;       }
              catch (NonReadableChannelException e) {   e.printStackTrace();        return false;       }
              catch (NonWritableChannelException e) {   e.printStackTrace();        return false;       }
              catch (ClosedByInterruptException e) {    e.printStackTrace();        return false;       }
              catch (AsynchronousCloseException e) {    e.printStackTrace();        return false;       }
              catch (ClosedChannelException e) {        e.printStackTrace();        return false;       }
              catch (IOException e) {                   e.printStackTrace();        return false;       }
        
            finally {   try {                           fileOS.close();
                    } catch (IOException e) {           e.printStackTrace();                    }       }
        return true;
    }

我仍然得到 Error 403

然后我再次检查了 Live HTTP Headers 详细说明的 GET 通信,发现实际上有 三个

[url of my image]
Host: [host name]
User-Agent: [a long string detailing my browser,br
DNT: 1
Connection: keep-alive
Cookie: __cfduid=d7ae93ab45711bffa2c46bc1ef8e7b3411612879897
Upgrade-Insecure-Requests: 1

GET: HTTP/2.0 304 Not Modified
date: Wed,17 Feb 2021 11:51:08 GMT
last-modified: Sat,06 Feb 2021 10:08:41 GMT
etag: "54a0-5baa81f065742"
cache-control: max-age=14400
cf-cache-status: HIT
age: 5184
cf-request-id: 08516d7cde0000331f7608e000000001
expect-ct: max-age=604800,report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report?s=tucVFsSXV%2BrfYOpIrnWuaGdpMEeZdQ3UO%2FtVBtIxXEp7975j0kyPQUd1zj6FGW85aam8vpJWNabh%2BHw0WAKSxaZuJJjPl9op4SQfmn1cuw%3D%3D"}],"max_age":604800,"group":"cf-nel"}
nel: {"report_to":"cf-nel","max_age":604800}
vary: Accept-Encoding
server: cloudflare
cf-ray: 622f4b749b5c331f-CDG
X-Firefox-Spdy: h2
---------------------


GET: HTTP/2.0 200 OK
content-type: image/jpeg
content-length: 21664
accept-ranges: bytes
date: Wed,17 Feb 2021 11:51:08 GMT
cf-cache-status: HIT
cf-request-id: 08516d7cde0000331f7608e000000001
report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report?s=tucVFsSXV%2BrfYOpIrnWuaGdpMEeZdQ3UO%2FtVBtIxXEp7975j0kyPQUd1zj6FGW85aam8vpJWNabh%2BHw0WAKSxaZuJJjPl9op4SQfmn1cuw%3D%3D"}],"max_age":604800}
cf-ray: 622f4b749b5c331f-CDG
last-modified: Sat,06 Feb 2021 10:08:41 GMT
cache-control: max-age=14400
age: 5184
expect-ct: max-age=604800,report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
vary: Accept-Encoding
server: cloudflare
X-Firefox-Spdy: h2

我不明白这些。我的浏览器和它所联系的主机之间发生了什么?如何修改我的程序以下载这些图像而不会再次造成 Error 403 的麻烦?

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -> systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping("/hires") public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate<String
使用vite构建项目报错 C:\Users\ychen\work>npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)> insert overwrite table dwd_trade_cart_add_inc > select data.id, > data.user_id, > data.course_id, > date_format(
错误1 hive (edu)> insert into huanhuan values(1,'haoge'); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive> show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 <configuration> <property> <name>yarn.nodemanager.res