如何解决Jsoup - 不寻常且困难的 HTTP 响应代码 403
我制作了一个程序,可以使用 Jsoup 从 Internet 下载许多文件(图像)。当我收到 java.io.IOException: Server returned HTTP response code: 403 for URL
错误时,我进行了谷歌研究,发现问题出在身份验证上。
我得到的最完整、最详尽的答案是here。我按照建议的步骤操作。当我要求浏览器加载一个所需的图像时,Live HTTP Headers 扩展为我提供了以下输出:
[url of my image]
Host: [host name]
User-Agent: [a long string detailing my browser,OS and more]
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip,deflate,br
DNT: 1
Connection: keep-alive
Cookie: __cfduid=d7ae93ab45711bffa2c46bc1ef8e7b3411612879897
Upgrade-Insecure-Requests: 1
NS_ERROR_NET_ON_RESOLVING
然后我通过添加一行代码来相应地修改我的程序的 downloadFile()
方法,该代码会告诉主机请求是如何来自(我的)浏览器的:urlConnection.addRequestProperty("User-Agent","[a long string detailing my browser,OS and more]" );
。
没用。
我决定更精确地添加关于我的浏览器发送的 GET 请求的每一个信息,如下所示:
public static boolean downloadFile(String URL,File directory,String nameFile) {
File file = directory.toPath().resolve(nameFile).toFile();
try {
Files.createFile(file.toPath()) != null;
} catch (UnsupportedOperationException e) { e.printStackTrace(); }
catch (FileAlreadyExistsException e) { e.printStackTrace(); }
catch (SecurityException e) { e.printStackTrace(); }
catch (IOException e) { e.printStackTrace(); }
}
ReadableByteChannel readChannel = null;
FileOutputStream fileOS = null;
URLConnection urlConnection = null;
try { urlConnection = new URL(URL).openConnection();
} catch (MalformedURLException e1) { e1.printStackTrace(); }
catch (IOException e1) { e1.printStackTrace(); }
urlConnection.addRequestProperty("User-Agent",OS and more]" );
urlConnection.addRequestProperty("Accept","text/html,*/*;q=0.8" );
urlConnection.addRequestProperty("Accept-Language","it-IT,en;q=0.3" );
urlConnection.addRequestProperty("Accept-Encoding","gzip,br" );
urlConnection.addRequestProperty("DNT","1" );
urlConnection.addRequestProperty("Connection","keep-alive" );
urlConnection.addRequestProperty("Cookie","__cfduid=d7ae93ab45711bffa2c46bc1ef8e7b3411612879897" );
urlConnection.addRequestProperty("Upgrade-Insecure-Requests","1" );
urlConnection.setReadTimeout(5000);
urlConnection.setConnectTimeout(5000);
try {
readChannel = Channels.newChannel(new URL(URL).openStream());
} catch (MalformedURLException e) { e.printStackTrace(); return false; }
catch (IOException e) { e.printStackTrace(); return false; }
try {
fileOS = new FileOutputStream(file);
} catch (FileNotFoundException e) { e.printStackTrace(); return false; }
catch (SecurityException e) { e.printStackTrace(); return false; }
FileChannel writeChannel = fileOS.getChannel();
try {
writeChannel.transferFrom(readChannel,Long.MAX_VALUE);
} catch (IllegalArgumentException e) { e.printStackTrace(); return false; }
catch (NonReadableChannelException e) { e.printStackTrace(); return false; }
catch (NonWritableChannelException e) { e.printStackTrace(); return false; }
catch (ClosedByInterruptException e) { e.printStackTrace(); return false; }
catch (AsynchronousCloseException e) { e.printStackTrace(); return false; }
catch (ClosedChannelException e) { e.printStackTrace(); return false; }
catch (IOException e) { e.printStackTrace(); return false; }
finally { try { fileOS.close();
} catch (IOException e) { e.printStackTrace(); } }
return true;
}
我仍然得到 Error 403
。
然后我再次检查了 Live HTTP Headers 详细说明的 GET 通信,发现实际上有 三个:
[url of my image]
Host: [host name]
User-Agent: [a long string detailing my browser,br
DNT: 1
Connection: keep-alive
Cookie: __cfduid=d7ae93ab45711bffa2c46bc1ef8e7b3411612879897
Upgrade-Insecure-Requests: 1
GET: HTTP/2.0 304 Not Modified
date: Wed,17 Feb 2021 11:51:08 GMT
last-modified: Sat,06 Feb 2021 10:08:41 GMT
etag: "54a0-5baa81f065742"
cache-control: max-age=14400
cf-cache-status: HIT
age: 5184
cf-request-id: 08516d7cde0000331f7608e000000001
expect-ct: max-age=604800,report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report?s=tucVFsSXV%2BrfYOpIrnWuaGdpMEeZdQ3UO%2FtVBtIxXEp7975j0kyPQUd1zj6FGW85aam8vpJWNabh%2BHw0WAKSxaZuJJjPl9op4SQfmn1cuw%3D%3D"}],"max_age":604800,"group":"cf-nel"}
nel: {"report_to":"cf-nel","max_age":604800}
vary: Accept-Encoding
server: cloudflare
cf-ray: 622f4b749b5c331f-CDG
X-Firefox-Spdy: h2
---------------------
GET: HTTP/2.0 200 OK
content-type: image/jpeg
content-length: 21664
accept-ranges: bytes
date: Wed,17 Feb 2021 11:51:08 GMT
cf-cache-status: HIT
cf-request-id: 08516d7cde0000331f7608e000000001
report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report?s=tucVFsSXV%2BrfYOpIrnWuaGdpMEeZdQ3UO%2FtVBtIxXEp7975j0kyPQUd1zj6FGW85aam8vpJWNabh%2BHw0WAKSxaZuJJjPl9op4SQfmn1cuw%3D%3D"}],"max_age":604800}
cf-ray: 622f4b749b5c331f-CDG
last-modified: Sat,06 Feb 2021 10:08:41 GMT
cache-control: max-age=14400
age: 5184
expect-ct: max-age=604800,report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
vary: Accept-Encoding
server: cloudflare
X-Firefox-Spdy: h2
我不明白这些。我的浏览器和它所联系的主机之间发生了什么?如何修改我的程序以下载这些图像而不会再次造成 Error 403
的麻烦?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。