如何解决Puppeteer 请求拦截器通过 HTTP 30x 重定向导致 net::ERR_FAILED 错误
我正在尝试使用 Puppeteer 从 URL 获取 HTML,而不遵循重定向或触发相关的 HTTP 请求(CSS、图像等)。
根据Puppeteer documentation,we can use page.setRequestInterception()
忽略一些请求。
我还发现了几个 SO 问题,例如 this one 建议使用 request.isNavigationRequest()
和 request.redirectChain()
来确定请求是“main”还是重定向。
所以我尝试了,但出现 Error: net::ERR_Failed
错误。
带有 http://google.com
的示例(当请求时,它以带有 Location: http://www.google.com/
标头的 301 回答)。
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request',(interceptedRequest) => {
if (
interceptedRequest.isNavigationRequest()
&& interceptedRequest.redirectChain().length !== 0
) {
interceptedRequest.abort();
} else {
interceptedRequest.continue();
}
});
await page.goto('http://google.com');
const html = await page.content();
console.log(html);
await browser.close();
})();
使用 node --trace-warnings file.js
运行,我得到:
(node:14252) UnhandledPromiseRejectionWarning: Error: net::ERR_Failed at http://google.com
at navigate (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115:23)
at processticksAndRejections (internal/process/task_queues.js:93:5)
at async FrameManager.navigateFrame (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:90:21)
at async Frame.goto (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:416:16)
at async Page.goto (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:789:16)
at async /path_to_working_dir/file.js:17:3
at emitUnhandledRejectionWarning (internal/process/promises.js:168:15)
at processpromiseRejections (internal/process/promises.js:247:11)
at processticksAndRejections (internal/process/task_queues.js:94:32)
(node:14252) Error: net::ERR_Failed at http://google.com
at navigate (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115:23)
at processticksAndRejections (internal/process/task_queues.js:93:5)
at async FrameManager.navigateFrame (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:90:21)
at async Frame.goto (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:416:16)
at async Page.goto (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:789:16)
at async /path_to_working_dir/file.js:17:3
(node:14252) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future,promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
at emitDeprecationWarning (internal/process/promises.js:180:11)
at processpromiseRejections (internal/process/promises.js:249:13)
at processticksAndRejections (internal/process/task_queues.js:94:32)
使用 http://www.example.com
URL(而不是 http://google.com
)工作正常:没有错误,我得到了 curl http://www.example.com
的意思。
如何丢弃不需要的重定向请求,但仍然能够在“主”页面(page.screenshot()
、page.content()
、...)上执行操作?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。