微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Puppeteer 请求拦截器通过 HTTP 30x 重定向导致 net::ERR_FAILED 错误

如何解决Puppeteer 请求拦截器通过 HTTP 30x 重定向导致 net::ERR_FAILED 错误

我正在尝试使用 Puppeteer 从 URL 获取 HTML,而不遵循重定向或触发相关的 HTTP 请求(CSS、图像等)。

根据Puppeteer documentation,we can use page.setRequestInterception()忽略一些请求。

我还发现了几个 SO 问题,例如 this one 建议使用 request.isNavigationRequest()request.redirectChain() 来确定请求是“main”还是重定向

所以我尝试了,但出现 Error: net::ERR_Failed 错误

带有 http://google.com 的示例(当请求时,它以带有 Location: http://www.google.com/ 标头的 301 回答)。

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setRequestInterception(true);
  page.on('request',(interceptedRequest) => {
    if (
        interceptedRequest.isNavigationRequest()
        && interceptedRequest.redirectChain().length !== 0
    ) {
        interceptedRequest.abort();
    } else {
        interceptedRequest.continue();
    }
  });
  await page.goto('http://google.com');
  const html = await page.content();
  console.log(html);
  await browser.close();
})();

使用 node --trace-warnings file.js 运行,我得到:

(node:14252) UnhandledPromiseRejectionWarning: Error: net::ERR_Failed at http://google.com
    at navigate (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115:23)
    at processticksAndRejections (internal/process/task_queues.js:93:5)
    at async FrameManager.navigateFrame (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:90:21)
    at async Frame.goto (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:416:16)
    at async Page.goto (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:789:16)
    at async /path_to_working_dir/file.js:17:3
    at emitUnhandledRejectionWarning (internal/process/promises.js:168:15)
    at processpromiseRejections (internal/process/promises.js:247:11)
    at processticksAndRejections (internal/process/task_queues.js:94:32)
(node:14252) Error: net::ERR_Failed at http://google.com
    at navigate (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115:23)
    at processticksAndRejections (internal/process/task_queues.js:93:5)
    at async FrameManager.navigateFrame (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:90:21)
    at async Frame.goto (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:416:16)
    at async Page.goto (/path_to_working_dir/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:789:16)
    at async /path_to_working_dir/file.js:17:3
(node:14252) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future,promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
    at emitDeprecationWarning (internal/process/promises.js:180:11)
    at processpromiseRejections (internal/process/promises.js:249:13)
    at processticksAndRejections (internal/process/task_queues.js:94:32)

使用 http://www.example.com URL(而不是 http://google.com)工作正常:没有错误,我得到了 curl http://www.example.com 的意思。

如何丢弃不需要的重定向请求,但仍然能够在“主”页面page.screenshot()page.content()、...)上执行操作?

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。