微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

PHP-spider

程序名称:PHP-spider

授权协议: GPL

操作系统: 跨平台

开发语言: PHP

PHP-spider 介绍

一个可扩展的PHP WEB 蜘蛛,示例代码

use VDB\Spider\Spider;
use VDB\Spider\discoverer\XPathExpressiondiscoverer;

$spider = new Spider('http://www.oschina.net');

特性:

  • supports two traversal algorithms: breadth-first and depth-first

  • supports depth limiting and queue size limiting

  • supports adding custom URI discovery logic, based on XPath, CSS selectors, or plain old PHP

  • comes with a useful set of URI filters, such as Domain limiting

  • supports custom URI filters, both prefetch (URI) and postfetch (Resource content)

  • supports custom request handling logic

  • comes with a useful set of persistence handlers (memory, file. Redis soon to follow)

  • supports custom persistence handlers

  • collects statistics about the crawl for reporting

  • dispatches useful events, allowing developers to add even more custom behavior

  • supports a politeness policy

  • will soon come with many default discoverers: RSS, Atom, RDF, etc.

  • will soon support multiple queueing mechanisms (file, memcache, redis)

  • will eventually support distributed spidering with a central queue

PHP-spider 官网

http://php-spider.org/

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐