PHP-spider 介绍
use VDB\Spider\Spider; use VDB\Spider\discoverer\XPathExpressiondiscoverer; $spider = new Spider('http://www.oschina.net');
特性:
-
supports two traversal algorithms: breadth-first and depth-first
-
supports depth limiting and queue size limiting
-
supports adding custom URI discovery logic, based on XPath, CSS selectors, or plain old PHP
-
comes with a useful set of URI filters, such as Domain limiting
-
supports custom URI filters, both prefetch (URI) and postfetch (Resource content)
-
supports custom request handling logic
-
comes with a useful set of persistence handlers (memory, file. Redis soon to follow)
-
supports custom persistence handlers
-
collects statistics about the crawl for reporting
-
dispatches useful events, allowing developers to add even more custom behavior
-
supports a politeness policy
-
will soon come with many default discoverers: RSS, Atom, RDF, etc.
-
will soon support multiple queueing mechanisms (file, memcache, redis)
-
will eventually support distributed spidering with a central queue
PHP-spider 官网
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。