htmlcxx 介绍
htmlcxx 是一个 C++ 的 HTML 解析器和 CSS1 的解析器。The parsing politics attempt to mimic
the behavior of Mozilla Firefox, so you should expect parse trees similar to
those created by Firefox. However, it does not insert nonexistent stuff in
your HTML. Therefore, serializing the DOM tree gives exactly the same output
as the original HTML document. Another key feature is an STL-like tree
navigation API provided by the tree.hh template library.
示例代码:
#include <htmlcxx/html/ParserDom.h> ... //Parse some html code string html = "<html><body>hey</body></html>"; HTML::ParserDom parser; tree<HTML::Node> dom = parser.parseTree(html); //Print whole DOM tree cout << dom << endl; //Dump all links in the tree tree<HTML::Node>::iterator it = dom.begin(); tree<HTML::Node>::iterator end = dom.end(); for (; it != end; ++it) { if (it->tagName() == "A") { it->parseAttributes(); cout << it->attributes("href"); } } //Dump all text of the document it = dom.begin(); end = dom.end(); for (; it != end; ++it) { if ((!it->isTag()) && (!it->isComment())) { cout << it->text(); } }
htmlcxx 官网
http://htmlcxx.sourceforge.net/
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。