如何解决Jsoup 拆分块元素以保留文本和子元素的顺序
假设我有一个这样的 html 片段:
<p>Generally speaking,in the U.S.,if you want to <a href="\"https://web.archive.org/web/20210408195204/https://www.wsj.com/articles/investors-big-and-small-are-driving-stock-gains-with-borrowed-money-11617799940\"">
borrow money from your broker to buy stocks</a>,you are capped at 2-to-1 leverage. If you have $100,you can buy $200 worth of stock. Back in the olden days,you could have bought $300 or $500 or $1,000 of stock with your $100,borrowing the rest from your broker,but then a Great Depression happened and regulators clamped down on margin lending. </p>
我用 jsoup 对其进行了解析,并将其表示为 Element。
我希望能够将元素拆分为以下内容:
- 一段文本:
"Generally speaking,if you want to "
- 一个元素
<a href="\"https://web.archive.org/web/20210408195204/https://www.wsj.com/articles/investors-big-and-small-are-driving-stock-gains-with-borrowed-money-11617799940\"">
- 另一个文本片段:
"borrow money from your broker to buy stocks</a>,but then a Great Depression happened and regulators clamped down on margin lending"
并在保留这些部分的顺序的同时执行此操作。
到目前为止我看过的东西:
- getAllElements() 只返回索引 0 处的 p 标签本身,然后返回 a 标签的元素
- children() 只为 a 标签返回 1 个元素。
解决方法
您想要的是 childNodes()
方法。然后你会得到一个 List<Node>
。您可以使用这些节点,访问它们的属性和父节点,但您无法使用 select()
访问更深的标签,因为只有 Element
类具有此方法。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。