如何解决如何使用 BeautifulSoup 围绕多个标签包装新标签?
尝试:
from bs4 import BeautifulSoup
html_doc = """\
<item>
<title>Heading for Sec 1</title>
<p>some text sec 1</p>
<p>some text sec 1</p>
<p>some text sec 1</p>
</item>
<item>
<title>Heading for Sec 2</title>
<p>some text sec 2</p>
<p>some text sec 2</p>
<p>some text sec 2</p>
</item>
<item>
<title>Heading for Sec 3</title>
<p>some text sec 3</p>
<p>some text sec 3</p>
</item>"""
soup = BeautifulSoup(html_doc, "html.parser")
for item in soup.select("item"):
t = soup.new_tag("content")
t.append("\n")
item.title.insert_after(t)
item.title.insert_after("\n")
for p in item.select("p"):
t.append(p)
t.append("\n")
item.smooth()
for t in item.find_all(text=True, recursive=False):
t.replace_with("\n")
print(soup)
打印:
<item>
<title>Heading for Sec 1</title>
<content>
<p>some text sec 1</p>
<p>some text sec 1</p>
<p>some text sec 1</p>
</content>
</item>
<item>
<title>Heading for Sec 2</title>
<content>
<p>some text sec 2</p>
<p>some text sec 2</p>
<p>some text sec 2</p>
</content>
</item>
<item>
<title>Heading for Sec 3</title>
<content>
<p>some text sec 3</p>
<p>some text sec 3</p>
</content>
</item>
解决方法
在下面的示例中,我试图在一个部分中的<content>
所有标签周围包装一个标签。<p>
每个部分都在一个内<item>
,但<title>
需要留在外<content>
。我怎样才能做到这一点?
源文件:
<item>
<title>Heading for Sec 1</title>
<p>some text sec 1</p>
<p>some text sec 1</p>
<p>some text sec 1</p>
</item>
<item>
<title>Heading for Sec 2</title>
<p>some text sec 2</p>
<p>some text sec 2</p>
<p>some text sec 2</p>
</item>
<item>
<title>Heading for Sec 3</title>
<p>some text sec 3</p>
<p>some text sec 3</p>
</item>
我想要这个输出:
<item>
<title>Heading for Sec 1</title>
<content>
<p>some text sec 1</p>
<p>some text sec 1</p>
</content>
</item>
<item>
<title>Heading for Sec 2</title>
<content>
<p>some text sec 2</p>
<p>some text sec 2</p>
<p>some text sec 2</p>
</content>
</item>
<item>
<title>Heading for Sec 3</title>
<content>
<p>some text sec 3</p>
<p>some text sec 3</p>
</content>
</item>
下面的代码是我正在尝试的。但是,它会<content>
在每个<p>
标签周围包装一个标签,而不是<p>
在一个部分中的所有标签周围。我怎样才能解决这个问题?
from bs4 import BeautifulSoup
with open('testdoc.txt','r') as f:
soup = BeautifulSoup(f,"html.parser")
content = None
for tag in soup.select("p"):
if tag.name == "p":
content = tag.wrap(soup.new_tag("content"))
content.append(tag)
continue
print(soup)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。