微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

php – 正则表达式/ DOMDocument – 匹配和替换不在链接中的文本

我需要以不区分大小写的方式查找和替换所有文本匹配,除非文本在锚标签内,例如:
<p>Match this text and replace it</p>
<p>Don't <a href="/">match this text</a></p>
<p>We still need to match this text and replace it</p>

搜索“匹配此文本”将仅替换第一个实例和最后一个实例.

根据Gordon的评论,在这种情况下可能会使用DOMDocument.我完全不熟悉DOMDocument扩展,并且非常感谢这个功能的一些基本示例.

这是一个UTF-8安全解决方案,它不仅适用于正确格式化的文档,而且与文档片段一起使用.

需要mb_convert_encoding,因为loadHtml()似乎有一个UTF-8编码的错误(见herehere).

mb_substr从输出中修剪body标签,这样你就可以获得原始内容,而无需任何额外的标记.

<?PHP
$html = '<p>Match this text and replace it</p>
<p>Don\'t <a href="/">match this text</a></p>
<p>We still need to match this text and replace itŐŰ</p>
<p>This is <a href="#">a link <span>with <strong>don\'t match this text</strong> content</span></a></p>';

$dom = new DOMDocument();
// loadXml needs properly formatted documents,so it's better to use loadHtml,but it needs a hack to properly handle UTF-8 encoding
$dom->loadHtml(mb_convert_encoding($html,'HTML-ENTITIES',"UTF-8"));

$xpath = new DOMXPath($dom);

foreach($xpath->query('//text()[not(ancestor::a)]') as $node)
{
    $replaced = str_ireplace('match this text','MATCH',$node->wholeText);
    $newNode  = $dom->createDocumentFragment();
    $newNode->appendXML($replaced);
    $node->parentNode->replaceChild($newNode,$node);
}

// get only the body tag with its contents,then trim the body tag itself to get only the original content
echo mb_substr($dom->saveXML($xpath->query('//body')->item(0)),6,-7,"UTF-8");

参考文献:
1. find and replace keywords by hyperlinks in an html fragment,via php dom
2. Regex / DOMDocument – match and replace text not in a link
3. php problem with russian language
4. Why Does DOM Change Encoding?

我读了几十个答案,所以我很抱歉,如果我忘了某人(请评论,我会在这种情况下添加你的).

感谢Gordon和my other answer评论.

原文地址:https://www.jb51.cc/php/138203.html

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐