微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

PHP:在文本中查找带有和不带空格的重复单词

我可以使用此函数在文本中找到重复的单词:

$str = 'bob is a good person. mary is a good person. who is the best? are you a good person? bob is the best?';
    function repeated($str)
    {
        $str=trim($str);  
        $str=ereg_replace('[[:space:]]+', ' ',$str);  
        $words=explode(' ',$str);  
        foreach($words as $w)  
        {  
        $wordstats[($w)]++;  
        }  
        foreach($wordstats as $k=>$v)  
        {  
            if($v>=2)  
            {  
                print "$k"." , ";  
            }  
        }  
    }

这就是我的结果:

bob , good , person , is , a , the , best?

问:我怎样才能得到结果重复的单词和空间之间的多部分单词看起来像:

bob , good , person , is , a , the , best? , good person , is a , a good , is the , bob is

解决方法:

<?PHP
$str = 'bob is a good person. mary is a good person. who is the best? are you a good person? bob is the best?';

//all words:
$found = str_word_count(strtolower($str),1);
//get all words with occurance of more then 1
$counts = array_count_values($found);
$repeated = array_keys(array_filter($counts,function($a){return $a > 1;}));
//begin results with the groups of 1 word.
$results = $repeated;
while($word = array_shift($found)){
    if(!in_array($word,$repeated)) continue;
    $additions = array();
    while($add = array_shift($found)){
        if(!in_array($add,$repeated)) break;
        $additions[] = $add;
        $count = preg_match_all('/'.preg_quote($word).'\W+'.implode('\W+',$additions).'/si',$str,$matches);
        if($count > 1){
            $newmatch = $word.' '.implode(' ',$additions);
            if(!in_array($newmatch,$results)) $results[] = $newmatch;
        } else {
            break;
        }
    }
    if(!empty($additions)) array_splice($found,0,0,$additions);
}
var_dump($results);

产量:

array(17) {
  [0]=>
  string(3) "bob"
  [1]=>
  string(2) "is"
  [2]=>
  string(1) "a"
  [3]=>
  string(4) "good"
  [4]=>
  string(6) "person"
  [5]=>
  string(3) "the"
  [6]=>
  string(4) "best"
  [7]=>
  string(6) "bob is"
  [8]=>
  string(4) "is a"
  [9]=>
  string(9) "is a good"
  [10]=>
  string(16) "is a good person"
  [11]=>
  string(6) "a good"
  [12]=>
  string(13) "a good person"
  [13]=>
  string(11) "good person"
  [14]=>
  string(6) "is the"
  [15]=>
  string(11) "is the best"
  [16]=>
  string(8) "the best"
}

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐