微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

php – 如何使用可互换字母获取单词的所有可能变体?

在阿拉伯语中,像“ا”(Alef)这样的字母有很多形式/变体:

(ا,أ,Å,آ)

也是字母ي的情况相同,也可能是ى.

我想要做的是获得一个单词的所有可能的变化与许多أ和ي字母.

例如,“أين”这个词应该包含所有这些(大多数情况下都是不正确的)变体:أين,إين,اين,آين,أىن,اىن,آىن……等等.

为什么?我正在构建一个小的文本更正系统,可以处理语法错误并用正确的单词替换错误的单词.

我一直试图以最干净的方式做到这一点,但我最终得到一个8 for / foreach循环只是为了处理“أ”这个词

必须有一个更好的更干净的方式来做到这一点!有什么想法吗?

这是我的代码到目前为止:

$alefVariations = ['ا','إ','أ','آ'];
        $word = 'أيامنا';

        // Break into letters
        $wordLetters = preg_split('//u',$word,null,PREG_SPLIT_NO_EMPTY);
        $worDalefLettersIndexes = [];

        // Get the أ letters
        for($letterIndex = 0; $letterIndex < count($wordLetters); $letterIndex++){
            if(in_array($wordLetters[$letterIndex],$alefVariations)){
                $worDalefLettersIndexes[] = $letterIndex;
            }
        }

        $eachLetterVariations = [];
        foreach($worDalefLettersIndexes as $alefLettersIndex){
            foreach($alefVariations as $alefVariation){
                $wordcopy = $wordLetters;
                $wordcopy[$alefLettersIndex] = $alefVariation;

                $eachLetterVariations[$alefLettersIndex][] = $wordcopy;
            }
        }

        $variations = [];
        foreach($worDalefLettersIndexes as $alefLettersIndex){
            $alefWordVariations = $eachLetterVariations[$alefLettersIndex];

            foreach($worDalefLettersIndexes as $alefLettersIndex_inner){
                if($alefLettersIndex == $alefLettersIndex_inner) continue;

                foreach($alefWordVariations as $alefWordVariation){
                    foreach($alefVariations as $alefVariation){
                        $alefWordVariationcopy = $alefWordVariation;
                        $alefWordVariationcopy[$alefLettersIndex_inner] = $alefVariation;

                        $variations[] = $alefWordVariationcopy;
                    }
                }
            }
        }

        $finalList = [];
        foreach($variations as $variation){
            $finalList[] = implode('',$variation);
        }

        return array_unique($finalList);
我不认为这是自动更正的方法,但这里是您提出的问题的通用解决方案.它使用递归,它是在JavaScript(我不知道PHP).
function solve(word,sameLetters,customIndices = []){
    var splitLetters = word.split('')
                .map((char,index) => { // check if the current letter is within any variation
                    if(customIndices.length == 0 || customIndices.includes(index)){
                        var variations = sameLetters.find(arr => arr.includes(char));
                        if(variations != undefined) return variations;
                    }
                    return [char];
                 });

    // up to this point splitLetters will be like this
    //  [["ا","إ","أ","آ"],["ي","ى","ي"],["ا"],["م"],["ن"],["ا"]]
    var res = [];
    recurse(splitLetters,'',res); // this function will generate all the permuations
    return res;
}

function recurse(letters,index,cur,res){
    if(index == letters.length){
        res.push(cur);
    } else {
        for(var letter of letters[index]) {
            recurse(letters,index + 1,cur + letter,res );
        }
    }
}

var sameLetters = [     // represents the variations that you want to enumerate
    ['ا','آ'],['ي','ى','ي']
];

var word = 'أيامنا';    
var customIndices = [0,1]; // will make variations to the letters in these indices only. leave it empty for all indices

var ans = solve(word,customIndices);
console.log(ans);

原文地址:https://www.jb51.cc/php/137377.html

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐