在阿拉伯语中,像“ا”(Alef)这样的字母有很多形式/变体:
(ا,أ,Å,آ)
也是字母ي的情况相同,也可能是ى.
例如,“أين”这个词应该包含所有这些(大多数情况下都是不正确的)变体:أين,إين,اين,آين,أىن,اىن,آىن……等等.
为什么?我正在构建一个小的文本更正系统,可以处理语法错误并用正确的单词替换错误的单词.
我一直试图以最干净的方式做到这一点,但我最终得到一个8 for / foreach循环只是为了处理“أ”这个词
必须有一个更好的更干净的方式来做到这一点!有什么想法吗?
这是我的代码到目前为止:
$alefVariations = ['ا','إ','أ','آ']; $word = 'أيامنا'; // Break into letters $wordLetters = preg_split('//u',$word,null,PREG_SPLIT_NO_EMPTY); $worDalefLettersIndexes = []; // Get the أ letters for($letterIndex = 0; $letterIndex < count($wordLetters); $letterIndex++){ if(in_array($wordLetters[$letterIndex],$alefVariations)){ $worDalefLettersIndexes[] = $letterIndex; } } $eachLetterVariations = []; foreach($worDalefLettersIndexes as $alefLettersIndex){ foreach($alefVariations as $alefVariation){ $wordcopy = $wordLetters; $wordcopy[$alefLettersIndex] = $alefVariation; $eachLetterVariations[$alefLettersIndex][] = $wordcopy; } } $variations = []; foreach($worDalefLettersIndexes as $alefLettersIndex){ $alefWordVariations = $eachLetterVariations[$alefLettersIndex]; foreach($worDalefLettersIndexes as $alefLettersIndex_inner){ if($alefLettersIndex == $alefLettersIndex_inner) continue; foreach($alefWordVariations as $alefWordVariation){ foreach($alefVariations as $alefVariation){ $alefWordVariationcopy = $alefWordVariation; $alefWordVariationcopy[$alefLettersIndex_inner] = $alefVariation; $variations[] = $alefWordVariationcopy; } } } } $finalList = []; foreach($variations as $variation){ $finalList[] = implode('',$variation); } return array_unique($finalList);
我不认为这是自动更正的方法,但这里是您提出的问题的通用解决方案.它使用递归,它是在JavaScript(我不知道PHP).
function solve(word,sameLetters,customIndices = []){ var splitLetters = word.split('') .map((char,index) => { // check if the current letter is within any variation if(customIndices.length == 0 || customIndices.includes(index)){ var variations = sameLetters.find(arr => arr.includes(char)); if(variations != undefined) return variations; } return [char]; }); // up to this point splitLetters will be like this // [["ا","إ","أ","آ"],["ي","ى","ي"],["ا"],["م"],["ن"],["ا"]] var res = []; recurse(splitLetters,'',res); // this function will generate all the permuations return res; } function recurse(letters,index,cur,res){ if(index == letters.length){ res.push(cur); } else { for(var letter of letters[index]) { recurse(letters,index + 1,cur + letter,res ); } } } var sameLetters = [ // represents the variations that you want to enumerate ['ا','آ'],['ي','ى','ي'] ]; var word = 'أيامنا'; var customIndices = [0,1]; // will make variations to the letters in these indices only. leave it empty for all indices var ans = solve(word,customIndices); console.log(ans);
原文地址:https://www.jb51.cc/php/137377.html
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。