如何将多字节字符串拆分为PHP中的单词?
这是我到目前为止所做的,但我想改进代码……
mb_internal_encoding( 'UTF-8');
mb_regex_encoding( 'UTF-8');
$arr = mb_split( '[\s\[\]().,;:-_]', $str );
有没有办法说一个单词是一个“alpha”字符序列(不使用符号a-z,因为我想包括非拉丁字符)
解决方法:
试试这个宝贝:
preg_match_all('/[\p{L}\p{M}]+/u', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
# Matched text = $result[0][$i];
}
匹配所有可能的字母及其口音作为单词:
"
[\p{L}\p{M}] # Match a single character present in the list below
# A character with the Unicode property “letter” (any kind of letter from any language)
# A character with the Unicode property “mark” (a character intended to be combined with another character (e.g. accents, umlauts, enclosing Boxes, etc.))
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。