微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

php – utf8整理与丹麦语的区别

那你好.
我正在将数据库的charset从latin1_sweedish_ci更改为utf8.我总是使用utf8_danish_ci,因为它最接近挪威人的风格 – 我想.
但是utf8_general_ci和utf8_unicode_ci怎么样?

前一段时间;由于排序算法在后者中更为复杂,因此最好使用_general_ci以获得更好/更快的性能,并使用_unicode_ci以获得更高的准确性.但是由于速度/性能不再是问题 – 或者在大多数情况下不再是问题 – 在大多数情况下_unicode_ci可以使用吗?

但_unicode_ci与_danish_ci有何不同?
它是考虑到北欧字母表中的最后三个字母æ,ø,å吗?

我能找到的大多数比较(一个与另一个)只在_general_ci和_unicode_ci之间.

任何人都知道何时使用_unicode_ci或何时使用_danish_ci的任何例子都将受到高度赞赏……

解决方法:

简而言之,如果您的应用程序是多语言的并且在同一个表中存储多种语言,那么您大部分都是搞砸了,并且应该担心在数据库之外进行排序/整理 – utf8_general_ci就像其他任何一样好.

如果它只支持单一语言,你可以通过在数据库级别设置正确的排序规则来做得很好 – 在你的情况下,确实是utf8_danish_ci,因为如果维基百科是任何东西,那么它与挪威语相同.

如果您想阅读有关整理的更多内容,ICU文档会提供丰富的例子,说明这种东西是多么棘手.引用广泛:

http://userguide.icu-project.org/collation

[H]ere are some of the ways languages vary in ordering strings:

The letters A-Z can be sorted in a different order than in English.
For example, in Lithuanian, “y” is sorted between “i” and “k”.

Combinations of letters can be treated as if they were one letter. For
example, in Traditional Spanish “ch” is treated as a single letter,
and sorted between “c” and “d”.

Accented letters can be treated as minor variants of the unaccented
letter. For example, “é” can be treated equivalent to “e”.

Accented letters can be treated as distinct letters. For example, “Å”
in Danish is treated as a separate letter that sorts just after “Z”.

Unaccented letters that are considered distinct in one language can be
indistinct in another. For example, the letters “v” and “w” are two
different letters according to English. However, “v” and “w” are
considered variant forms of the same letter in Swedish.

A letter can be treated as if it were two letters. For example, in
Traditional German “ä” is compared as if it were “ae”.

Thai requires that the order of certain letters be reversed.

french requires that letters sorted with accents at the end of the
string be sorted ahead of accents in the beginning of the string. For
example, the word “côte” sorts before “coté” because the acute accent
on the final “e” is more significant than the circumflex on the “o”.

Sometimes lowercase letters sort before uppercase letters. The reverse
is required in other situations. For example, lowercase letters are
usually sorted before uppercase letters in English. Latvian letters
are the exact opposite.

Even in the same language, different applications might require
different sorting orders. For example, in German dictionaries, “öf”
would come before “of”. In phone books the situation is the exact
opposite.

Sorting orders can change over time due to government regulations or
new characters/scripts in Unicode.

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐