微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

NSStringEncoding

转载自  may2150209
最终编辑  zhufeng7777777

读取任意编码的文件.

900 views, Cocoa,by Allen Dang.

今天在尝试抓取起点中文首页的时候遇到了一个问题 — 如果编码没有用对的话是没办法读取任何东西的.
这也算是C#用的太多养成的坏习惯,以前基本没怎么考虑过编码问题. 应该说,C#里面就算编码错了,也能读进来东西,只是一片乱码而已. Cocoa里面就狠了点,直接抛异常了.
下面是刚开始写的一段代码,把起点中文网的主页下载到一个字符串中.

1
2
3
4
5
6
7
8
9
10
11
12
NSURL *url = [[NSURL alloc] initWithString:@"http://www.cmfu.com"];
 NSError *error;
 Nsstring *xml = [Nsstring stringWithContentsOfURL:url encoding:NSUTF8StringEncoding error:&error];
 
 if(xml == nil)
 {
  NSLog(@"Error reading url at %@",[error localizedFailureReason]);
 }
 else
 {
  [result setString:xml];
 }

死活下载失败,错误信息就是编码不对. 好吧,我打开了帮助查看了下所有的编码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
enum {
   NSASCIIStringEncoding = 1,NSNEXTSTEPStringEncoding = 2,NSJapaneseEUCStringEncoding = 3,NSUTF8StringEncoding = 4,NSISOlatin1StringEncoding = 5,NSSymbolStringEncoding = 6,NSNonLossyASCIIStringEncoding = 7,NSShiftJIsstringEncoding = 8,NSISOLatin2StringEncoding = 9,NSUnicodeStringEncoding = 10,NSWindowsCP1251StringEncoding = 11,NSWindowsCP1252StringEncoding = 12,NSWindowsCP1253StringEncoding = 13,NSWindowsCP1254StringEncoding = 14,NSWindowsCP1250StringEncoding = 15,NSISO2022JPStringEncoding = 21,NSMacOSRomanStringEncoding = 30,nsproprietaryStringEncoding = 65536
};

一个一个的试,居然全都不行! 崩溃了,这都什么年代了,难道Cocoa还不支持中文? 不可能啊. 估计是上面那份文档里面只是列出了最长用的几种编码(这里是苹果认为最长用的,可见对于中国基本是无视了,鄙视下!),我就写了下面这段代码输出了所有支持的编码:

1
2
3
4
5
6
7
8
9
const nsstringencoding *encodings = [Nsstring availableStringEncodings];
 NSMutableString *str = [[NSMutableString alloc] init];
 nsstringencoding encoding;
 while ((encoding = *encodings++) != 0)
 {
         [str appendFormat: @"%@ === %in",[Nsstring localizednameOfStringEncoding:encoding],encoding];
 }
 
 [result setString: str];

好家伙,果然被我猜中了,下面就是所有支持的编码列表

 

Western (Mac OS Roman) === 30
Japanese (Mac OS) === -2147483647
Traditional Chinese (Mac OS) === -2147483646
Korean (Mac OS) === -2147483645
arabic (Mac OS) === -2147483644
Hebrew (Mac OS) === -2147483643
Greek (Mac OS) === -2147483642
Cyrillic (Mac OS) === -2147483641
Devanagari (Mac OS) === -2147483639
Gurmukhi (Mac OS) === -2147483638
Gujarati (Mac OS) === -2147483637
Thai (Mac OS) === -2147483627
Simplified Chinese (Mac OS) === -2147483623
Tibetan (Mac OS) === -2147483622
Central European (Mac OS) === -2147483619
Symbol (Mac OS) === 6
Dingbats (Mac OS) === -2147483614
Turkish (Mac OS) === -2147483613
Croatian (Mac OS) === -2147483612
Icelandic (Mac OS) === -2147483611
Romanian (Mac OS) === -2147483610
Celtic (Mac OS) === -2147483609
Gaelic (Mac OS) === -2147483608
Keyboard Symbols (Mac OS) === -2147483607
Farsi (Mac OS) === -2147483508
Cyrillic (Mac OS Ukrainian) === -2147483496
Inuit (Mac OS) === -2147483412
Unicode (UTF-32LE) === -1677721344
Unicode (UTF-8) === 4
Unicode (UTF-16) === 10
Unicode (UTF-16BE) === -1879047936
Unicode (UTF-16LE) === -1811939072
Unicode (UTF-32) === -1946156800
Unicode (UTF-32BE) === -1744830208
Western (ISO Latin 1) === 5
Central European (ISO Latin 2) === 9
Western (ISO Latin 3) === -2147483133
Central European (ISO Latin 4) === -2147483132
Cyrillic (ISO 8859-5) === -2147483131
arabic (ISO 8859-6) === -2147483130
Greek (ISO 8859-7) === -2147483129
Hebrew (ISO 8859-8) === -2147483128
Turkish (ISO Latin 5) === -2147483127
nordic (ISO Latin 6) === -2147483126
Thai (ISO 8859-11) === -2147483125
Baltic Rim (ISO Latin 7) === -2147483123
Celtic (ISO Latin 

8) === -2147483122

Western (ISO Latin 9) === -2147483121
Romanian (ISO Latin 10) === -2147483120
Latin-US (DOS) === -2147482624
Greek (DOS) === -2147482619
Baltic Rim (DOS) === -2147482618
Western (DOS Latin 1) === -2147482608
Greek (DOS Greek 1) === -2147482607
Central European (DOS Latin 2) === -2147482606
Cyrillic (DOS) === -2147482605
Turkish (DOS) === -2147482604
Portuguese (DOS) === -2147482603
Icelandic (DOS) === -2147482602
Hebrew (DOS) === -2147482601
Canadian french (DOS) === -2147482600
arabic (DOS) === -2147482599
nordic (DOS) === -2147482598
Cyrillic (DOS) === -2147482597
Greek (DOS Greek 2) === -2147482596
Thai (Windows,DOS) === -2147482595
Japanese (Windows,DOS) === 8
Simplified Chinese (Windows,DOS) === -2147482591
Korean (Windows,DOS) === -2147482590
Traditional Chinese (Windows,DOS) === -2147482589
Western (Windows Latin 1) === 12
Central European (Windows Latin 2) === 15
Cyrillic (Windows) === 11
Greek (Windows) === 13
Turkish (Windows Latin 5) === 14
Hebrew (Windows) === -2147482363
arabic (Windows) === -2147482362
Baltic Rim (Windows) === -2147482361
Vietnamese (Windows) === -2147482360
Western (ASCII) === 1
Japanese (Shift JIS X0213) === -2147482072
Chinese (GBK) === -2147482063
Chinese (GB 18030) === -2147482062
Japanese (ISO 2022-JP) === 21
Korean (ISO 2022-KR) === -2147481536
Japanese (EUC) === 3
Simplified Chinese (EUC) === -2147481296
Traditional Chinese (EUC) === -2147481295
Korean (EUC) === -2147481280
Japanese (Shift JIS) === -2147481087
Cyrillic (KOI8-R) === -2147481086
Traditional Chinese (Big 5) === -2147481085
Western (Mac Mail) === -2147481084
Simplified Chinese (HZ GB 2312) === -2147481083
Traditional Chinese (Big 5 HKSCS) === -2147481082
Ukrainian (KOI8-U) === -2147481080
Traditional Chinese (Big 5-E) === -2147481079
Western (NextStep) === 2
Non-lossy ASCII === 7
Western (EBCDIC Latin 1) === -2147480574

终于看到了熟悉的 GBK 编码,对应的代码是 -2147482063. Ok,更改一下最开始的代码

1
2
3
4
5
6
7
8
9
10
11
12
13
NSURL *url = [[NSURL alloc] initWithString:@"http://www.cmfu.com"];
 NSError *error;
 nsstringencoding encoder;
 Nsstring *xml = [Nsstring stringWithContentsOfURL:url encoding:encoder=-2147482063 error:&error];
 
 if(xml == nil)
 {
  NSLog(@"Error reading url at %@",[error localizedFailureReason]);
 }
 else
 {
  [result setString:xml];
 }

终于搞定了! 看到熟悉的中文真是激动了.

注:转载的

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐