如何解决在 perl regex 中操作任意 unicode 字符的最优雅的方法是什么?
考虑一个 unicode 字符,例如零宽度空格,它不在任何传统键盘上,也不属于任何人类书写系统。假设有人想使用 perl 从字符串中删除这个字符,或者想在 bash unix 中打印该字符。
这篇文章回顾了如何使用十六进制代码来做这些事情,然后问:是否有一种更直接(或优雅)的方式来做这些事情,也许使用字符的十进制表示?
“零宽度空间”http://www.unicode-symbol.com/u/200B.html 偶尔会出现在文本文件中。
例如,在 macbook pro 上,从 Messages.app,我将短信对话保存为 pdf。然后我在预览中打开 pdf,全部复制,并将剪贴板粘贴到文件 z
中。然后 less z
显示了 <U+200B>
的许多实例,
当我在 vim
中打开它时,它显示为 <200b>
。
同样,当我从 Contacts.app 的电话字段中复制并粘贴电话号码时,会显示“弹出定向格式”http://www.unicode-symbol.com/u/202C.html。
通常我想从字符串中获取纯文本——人类真正想要阅读的任何内容,包括任何语言的字母,如法语 é、希腊语 β、阿拉伯语、中文,当然还有制表符、空格和换行符---没有其他字符。
这是因为其他字符会导致问题。它们不仅在 less 和 vim 中分散注意力,而且它们似乎导致 LaTeX, pdflatex,抛出错误。
可以删除“零长度空间”,如下所示:
- 转到角色的 url,如上所述
- 向下滚动到标题为“编码(Unicode 字符转换器)”的表格
- 在 UTF-8 行上,找到文本“E2 80 8B”
- 手动将其转换为
\xe2\x80\x8b
perl -p -e 's/\xe2\x80\x8b//g;' myfile
使用相同的方法,可以打印字符:
printf '\xe2\x80\x8b'
但在同一行
在http://www.unicode-symbol.com/u/200B.html
在得到十六进制数的三元组的地方,也发现十进制表示是14844043
。有没有办法使用这种十进制表示,或者其他一些比将三个十六进制代码粘贴在一起更直接的方法?
解决方法
优雅在旁观者的眼中。
但是,-C
开关启用 Perl 的 unicode 处理,因此您可以利用它。
perl -CD -wpe 's/\x{200B}//g' file
此外,您可以使用 \N
来指定字符的全名:
perl -CD -wpe 's/\N{ZERO WIDTH SPACE}//g' file
有关-C
的详细说明,请参阅perlrun。
特别是,-CD
等价于 -Cio
,意思是“使 UTF-8 成为输入和输出流的默认 PerlIO 层”。
要专门删除 U+200B 零宽度空间:
perl -CSD -pe's/\x{200B}//g'
perl -CSD -pe's/\N{U+200B}//g'
perl -CSD -pe's/\N{ZERO WIDTH SPACE}//g'
-CSD
处理编码/解码 STDIN/STDOUT/STDERR/ARGV。 (特别是 UTF-8。)
Specifying file to process to Perl one-liner。
也就是说,听起来您想要一种更通用的方法来匹配“零宽度空间之类的字符”,而不仅仅是零宽度空间。但目前尚不清楚这意味着什么。以下是 ZERO WIDTH SPACE 的属性:
$ uniprops -a1 200B
U+200B ‹U+200B› \N{ZERO WIDTH SPACE}
\pC
\p{Cf}
All
Any
Assigned
C
Other
Case_Ignorable
CI
Cf
Format
Changes_When_NFKC_Casefolded
CWKCF
Common
Zyyy
Default_Ignorable_Code_Point
DI
General_Punctuation
InPunctuation
Graph
X_POSIX_Graph
Print
X_POSIX_Print
Unicode
Age=1.1
Age=V1_1
Bidi_Class=BN
Bidi_Class=Boundary_Neutral
BC=BN
Bidi_Paired_Bracket_Type=None
Block=General_Punctuation
BLK=Punctuation
Block=Punctuation
Canonical_Combining_Class=0
Canonical_Combining_Class=Not_Reordered
CCC=NR
Canonical_Combining_Class=NR
Script_Extensions=Common
Decomposition_Type=None
DT=None
East_Asian_Width=Neutral
Grapheme_Cluster_Break=CN
Grapheme_Cluster_Break=Control
GCB=CN
Hangul_Syllable_Type=NA
Hangul_Syllable_Type=Not_Applicable
HST=NA
Identifier_Status=Restricted
Identifier_Type=Default_Ignorable
Indic_Positional_Category=NA
InPC=NA
Indic_Syllabic_Category=Other
InSC=Other
Joining_Group=No_Joining_Group
JG=NoJoiningGroup
Joining_Type=T
Joining_Type=Transparent
JT=T
Line_Break=ZW
Line_Break=ZWSpace
LB=ZW
Numeric_Type=None
NT=None
Numeric_Value=NaN
NV=NaN
Present_In=1.1
IN=1.1
Present_In=2.0
IN=2.0
Present_In=V2_0
Present_In=2.1
IN=2.1
Present_In=V2_1
Present_In=3.0
IN=3.0
Present_In=V3_0
Present_In=3.1
IN=3.1
Present_In=V3_1
Present_In=3.2
IN=3.2
Present_In=V3_2
Present_In=4.0
IN=4.0
Present_In=V4_0
Present_In=4.1
IN=4.1
Present_In=V4_1
Present_In=5.0
IN=5.0
Present_In=V5_0
Present_In=5.1
IN=5.1
Present_In=V5_1
Present_In=5.2
IN=5.2
Present_In=V5_2
Present_In=6.0
IN=6.0
Present_In=V6_0
Present_In=6.1
IN=6.1
Present_In=V6_1
Present_In=6.2
IN=6.2
Present_In=V6_2
Present_In=6.3
IN=6.3
Present_In=V6_3
Present_In=7.0
IN=7.0
Present_In=V7_0
Present_In=8.0
IN=8.0
Present_In=V8_0
Present_In=9.0
IN=9.0
Present_In=V9_0
Present_In=10.0
IN=10.0
Present_In=V10_0
Present_In=11.0
IN=11.0
Present_In=V11_0
Present_In=12.0
IN=12.0
Present_In=V12_0
Present_In=12.1
IN=12.1
Present_In=V12_1
Present_In=13.0
IN=13.0
Present_In=V13_0
Script=Common
SC=Zyyy
Script=Zyyy
Scx=Zyyy
Script_Extensions=Zyyy
Sentence_Break=FO
Sentence_Break=Format
SB=FO
Vertical_Orientation=R
Vertical_Orientation=Rotated
Vo=R
Word_Break=Other
WB=XX
Word_Break=XX
前两个可能是感兴趣的两个。
\p{General_Category=Format}
又名 \p{Gc=Cf}
又名 \p{Format}
又名 \p{Cf}
perl -CSD -pe's/\p{Cf}//g'
此属性由以下 161 个代码点共享:
$ unichars -a '\p{Cf}' | cat
---- U+000AD SOFT HYPHEN
---- U+00600 ARABIC NUMBER SIGN
---- U+00601 ARABIC SIGN SANAH
---- U+00602 ARABIC FOOTNOTE MARKER
---- U+00603 ARABIC SIGN SAFHA
---- U+00604 ARABIC SIGN SAMVAT
---- U+00605 ARABIC NUMBER MARK ABOVE
---- U+0061C ARABIC LETTER MARK
---- U+006DD ARABIC END OF AYAH
---- U+0070F SYRIAC ABBREVIATION MARK
---- U+008E2 ARABIC DISPUTED END OF AYAH
---- U+0180E MONGOLIAN VOWEL SEPARATOR
---- U+0200B ZERO WIDTH SPACE
---- U+0200C ZERO WIDTH NON-JOINER
---- U+0200D ZERO WIDTH JOINER
---- U+0200E LEFT-TO-RIGHT MARK
---- U+0200F RIGHT-TO-LEFT MARK
---- U+0202A LEFT-TO-RIGHT EMBEDDING
---- U+0202B RIGHT-TO-LEFT EMBEDDING
---- U+0202C POP DIRECTIONAL FORMATTING
---- U+0202D LEFT-TO-RIGHT OVERRIDE
---- U+0202E RIGHT-TO-LEFT OVERRIDE
---- U+02060 WORD JOINER
---- U+02061 FUNCTION APPLICATION
---- U+02062 INVISIBLE TIMES
---- U+02063 INVISIBLE SEPARATOR
---- U+02064 INVISIBLE PLUS
---- U+02066 LEFT-TO-RIGHT ISOLATE
---- U+02067 RIGHT-TO-LEFT ISOLATE
---- U+02068 FIRST STRONG ISOLATE
---- U+02069 POP DIRECTIONAL ISOLATE
---- U+0206A INHIBIT SYMMETRIC SWAPPING
---- U+0206B ACTIVATE SYMMETRIC SWAPPING
---- U+0206C INHIBIT ARABIC FORM SHAPING
---- U+0206D ACTIVATE ARABIC FORM SHAPING
---- U+0206E NATIONAL DIGIT SHAPES
---- U+0206F NOMINAL DIGIT SHAPES
---- U+0FEFF ZERO WIDTH NO-BREAK SPACE
---- U+0FFF9 INTERLINEAR ANNOTATION ANCHOR
---- U+0FFFA INTERLINEAR ANNOTATION SEPARATOR
---- U+0FFFB INTERLINEAR ANNOTATION TERMINATOR
---- U+110BD KAITHI NUMBER SIGN
---- U+110CD KAITHI NUMBER SIGN ABOVE
---- U+13430 EGYPTIAN HIEROGLYPH VERTICAL JOINER
---- U+13431 EGYPTIAN HIEROGLYPH HORIZONTAL JOINER
---- U+13432 EGYPTIAN HIEROGLYPH INSERT AT TOP START
---- U+13433 EGYPTIAN HIEROGLYPH INSERT AT BOTTOM START
---- U+13434 EGYPTIAN HIEROGLYPH INSERT AT TOP END
---- U+13435 EGYPTIAN HIEROGLYPH INSERT AT BOTTOM END
---- U+13436 EGYPTIAN HIEROGLYPH OVERLAY MIDDLE
---- U+13437 EGYPTIAN HIEROGLYPH BEGIN SEGMENT
---- U+13438 EGYPTIAN HIEROGLYPH END SEGMENT
---- U+1BCA0 SHORTHAND FORMAT LETTER OVERLAP
---- U+1BCA1 SHORTHAND FORMAT CONTINUING OVERLAP
---- U+1BCA2 SHORTHAND FORMAT DOWN STEP
---- U+1BCA3 SHORTHAND FORMAT UP STEP
---- U+1D173 MUSICAL SYMBOL BEGIN BEAM
---- U+1D174 MUSICAL SYMBOL END BEAM
---- U+1D175 MUSICAL SYMBOL BEGIN TIE
---- U+1D176 MUSICAL SYMBOL END TIE
---- U+1D177 MUSICAL SYMBOL BEGIN SLUR
---- U+1D178 MUSICAL SYMBOL END SLUR
---- U+1D179 MUSICAL SYMBOL BEGIN PHRASE
---- U+1D17A MUSICAL SYMBOL END PHRASE
---- U+E0001 LANGUAGE TAG
---- U+E0020 TAG SPACE
---- U+E0021 TAG EXCLAMATION MARK
---- U+E0022 TAG QUOTATION MARK
---- U+E0023 TAG NUMBER SIGN
---- U+E0024 TAG DOLLAR SIGN
---- U+E0025 TAG PERCENT SIGN
---- U+E0026 TAG AMPERSAND
---- U+E0027 TAG APOSTROPHE
---- U+E0028 TAG LEFT PARENTHESIS
---- U+E0029 TAG RIGHT PARENTHESIS
---- U+E002A TAG ASTERISK
---- U+E002B TAG PLUS SIGN
---- U+E002C TAG COMMA
---- U+E002D TAG HYPHEN-MINUS
---- U+E002E TAG FULL STOP
---- U+E002F TAG SOLIDUS
---- U+E0030 TAG DIGIT ZERO
---- U+E0031 TAG DIGIT ONE
---- U+E0032 TAG DIGIT TWO
---- U+E0033 TAG DIGIT THREE
---- U+E0034 TAG DIGIT FOUR
---- U+E0035 TAG DIGIT FIVE
---- U+E0036 TAG DIGIT SIX
---- U+E0037 TAG DIGIT SEVEN
---- U+E0038 TAG DIGIT EIGHT
---- U+E0039 TAG DIGIT NINE
---- U+E003A TAG COLON
---- U+E003B TAG SEMICOLON
---- U+E003C TAG LESS-THAN SIGN
---- U+E003D TAG EQUALS SIGN
---- U+E003E TAG GREATER-THAN SIGN
---- U+E003F TAG QUESTION MARK
---- U+E0040 TAG COMMERCIAL AT
---- U+E0041 TAG LATIN CAPITAL LETTER A
---- U+E0042 TAG LATIN CAPITAL LETTER B
---- U+E0043 TAG LATIN CAPITAL LETTER C
---- U+E0044 TAG LATIN CAPITAL LETTER D
---- U+E0045 TAG LATIN CAPITAL LETTER E
---- U+E0046 TAG LATIN CAPITAL LETTER F
---- U+E0047 TAG LATIN CAPITAL LETTER G
---- U+E0048 TAG LATIN CAPITAL LETTER H
---- U+E0049 TAG LATIN CAPITAL LETTER I
---- U+E004A TAG LATIN CAPITAL LETTER J
---- U+E004B TAG LATIN CAPITAL LETTER K
---- U+E004C TAG LATIN CAPITAL LETTER L
---- U+E004D TAG LATIN CAPITAL LETTER M
---- U+E004E TAG LATIN CAPITAL LETTER N
---- U+E004F TAG LATIN CAPITAL LETTER O
---- U+E0050 TAG LATIN CAPITAL LETTER P
---- U+E0051 TAG LATIN CAPITAL LETTER Q
---- U+E0052 TAG LATIN CAPITAL LETTER R
---- U+E0053 TAG LATIN CAPITAL LETTER S
---- U+E0054 TAG LATIN CAPITAL LETTER T
---- U+E0055 TAG LATIN CAPITAL LETTER U
---- U+E0056 TAG LATIN CAPITAL LETTER V
---- U+E0057 TAG LATIN CAPITAL LETTER W
---- U+E0058 TAG LATIN CAPITAL LETTER X
---- U+E0059 TAG LATIN CAPITAL LETTER Y
---- U+E005A TAG LATIN CAPITAL LETTER Z
---- U+E005B TAG LEFT SQUARE BRACKET
---- U+E005C TAG REVERSE SOLIDUS
---- U+E005D TAG RIGHT SQUARE BRACKET
---- U+E005E TAG CIRCUMFLEX ACCENT
---- U+E005F TAG LOW LINE
---- U+E0060 TAG GRAVE ACCENT
---- U+E0061 TAG LATIN SMALL LETTER A
---- U+E0062 TAG LATIN SMALL LETTER B
---- U+E0063 TAG LATIN SMALL LETTER C
---- U+E0064 TAG LATIN SMALL LETTER D
---- U+E0065 TAG LATIN SMALL LETTER E
---- U+E0066 TAG LATIN SMALL LETTER F
---- U+E0067 TAG LATIN SMALL LETTER G
---- U+E0068 TAG LATIN SMALL LETTER H
---- U+E0069 TAG LATIN SMALL LETTER I
---- U+E006A TAG LATIN SMALL LETTER J
---- U+E006B TAG LATIN SMALL LETTER K
---- U+E006C TAG LATIN SMALL LETTER L
---- U+E006D TAG LATIN SMALL LETTER M
---- U+E006E TAG LATIN SMALL LETTER N
---- U+E006F TAG LATIN SMALL LETTER O
---- U+E0070 TAG LATIN SMALL LETTER P
---- U+E0071 TAG LATIN SMALL LETTER Q
---- U+E0072 TAG LATIN SMALL LETTER R
---- U+E0073 TAG LATIN SMALL LETTER S
---- U+E0074 TAG LATIN SMALL LETTER T
---- U+E0075 TAG LATIN SMALL LETTER U
---- U+E0076 TAG LATIN SMALL LETTER V
---- U+E0077 TAG LATIN SMALL LETTER W
---- U+E0078 TAG LATIN SMALL LETTER X
---- U+E0079 TAG LATIN SMALL LETTER Y
---- U+E007A TAG LATIN SMALL LETTER Z
---- U+E007B TAG LEFT CURLY BRACKET
---- U+E007C TAG VERTICAL LINE
---- U+E007D TAG RIGHT CURLY BRACKET
---- U+E007E TAG TILDE
---- U+E007F CANCEL TAG
\p{General_Category=Other}
aka \p{Gc=C}
aka \p{Other}
aka \p{C}
aka \pC
perl -CSD -pe's/\pC//g'
\p{General_Category=Other}
(\pC
) 包括:
-
\p{General_Category=Control}
(\p{Cc}
):65 个代码点 -
\p{General_Category=Format}
(\p{Cf}
):161 个代码点 [上面提到] -
\p{General_Category=Private_Use}
(\p{Co}
):137,468 个代码点 -
\p{General_Category=Unassigned}
(\p{Cn}
):830,672 个代码点 -
\p{General_Category=Surrogate}
(\p{Cs}
):2,048 个代码点
在这 970,414 个中,以下是 226 个命名的(相当于 [\p{Cc}\p{Cf}]
):
$ unichars -a '\pC' | cat
---- U+00000 NULL
---- U+00001 START OF HEADING
---- U+00002 START OF TEXT
---- U+00003 END OF TEXT
---- U+00004 END OF TRANSMISSION
---- U+00005 ENQUIRY
---- U+00006 ACKNOWLEDGE
---- U+00007 ALERT
---- U+00008 BACKSPACE
---- U+00009 CHARACTER TABULATION
---- U+0000A LINE FEED
---- U+0000B LINE TABULATION
---- U+0000C FORM FEED
---- U+0000D CARRIAGE RETURN
---- U+0000E SHIFT OUT
---- U+0000F SHIFT IN
---- U+00010 DATA LINK ESCAPE
---- U+00011 DEVICE CONTROL ONE
---- U+00012 DEVICE CONTROL TWO
---- U+00013 DEVICE CONTROL THREE
---- U+00014 DEVICE CONTROL FOUR
---- U+00015 NEGATIVE ACKNOWLEDGE
---- U+00016 SYNCHRONOUS IDLE
---- U+00017 END OF TRANSMISSION BLOCK
---- U+00018 CANCEL
---- U+00019 END OF MEDIUM
---- U+0001A SUBSTITUTE
---- U+0001B ESCAPE
---- U+0001C INFORMATION SEPARATOR FOUR
---- U+0001D INFORMATION SEPARATOR THREE
---- U+0001E INFORMATION SEPARATOR TWO
---- U+0001F INFORMATION SEPARATOR ONE
---- U+0007F DELETE
---- U+00080 PADDING CHARACTER
---- U+00081 HIGH OCTET PRESET
---- U+00082 BREAK PERMITTED HERE
---- U+00083 NO BREAK HERE
---- U+00084 INDEX
---- U+00085 NEXT LINE
---- U+00086 START OF SELECTED AREA
---- U+00087 END OF SELECTED AREA
---- U+00088 CHARACTER TABULATION SET
---- U+00089 CHARACTER TABULATION WITH JUSTIFICATION
---- U+0008A LINE TABULATION SET
---- U+0008B PARTIAL LINE FORWARD
---- U+0008C PARTIAL LINE BACKWARD
---- U+0008D REVERSE LINE FEED
---- U+0008E SINGLE SHIFT TWO
---- U+0008F SINGLE SHIFT THREE
---- U+00090 DEVICE CONTROL STRING
---- U+00091 PRIVATE USE ONE
---- U+00092 PRIVATE USE TWO
---- U+00093 SET TRANSMIT STATE
---- U+00094 CANCEL CHARACTER
---- U+00095 MESSAGE WAITING
---- U+00096 START OF GUARDED AREA
---- U+00097 END OF GUARDED AREA
---- U+00098 START OF STRING
---- U+00099 SINGLE GRAPHIC CHARACTER INTRODUCER
---- U+0009A SINGLE CHARACTER INTRODUCER
---- U+0009B CONTROL SEQUENCE INTRODUCER
---- U+0009C STRING TERMINATOR
---- U+0009D OPERATING SYSTEM COMMAND
---- U+0009E PRIVACY MESSAGE
---- U+0009F APPLICATION PROGRAM COMMAND
---- U+000AD SOFT HYPHEN
---- U+00600 ARABIC NUMBER SIGN
---- U+00601 ARABIC SIGN SANAH
---- U+00602 ARABIC FOOTNOTE MARKER
---- U+00603 ARABIC SIGN SAFHA
---- U+00604 ARABIC SIGN SAMVAT
---- U+00605 ARABIC NUMBER MARK ABOVE
---- U+0061C ARABIC LETTER MARK
---- U+006DD ARABIC END OF AYAH
---- U+0070F SYRIAC ABBREVIATION MARK
---- U+008E2 ARABIC DISPUTED END OF AYAH
---- U+0180E MONGOLIAN VOWEL SEPARATOR
---- U+0200B ZERO WIDTH SPACE
---- U+0200C ZERO WIDTH NON-JOINER
---- U+0200D ZERO WIDTH JOINER
---- U+0200E LEFT-TO-RIGHT MARK
---- U+0200F RIGHT-TO-LEFT MARK
---- U+0202A LEFT-TO-RIGHT EMBEDDING
---- U+0202B RIGHT-TO-LEFT EMBEDDING
---- U+0202C POP DIRECTIONAL FORMATTING
---- U+0202D LEFT-TO-RIGHT OVERRIDE
---- U+0202E RIGHT-TO-LEFT OVERRIDE
---- U+02060 WORD JOINER
---- U+02061 FUNCTION APPLICATION
---- U+02062 INVISIBLE TIMES
---- U+02063 INVISIBLE SEPARATOR
---- U+02064 INVISIBLE PLUS
---- U+02066 LEFT-TO-RIGHT ISOLATE
---- U+02067 RIGHT-TO-LEFT ISOLATE
---- U+02068 FIRST STRONG ISOLATE
---- U+02069 POP DIRECTIONAL ISOLATE
---- U+0206A INHIBIT SYMMETRIC SWAPPING
---- U+0206B ACTIVATE SYMMETRIC SWAPPING
---- U+0206C INHIBIT ARABIC FORM SHAPING
---- U+0206D ACTIVATE ARABIC FORM SHAPING
---- U+0206E NATIONAL DIGIT SHAPES
---- U+0206F NOMINAL DIGIT SHAPES
---- U+0FEFF ZERO WIDTH NO-BREAK SPACE
---- U+0FFF9 INTERLINEAR ANNOTATION ANCHOR
---- U+0FFFA INTERLINEAR ANNOTATION SEPARATOR
---- U+0FFFB INTERLINEAR ANNOTATION TERMINATOR
---- U+110BD KAITHI NUMBER SIGN
---- U+110CD KAITHI NUMBER SIGN ABOVE
---- U+13430 EGYPTIAN HIEROGLYPH VERTICAL JOINER
---- U+13431 EGYPTIAN HIEROGLYPH HORIZONTAL JOINER
---- U+13432 EGYPTIAN HIEROGLYPH INSERT AT TOP START
---- U+13433 EGYPTIAN HIEROGLYPH INSERT AT BOTTOM START
---- U+13434 EGYPTIAN HIEROGLYPH INSERT AT TOP END
---- U+13435 EGYPTIAN HIEROGLYPH INSERT AT BOTTOM END
---- U+13436 EGYPTIAN HIEROGLYPH OVERLAY MIDDLE
---- U+13437 EGYPTIAN HIEROGLYPH BEGIN SEGMENT
---- U+13438 EGYPTIAN HIEROGLYPH END SEGMENT
---- U+1BCA0 SHORTHAND FORMAT LETTER OVERLAP
---- U+1BCA1 SHORTHAND FORMAT CONTINUING OVERLAP
---- U+1BCA2 SHORTHAND FORMAT DOWN STEP
---- U+1BCA3 SHORTHAND FORMAT UP STEP
---- U+1D173 MUSICAL SYMBOL BEGIN BEAM
---- U+1D174 MUSICAL SYMBOL END BEAM
---- U+1D175 MUSICAL SYMBOL BEGIN TIE
---- U+1D176 MUSICAL SYMBOL END TIE
---- U+1D177 MUSICAL SYMBOL BEGIN SLUR
---- U+1D178 MUSICAL SYMBOL END SLUR
---- U+1D179 MUSICAL SYMBOL BEGIN PHRASE
---- U+1D17A MUSICAL SYMBOL END PHRASE
---- U+E0001 LANGUAGE TAG
---- U+E0020 TAG SPACE
---- U+E0021 TAG EXCLAMATION MARK
---- U+E0022 TAG QUOTATION MARK
---- U+E0023 TAG NUMBER SIGN
---- U+E0024 TAG DOLLAR SIGN
---- U+E0025 TAG PERCENT SIGN
---- U+E0026 TAG AMPERSAND
---- U+E0027 TAG APOSTROPHE
---- U+E0028 TAG LEFT PARENTHESIS
---- U+E0029 TAG RIGHT PARENTHESIS
---- U+E002A TAG ASTERISK
---- U+E002B TAG PLUS SIGN
---- U+E002C TAG COMMA
---- U+E002D TAG HYPHEN-MINUS
---- U+E002E TAG FULL STOP
---- U+E002F TAG SOLIDUS
---- U+E0030 TAG DIGIT ZERO
---- U+E0031 TAG DIGIT ONE
---- U+E0032 TAG DIGIT TWO
---- U+E0033 TAG DIGIT THREE
---- U+E0034 TAG DIGIT FOUR
---- U+E0035 TAG DIGIT FIVE
---- U+E0036 TAG DIGIT SIX
---- U+E0037 TAG DIGIT SEVEN
---- U+E0038 TAG DIGIT EIGHT
---- U+E0039 TAG DIGIT NINE
---- U+E003A TAG COLON
---- U+E003B TAG SEMICOLON
---- U+E003C TAG LESS-THAN SIGN
---- U+E003D TAG EQUALS SIGN
---- U+E003E TAG GREATER-THAN SIGN
---- U+E003F TAG QUESTION MARK
---- U+E0040 TAG COMMERCIAL AT
---- U+E0041 TAG LATIN CAPITAL LETTER A
---- U+E0042 TAG LATIN CAPITAL LETTER B
---- U+E0043 TAG LATIN CAPITAL LETTER C
---- U+E0044 TAG LATIN CAPITAL LETTER D
---- U+E0045 TAG LATIN CAPITAL LETTER E
---- U+E0046 TAG LATIN CAPITAL LETTER F
---- U+E0047 TAG LATIN CAPITAL LETTER G
---- U+E0048 TAG LATIN CAPITAL LETTER H
---- U+E0049 TAG LATIN CAPITAL LETTER I
---- U+E004A TAG LATIN CAPITAL LETTER J
---- U+E004B TAG LATIN CAPITAL LETTER K
---- U+E004C TAG LATIN CAPITAL LETTER L
---- U+E004D TAG LATIN CAPITAL LETTER M
---- U+E004E TAG LATIN CAPITAL LETTER N
---- U+E004F TAG LATIN CAPITAL LETTER O
---- U+E0050 TAG LATIN CAPITAL LETTER P
---- U+E0051 TAG LATIN CAPITAL LETTER Q
---- U+E0052 TAG LATIN CAPITAL LETTER R
---- U+E0053 TAG LATIN CAPITAL LETTER S
---- U+E0054 TAG LATIN CAPITAL LETTER T
---- U+E0055 TAG LATIN CAPITAL LETTER U
---- U+E0056 TAG LATIN CAPITAL LETTER V
---- U+E0057 TAG LATIN CAPITAL LETTER W
---- U+E0058 TAG LATIN CAPITAL LETTER X
---- U+E0059 TAG LATIN CAPITAL LETTER Y
---- U+E005A TAG LATIN CAPITAL LETTER Z
---- U+E005B TAG LEFT SQUARE BRACKET
---- U+E005C TAG REVERSE SOLIDUS
---- U+E005D TAG RIGHT SQUARE BRACKET
---- U+E005E TAG CIRCUMFLEX ACCENT
---- U+E005F TAG LOW LINE
---- U+E0060 TAG GRAVE ACCENT
---- U+E0061 TAG LATIN SMALL LETTER A
---- U+E0062 TAG LATIN SMALL LETTER B
---- U+E0063 TAG LATIN SMALL LETTER C
---- U+E0064 TAG LATIN SMALL LETTER D
---- U+E0065 TAG LATIN SMALL LETTER E
---- U+E0066 TAG LATIN SMALL LETTER F
---- U+E0067 TAG LATIN SMALL LETTER G
---- U+E0068 TAG LATIN SMALL LETTER H
---- U+E0069 TAG LATIN SMALL LETTER I
---- U+E006A TAG LATIN SMALL LETTER J
---- U+E006B TAG LATIN SMALL LETTER K
---- U+E006C TAG LATIN SMALL LETTER L
---- U+E006D TAG LATIN SMALL LETTER M
---- U+E006E TAG LATIN SMALL LETTER N
---- U+E006F TAG LATIN SMALL LETTER O
---- U+E0070 TAG LATIN SMALL LETTER P
---- U+E0071 TAG LATIN SMALL LETTER Q
---- U+E0072 TAG LATIN SMALL LETTER R
---- U+E0073 TAG LATIN SMALL LETTER S
---- U+E0074 TAG LATIN SMALL LETTER T
---- U+E0075 TAG LATIN SMALL LETTER U
---- U+E0076 TAG LATIN SMALL LETTER V
---- U+E0077 TAG LATIN SMALL LETTER W
---- U+E0078 TAG LATIN SMALL LETTER X
---- U+E0079 TAG LATIN SMALL LETTER Y
---- U+E007A TAG LATIN SMALL LETTER Z
---- U+E007B TAG LEFT CURLY BRACKET
---- U+E007C TAG VERTICAL LINE
---- U+E007D TAG RIGHT CURLY BRACKET
---- U+E007E TAG TILDE
---- U+E007F CANCEL TAG
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。