微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

在 perl regex 中操作任意 unicode 字符的最优雅的方法是什么?

如何解决在 perl regex 中操作任意 unicode 字符的最优雅的方法是什么?

考虑一个 unicode 字符,例如零宽度空格,它不在任何传统键盘上,也不属于任何人类书写系统。假设有人想使用 perl 从字符串中删除这个字符,或者想在 bash unix 中打印该字符。

这篇文章回顾了如何使用十六进制代码来做这些事情,然后问:是否有一种更直接(或优雅)的方式来做这些事情,也许使用字符的十进制表示?

“零宽度空间”http://www.unicode-symbol.com/u/200B.html 偶尔会出现在文本文件中。

例如,在 macbook pro 上,从 Messages.app,我将短信对话保存为 pdf。然后我在预览中打开 pdf,全部复制,并将剪贴板粘贴到文件 z 中。然后 less z 显示<U+200B> 的许多实例, 当我在 vim 中打开它时,它显示<200b>

同样,当我从 Contacts.app 的电话字段中复制并粘贴电话号码时,会显示“弹出定向格式”http://www.unicode-symbol.com/u/202C.html

通常我想从字符串中获取纯文本——人类真正想要阅读的任何内容包括任何语言的字母,如法语 é、希腊语 β、阿拉伯语、中文,当然还有制表符、空格和换行符---没有其他字符。

这是因为其他字符会导致问题。它们不仅在 less 和 vim 中分散注意力,而且它们似乎导致 LaTeX, pdflatex,抛出错误

可以删除“零长度空间”,如下所示:

  1. 转到角色的 url,如上所述
  2. 向下滚动到标题为“编码(Unicode 字符转换器)”的表格
  3. 在 UTF-8 行上,找到文本“E2 80 8B”
  4. 手动将其转换为 \xe2\x80\x8b
  5. perl -p -e 's/\xe2\x80\x8b//g;' myfile

使用相同的方法,可以打印字符:

printf '\xe2\x80\x8b'

但在同一行 在http://www.unicode-symbol.com/u/200B.html 在得到十六进制数的三元组的地方,也发现十进制表示是14844043。有没有办法使用这种十进制表示,或者其他一些比将三个十六进制代码粘贴在一起更直接的方法

解决方法

优雅在旁观者的眼中。

但是,-C 开关启用 Perl 的 unicode 处理,因此您可以利用它。

perl -CD -wpe 's/\x{200B}//g' file

此外,您可以使用 \N 来指定字符的全名:

perl -CD -wpe 's/\N{ZERO WIDTH SPACE}//g' file

有关-C 的详细说明,请参阅perlrun。 特别是,-CD 等价于 -Cio,意思是“使 UTF-8 成为输入和输出流的默认 PerlIO 层”。

,

要专门删除 U+200B 零宽度空间:

perl -CSD -pe's/\x{200B}//g'
perl -CSD -pe's/\N{U+200B}//g'
perl -CSD -pe's/\N{ZERO WIDTH SPACE}//g'

-CSD 处理编码/解码 STDIN/STDOUT/STDERR/ARGV。 (特别是 UTF-8。)

Specifying file to process to Perl one-liner


也就是说,听起来您想要一种更通用的方法来匹配“零宽度空间之类的字符”,而不仅仅是零宽度空间。但目前尚不清楚这意味着什么。以下是 ZERO WIDTH SPACE 的属性:

$ uniprops -a1 200B
U+200B ‹U+200B› \N{ZERO WIDTH SPACE}
\pC
\p{Cf}
All
Any
Assigned
C
Other
Case_Ignorable
CI
Cf
Format
Changes_When_NFKC_Casefolded
CWKCF
Common
Zyyy
Default_Ignorable_Code_Point
DI
General_Punctuation
InPunctuation
Graph
X_POSIX_Graph
Print
X_POSIX_Print
Unicode
Age=1.1
Age=V1_1
Bidi_Class=BN
Bidi_Class=Boundary_Neutral
BC=BN
Bidi_Paired_Bracket_Type=None
Block=General_Punctuation
BLK=Punctuation
Block=Punctuation
Canonical_Combining_Class=0
Canonical_Combining_Class=Not_Reordered
CCC=NR
Canonical_Combining_Class=NR
Script_Extensions=Common
Decomposition_Type=None
DT=None
East_Asian_Width=Neutral
Grapheme_Cluster_Break=CN
Grapheme_Cluster_Break=Control
GCB=CN
Hangul_Syllable_Type=NA
Hangul_Syllable_Type=Not_Applicable
HST=NA
Identifier_Status=Restricted
Identifier_Type=Default_Ignorable
Indic_Positional_Category=NA
InPC=NA
Indic_Syllabic_Category=Other
InSC=Other
Joining_Group=No_Joining_Group
JG=NoJoiningGroup
Joining_Type=T
Joining_Type=Transparent
JT=T
Line_Break=ZW
Line_Break=ZWSpace
LB=ZW
Numeric_Type=None
NT=None
Numeric_Value=NaN
NV=NaN
Present_In=1.1
IN=1.1
Present_In=2.0
IN=2.0
Present_In=V2_0
Present_In=2.1
IN=2.1
Present_In=V2_1
Present_In=3.0
IN=3.0
Present_In=V3_0
Present_In=3.1
IN=3.1
Present_In=V3_1
Present_In=3.2
IN=3.2
Present_In=V3_2
Present_In=4.0
IN=4.0
Present_In=V4_0
Present_In=4.1
IN=4.1
Present_In=V4_1
Present_In=5.0
IN=5.0
Present_In=V5_0
Present_In=5.1
IN=5.1
Present_In=V5_1
Present_In=5.2
IN=5.2
Present_In=V5_2
Present_In=6.0
IN=6.0
Present_In=V6_0
Present_In=6.1
IN=6.1
Present_In=V6_1
Present_In=6.2
IN=6.2
Present_In=V6_2
Present_In=6.3
IN=6.3
Present_In=V6_3
Present_In=7.0
IN=7.0
Present_In=V7_0
Present_In=8.0
IN=8.0
Present_In=V8_0
Present_In=9.0
IN=9.0
Present_In=V9_0
Present_In=10.0
IN=10.0
Present_In=V10_0
Present_In=11.0
IN=11.0
Present_In=V11_0
Present_In=12.0
IN=12.0
Present_In=V12_0
Present_In=12.1
IN=12.1
Present_In=V12_1
Present_In=13.0
IN=13.0
Present_In=V13_0
Script=Common
SC=Zyyy
Script=Zyyy
Scx=Zyyy
Script_Extensions=Zyyy
Sentence_Break=FO
Sentence_Break=Format
SB=FO
Vertical_Orientation=R
Vertical_Orientation=Rotated
Vo=R
Word_Break=Other
WB=XX
Word_Break=XX

前两个可能是感兴趣的两个。


\p{General_Category=Format} 又名 \p{Gc=Cf} 又名 \p{Format} 又名 \p{Cf}

perl -CSD -pe's/\p{Cf}//g'

此属性由以下 161 个代码点共享:

$ unichars -a '\p{Cf}' | cat
 ---- U+000AD SOFT HYPHEN
 ---- U+00600 ARABIC NUMBER SIGN
 ---- U+00601 ARABIC SIGN SANAH
 ---- U+00602 ARABIC FOOTNOTE MARKER
 ---- U+00603 ARABIC SIGN SAFHA
 ---- U+00604 ARABIC SIGN SAMVAT
 ---- U+00605 ARABIC NUMBER MARK ABOVE
 ---- U+0061C ARABIC LETTER MARK
 ---- U+006DD ARABIC END OF AYAH
 ---- U+0070F SYRIAC ABBREVIATION MARK
 ---- U+008E2 ARABIC DISPUTED END OF AYAH
 ---- U+0180E MONGOLIAN VOWEL SEPARATOR
 ---- U+0200B ZERO WIDTH SPACE
 ---- U+0200C ZERO WIDTH NON-JOINER
 ---- U+0200D ZERO WIDTH JOINER
 ---- U+0200E LEFT-TO-RIGHT MARK
 ---- U+0200F RIGHT-TO-LEFT MARK
 ---- U+0202A LEFT-TO-RIGHT EMBEDDING
 ---- U+0202B RIGHT-TO-LEFT EMBEDDING
 ---- U+0202C POP DIRECTIONAL FORMATTING
 ---- U+0202D LEFT-TO-RIGHT OVERRIDE
 ---- U+0202E RIGHT-TO-LEFT OVERRIDE
 ---- U+02060 WORD JOINER
 ---- U+02061 FUNCTION APPLICATION
 ---- U+02062 INVISIBLE TIMES
 ---- U+02063 INVISIBLE SEPARATOR
 ---- U+02064 INVISIBLE PLUS
 ---- U+02066 LEFT-TO-RIGHT ISOLATE
 ---- U+02067 RIGHT-TO-LEFT ISOLATE
 ---- U+02068 FIRST STRONG ISOLATE
 ---- U+02069 POP DIRECTIONAL ISOLATE
 ---- U+0206A INHIBIT SYMMETRIC SWAPPING
 ---- U+0206B ACTIVATE SYMMETRIC SWAPPING
 ---- U+0206C INHIBIT ARABIC FORM SHAPING
 ---- U+0206D ACTIVATE ARABIC FORM SHAPING
 ---- U+0206E NATIONAL DIGIT SHAPES
 ---- U+0206F NOMINAL DIGIT SHAPES
 ---- U+0FEFF ZERO WIDTH NO-BREAK SPACE
 ---- U+0FFF9 INTERLINEAR ANNOTATION ANCHOR
 ---- U+0FFFA INTERLINEAR ANNOTATION SEPARATOR
 ---- U+0FFFB INTERLINEAR ANNOTATION TERMINATOR
 ---- U+110BD KAITHI NUMBER SIGN
 ---- U+110CD KAITHI NUMBER SIGN ABOVE
 ---- U+13430 EGYPTIAN HIEROGLYPH VERTICAL JOINER
 ---- U+13431 EGYPTIAN HIEROGLYPH HORIZONTAL JOINER
 ---- U+13432 EGYPTIAN HIEROGLYPH INSERT AT TOP START
 ---- U+13433 EGYPTIAN HIEROGLYPH INSERT AT BOTTOM START
 ---- U+13434 EGYPTIAN HIEROGLYPH INSERT AT TOP END
 ---- U+13435 EGYPTIAN HIEROGLYPH INSERT AT BOTTOM END
 ---- U+13436 EGYPTIAN HIEROGLYPH OVERLAY MIDDLE
 ---- U+13437 EGYPTIAN HIEROGLYPH BEGIN SEGMENT
 ---- U+13438 EGYPTIAN HIEROGLYPH END SEGMENT
 ---- U+1BCA0 SHORTHAND FORMAT LETTER OVERLAP
 ---- U+1BCA1 SHORTHAND FORMAT CONTINUING OVERLAP
 ---- U+1BCA2 SHORTHAND FORMAT DOWN STEP
 ---- U+1BCA3 SHORTHAND FORMAT UP STEP
 ---- U+1D173 MUSICAL SYMBOL BEGIN BEAM
 ---- U+1D174 MUSICAL SYMBOL END BEAM
 ---- U+1D175 MUSICAL SYMBOL BEGIN TIE
 ---- U+1D176 MUSICAL SYMBOL END TIE
 ---- U+1D177 MUSICAL SYMBOL BEGIN SLUR
 ---- U+1D178 MUSICAL SYMBOL END SLUR
 ---- U+1D179 MUSICAL SYMBOL BEGIN PHRASE
 ---- U+1D17A MUSICAL SYMBOL END PHRASE
 ---- U+E0001 LANGUAGE TAG
 ---- U+E0020 TAG SPACE
 ---- U+E0021 TAG EXCLAMATION MARK
 ---- U+E0022 TAG QUOTATION MARK
 ---- U+E0023 TAG NUMBER SIGN
 ---- U+E0024 TAG DOLLAR SIGN
 ---- U+E0025 TAG PERCENT SIGN
 ---- U+E0026 TAG AMPERSAND
 ---- U+E0027 TAG APOSTROPHE
 ---- U+E0028 TAG LEFT PARENTHESIS
 ---- U+E0029 TAG RIGHT PARENTHESIS
 ---- U+E002A TAG ASTERISK
 ---- U+E002B TAG PLUS SIGN
 ---- U+E002C TAG COMMA
 ---- U+E002D TAG HYPHEN-MINUS
 ---- U+E002E TAG FULL STOP
 ---- U+E002F TAG SOLIDUS
 ---- U+E0030 TAG DIGIT ZERO
 ---- U+E0031 TAG DIGIT ONE
 ---- U+E0032 TAG DIGIT TWO
 ---- U+E0033 TAG DIGIT THREE
 ---- U+E0034 TAG DIGIT FOUR
 ---- U+E0035 TAG DIGIT FIVE
 ---- U+E0036 TAG DIGIT SIX
 ---- U+E0037 TAG DIGIT SEVEN
 ---- U+E0038 TAG DIGIT EIGHT
 ---- U+E0039 TAG DIGIT NINE
 ---- U+E003A TAG COLON
 ---- U+E003B TAG SEMICOLON
 ---- U+E003C TAG LESS-THAN SIGN
 ---- U+E003D TAG EQUALS SIGN
 ---- U+E003E TAG GREATER-THAN SIGN
 ---- U+E003F TAG QUESTION MARK
 ---- U+E0040 TAG COMMERCIAL AT
 ---- U+E0041 TAG LATIN CAPITAL LETTER A
 ---- U+E0042 TAG LATIN CAPITAL LETTER B
 ---- U+E0043 TAG LATIN CAPITAL LETTER C
 ---- U+E0044 TAG LATIN CAPITAL LETTER D
 ---- U+E0045 TAG LATIN CAPITAL LETTER E
 ---- U+E0046 TAG LATIN CAPITAL LETTER F
 ---- U+E0047 TAG LATIN CAPITAL LETTER G
 ---- U+E0048 TAG LATIN CAPITAL LETTER H
 ---- U+E0049 TAG LATIN CAPITAL LETTER I
 ---- U+E004A TAG LATIN CAPITAL LETTER J
 ---- U+E004B TAG LATIN CAPITAL LETTER K
 ---- U+E004C TAG LATIN CAPITAL LETTER L
 ---- U+E004D TAG LATIN CAPITAL LETTER M
 ---- U+E004E TAG LATIN CAPITAL LETTER N
 ---- U+E004F TAG LATIN CAPITAL LETTER O
 ---- U+E0050 TAG LATIN CAPITAL LETTER P
 ---- U+E0051 TAG LATIN CAPITAL LETTER Q
 ---- U+E0052 TAG LATIN CAPITAL LETTER R
 ---- U+E0053 TAG LATIN CAPITAL LETTER S
 ---- U+E0054 TAG LATIN CAPITAL LETTER T
 ---- U+E0055 TAG LATIN CAPITAL LETTER U
 ---- U+E0056 TAG LATIN CAPITAL LETTER V
 ---- U+E0057 TAG LATIN CAPITAL LETTER W
 ---- U+E0058 TAG LATIN CAPITAL LETTER X
 ---- U+E0059 TAG LATIN CAPITAL LETTER Y
 ---- U+E005A TAG LATIN CAPITAL LETTER Z
 ---- U+E005B TAG LEFT SQUARE BRACKET
 ---- U+E005C TAG REVERSE SOLIDUS
 ---- U+E005D TAG RIGHT SQUARE BRACKET
 ---- U+E005E TAG CIRCUMFLEX ACCENT
 ---- U+E005F TAG LOW LINE
 ---- U+E0060 TAG GRAVE ACCENT
 ---- U+E0061 TAG LATIN SMALL LETTER A
 ---- U+E0062 TAG LATIN SMALL LETTER B
 ---- U+E0063 TAG LATIN SMALL LETTER C
 ---- U+E0064 TAG LATIN SMALL LETTER D
 ---- U+E0065 TAG LATIN SMALL LETTER E
 ---- U+E0066 TAG LATIN SMALL LETTER F
 ---- U+E0067 TAG LATIN SMALL LETTER G
 ---- U+E0068 TAG LATIN SMALL LETTER H
 ---- U+E0069 TAG LATIN SMALL LETTER I
 ---- U+E006A TAG LATIN SMALL LETTER J
 ---- U+E006B TAG LATIN SMALL LETTER K
 ---- U+E006C TAG LATIN SMALL LETTER L
 ---- U+E006D TAG LATIN SMALL LETTER M
 ---- U+E006E TAG LATIN SMALL LETTER N
 ---- U+E006F TAG LATIN SMALL LETTER O
 ---- U+E0070 TAG LATIN SMALL LETTER P
 ---- U+E0071 TAG LATIN SMALL LETTER Q
 ---- U+E0072 TAG LATIN SMALL LETTER R
 ---- U+E0073 TAG LATIN SMALL LETTER S
 ---- U+E0074 TAG LATIN SMALL LETTER T
 ---- U+E0075 TAG LATIN SMALL LETTER U
 ---- U+E0076 TAG LATIN SMALL LETTER V
 ---- U+E0077 TAG LATIN SMALL LETTER W
 ---- U+E0078 TAG LATIN SMALL LETTER X
 ---- U+E0079 TAG LATIN SMALL LETTER Y
 ---- U+E007A TAG LATIN SMALL LETTER Z
 ---- U+E007B TAG LEFT CURLY BRACKET
 ---- U+E007C TAG VERTICAL LINE
 ---- U+E007D TAG RIGHT CURLY BRACKET
 ---- U+E007E TAG TILDE
 ---- U+E007F CANCEL TAG

\p{General_Category=Other} aka \p{Gc=C} aka \p{Other} aka \p{C} aka \pC

perl -CSD -pe's/\pC//g'

\p{General_Category=Other} (\pC) 包括:

  • \p{General_Category=Control} (\p{Cc}):65 个代码点
  • \p{General_Category=Format} (\p{Cf}):161 个代码点 [上面提到]
  • \p{General_Category=Private_Use} (\p{Co}):137,468 个代码点
  • \p{General_Category=Unassigned} (\p{Cn}):830,672 个代码点
  • \p{General_Category=Surrogate} (\p{Cs}):2,048 个代码点

在这 970,414 个中,以下是 226 个命名的(相当于 [\p{Cc}\p{Cf}]):

$ unichars -a '\pC' | cat
 ---- U+00000 NULL
 ---- U+00001 START OF HEADING
 ---- U+00002 START OF TEXT
 ---- U+00003 END OF TEXT
 ---- U+00004 END OF TRANSMISSION
 ---- U+00005 ENQUIRY
 ---- U+00006 ACKNOWLEDGE
 ---- U+00007 ALERT
 ---- U+00008 BACKSPACE
 ---- U+00009 CHARACTER TABULATION
 ---- U+0000A LINE FEED
 ---- U+0000B LINE TABULATION
 ---- U+0000C FORM FEED
 ---- U+0000D CARRIAGE RETURN
 ---- U+0000E SHIFT OUT
 ---- U+0000F SHIFT IN
 ---- U+00010 DATA LINK ESCAPE
 ---- U+00011 DEVICE CONTROL ONE
 ---- U+00012 DEVICE CONTROL TWO
 ---- U+00013 DEVICE CONTROL THREE
 ---- U+00014 DEVICE CONTROL FOUR
 ---- U+00015 NEGATIVE ACKNOWLEDGE
 ---- U+00016 SYNCHRONOUS IDLE
 ---- U+00017 END OF TRANSMISSION BLOCK
 ---- U+00018 CANCEL
 ---- U+00019 END OF MEDIUM
 ---- U+0001A SUBSTITUTE
 ---- U+0001B ESCAPE
 ---- U+0001C INFORMATION SEPARATOR FOUR
 ---- U+0001D INFORMATION SEPARATOR THREE
 ---- U+0001E INFORMATION SEPARATOR TWO
 ---- U+0001F INFORMATION SEPARATOR ONE
 ---- U+0007F DELETE
 ---- U+00080 PADDING CHARACTER
 ---- U+00081 HIGH OCTET PRESET
 ---- U+00082 BREAK PERMITTED HERE
 ---- U+00083 NO BREAK HERE
 ---- U+00084 INDEX
 ---- U+00085 NEXT LINE
 ---- U+00086 START OF SELECTED AREA
 ---- U+00087 END OF SELECTED AREA
 ---- U+00088 CHARACTER TABULATION SET
 ---- U+00089 CHARACTER TABULATION WITH JUSTIFICATION
 ---- U+0008A LINE TABULATION SET
 ---- U+0008B PARTIAL LINE FORWARD
 ---- U+0008C PARTIAL LINE BACKWARD
 ---- U+0008D REVERSE LINE FEED
 ---- U+0008E SINGLE SHIFT TWO
 ---- U+0008F SINGLE SHIFT THREE
 ---- U+00090 DEVICE CONTROL STRING
 ---- U+00091 PRIVATE USE ONE
 ---- U+00092 PRIVATE USE TWO
 ---- U+00093 SET TRANSMIT STATE
 ---- U+00094 CANCEL CHARACTER
 ---- U+00095 MESSAGE WAITING
 ---- U+00096 START OF GUARDED AREA
 ---- U+00097 END OF GUARDED AREA
 ---- U+00098 START OF STRING
 ---- U+00099 SINGLE GRAPHIC CHARACTER INTRODUCER
 ---- U+0009A SINGLE CHARACTER INTRODUCER
 ---- U+0009B CONTROL SEQUENCE INTRODUCER
 ---- U+0009C STRING TERMINATOR
 ---- U+0009D OPERATING SYSTEM COMMAND
 ---- U+0009E PRIVACY MESSAGE
 ---- U+0009F APPLICATION PROGRAM COMMAND
 ---- U+000AD SOFT HYPHEN
 ---- U+00600 ARABIC NUMBER SIGN
 ---- U+00601 ARABIC SIGN SANAH
 ---- U+00602 ARABIC FOOTNOTE MARKER
 ---- U+00603 ARABIC SIGN SAFHA
 ---- U+00604 ARABIC SIGN SAMVAT
 ---- U+00605 ARABIC NUMBER MARK ABOVE
 ---- U+0061C ARABIC LETTER MARK
 ---- U+006DD ARABIC END OF AYAH
 ---- U+0070F SYRIAC ABBREVIATION MARK
 ---- U+008E2 ARABIC DISPUTED END OF AYAH
 ---- U+0180E MONGOLIAN VOWEL SEPARATOR
 ---- U+0200B ZERO WIDTH SPACE
 ---- U+0200C ZERO WIDTH NON-JOINER
 ---- U+0200D ZERO WIDTH JOINER
 ---- U+0200E LEFT-TO-RIGHT MARK
 ---- U+0200F RIGHT-TO-LEFT MARK
 ---- U+0202A LEFT-TO-RIGHT EMBEDDING
 ---- U+0202B RIGHT-TO-LEFT EMBEDDING
 ---- U+0202C POP DIRECTIONAL FORMATTING
 ---- U+0202D LEFT-TO-RIGHT OVERRIDE
 ---- U+0202E RIGHT-TO-LEFT OVERRIDE
 ---- U+02060 WORD JOINER
 ---- U+02061 FUNCTION APPLICATION
 ---- U+02062 INVISIBLE TIMES
 ---- U+02063 INVISIBLE SEPARATOR
 ---- U+02064 INVISIBLE PLUS
 ---- U+02066 LEFT-TO-RIGHT ISOLATE
 ---- U+02067 RIGHT-TO-LEFT ISOLATE
 ---- U+02068 FIRST STRONG ISOLATE
 ---- U+02069 POP DIRECTIONAL ISOLATE
 ---- U+0206A INHIBIT SYMMETRIC SWAPPING
 ---- U+0206B ACTIVATE SYMMETRIC SWAPPING
 ---- U+0206C INHIBIT ARABIC FORM SHAPING
 ---- U+0206D ACTIVATE ARABIC FORM SHAPING
 ---- U+0206E NATIONAL DIGIT SHAPES
 ---- U+0206F NOMINAL DIGIT SHAPES
 ---- U+0FEFF ZERO WIDTH NO-BREAK SPACE
 ---- U+0FFF9 INTERLINEAR ANNOTATION ANCHOR
 ---- U+0FFFA INTERLINEAR ANNOTATION SEPARATOR
 ---- U+0FFFB INTERLINEAR ANNOTATION TERMINATOR
 ---- U+110BD KAITHI NUMBER SIGN
 ---- U+110CD KAITHI NUMBER SIGN ABOVE
 ---- U+13430 EGYPTIAN HIEROGLYPH VERTICAL JOINER
 ---- U+13431 EGYPTIAN HIEROGLYPH HORIZONTAL JOINER
 ---- U+13432 EGYPTIAN HIEROGLYPH INSERT AT TOP START
 ---- U+13433 EGYPTIAN HIEROGLYPH INSERT AT BOTTOM START
 ---- U+13434 EGYPTIAN HIEROGLYPH INSERT AT TOP END
 ---- U+13435 EGYPTIAN HIEROGLYPH INSERT AT BOTTOM END
 ---- U+13436 EGYPTIAN HIEROGLYPH OVERLAY MIDDLE
 ---- U+13437 EGYPTIAN HIEROGLYPH BEGIN SEGMENT
 ---- U+13438 EGYPTIAN HIEROGLYPH END SEGMENT
 ---- U+1BCA0 SHORTHAND FORMAT LETTER OVERLAP
 ---- U+1BCA1 SHORTHAND FORMAT CONTINUING OVERLAP
 ---- U+1BCA2 SHORTHAND FORMAT DOWN STEP
 ---- U+1BCA3 SHORTHAND FORMAT UP STEP
 ---- U+1D173 MUSICAL SYMBOL BEGIN BEAM
 ---- U+1D174 MUSICAL SYMBOL END BEAM
 ---- U+1D175 MUSICAL SYMBOL BEGIN TIE
 ---- U+1D176 MUSICAL SYMBOL END TIE
 ---- U+1D177 MUSICAL SYMBOL BEGIN SLUR
 ---- U+1D178 MUSICAL SYMBOL END SLUR
 ---- U+1D179 MUSICAL SYMBOL BEGIN PHRASE
 ---- U+1D17A MUSICAL SYMBOL END PHRASE
 ---- U+E0001 LANGUAGE TAG
 ---- U+E0020 TAG SPACE
 ---- U+E0021 TAG EXCLAMATION MARK
 ---- U+E0022 TAG QUOTATION MARK
 ---- U+E0023 TAG NUMBER SIGN
 ---- U+E0024 TAG DOLLAR SIGN
 ---- U+E0025 TAG PERCENT SIGN
 ---- U+E0026 TAG AMPERSAND
 ---- U+E0027 TAG APOSTROPHE
 ---- U+E0028 TAG LEFT PARENTHESIS
 ---- U+E0029 TAG RIGHT PARENTHESIS
 ---- U+E002A TAG ASTERISK
 ---- U+E002B TAG PLUS SIGN
 ---- U+E002C TAG COMMA
 ---- U+E002D TAG HYPHEN-MINUS
 ---- U+E002E TAG FULL STOP
 ---- U+E002F TAG SOLIDUS
 ---- U+E0030 TAG DIGIT ZERO
 ---- U+E0031 TAG DIGIT ONE
 ---- U+E0032 TAG DIGIT TWO
 ---- U+E0033 TAG DIGIT THREE
 ---- U+E0034 TAG DIGIT FOUR
 ---- U+E0035 TAG DIGIT FIVE
 ---- U+E0036 TAG DIGIT SIX
 ---- U+E0037 TAG DIGIT SEVEN
 ---- U+E0038 TAG DIGIT EIGHT
 ---- U+E0039 TAG DIGIT NINE
 ---- U+E003A TAG COLON
 ---- U+E003B TAG SEMICOLON
 ---- U+E003C TAG LESS-THAN SIGN
 ---- U+E003D TAG EQUALS SIGN
 ---- U+E003E TAG GREATER-THAN SIGN
 ---- U+E003F TAG QUESTION MARK
 ---- U+E0040 TAG COMMERCIAL AT
 ---- U+E0041 TAG LATIN CAPITAL LETTER A
 ---- U+E0042 TAG LATIN CAPITAL LETTER B
 ---- U+E0043 TAG LATIN CAPITAL LETTER C
 ---- U+E0044 TAG LATIN CAPITAL LETTER D
 ---- U+E0045 TAG LATIN CAPITAL LETTER E
 ---- U+E0046 TAG LATIN CAPITAL LETTER F
 ---- U+E0047 TAG LATIN CAPITAL LETTER G
 ---- U+E0048 TAG LATIN CAPITAL LETTER H
 ---- U+E0049 TAG LATIN CAPITAL LETTER I
 ---- U+E004A TAG LATIN CAPITAL LETTER J
 ---- U+E004B TAG LATIN CAPITAL LETTER K
 ---- U+E004C TAG LATIN CAPITAL LETTER L
 ---- U+E004D TAG LATIN CAPITAL LETTER M
 ---- U+E004E TAG LATIN CAPITAL LETTER N
 ---- U+E004F TAG LATIN CAPITAL LETTER O
 ---- U+E0050 TAG LATIN CAPITAL LETTER P
 ---- U+E0051 TAG LATIN CAPITAL LETTER Q
 ---- U+E0052 TAG LATIN CAPITAL LETTER R
 ---- U+E0053 TAG LATIN CAPITAL LETTER S
 ---- U+E0054 TAG LATIN CAPITAL LETTER T
 ---- U+E0055 TAG LATIN CAPITAL LETTER U
 ---- U+E0056 TAG LATIN CAPITAL LETTER V
 ---- U+E0057 TAG LATIN CAPITAL LETTER W
 ---- U+E0058 TAG LATIN CAPITAL LETTER X
 ---- U+E0059 TAG LATIN CAPITAL LETTER Y
 ---- U+E005A TAG LATIN CAPITAL LETTER Z
 ---- U+E005B TAG LEFT SQUARE BRACKET
 ---- U+E005C TAG REVERSE SOLIDUS
 ---- U+E005D TAG RIGHT SQUARE BRACKET
 ---- U+E005E TAG CIRCUMFLEX ACCENT
 ---- U+E005F TAG LOW LINE
 ---- U+E0060 TAG GRAVE ACCENT
 ---- U+E0061 TAG LATIN SMALL LETTER A
 ---- U+E0062 TAG LATIN SMALL LETTER B
 ---- U+E0063 TAG LATIN SMALL LETTER C
 ---- U+E0064 TAG LATIN SMALL LETTER D
 ---- U+E0065 TAG LATIN SMALL LETTER E
 ---- U+E0066 TAG LATIN SMALL LETTER F
 ---- U+E0067 TAG LATIN SMALL LETTER G
 ---- U+E0068 TAG LATIN SMALL LETTER H
 ---- U+E0069 TAG LATIN SMALL LETTER I
 ---- U+E006A TAG LATIN SMALL LETTER J
 ---- U+E006B TAG LATIN SMALL LETTER K
 ---- U+E006C TAG LATIN SMALL LETTER L
 ---- U+E006D TAG LATIN SMALL LETTER M
 ---- U+E006E TAG LATIN SMALL LETTER N
 ---- U+E006F TAG LATIN SMALL LETTER O
 ---- U+E0070 TAG LATIN SMALL LETTER P
 ---- U+E0071 TAG LATIN SMALL LETTER Q
 ---- U+E0072 TAG LATIN SMALL LETTER R
 ---- U+E0073 TAG LATIN SMALL LETTER S
 ---- U+E0074 TAG LATIN SMALL LETTER T
 ---- U+E0075 TAG LATIN SMALL LETTER U
 ---- U+E0076 TAG LATIN SMALL LETTER V
 ---- U+E0077 TAG LATIN SMALL LETTER W
 ---- U+E0078 TAG LATIN SMALL LETTER X
 ---- U+E0079 TAG LATIN SMALL LETTER Y
 ---- U+E007A TAG LATIN SMALL LETTER Z
 ---- U+E007B TAG LEFT CURLY BRACKET
 ---- U+E007C TAG VERTICAL LINE
 ---- U+E007D TAG RIGHT CURLY BRACKET
 ---- U+E007E TAG TILDE
 ---- U+E007F CANCEL TAG

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。