расширенный поиск

Книга: Unicode Transformation Formats: Utf-8, Utf-16|ucs-2, Utf-Ebcdic, Comparison of Unicode Encodings, Utf-7, Gb 18030, Utf-1, Utf-9 and Utf-18

Товар № 10214177
Вес: 0.110 кг.
Год издания: 2010
Страниц: 68 Переплет: Мягкая обложка
Товар отсутствует
Узнать о поступлении

Purchase includes free access to book updates online and a free trial membership in the publisher's book club where you can select from more than a million books without charge. Chapters: Utf-8, Utf-16/ucs-2, Utf-Ebcdic, Comparison of Unicode Encodings, Utf-7, Gb 18030, Utf-1, Utf-9 and Utf-18, Utf-32/ucs-4. Excerpt: Unicode item BOM item Bi-directional text item Character Set : planes characters item Han unification item Use: HTML E-mail item Unicode typefaces This article compares Unicode encodings. Two situations are considered: 8-bit-clean environments and environments that forbid use of byte values that have the high bit set. Originally such prohibitions were to allow for links that used only seven data bits, but they remain in the standards and so software must generate messages that comply with the restrictions. Standard Compression Scheme for Unicode and Binary Ordered Compression for Unicode are excluded from the comparison tables because it is difficult to simply quantify their size. Compatibility issues An UTF-8 file that contains only ASCII characters is identical to an ASCII file. Legacy programs can generally handle UTF-8 encoded files even if they contain non-ASCII characters. UTF-16 and UTF-32 are incompatible with ASCII files, and thus require Unicode -aware programs to display, print and manipulate them, even if the file is known to contain only characters in the ASCII subset. Because they contain many zero bytes, the strings cannot be manipulated by normal C string handling for even simple operations such as copy. Therefore even most UTF-16 systems such as Windows and Java represent text objects such as program code with 8-bit encodings (ASCII, ISO-8859-1 , or UTF-8), not UTF-16. Indeed it is very rare to find a UTF-16 encoded text file on any system unless it is part of some more complex structure. This introduces a serious complication in programming that is often overlooked by system designers: many 8-bit encodings (in particular UTF-8) can...

Читать далее