Inline videos. See also:Category: Articles with embedded Videos..

Byte Order Mark

From Biocrawler, the free encyclopedia.

Unicode
Encodings
Bi-directional text
BOM
Han unification
Unicode and HTML
Unicode and Email

A Byte Order Mark (BOM) is the character at code point FEFF (ZERO-WIDTH NO-BREAK SPACE), when that character is used to denote the Endianness of a string of UCS/Unicode characters encoded in UTF-16 or UTF-32.

A BOM can also be used to indicate the encoding of unlabeled text in many Unicode encodings. In most encodings the BOM is a sequence which is unlikely to be seen in more conventional encodings or other Unicode encodings (usually looking like a sequence of obscure control codes). If a BOM is misinterpreted as an actual character within the text then it will generally be invisible due to the fact it is a ZERO-WIDTH NO-BREAK SPACE. The "zero width no-break space" function of the U+FEFF character has been deprecated in Unicode 3.2, allowing it to be used solely with the semantic of BOM.

In UTF-16, a BOM is expressed as the 2 byte sequence FE FF at the beginning of the encoded string, to indicate that the encoded characters that follow it use big-endian byte order; or it is expressed as the byte sequence FF FE to indicate little-endian order.

Whilst UTF-8 does not have byte order issues, a BOM encoded in UTF-8 may be used to mark text as UTF-8. Quite a lot of Windows software adds one to UTF-8 files. However in Unix-like systems (which make heavy use of text files for configuration) this practice is not recommended as it will interfere with correct processing of important codes such as the hash-bang at the start of a file. The UTF-8 representation of the BOM is the byte sequence EF BB BF.

Whilst a BOM could be used with UTF-32 this encoding is almost never used for transmission anyway.

Representations of Byte Order Marks by Encoding

  • UTF-8: EF BB BF
  • UTF-16 Big Endian: FE FF
  • UTF-16 Little Endian: FF FE
  • UTF-32 Big Endian: 00 00 FE FF
  • UTF-32 Little Endian: FF FE 00 00
  • SCSU: 0E FE FF
  • UTF-7: 2B 2F 76 and one of the following byte sequences [ 38 | 39 | 2B | 2F | 38 2D ]
  • UTF-EBCDIC: DD 73 66 73
  • BOCU-1: FB EE 28

See also

External links

Wikipedia (http://en.wikipedia.org/wiki/Main_Page) Byte_Order_Mark (http://en.wikipedia.org/wiki/Byte_Order_Mark) version history (http://en.wikipedia.org/w/index.php?title=Byte_Order_Mark&action=history) GNU Free Documentation Lizenz (http://en.wikipedia.org/wiki/Wikipedia:Text_of_the_GNU_Free_Documentation_License) CC-by-sa (http://creativecommons.org/licenses/by-sa/2.5/)

Personal tools
Google Search
Google
Web
biocrawler.com