Characters and Strings

Characters

Characters are encoded by unsigned integers.

The ASCII scheme encodes characters in the Latin alphabet as unsigned 8-bit integers between 0 and 127. (Note that the leading bit will always be 0.) This includes punctuation marks and control characters (such as '\n').

The Unicode UTF-16 scheme encodes characters as a 16-bit unsigned integer. This leaves plenty of room for other alphabets such as Hebrew, Cyrillic, Chinese, etc.

The first 8 bits of UTF-16 matches the ASCII codes.

Here's a list of UTF-16 codes:

http://en.wikipedia.org/wiki/List_of_Unicode_characters

Strings

Strings are often represented as null-terminated arrays of characters.

For example: the ASCII string "abc" would correspond to the 4-byte bit pattern: 0x610x620x630x00 = 01100001,01100010,01100011,00000000

Java strings are objects.