Bits and Bytes

The smallest piece of information is a bit. A bit is either a 0 or a 1. A bit can represent different types of binary (two-valued) information:

1 = true
0 = false

1 = yes
0 = no

1 = on
0 = off

1 = negative
0 = positive

1 = male
0 = female

1 = by land
0 = by sea

etc.

Electronically, bits are represented by high and low voltage levels:

1 = 3 – 5 volts
0 = 0 – 2 volts

A bit string (also called a word) is a sequence of bits of some set length. Usually the length is some small power of 2: 4, 8, 16, 32, 64).

A byte is a bit string of length 8. There are 256 different bytes. (Why?) A byte can represent an unsigned integer between 0 and 255, a signed integer between -128 and +127, or a keyboard character such as a letter, digit, or punctuation mark.

Here's how it begins:

00000000 = 0 = +0
00000001 = 1 = +1
00000010 = 2 = +2
00000011 = 3 = +3
00000100 = 4 = +4
etc.

Bytes with leading bit = 1 represent negatives when interpreted as signed integers:

10000000 = 127 = -128
10000001 = 128 = -127
...
11111110 = 254 = -2
11111111 = 255 = -1

The ASCII standard is a scheme for coding keyboard characters as bytes. For example, the digits '0' through '9' are represented by the following bytes:

00110000 = 48 = +48 = '0' (i.e., the digit 0)
00110001 = 49 = +49 = '1'
...
00111001 = 57 = +57 = '9'

The uppercase letters are represented by the bytes:

01000001 = 65 = +65 = 'A'
01000010 = 66 = +66 = 'B'
...
01011010 = 90 = +90 = 'Z'

The lowercase letters are represented by the bytes:

01100001 = 97 = +97 = 'a'
01100010 = 98 = +98 = 'b'
...
01111010 = 121 = +121 = 'z'

There are also bytes for representing punctuation marks ('.' '?' '!' ''' ...) and for control characters (i.e., keyboard characters pressed while holding down the control (Ctrl) key.)

Of course the ASCII standard only codes characters in the Latin alphabet. To code characters in all of the world's alphabets we would need two bytes. This is done by the Unicode standard.

Text

A document is a sequence of paragraphs. A paragraph is a sequence of sentences. A sentence is a sequence of words, numbers, punctuation marks, and blank spaces. A word is a sequence of letters and a number is a sequence of digits. Letters, digits, punctuation marks, even blank spaces, are ASCII codes. An ASCII code is a sequence of bits. Thus, inside a computer any document—essay, poem, novel-- is merely a very long string of bits.

Byte Arithmetic

We can add, multiply, subtract, and divide bytes using adaptations of the algorithms learned in elementary school.

The trick is to remember that when we add bits everything looks the same as regular arithmetic:

0 + 0 = 0
0 + 1 = 1
1 + 0 = 1

Except:

1 + 1 = 0

What happened here? Well, 1 + 1 = 2, but 2 isn't a bit. We would need 2 bits to represent 2, namely:

1 + 1 = 2 = 10 (2-bit representation of 2) = 0 carry 1

So what happened to the 1? We will need to carry it to the next column, if there is one.

Notice that this is no different from ordinary (decimal) arithmetic, where:

1 + 9 = 10 = 0 carry 1

We can extend this rule to adding bytes:

00000110 = 6
+ 00000111 = + 7
00001101 = 13

Notice that in the second column from the right we computed:

1 + 1 = 0 carry 1

That made the third column from the right:

1 + 1 + 1 = 1 + 10 = 11 = 1 carry 1

In the fourth column from the right we had:

1 + 0 + 0 = 1

Verify that 00001101 is the byte representing 13.

What happens when we carry out of the last column? Well, it depends on if we are interpreting the result as a sum of singed integers or unsigned integers.

For example:

11111111 = 255 = -1
+ 00000011 = + 3 = + +3

00000010 = 2 = +2

Notice that interpreted as the sum of unsigned integers the answer is wrong (255 + 3 = 2). This is called a carry error. However, interpreted as the sum of signed integers, the answer is right!

On the other hand, consider this example:

   01111111 = 127 =   +127
+ 00000010 = + 3 = +   +3
   10000001 = 130 =   -127

In this case the unsigned answer is correct, but the signed answer is incorrect. This is called an overflow error.

The problem is that when one is limited to such a small range of integers, overflow and carry errors are inevitable. Of course the processor "flags" these errors so that programmers can work around them.

Big Numbers

We can represent big numbers as sequences of consecutive bytes. For example, we can interpret a sequence of 4 consecutive bytes (called a 32-bit word) as an unsigned integer between 0 and 2³² – 1 = 4,294,967,295 and a signed integer between
-2³¹= -2,147,483,648 and 2³¹ – 1 = 2,147,483,647.

To see how these limits are computed, note that a sequence of 4 bytes is a sequence of 4 * 8 = 32 bits. Now, how many 3 bit strings are there? Well, there are 2 possibilities for the first bit, and 2 for the second bit, thus 2 * 2 = 4 possibilities for the first two bits:

00
01
10
11

There are 2 possibilities for the third bit, therefore 2 * 2 * 2 = 2³ = 2 * 4 = 8 possibilities for all 3 bits:

000
001
010
011
100
101
110
111

Generalizing this reasoning, we conclude that there are 2³² possibilities for a 32 bit string. This is a very big number, over 4 million!

Of course we can easily extend our arithmetic algorithms to multi-byte integers.

Here's a little calculator that helps you convert between binary numbers (bit strings) and decimal numbers (digit strings):

http://www.mathsisfun.com/binary-decimal-hexadecimal-converter.html

Floating Point numbers

What about decimal fractions like 3.14? Keep in mind that this means:

3 + 1/10 + 4/100

We can interpret a pair of bytes like 00000010.10100000 to mean:

2 + 1/2 + 0/4 + 1/8 = 2.625

Using this idea we can represent many (but obviously not all) fractions. These are called floating point numbers.

Other Data

So what's left? Photos, videos, music, programs, inside a computer all are represented as long strings of bits.