1   

Expressions and Values

'Alright,' said Deep Thought, 'The Answer to the Great Question... Of Life, the Universe, and Everything... Is... Is... Forty-two.'

The Hitch Hiker's Guide to the Galaxy
Douglas Adams

A Scheme interpreter is a primitive version of Deep Thought, the enigmatic super computer in Douglas Adams' book. We type a question on a keyboard, the Scheme interpreter displays the answer on a screen, then asks for another question.

Unlike Deep Thought, a Scheme interpreter won't answer a vague question. Instead, all Scheme questions have the form: "What answer does the following algorithm produce ... ?" where algorithms must be stated in the precise language of Scheme expressions. We define this language later in the chapter.

Like Deep Thought, the answers a Scheme interpreter provides are from a carefully defined domain of all possible answers. We call Scheme answers "values." Since it's always better to know the answers before the questions, we begin with a description of the domain of Scheme values.

1.1.   Values

Scheme values can be divided into simple and composite values:

VALUE ::= SIMPLE | COMPOSITE

Simple values- numbers, characters, Booleans, etc.- can't be decomposed into component values. A composite value is several values grouped together. Lists, vectors, pairs, and strings are examples of such groupings:

SIMPLE ::= NUMBER | CHAR | BOOLE | SYMBOL | etc.
COMPOSITE ::= STRING | VECTOR | PAIR | LIST

1.1.1.   Numbers

All implementations of Scheme provide binary, octal, decimal, and hexadecimal representations of integers. Here are five ways to represent 42 in Scheme:

42 = #b101010 = #o52 = #d42 = #x2A

Most implementations of Scheme provide truncations using decimal and scientific notation. For example:

.0333 = 3.33e-2

are two Scheme representations of 3.33 x 10-2. Notice that Scheme's version of scientific notation uses e to represent 10. Don't confuse this with the natural exponent e.

Some implementations of Scheme also provide rationals and complex numbers with real and imaginary parts that can be integers, truncations, or rationals. Here are some samples of legal Scheme representations:

2/3   4+6i   3e-2+1.5i   1/2+2/3i   etc.

All implementations of Scheme identify the domain of numbers with the domain of complex numbers:

NUMBER ::= COMPLEX

This makes sense mathematically since all other domains of numbers are subsets of the complex numbers. For example, if we ask the Scheme interpreter if 3.0 is a complex number, it will answer "true."

Since 3.0 is equivalent to 3, the Scheme interpreter will answer "true" if we ask if 3.0 is an integer. For the same reason, if we ask if 3 is real, the Scheme interpreter will again answer "true." This is different from languages like C and Pascal, which determine the type of a number by its representation rather than its interpretation.

A Scheme interpreter will also answer that 3.0 is a rational number and an inexact number. It will answer that 3 is a rational number, but when asked if 3 is inexact, it will answer "false." This happens because 3 only represents itself, while 3.0 is used to approximate any real number x such that, 2.95 ≤ x < 3.05.

1.1.2.   Characters

The CHAR domain consists of all keyboard characters: upper and lower case letters, digits, punctuation marks, symbols, and control characters. Scheme uses the #\ prefix to distinguish characters appearing as themselves and characters appearing in strings, numbers, names, and other contexts. For example, 5 is the number 5, but #\5 is the character 5. Scheme uses special names for non-printing control characters:

#\backspace, #\escape, #\newline, #\page,
#\return, #\rubout, #\space, #\tab

Inside the computer each character is represented by an integer between 0 and 127. This number is called the ASCII code (American Standard Code for Information Interchange) for the character. For example:

ASCII #\A = 65
ASCII #\a = 97
ASCII #\0 = 48
ASCII #\tab = 9

1.1.3.   Booleans

Truth is the kind of error without which a certain species of life could not live.

The Will to Power
Friedrich Nietzsche

In his influential book, Laws of Thought, the British Mathematician George Boole (1815 - 1864) used Algebra to model logical reasoning. He viewed propositions as expressions denoting one of two possible values: true or false. But instead of thinking of true and false in philosophical terms, he conceived of them as arbitrary but distinct members of a truth-value domain, and interpreted the connectives used to combine simple propositions into compound propositions- "and", "or", and "not"- as primitive algebraic operations on this domain.

Today any domain containing distinct members representing true and false, combined with primitive operations corresponding to the connectives, is called a Boolean Algebra. The circuitry used to build digital computers is based on Boolean Algebra. In this context true and false are identified with high and low voltages, and the connectives are implemented as solid-state switches called logic gates. Boolean Algebra is also incorporated into every programming language (high-level and machine languages) as a foundation for algorithmic testing and decision making.

In Scheme the BOOLE domain consists of the two truth values: #t for true and #f for false:

BOOLE ::= #t | #f

Note the difference between the characters #\t and #\f, and the Booles #t and #f. In both cases Scheme uses a special prefix to indicate the domain.

1.1.4.   Symbols

The SYMBOL domain seems out of place. In the next section we will learn that symbols are names used in programs to denote values. For example, pi is a name denoting the number 3.1416 and true is a name denoting the Boole #t. It's clear that symbols are an essential building block of Scheme programs, but why should symbols be included among the value domains?

Recall Newell and Simon's Physical Symbol System Hypothesis:

A physical symbol system- i.e. any device or agent that produces an evolving collection of symbols and expressions- has the necessary and sufficient means for general intelligent action.

If we want to model human problem solving, then it appears our programs will be based on symbol and expression manipulation, and this means symbols have to be treated like data.

More so than other languages, Scheme liberally allows the use of punctuation marks and operator symbols in names:

SYMBOL ::= PECULIAR | NORMAL

Scheme recognizes three "peculiar" symbols:

PECULIAR ::= + | - | ...

The initial character of a normal symbol can be any letter or special initial character:

SPECIAL-INIT ::=
! | $ | % | & | * | / | : | < | = | > | ? | ~ | _ | ^

The subsequent characters of a normal symbol include the initial characters, digits, and special subsequent characters:

SPECIAL-SUBSEQUENT ::= . | + | -

We can (and should) use special symbols to create readable and suggestive names:

int->real, cube-root, close?, halt!, a+bi, a/b, $profit, %loss

Symbols can be any length and are case insensitive. The following symbols are equivalent:

cat CAT cAt CaT

1.1.5.   Procedures

Sometimes an algorithm becomes so practiced we come to view it as a single operation. For example, we seldom think of starting a car as an algorithm:

1. shift to neutral
2. adjust the choke
3. pump the accelerator
4. hold the clutch down
5. turn the ignition key
6. repeat if necessary

Instead it becomes so automatic we think of it as a single operation:

1. start the car

A procedure is an algorithm encapsulated as a single operation, an algorithm-in-a-box. Procedures provided by Scheme are called primitive procedures. It is also possible for programmers to box their own algorithms.

Unfortunately, procedures don't have standard representations. For now, we denote the procedure named proc by [proc]. For example, [sin], [+], [*], [<], [=] denote the sine, addition, multiplication, less-than, and equality procedures respectively.

Like symbols, procedures also seem out of place among the Scheme value domains. It seems clear that procedures, like symbols, are important building blocks of Scheme programs, but why do they need to be treated as data too? But remember, Scheme is a meta programming language. This means we will be interested in writing procedures that manipulate other procedures as ordinary data.

1.1.6.   Strings

A string is any sequence of characters (including blanks) bracketed by double quotes. Here are some examples:

"Hello World"
"A man, a plan, a canal, Panama!"
"42"

If a double quote appears in a string as a literal character rather than signaling the end of the string, it must be preceded by a special escape character. The Scheme escape character is the backslash: \. For example, inside the computer the string:

"The phrase \"meta programming\" has many meanings."

represents the string:

The phrase "meta programming" has many meanings.

If a backslash appears in a string as a literal character rather than an escape character, it too must be preceded by a backslash escape. For example, the DOS path name:

c:\scheme\libs\string.scm

must be written in Scheme programs as the string:

"c:\\scheme\\libs\\string.scm"

The distinction between strings, characters, Booles, and symbols can get confusing. For example, the following four values belong to different domains:

"t" = the string consisting of the single character t
#\t = the character t
t = the symbol t
#t = the Boole true

1.1.7.   Lists

Any sequence of values can be grouped together into a list:

LIST ::= (VALUE ... )

Here are some examples of lists:

(a e i o u)
(#\a #\e #\i #\o #\u)
("a" "e" "i" "o" "u")
("(1 2 3)")
(3 "3" #\3)
()
(1 (1 2) (1 (1 2)))

Notice that a list is bracketed by parenthesis, and blanks, not commas, are used to separate the members of the list. More importantly, the members of a list can belong to different domains. A list can also be empty. Finally, lists can be nested inside lists. The last example above is a list of three members: the number 1, the two member list: (1 2), and the two member list: (1 (1 2)). How many members does the fourth list: ("(1 2 3)") have?

1.1.8.   Vectors

A vector is just like a list except it is prefixed by the # symbol:

VECTOR ::= #(VALUE ... )

For example:

#(a e i o u), #(3 "3" #\3), #(), #(1 #(1 2) (1 (1 2)))

are all vectors. In the last example the vector consists of three elements, the number 1, the vector #(1 2), and the list (1 (1 2)).

Logically, there is no difference between lists and vectors, but the vector #(1 2 3) and the list (1 2 3) may have different representations in the computer's memory. These differences allow certain operations to be performed more efficiently on one but not the other.

1.1.9.   Pairs

Any two Scheme values can be grouped together to form a pair:

PAIR ::= (VALUE . VALUE)

Here are three examples of pairs:

(.1 . .1)
("hello world" . (() . -2-i))
(#\f . #f)

There doesn't seem to be much difference between a pair and a two element list. Notationally, the only difference is the dot used to separate the members of the pair. Thus, (1 .2) is the list containing 1 and .2, while (1 . .2) is the pair containing 1 and .2. As with vectors, the difference is in the internal representation.

Note that the following values belong to different domains:

(1 . .2)   = the pair consisting of 1 and .2
(1 .2)   = the list consisting of 1 and .2
#(1 .2)   = the vector consisting of 1 and .2

Sequences versus Sets

How is the list (1 2 3) different from the set {1 2 3}? The main difference is that the members of a sequence (i.e. list, vector, string, or pair) are ordered by their position in the sequence. Therefore the list (3 2 1) is different from the list (1 2 3) while the set {3 2 1} is the same as the set {1 2 3}. This also implies that sequences can have multiple occurrences of the same item, while sets cannot. Thus, the list (1 2 3 1 2 3) is different from the list (1 2 3), while the set {1 2 3 1 2 3} is the same as the set {1 2 3}.

1.1.10.   Other Value Domains

Although all implementations of Scheme must provide the domains described above, some implementations provide additional value domains. Later we will encounter two variations of procedures: continuations and promises, as well as two variations of lists: ports and streams.

1.2.   Expressions

Algorithms are represented in Scheme by expressions. The algorithm described by a Scheme expression normally produces a value. For example, the arithmetic expression 5 * 8 + 2 represents the algorithm, "add 2 to the result of multiplying 5 and 8." The result of performing these operations is 42. The expression 5 * 8 + 2 produces the value 42.

Like values, Scheme expressions can be divided into subdomains:

EXPRESSION
   ::= LITERAL | SYMBOL | APPLICATION | STRUCTURE

1.2.1.   Literals

Almost any Scheme value can be turned into a Scheme expression by placing a single quote in front of it. Such an expression is called a literal because the value it produces is got by simply removing the quote. For example, '42 is the literal expression that produces the value 42.

LITERAL ::= 'VALUE

In most cases programmers can safely leave off the quote without confusing the interpreter. This is because programmers only produce expressions, while interpreters only produce values. Thus, if 42 is typed at the interpreter's prompt, the interpreter understands that the programmer is really asking for the value of '42.

There are a few types of values that require the single quote to avoid ambiguity. These will be discussed below.

1.2.2.   Symbols and the Global Environment

We have already encountered the SYMBOL domain. These are just the names used to denote procedures, constants, and other values. Scheme provides some pre-defined names, for example, +, *, sin, and < are pre-defined names for [+], [*], [sin], and [<]. Some implementations of Scheme provide nil, pi, true, and false as predefined names for (), 3.1416, #t, and #f respectively.

An association between a name and a value is called a binding. The Scheme interpreter stores predefined bindings in a symbol table called the Global Environment:

Figure 1.1

1.2.3.   Applications

An application or procedure call is simply a list of one or more expressions:

APPLICATION ::= (EXPRESSION EXPRESSION ... )

The first expression in an application, called the operator, always denotes a procedure, while the remaining expressions, called the operands, denote the procedure's inputs.

A procedure computes a mathematical function, which we can visualize as an abstract input-output device:

Figure 1.2

Data enters the "device" through input "wires," and a "circuit" (i.e. the algorithm) inside the device computes an output, which eventually emerges through the output "wire."

For example, the value denoted by the application:

(max (+ 2 3) (abs -4) (remainder 12 5))

is 5, the output produced by the max procedure given inputs 5, 4, and 2:

Figure 1.3

The value denoted by the application:

(<= (- 5 3) (+ 2 (* 3 3)) 14)

is #t, the output produced by the <= procedure given inputs 2, 11, and 14:

Figure 1.4

Notice that Scheme uses prefix notation instead on infix notation. In other words, Scheme will not understand the infix expression 2 + 3. Instead, programmers must write the + symbol in front of the operands: (+ 2 3). Although prefix notation normally doesn't require parenthesis, prefix notation in Scheme does. Parenthesis have special meaning in Scheme, namely, they indicate that a procedure is being called. This means Scheme programmers are not free to use parenthesis to make their programs more readable. For example, the expression (+ 2 (3.1e-5)) will cause an error since Scheme will assume the subexpression (3.1e-5) is an attempt to call the non-procedural value 3.1e-5.

Finally, notice that expressions can be nested. In other words, operands and operators may also be applications. In both sample applications shown above:

(max (+ 2 3) (abs -4) (remainder 12 5))

(<= (- 5 3) (+ 2 (* 3 3)))

the operands (boldface) are themselves applications.

Translating Algebraic Expressions into Scheme

Translating algebraic expressions into equivalent Scheme expressions requires working backwards. For example, in the expression:

division is performed last, so this will be the first operation to appear in the corresponding Scheme expression:

(/ NUMERATOR DENOMINATOR)

The sin in the numerator is performed after 1 is added to x, so sin appears first in NUMERATOR. Remember, the name sin goes inside the parenthesis with its operand:

(/ (sin SUM) DENOMINATOR)

SUM is simple, just remember, + comes first:

(+ x 1)

Following the same procedure for DENOMINATOR and substituting into the original quotient yields:

(/ (sin (+ x 1)) (cos (- x 1)))

Make sure all the parenthesis are balanced.

Let's look at one more example:

Unfortunately, ANSI/IEEE Scheme doesn't provide an inequality operator. We'll have to combine Scheme's not procedure with Scheme's = procedure:

(not (= ROOT y))

We consult the list of primitive number procedures in the Revised4 Report (or Chapter Two) and notice that sqrt is supplied by all implementations of Scheme that supply real numbers. We can replace ROOT with a call to sqrt:

(not (= (sqrt SUM) y))

Avoid inserting unnecessary parenthesis.

SUM is simply:

(+ EXPONENT 1)

Another quick scan of the Revised4 Report reveals Scheme supplies two possible candidates for computing exponentials: exp and expt. Checking the Revised4 Report, we discover that (exp x) computes ex, but (expt x y) computes xy. Thus, we can formalize EXPONENT as:

(expt 3 x)

Putting these pieces back into our original expression gives:

(not (= (sqrt (+ (expt 3 x) 1) y))

Data Flow Structures

Sometimes it is helpful to think of a complicated application as a Scheme representation of an abstract "circuit" called a data flow structure or a data flow diagram. A data flow structure is built by connecting the input and output "wires" of function "devices." For example, the Scheme expression:

(/ (sqrt x) (- (cos x) 1))

represents a data flow structure built from four devices: sqrt, cos, -, and /. The input to both sqrt and cos is x. The output of cos, together with 1, are the inputs to -. The output of sqrt and - are the inputs to /. The final diagram is shown in Figure 1.5.

Figure 1.5

No device in a data flow structure produces an output until all its inputs have arrived. Thus, data flows through a data flow structure from left to right. The subtraction procedure (-) can't produce an output until the cos procedure produces its output. Similarly, the division procedure (/) must wait for the outputs of the - subtraction procedure and the sqrt procedure before producing its output.

Let's consider another example:

(>= (length (cons (car x) (cdr y)) 42)

This expression compares 42 to the length of the list obtained from the expression (cons (car x) (cdr y)) (never mind what this means for now). The corresponding data flow structure is built from five components: >=, length, cons, car, and cdr. The inputs to cons are the outputs of car and cdr. The output of cons is the input to length. The output of length, together with 42, is the input to >=:

Figure 1.6

1.2.4.   Structures

Structures, also called special forms, allow programmers to control the flow of evaluation (control structures), the visibility of data (block structures), and the contents of memory (assignment structures):

STRUCTURE ::=
   CONTROL | BLOCK | QUASIQUOTE | ASSIGNMENT

Structures look like specially formatted applications of the following seventeen special form constructors:

if cond case and or do let let* letrec lambda set! begin begin0 delay quote quasiquote unquote

These will be explained in Chapters Three through Five.

1.2.5.   Literals Revisited

There's a problem with our practice of not quoting literals. Notice the expression pi could be interpreted as a literal denoting itself- the symbol pi- or as a symbol denoting the number 3.1416 Similarly, the expression (+ 2 3) could be interpreted as a literal denoting a list containing a symbol and two numbers, or as an application denoting the number 5. How do we determine the correct interpretation of these expressions?

To resolve this ambiguity Scheme requires programmers to put a single quote in front of symbols, pairs, vectors, and lists when they are intended as literals. Thus, the expression pi is always interpreted as a symbol denoting 3.1416 while the expression 'pi is the literal denoting itself, the symbol pi. Similarly, the expression (+ 2 3) is always interpreted as the application denoting the number 5 while the expression '(+ 2 3) is the literal denoting itself, a list containing a symbol and two numbers.

It is not necessary to put quotes in front of symbols, lists, and pairs if they occur inside a vector, list, or pair. For example:

'#(x (x . x) (x x))
'((a . #\a) (e . #\e) (i . #\i) (o . #\o) (u . #\u))
'((a e i o u) . (#\a #\e #\i #\o #\u))

are all acceptable literals despite the fact that the pairs, symbols, and lists appearing inside are not quoted.

1.3.   The Scheme Interpreter

Defining the EXPRESSION and VALUE domains is only half the job of specifying a programming language; the other half is describing a processor able to evaluate expressions to produce values. A processor can be a physical device such as the CPU of a computer, or it can be a virtual device such as an interpreter or compiler.

The Scheme interpreter consists of three components: the Global Environment, an expression evaluator, and a control loop. The evaluator actually does the work of interpreting expressions. We will describe the operation of these components in detail in subsequent chapters.

1.3.1.   The Expression Evaluator

The evaluator, called eval, is a procedure that accepts an expression and an environment- called the current environment, often this is just the Global Environment- as input, and outputs the value denoted by the expression. The arrow between eval and the current environment is shown going two ways in Figure 1.7 because sometimes evaluating an expression can produce changes in the current environment.

Figure 1.7

If the expression input is a literal, eval returns the expression itself as a value. If the expression input is a symbol, eval searches the current environment input for the corresponding value. If the expression input is a structure, eval invokes a special evaluation algorithm tailored for the particular type of structure. If the expression input is an application, eval employs an algorithm called eager evaluation, which described in Chapter Two.

1.3.2.   The Control Loop

If the evaluator is the engine of the Scheme interpreter, then the control loop is the driver. The control loop is a procedure that perpetually waits for a Scheme expression to be typed on the computer's keyboard. When an expression is typed, the control loop reads it, evaluates it using the eval procedure, displays the result, then waits for the next expression to be typed:

Figure 1.8

1.4.   Definitions

All values computed by the Scheme interpreter are volatile. They disappear from the computer's memory the instant they appear on the screen. To save the value denoted by an expression it must be named using a definition:

DEFINITION ::=
   (define SYMBOL EXPRESSION) | PROCEDURE-BLOCK

(Procedure blocks will be discussed in the next chapter.) A definition declares an association between the name SYMBOL and the value denoted by EXPRESSION. We call associations between names and values bindings:

For example, the definition:

(define num1 (* 7 6))

Declares the binding num1 = 42, between the symbol num1 and 42, the value produced by the expression (* 7 6), and the definition:

(define num2 (* 13 3))

declares a binding num2 = 39, between the symbol num2 and 39, the value produced by the expression (* 13 3).

Bindings declared by definitions are saved in the Global Environment for future reference. After the definitions above the Global Environment shown in Figure 1.1 can be pictured as:

Figure 1.9

The symbols num1 and num2 now denote 42 and 39 respectively. Naturally, we can incorporate num1 and num2 into new Scheme phrases. For example, the expression:

(+ num1 num2)

now produces 81, and the definition:

(define num3 (* num1 num2))

now declares a binding between num3 and 1638.

Num1 and num2 will continue to denote 42 and 39 until the session ends or until they are redefined. For example, the definition:

(define num2 (+ 9 num2))

creates a new binding of num2 to 48, which replaces the old binding in the Global Environment. (But not before 9 is added to 39, the old value of num2, to create the new value.)

Appendices

Appendix 1.1.   Defining Domains

A domain is a set of objects that have similar representations and interpretations. If a is an object and A is a domain, then "a Î A" means "a is a member of A," and "a Ï A" means "a is not a member of A." For example, if EVEN is the domain of all even, non-negative integers- 0, 2, 4, 6, etc.- then 42 Î EVEN, but 43 Ï EVEN.

We can succinctly describe domains using domain equations. A domain equation has the form:

DOMAIN ::= PATTERN | PATTERN | etc.

where DOMAIN is the name of the domain being defined (domain names will always be in upper case), "::=" means "consists of", "|" means "or" or "union", and PATTERN is a string describing the format of some members of DOMAIN. These members are called instances of PATTERN.

There are several types of patterns. A pattern may be an actual member of the domain being defined. The only instance of a pattern like this is itself. For example, the equations:

CREW ::= Picard | Whorf | Spock
STOOGE ::= Larry | Curly | Moe

mean:

The domain CREW consists of Picard or Whorf or Spock.
The domain STOOGE consists of Larry or Curly or Moe.

In other words, the domain CREW contains three members: Picard, Whorf, and Spock; and the domain STOOGE contains three members: Larry, Curly, and Moe. A Mathematician would define these domains using set enumeration notation:

CREW = {Picard, Whorf, Spock}
STOOGE = {Larry, Curly, Moe}

A pattern may also be the name of another domain. In this case the instances are members of the domain. For example, the equation:

HERO ::= CREW | STOOGE

means:

The domain HERO consists of any member of the domain CREW or any member of the domain STOOGE.

A Mathematician would simply express this as a union:

HERO = CREW U STOOGE

Patterns can also be formed by concatenating (i.e. gluing together) patterns. For example, the first pattern on the right side of the equation:

INTRODUCTION ::= CREW meet STOOGE | STOOGE meet CREW

is built by concatenating three patterns: the domain CREW, the word " meet ", and the domain STOOGE. The meaning of the equation is:

The domain INTRODUCTION consists of any member of CREW, followed by the word " meet ", followed by any member of STOOGE; or any member of STOOGE, followed by the word " meet ", followed by any member of CREW.

Here are some sample members of the INTRODUCTION domain:

Moe meet Picard
Whorf meet Moe
Curly meet Picard

However,

Spock meet Whorf

is not a member of the INTRODUCTION domain. (Why?)

A Mathematician would probably define the INTRODUCTION domain using unions and set builder notation:

INTRODUCTION = INTRODUCTION1 U INTRODUCTION2

where

INTRODUCTION1 = {c meet s: c Î CREW & s Î STOOGE}
INTRODUCTION2 = {s meet c: s
Î STOOGE & c Î CREW}

Surrounding a pattern with square brackets indicates instances of the pattern are optional. For example, we can extend our definitions of CREW and STOOGE to allow more formality:

CREW2 ::= [Mr.] CREW
STOOGE2 ::= [Mr.] STOOGE

The members of CREW2 include the members of CREW:

Spock, Picard, Whorf

as well as:

Mr. Spock, Mr. Picard, Mr. Whorf

Alternatively, we could have defined CREW2 as a union of two domains using either a domain equation:

CREW2 ::= CREW | Mr. CREW

Finally, we can place an ellipsis (i.e. "...") behind a pattern. This means the pattern can be repeated zero or more times. The ellipsis is useful for defining domains of arbitrarily long sequences. For example:

HEROS ::= (HERO ... )

means:

The domain HEROS consists of zero members of the domain HERO surrounded by parenthesis, or one member of the domain HERO surrounded by parenthesis, or two members of the domain HERO surrounded by parenthesis, or three members of the domain HERO surrounded by parenthesis, etc.

Without the ellipsis the domain equation for HEROS would be infinitely long:

HEROS ::=
() | (HERO) | (HERO HERO) | (HERO HERO HERO) | etc.

Here are some sample instances of the pattern (HERO ...) (i.e. members of the domain HEROS):

()
(Picard Curly Spock)
(Larry)
(Moe Moe Moe Moe Moe Moe Moe Moe Moe Moe Moe Moe Moe)

The first example shows an instance of zero repetitions of HERO. The last example shows members of the HEROS domain can contain repeated instances of the HERO domain, and therefore the HEROS domain has an infinite number of members.

Warning: domain equations are not part of the Scheme language. Computer Scientists use domain equations to define the program domains and- less frequently- the data domains of all programming languages:

DATA ::= NUMBER | STRING | etc.
PROGRAM ::= INSTRUCTION ...

Computer Scientists call the format of domain equations Extended Backus-Naur Form, or EBNF for short. Domain Theory and Formal Language Theory are two areas of Computer Science that study domains.

Appendix 1.2.   Sessions

A Scheme session begins when the control loop is started from the operating system's prompt. When the Scheme application:

(exit)

is typed, the session ends, and control is returned to the operating system.

Saving Transcripts

The written dialogue between programmer and interpreter produced by a session is called a transcript. Unless the transcript is saved to a file, it scrolls out of view and into oblivion. Scheme provides a procedure for saving transcripts to files. Assume file is the name of a file:

(transcript-on "file")   =
an unspecified value. As a side effect, the standard output port is connected to both the monitor and file.

(transcript-off)   =
an unspecified value. As a side effect the standard output port is disconnected from file.

A file containing a Scheme session is called a transcript file.

An Example

Let's study a fragment of a Scheme session. Expressions entered by the user appear next to the interpreter's prompt: >, followed immediately by their values. All computer generated text is shown in boldface:

> 100
100
>
"Hello world"
"Hello world"
>
pi
3.14159265358979
>
(cos pi)
-1.0
>
(>= 3 5)
#f
>
'(+ 2 3 4)
(+ 2 3 4)
>
(+ 2 3 4)
9
>
+
[+]
>

This session shows literals, symbols, and applications being evaluated. Note that placing a quote in front of the list (+ 2 3 4) turns it into a literal that simply denotes itself, a list consisting of a symbol followed by three numbers. The control loop displays the list without the single quote because values don't need quotes, only literal expressions. But when the user types the list (+ 2 3 4) again without the quote, then eval interprets it as a procedure application, adds the operand values, and prints the resulting value, 9. Finally, when the user types + without the surrounding parenthesis, eval interprets it as a symbol, searches the Global Environment, and displays the corresponding value, the procedure [+]. (Different implementations of Scheme will use different notations for [+].)

Warning: PC-Scheme identifies the Boole #f and the empty list, (). This is inconsistent with ANSI Scheme, which, in most testing contexts, identifies all values with #t except #f.

Appendix 1.3.   Numbers

We will encounter the function-structure duality in many forms throughout this text. In the context of numbers, function refers to the interpretation of a number, while structure refers to the way the number is represented.

Mathematicians interpret real numbers as points on a number line, or more precisely, as distances from points on a number line to a fixed point called the origin. The unit of measurement can be miles, inches, meters, light-years, Angstroms, anything. Positive reals are the points to the right of the origin, negative reals are to the left. While there is only one interpretation of a real number, there are many representations.

Representing Real Numbers

The decimal representation of a positive real number is an infinite sequence of digits. For example:

x = 42.142857142857142857...

represents the infinite sum of distances:

4 * 101 + 2 * 100 + 1 * 10-1 + 4 * 10-2 + 2 * 10-3 + 8 * 10-4 + ...

We call 42 the integer part of x. The infinite sequence of digits to the right of the decimal point constitutes the fraction part of x.

This representation scheme is called decimal because there are ten possible digits that can occur in a sequence, 0 through 9. Why is ten special? Surely this is only an accident of biology or culture. In some places people count between instead of on the tips of their fingers. These people favor octal representations based on eight "digits:" 0 through 7, and interpret them as sums of powers of eight:

x = 52.1111... = 5 * 81 + 2 * 80 + 1 * 8-1 + 1 * 8-2 + 1 * 8-3 + ...

Programmers often prefer hexadecimal representations based on sixteen "digits:" 0 through 9, A (10), B (11), C (12), D (13), E (14), F (15), and interpret them as sums of powers of sixteen:

x = 2A.249249... = 2 * 161 + 10 * 160 + 2 * 16-1 + 4 * 16-2 + ...

Computers store, process, and communicate data as voltage levels. To avoid ambiguity only two levels are distinguished: high and low. For this reason computers favor binary representations using only two "digits," 0 and 1 (called bits), and interpret them as sums of powers of two:

x = 101010.001001001... = 1 * 25 + 0 * 24 + 1 * 23 + 0 * 22 + 1 * 21 + 0 * 20 + 0 * 2-1 + 0 * 2-2 + 1 * 2-3 + ...

The radix of a representation scheme is the number of allowable digits. The hexadecimal radix is sixteen, the octal radix is eight, the decimal radix is ten, and the binary radix is two.

Representing Integers and Rationals

All of the above representations of x are infinitely long. We are forced to use an ellipses (...) to indicate the fraction part continues forever. Computers too have difficulty dealing with infinitely long representations. For this reason three subsets of the reals are of particular interest.

In case all of the digits in the fraction part are zeros, we can safely ignore them and just represent the real by its integer part. Such reals are called integers.

Another special case occurs when the decimal part is periodic. This means the decimal part consists of a finite sequence of non-repeating digits followed by an infinitely repeating pattern of digits. It turns out such reals result from integer divisions, and therefore can be finitely represented as a ratio of two integers. We call these numbers rationals. Since the fraction part of x in the example above is periodic (the repeating pattern appears to be 142857, but who knows what "..." really means), x is rational. It results from dividing 295 by 7, hence can be represented as the ratio:

x = 295/7

Clearly integers are rationals. For example 42 = 42/1.

A truncation of a decimal representation, i.e. the result of chopping off all digits in the fraction part beyond some arbitrary point, is also a type of rational number since it is equivalent to the nontruncated number got by appending an infinite repeating pattern of zeros. For example:

y = 42.142857 = 42.142857000... = 42142857/1000000

Approximating Irrational Numbers

It seems rational to call reals that aren't rational irrational. Pi, the natural exponent e, and square roots of prime numbers are examples of irrationals. Irrational numbers are so ubiquitous that if we removed all of them from the number line, the length of what remained would be zero!

How does a computer represent an irrational number? Sadly, it doesn't. Instead, irrational numbers must be approximated by truncations. For this reason truncations are sometimes called inexact numbers. For example, some computers approximate the irrational number pi by the truncation 3.141592654.

Scientific Notation

An alternative, more compact representation for a truncation is scientific notation. Scientific notation can be used to compress long strings of consecutive zeros into an integer exponent of 10. For example the following decimal representation:

z = 0.0000000000123

can be compressed into the scientific notation:

z = 1.23 * 10-10

Problems

Warning: many versions of Scheme- including the TI and UG versions of PC-Scheme- are not 100% compliant with the IEEE/ANSI specification. While it is acceptable to use these versions for testing and experimenting, your answers must be based on IEEE/ANSI Scheme.

Problem 1.1.

Assume the following definitions have been made:

(define x 10)
(define y 20)
(define z -5)
(define m *)

Compute the values denoted by the following Scheme expressions. Use a Scheme interpreter to check your answers. Some expressions contain errors. Explain the nature of these errors, and if possible, suggest corrections. If you are not sure about a procedure, look its definition up in the Revised4 Report.

(+ x y (m z z))   
'(+ x y (m z z))   
"(+ x y (m z z))"   
#(+ x y (m z z))
(x + y + z * z)
'#(x y z)
(x y z)

Problem 1.2.

The type of a number is the lowest level in the hierarchy of number domains to which it belongs. For example, 42 belongs to the INTEGER, RATIONAL, REAL, and COMPLEX domains, hence its type is INTEGER. Classify the types of the following numbers. Also classify the representations as exact or inexact, and convert them into an equivalent decimal representation:

#x2.8e3   #b111+#o32i    #e#d32.0   #b.001   #x#iFFFF

Problem 1.3.

Assume the following definitions have been made:

(define a 6)
(define b 10)
(define c 30)

Because of the leading quote, the literal expression '(+ a b c) denotes the list (+ a b c) instead of the number 46. The quote tells the evaluator to interpret everything that follows it literally.

Scheme provides a structure called a quasiquote, which is denoted by a back quote: `. Quasiquote is similar to quote. For example, the expression `(+ a b c) also denotes the list (+ a b c). The difference is that the evaluator takes everything following a quasiquote literally unless it is preceded by the unquote operator (which is denoted by a comma). The evaluator evaluates unquoted values appearing inside a quasiquote. Thus, the expression `(+ a ,b c) denotes the list (+ a 10 c).

Compute the values of the following Scheme expressions:

`(+ ,a ,b ,c)
`(,a a ,b b ,c c)
`(,"a" ",a" ,'a ',a)
`((a b c) ,(a b c) (,a ,b ,c))

Problem 1.4.

Indicate what subdomain of VALUE and EXPRESSION each of the elements listed below belongs to. The choices for EXPRESSION subdomains are LITERAL, SYMBOL, APPLICATION, or NONE. The choices for VALUE subdomains are BOOLE, CHAR, SYMBOL, PROCEDURE, LIST, VECTOR, PAIR, STRING, NONE, or in the case of numbers, give the domain lowest in the number hierarchy to which the element belongs.

"f"   (1 .2)
f   (+ 1 .2)
#\f   #(+ 1 .2)
'f   '(+ 1 .2)
#f   +
#()   '+
'()   +3.0
()   '3.0
(1 . .2)   3.111
"(1 . .2)"   .2+0i

Problem 1.5.

Write Scheme expressions equivalent to the following mathematical expressions. You may use any ANSI Scheme procedures or constants. You may also use the "and" and "or" structures. Hint: Use Scheme's define procedure to give arbitrary definitions to x, y, z, a, and b. Once these symbols have values, you should be able to check your Scheme translations of the expressions below using your Scheme interpreter.

(sin(x + y)/cos(x - y))2
ln(xy + 3)
5.3 x 10-119
pi
e    (i.e. the natural log)
tan(x)/(log7(x)/y)
(x ≤ y) or (x2 ≠ 2)
((x/y)/(a/b))/(a/x)
-1 < ex < 1
arcsin(2x)

Problem 1.6.

A natural number is an unsigned integer. Write a domain equation for naturals:

NATURAL ::= ???

Problem 1.7.

Complete the following domain equations for number representation formats. NAT domain is the domain of natural number formats. (A natural number is just an unsigned integer.) UREAL is the domain of unsigned REAL formats. You may introduce supporting domains as you see fit.

FORMAT ::= REAL | COMPLEX
REAL ::= INT | RATIO | DECIMAL | SCIENTIFIC
INT ::= [-]NAT
NAT ::=
DIGIT ::=
RATIO ::=
DECIMAL ::=
SCIENTIFIC ::=
COMPLEX ::= REAL+UREALi | REAL-UREALi | REALi
UREAL ::=

Problem 1.8.

Write the following numbers in as many formats as possible. Classify each number as integer, rational, irrational, real, and/or complex. If a number belongs to several of these domains, list them all. Also, classify each format as exact or inexact:

-2.16, 1/20, 100, 1/7, 3.33, 2.5e-3, p, -2i, -2e10, 1+0i

Problem 1.9.

Compute the values of the following Scheme expressions assuming complex and rational numbers are fully implemented:

(+ 3+2i 4/3 .1)
(* 2/7 -i 3.1e2)
(expt -4 .5)
(exp 100)
(/ 2-i 2+i)
(* 3e42 .2e-16)

Problem 1.10.

Which of the following names are not members of Scheme's SYMBOL domain (as defined above). Explain why:

i +i ::: <.*.> /++} 3+ +3 + ++ C++ A<==>B&C=3 Hi-Ho! -Ho! x+y+z x+(y+z) <NUM>::=<INT>|<REAL> c^2 [x] f' .tax. a/b Scheme C++ Modula2.1 ___ x... ...x ... #x% =? ??? a+bi smith@sjsu.edu smith/project/foo.scm c:smith\project\foo.scm

Problem 1.11.

Why doesn't Scheme allow symbols to begin with ., -, or +?

Problem 1.12.

Investigate what happens when the following strings are typed into your Scheme interpreter:

"\cat"
"\\cat"
"\\\cat"

Problem 1.13.

Draw data flow diagrams for the following Scheme applications:

(+ (cos (* 3 x)) (sqrt (/ 1 x)))
(+ (+ (+ x y) (+ z z)) (+ x z))
(expt (* 3 z) (max x y z))

Problem 1.14.

Assume exp is a Scheme expression. Some versions of Scheme allow programmers to call the eval procedure directly:

(eval exp)   =
the value of exp in the Global Environment.

Assume the following definitions have been made:

(define val1 42)
(define val2 'val1)
(define val3 'val2)
(define val4 '(quotient val1 6))

Compute the values of the following Scheme expressions:

(eval val1)
(eval val2)
(eval val3)
(eval (eval val3))
(eval '(eval val3))
(eval val4)

Problem 1.15.

Find the hexadecimal (i.e. base 16) representations of the following numbers:

124   215   248.625   1/6