Bindings

Background, Motivation

Motivation for review is to explain why lex and grammars can't do it all.

[ draw phases of compilation: FE: lex->parse->semantics BE:various opt, code-gen ] be sure to emphasize parse tree, undecorated. semantic analysis will decorate tree. By the time reach BE, the parse tree is no longer recognizable, probable discarded monte of it.

Usually generate parse tree from parse; don't generate code directly from parse. (although can almost do this for stupid languages, e.g., assemblers) Cannot do certain things with grammars: e.g., DU and UD chaining. Type inference, coercion. Names become important: what does the variable refer to?

Introduces us to next topic...Bindings.

Binding

Consider the following (p. 158)

public class A {		// defines class (and therefore type) named A
  A A(A A) {			// function named A, returns A, formal arg
				// named A of type A
  A:				// label
    for (;;) {
      if (A.A(A) == A) {	// obj named A, invoke method named A with A
				// as actual arg, then test return is
				// equal to obj named A
	break;
      }
    }
    return A;			// returns the formal arg named A
  }
}

This is an example that shows the same name can be bound to many types of entities.

[Caution]Caution

The text material makes refers to many C, C++, and Ada example that may behave differently that Java.

The issue of scoping needs to be explained in more depth than presented in the text.


- definitions of dynamic and static scope on insufficient
- show scope + extent terminology

- motivate with examples, below...

Two example C++ functions that have the same return type (pointer to int),

int* makePtr() {
  int* p = new int;
  (*p) = 5;
  return p;
}

The above demonstrates that C++ heap-allocated (i.e., using new) entities live forever. This is indefinite extent. Notice also that there is no binding for the new int entity. The name p is bound to a different variable that contains a pointer.

int* makeDangler() {
  int i;
  i = 5;
  return &i;
}

The above demonstrates that C++ stack-allocated (i.e., locally declared) entities live from the time they are declared until the function returns. This is dynamic extent. By the way, you may notice that makeDangler() is a pointer nightmare.

Here is a Java example,


public class MyClass {
  private static counter = 100;
  public void bump() { counter ++; }
}

The above demonstrates lexical scope and indefinite extent. The binding of the static variable, counter, begins with class loading and lives as long as the class does. It's scope however is restricted to the couple of lines shown above, namely the lexical region defined by the class.

Binding, Scope, Extent

[Caution]Caution

The textbook chapter on semantics presents many similar sounding terms,

  • dynamic allocation and static allocation
  • dynamic binding time and static binding time
  • dynamic scope and static scope
  • dynamic constants and static constants - !?
  • dynamic environment and static environment

Unfortunately the definitions are imprecise with scant material comparing and contrasting these terms. This is especially problematic because the terms have been defined to mean different things in different settings as defined by various researchers.

For these reasons, the following notes provide more depth and cover material and ideas not presented in the text.

Scope, Bindings

The term “dynamic scope” is a misnomer. Nevertheless it is both traditional and useful.

--Common Lisp the Language, 2nd Edition

Scoping and binding semantics: a difficult topic. Trying to make sense of scoping semantics is hard because established authorities mixup terminology and use imprecise defnitions. Even seasoned programmers get frustrated and confused by binding semantics. Scoping (and binding) semantics deal with subtle interactions of a name's defininitons and later use. The following questions highlight some of these subtleties.

  • What if a subclass uses the same name to define something different than its superclass?
  • What if the same parameter gets bound to many values when a function is called repeatedly?
  • Which entity should be used, and when is a binding is no longer valid?

Scoping semantics answer these questions.

If you look at a program, you can see the structure of an if statement and the calculation performed by an expression, but you cannot always clearly see the binding employed by scoping semantics. Yet the scoping semantics dictate a program's behavior. For these reasons, Programmers struggle with scoping. So do the language designers. Language designers do not rework the syntax of a conditional, but many re-architect or redefine their programming language's scoping rules after FCS (first customer ship). And when language designers change scoping semantics, all those programs' meaning change, they run differently or not at all.

Lisp switched to lexical scope (now found in Common Lisp, Scheme). The Perl designers missed this lesson and with Perl5, they too switched to lexical scope.

C++ was initially designed without support for namespaces. Although not a programming language, XML missed the C++ lesson and kludged a workaround to add namespace support.

When the Perl designers switched to lexical scoping rules with Perl5, they broke Perl4 legacy source code. The legacy code passed Perl5 lexical and syntax checks, since the language syntax didn't change. The scoping rules of Perl5 couldn't be seen in developers' source code, but it was felt through semantic and run-time errors.

In spite of tremendous forethought in language design, even the Java designers returned to modify their scope rules. Where we once had Java's hidden references we now we have hidden, shadowed, and obsured references. It seems that not just programmers but language designers, too, have difficulties with scoping. Oh well.

Scope is not only hard to see, it is hard to describe, so to best understand language scoping rules, we look their implementations. Understanding a few of these implementations, should provide a mental-model of the way scoping works. The goal of reviewing implementations is to clear the confusion around overloaded terminology and imprecise definitions found in the litterature. Scoping semantics: hard to see, hard to describe, but you can get a feel if you learn how it is built.

Definitions

[Tip]Definition: Binding

A binding associates a name to an entity. Most scope and extent examples presenteted concern variable bindings. In general, a name can be bound to an entity such as a value or to a location containing a value. These concepts work for other bindings, too, such as, constants, symbols, cons cells, catchers, goto target labels, types, classes, interfaces and namespace entities.

A name can be bound, unbound and rebound to different entities. For example, an instance variable (non-static data member) in a Java class is has a name. That name is bound to a different entity each time a new object of that class is cerated. Many bindings for the same name can exist at the same time.

Many names can exist for the same entity at the same time (sometimes called aliasing). For example, this can happend when passing parameters: there's only one entity, but the caller and callee have different names for the same entity.

Examples:

class C {
  int foo;                              // a definition (name foo)
  ...
  int getValue() {
    return foo+1;                       // a use (name foo)
  }
}

c1 = new C();                           // binds foo to an int entity 
c2 = new C();                           // binds foo to another int entity 
[Tip]Definition: Extent

The time period of binding. Lifetime. The time between when a binding is established and when it is unestablished. There are two basic types: dynamic extent (stack-like behavior), and indefininte extent.

[Tip]Definition: Scope

Not to be confused with the informal terms "dynamic scope" or "static scope". Scope is the textual region where a name may be referenced. There are two basic types: lexical scope (confined to the textual region of code of a body), indefinite scope (references may occur anywhere).

[Caution]Caution

The informal terms can get confusing. Dynamic extent is what is used by languages such as C, which are informally termed static scoping languages.

How Bindings Work...

A few case studies of programming languages are considered. Each considers or hints at how the language implements its binding mechanism(s). Examples show some effects of each implementation. The scope and extent categorizations are highlighted for each.

BASH, the Unix Shell Language

SCOPE: INDEFININTE

EXTENT: INDEFININTE

IMPLEMENTATION: your basic hashtable

Example 1. Example 1. dynTst.bash

#!/bin/bash

foo () {
  X="foo is "$X
  echo inside foo : X=$X
}

X="a string"
echo dynTst.bash: X=$X
foo
echo dynTst.bash: X=$X
. addToX.bash
echo dynTst.bash: X=$X

When this Bash shell script is run, it calls the local function, foo, and sources another script, addToX.bash. echo statements report X's values at various points of execution.

Example 2. Example 2. addToX.bash

#!/bin/bash

X=$X" that is a little longer."
echo addToX.bash: X=$X

The output below demonstrates the scoping behavior

$ dynTst.bash
dynTst.bash: X=a string
inside foo : X=foo is a string
dynTst.bash: X=foo is a string
addToX.bash: X=foo is a string that is a little longer.
dynTst.bash: X=foo is a string that is a little longer.

Unix shells use a straightforward dictionary lookup mechanism to retrieve references bindings, such as $X in this example. In spite of the striking similarity of Bash script programming to C programming, neither function invocations nor sourcing external files have C's scoping behavior. Instead, Bash variable bindings have indefinite extent. Once created the binding continues on. An entity continues to exist even after the function completes. The last established binding prevails, destroying any previous values for that variable.

The implementation of this mechanism is nothing more than a key-value pair lookup such as a hash-table or red-black tree might provide. When a Bash procedure is called it's bindings take effect permanently.

[Note]Note

Most Unix shells employ a 2-level namespace that is implemented by two environments: a local enviroment implemented by the shell and a global environmont maintained in the Unix process. The shell provides an "extern" variable declaration to support this. Also, Unix has many alternative shells (ASH, CSH, SH, BASH, KSH TSH, SCSH, and others) each providing variations on the scoping semantics of Bash.

Here is a standard idom used in the Unix Shell, bash.

if [ -n "$DNS_IPS" ]; then
    for IP in $DNS_IPS; do
	acceptZoneXferInput $IP -l
    done; unset IP
fi

This idiom differs form block structured languages on two cumbersome points: unset statement removes the loop variable's indefinitie binding from the scoping environment, wropping the loop in an initial "is-empty" test prevents a run-time syntax error.

A-Lists Implement Scoping Mechanisms in Lisp

SCOPE: DYNAMIC, but could be made INDEFININTE, depends on language implmentation

EXTENT: INDEFININTE

IMPLEMENTATION: environment represented by association list (A-list)

Although the Unix shell scoping semantics is OK for simple shell scripts, most languages need support for procedure calls and parameter passing with more intuitive semantics. The Unix shell semantics gets confusing when writing larger programs, since procedures bindings do not get "unset" after invocation. The remaining scoping implementations we study use some strategy to restore the environment that was in effect prior to the procedure call.

Older Lisp systems use a very simple mechanism for bindings: the A-list, discussed here. Newer Lisp systems uses closures, discussed later. Sometimes, the A-list is called the environment. An a-list is a (possibly long) list containing cons cells. These cons cells defines and association: the CAR points to the variable name (symbol) and the CDR points to the entity to which it is bound. Each cons cell is a variable binding.

Scheme A-List Notes

A-list. DEFINIION. assoc/association list. A list of assoc's.

ASSOC. DEFINIION. An association is a cons cell where the CAR is the KEY and the CDR is the VALUE.

 . --- . --- . --- nil
 |     |     |
 .     .     .
 /\    /\    /\    
x 55  a  1  ff (lambda (a b c) (* a (+ b c)))

Note that the function ff, is just data...since the lambda expression is just a list with 3 elements. But then, some scheme interpreters would compile the function body for efficiency.

 . ----- . --------------- . --- ()
 |       |                 |
lambda   . - . - . - ()    . - . - . --- ()
         |   |   |         |   |   |
         a   b   c         *   a   . - . - . --- ()
                                   |   |   |
                                   +   b   c

(define x 55)
(define a 1)
(define (ff a b c)
  (* a (+ b c)) )

Now, redefine x to be 11. Replaces the definition (association) which has the effect of changing the binding of x to 11.

 . --- . --- . --- ()
 |     |     |
 .     .     .
 /\    /\    /\    
x 11  a  1  ff #procedure

What does Scheme do to a function definition when it is entered? What does it mean to compile for efficiency? So scheme does some things to speed things up. For example the reference to * could be replaced by a pointer the actual function, to avoid an a-list lookup. But then, things are no really that simple. Because scheme allows for very late binding of the symbol * to the function name.

(define x 11)
Now x has a new binding, so performing calculation involving x will use this latest value. But what abouth the value (i.e., binding of) ff. The compiled definition of ff needs to be intellegent enough to know to use the most latest binding. Now redefine * to add two numbers.
(define (* x y) (+ x y))
So the program ff changes, too.
(ff x 4 5) ==> (+ 11 4 5)
   ;;   NOT    (* 11 (+ 4 5))
   ;; so it prints 20, not 99.

We'll need to come back to talk about how (specifically) a-lists are used to implement scoping behavior in programming languages.

The technique for implementing scoping using a single A-list is informally called "dynamic scoping." Specifically, dynamic scoping means indefinite scope (references may occur anywhere) and dynamic extent.

When you enter,

> (define x 3)
> (define y (lambda (z) (* z z)))
> (define (y z) (* z z))
;; or the equivalent
> (define y (lambda (y) (* y y))

You have added two more associations to the beginning of the a-list. when you enter,

> y
#<procedure:y>

> x
3
> y
#<procedure:y>
> z
reference to undefined identifier: z

Note, z is not defined. But when you call procedure y with an argument, an new association is added to the front of the a-list for z.

More Examples Demonstrating Dynamic Scope Behavior with A-Lists

(define x 1)
(define (foo x) (bar (+ x 1)))
(define (bar x) (baz (+ x 1)))
(define (baz x) (* x 3))

(foo 2)

 ==> 12

Just before baz returns, the binding (x . 4) is at the head of the a-list,

((x . 2) (baz . #) (bar . #) (foo . #) (x . 1))                 ;; inside foo
((x . 3) (x . 2) (baz . #) (bar . #) (foo . #) (x . 1))         ;; inside bar
((x . 4) (x . 3) (x . 2) (baz . #) (bar . #) (foo . #) (x . 1)) ;; inside baz

Dynamic scoping works as expected (i.e., what you would think) until a procedure references something outside the lexical boundaries of its definition.

(define x 1)
(define (foo x) (bar (+ x 1)))
(define (bar x) (baz (+ x 1)))
(define (baz y) (* y x))         ;; here is a non-local reference to x

(foo 2)

 ==> 12

Just before baz returns, the binding (x . 4) is at the head of the a-list, even though the binding (x . 1) was in effect when baz was defined.

((x . 2) ...)                 ;; inside foo
((x . 3) (x . 2) ...)         ;; inside bar
((y . 4) (x . 3) (x . 2) ...) ;; inside baz

A-list scoping says pick up the latest bound definitions. A benefit is that it does not matter if a procedure references an undefined variable at define-time. A downside is trying to understand how a procedure behaves, and understanding dynamic scoping is especially tough with code modifications, since it could affect everything.

Imagine redefining bar with a differently named argument.

(define (bar arg) (baz (+ arg 1)))

The program's behavior changes when calling (foo 2).

((x . 2) ...)                   ;; inside foo
((arg . 3) (x . 2) ...)         ;; inside bar
((y . 4) (arg . 3) (x . 2) ...)   ;; inside baz

This time (foo 2) returns 8, since baz computes its return value by multiplying the current values for y and x: (* 4 2).

Efficiency Considerations

The A-list is a naive environment symbol lookup mechanism. A-list performance deteriorates as bindings are added and the list grows longer. Worst-case A-list lookup takes O(n) comparisons and since each comparison is a character by character string match, the running time may be worse than O(n).

Other implementations provide the same A-list behavior with better performance. Symbol interning and hash tables are two techniques for improving lookup performance. Symbol interning avoids the expensive string comparison by converting all symbols to a unique integer value. Thereafter symbols are compared by their integer interned values which avoids the expensive string comparison. Modified hash table lookup data structures can mimic A-list behavior while improving upon the O(n) performance problem.

Variations of Scoping Semantics using A-lists

Dynamic extent, if the time intervals of two entities overlap, then one interval will necessarily be nested within the other one. Many Lisp variants (e.g., Emacs, Jade, Xlisp) use indefinite scope and dynamic extent. Variables references may occur anywhere in a program (indefinite scope) and used any time between its binding being created and later unbound (dynamic extent). Consider a procedure with an argument. Each procedure call establishes a new binding. When many invocations are active at once, there are just as many bindings for the same parameter name. When a piece of code references a parameter, which binding to use depends upon the dynamic behavior of the A-list: the most recently added.

The combination of indefinite scope and dynamic extent is often termed "dynamic scope".

Another example of the use of dynamic scope. This one highlights accessing a non local variable from a procedure.

(defun foo (x)
  (let
      ((foo-var (* x 20)))
    (bar x)
    ...

(defun bar (y)
  ;; Since this function is called from
  ;; the function foo it can refer
  ;; to bindings which foo established.
  (setq y (+ y foo-var))

Summarizing

The A-list provides a (dynamic) scoping mechanism that is very easy to implement. The a-list is really how it's implemented in Lisp systems such as emacs-lisp, XLisp, other older lisps. Note Perl (version 4), uses a similar mechanism for bindings, but it's not called an A-list. Perl5 abandoned dynamic scope in favor of lexical scope, described next. Some Unix shells (not BASH) also use a similarly implemented dynamic scoping mechanism. A-lists are easy for the language designer to build, but makes it hard for the programmer to understand his programs.

(*Note might explain more re: passing procedures as arguments) [ Then there is the case when passing procedures as arguments. this leads into closure and stack based implementations ]

Refereneces

Source: Gosling, Steel, et. al. Common Lisp the Language. Jim Gosling ond Guy Steele popularized the ideas of describing binding semantics with the terms, scope and extent. Jim Gosling is also known as the inventor of OAK which became Java, and both Guy Steele and Jim Gosling are on the Java designers team.)