A language processor executes programs written in a specific programming language, L.
There are two types of processors: compilers and interpreters.
An interpreter steps through program p executing one instruction at a time. No translation into a lower level language is needed. The interpreter simply figures out what each instruction wants it to do, and then does it. Lisp and Scala are examples of interpreted languages. Unix shells (Bourne, C, etc.) are examples of interpreters.
A compiler translates a program p into an equivalent program p' written in the native binary machine language of the host computer. The computer's processor essentially is an interpreter for this language and so can directly execute p'. FORTRAN and C are examples of compiled languages.
1. Technically, a compiler will translate a program p into an equivalent program p'' written in the host computer's native assembly language. Assembly languages contain simple instructions for arithmetic, logic, reading and writing memory, and sequence control (goto, call, return, and that's about it!). The instructions are symbolic:
add x, y // x = x + y
There is a one-to-one correspondence between assembly language instructions and the binary machine language of the host computer. A separate translator called an assembler translates p'' into p'.
2. Java is a compiled language. It's a little bit special because Java byte code, the target language of the Java compiler, runs on the Java VM (virtual machine), a processor implemented in software!
3. Earlier I said Scala is an interpreted language. The real story is more complicated. When a Scala expression is entered at the interpreter's prompt, it is actually compiled into Java byte code, executed on the Java VM, and the result is displayed in the interpreter's window.
The language processing
pipeline:
In my view there are three categories of programming languages: languages for students, languages for engineers, and languages for mathematicians.
The following languages were designed to teach students how to program:
Pascal (Wirth, 1969)
Basic (Kemeny & Kurtz, 1964)
Logo (Bobrow, Papert, et. al, 1967)
Languages for engineers are designed to maximize the programmer's access to the host computer. Engineers go crazy if there's a bit somewhere in the computer that they can't flip.
Of course different types of computers but their instruction sets are similarly imperative in nature. This tends to impose the imperative paradigm on programs written in engineering languages. These programs tend to reflect the underlying nature of the computer. This potentially makes these programs long, error prone, and hard to maintain, but efficient.
Examples of languages for engineers:
FORTRAN (Bachus, 1956)
C (Ritchie, 1969)
C++ (Stroustrup, 1979)
Ada (some DoD committee, 1980)
Languages for mathematicians hide the grubbiness of the host computer, presenting instead some fanciful virtual computer filled with wonderful abstractions. Programmers can employ non-imperative programming paradigms. Their programs tend to be short, elegant, abstract, but slow.
Examples of languages for mathematicians:
Lisp (McCarthy, 1958)
ALGOL (some committee of fancy computer scientists, 1960)
Haskel (some committee of irate mathematicians, 1997)
Java (Gossling, 1991)
Scala (Odersky, 2001)
1. One might add another category: languages for non-programmers (scientists, accountants, and other lesser beings). Languages like COBOL and NetLogo belong to this category.
2. There are also what I call kitchen sink languages. These are languages that provide for engineers and mathematicians. C++ and Scala are in this category.
A phrase is any part of a program-- including the whole program-- that can be executed. For example, x + 2 is a phrase, but x + is not.
Syntax refers to the tree-like structure of a well-formed phrase. For example, the syntax of the expression x + (2 * z) might be represented as the tree (+ x (* 2 z))
Note: for now I'm identifying parse trees with abstract syntax trees, which are essentially Lisp expressions. On paper you can easily redraw these as trees:
+
x *
2
z
Semantics refers to the behavior of a phrase. What happens when the phrase is executed? Is a value produced? Is a variable updated?
While the syntax of a language is often specified by an EBNF grammar (see below), semantics is often specified by a reference interpreter—an interpreter written in a well understood meta-language. A reference interpreter's code is not optimized for efficiency, instead it's optimized for understandability and spells out the semantics of the object language in terms of its own semantics.
(Of course there is an infinite regress lurking here. How do we specify the semantics of the meta-language? Do we write a reference interpreter in a meta-meta-language?)