Exceptions

Like Java, Python has a syntax for exception handling, try-except-finally statements, which have the following syntax:

    try:
        statement_block_0
    except SomeError1 as error_name1:
        statement_block_1 #executed if a SomeError1 occurs
    except SomeError2 as error_name2:
        statement_block_2
    ...
    finally:
        statement_block_n # always gets executed

For example,

    try:
      f = open("file.txt", "r")
    except IOError as e:
      print e

To signal an exception you use the raise keyword:

    raise RuntimeError("Something bad just happened")

You define new exceptions by extending exception:

class MyException(Exception): pass

#now could use as:
raise MyException("Whoa! A MyException occurred!")

#more control can be had by overriding __init__
#here we define an exception taking two arguments
class MyException2(Exception):
    def __init__(self, errno, msg):
        self.args = (errno, msg)
        self.errno = errno
        self.errmsg = msg

raise MyException2(403, "Access Forbidden")

An except: block for FooError will also respond to any subclasses of FooError

Modules

Typically, when you code larger projects you split them into several files.
If you want to use code in one file in another file in Python you use import.
For example, in one file div.py you might have a collection of functions about division. This would be your module.

To use this module in another file, you could put:

import div #notice not div.py
a, b = div.divide(198, 15) #notice function in div.py have to be prefixed with div.

What if we want to use a different prefix then div? We could do:
```
import div as foo #now foo is the prefix
```

What if we don't want to keep writing div.some_function ?

from div import divide #from div import *; would import all functions
print divide(198, 15)

Modules can be bundled together in so-called packages, but I won't get into that unless we really need it.

Documentation Strings and Help

The first statement of a module, class or function definition can be a string called a documentation string.

For example,

def fact(n):
    "This function computes a factorial" #can use triple quoted strings
    if(n <= 1): return 1
    else: return n * fact(n - 1)

The documentation string of such an object is associated with its __doc__ property which can be printed, etc:
```
print fact.__doc__
```
Python has the built-in function help() which can be run from python in interactive mode to get information about modules.
Python also comes with a pydoc command that can also give documentation information.

Back to `A^star` Search

Recall in Best First Search we have an evaluation function `f` from nodes to some ordered set such as the reals, and the node we choose to expand next is always the one on the frontier of least `f` value.
For Greedy Best First Search, `f` was chosen to be some heuristic function `h(n)` that estimated the cost to a solution
For Uniform Cost Search, `f` is chosen to be `g(n) = ` the cost to reach node `n`.
For the `A^star` algorithm, we chose `f(n) = g(n) + h(n)`. i.e., `f(n)` is estimated cost of the total solution.
It turns out `A^star` search is complete. i.e., given enough resources it will find a solution.
It is also in some sense optimal among best first search algorithms provided `h` satisfies some constraints.
Let's look at why the latter fact is true...

Admissible Heuristics; Consistency

Call an heuristic function `h` admissible, if it never overestimates the cost to a solution.
For example, if the problem was to get from point A to point B in a city, then the straight-line distance between the two points would be an admissible heuristic.
If `h` is admissible then `f(n)` never overestimates the total cost to a solution.
A goal is optimal if one cannot find a cheaper solution.
If the nodes one could expand form a tree, then `A^star`-star will return optimal results provided `h` is admissible.
If the nodes one could expand form a directed acyclic graph, then a stronger condition is needed: A heuristic is consistent if for every `n` and every successor `n'` of `n` generated by any action `a`, the estimated cost of reaching `n` satisfies:
`h(n) leq c(n, a, n') + h(n').`
Consistency implies admissibility.
Straight-line distance will satisfy consistency by the triangle inequality.

Proof of optimality

We argue the case where the nodes might form a DAG, the admissible case is similar.

Lemma. Suppose `h(n)` is a consistent heuristic, then the values of `f(n)` along any path are nondecreasing.

Proof. Suppose `n'` is a successor of `n`, then `g(n') = g(n) + c(n, a, n')` for some action `a` and we have:
`f(n') = g(n') + h(n') = g(n) + c(n, a, n') + h(n') mbox( (by consistency) ) geq g(n) + h(n) = f(n)`. QED.

Lemma. Whenever `A^star` selects a node `n` for expansion, the optimal path to that node has been found.

Proof. If this were not the case, there would have to be another frontier node `n'` on the optimal path from the start node to `n`. Here `n'` is on the frontier, as the frontier nodes of the graph always separate the unexplored region of the graph from the explored region, and if it was in the explored region we would have selected `n'` already on the path to `n` to get a lower solution. Since `f` is nondecreasing along any path, `n'` would have lower `f`-cost that `n` and would have been selected before `n`. QED.

It follows from these two lemmas that the sequence of nodes expanded by `A^star` is in non-decreasing order of `f(n)`. Hence, the first goal node selected for expansion must be optimal because `f` is the true cost for goal nodes and all later goal nodes will be at least as expensive. (QED optimality proof).

Memory bounded heuristic search

The problem with `A^star` as presented is that it needs to keep track of all fringe and closed nodes.
Thus, it tends to run out of space before time.
We now look at a couple of ways to solve this problem...

`IDA^star` (Iterative Deepening `A^star`)

In `IDA^star` we fix a constant `\mu` and we modify the expand-node function so that it only adds nodes to the fringe that are of cost less than the current threshold value.
`IDA^star` is then:
Do `A^star` search for thresholds `< \mu`, `< 2\mu`, `< 3\mu`, ... until you find a solution.

Recursive Best-First Search (RBFS)

This algorithm is similar to recursive depth-first-search:

function RBF-SEARCH(problem) returns a solution, or failure
    return RBFS(problem, MAKE-NODE(problem.INITIAL_STATE), infty)


function RBFS(problem, node, f_limit) 
    returns a solution, or failure and a new f-cost limit
    
    if problem.GOAL-TEST(node.STATE) then return SOLUTION(node)
    successors := []
    for each action in problem.ACTIONS(node.STATE) do
        add CHILD-NODE(problem, node, action) into successors
    if successors is empty then return failure, infty
    for each s in successors do 
        /* update f with value from previous search, if any */
        s.f = max(s.g + s.h, node.f)
    loop do
        best := the lowest f-value node in successors
        if best.f > f_limit then return failure, best.f
        alternative := the second-lowest f-value among successors
        result, best.f := RBFS(problem, best, min(f_limit, alternative))
        if result != failure then return result

Simplified memory bounded A* (SMA*)

Do `A^star` until we run out of memory.

When we don't have enough memory to add a new node to the fringe, discard from closed or fringe node of worst cost.

Choosing Heuristics

One way to figure out if a heuristic is good or not, is by looking at its effective branching factor, `b^star`, of the search.
Let `N` be the number of nodes generated by `A^star` for a particular problem and let `d` denote the solution depth.
`b^star` is then solution to the equation:
`N + 1 = 1 + b^star + (b^star)^2 +...+(b^star)^d =\frac((b^star)^(d+1) - 1)(b - 1)`
For example, if you find a solution of depth `5` with `52` nodes then after solving for `b^star` in the above equation, `b^star = 1.92`.
The book considers two heuristics for the 8-puzzle:
- `h_1` := the number of misplaced tiles
- `h_2` := the Manhattan distance
They did `1200` random configurations of the 8-puzzle.
They got `b^star = 1.5` for `h_1`, and `b^star = 1.3` for `h_2`. Thus, `h_2` is a better heuristic as far as effective branching factor. As it doesn't cost that much more to compute at each node, the lower branching factor will mean one tends to save on space and time.
We can show that `h_1 leq h_2` for all possible boards. So as both are admissible, `h_2` will theoretically and empirically give better results.

Generating Heuristics and Picking the Best

Notice one can the weaken rules of 8-puzzle, so that `h_1` (or `h_2`) becomes the exact cost of a solution to the appropriately weakened problem.
Hence, we often say `h_1` and `h_2` are solutions to a relaxed version of the problem.
... And we observe exact relaxed solutions for a given problem yield admissible heuristics for original problem.
So we can come up with heuristics by looking at different problem relaxations.
It might be the case that `h_i` sometimes performs better than `h_j` and sometimes `h_j` sometimes performs better than `h_i`.
Worse yet, suppose you have a list of admissible heuristics `h_1,..., h_m`. Each performs better than all others in some circumstance.
What heuristic do we choose?
Since they were each admissible, we have:
`h(n) = max{ h_1(n), ..., h_m(n) }`
is also an admissible heuristic and performs at the best level among these heuristics.
Admissible heuristics can also be generated by solutions to sub-problems.
For example, for the 8-puzzle, one sub-problem is to just get the tiles 1,2,3,4 into the correct position.
We could make a heuristic from this by estimating the cost of the 8-puzzle to be the cost of the optimal solution to the 1,2,3,4 problem.
We can collect such solutions to sub-problems into a disjoint pattern databases, and use them to get a heuristic cost of a solution.

Python Exception, Modules, More `A^star`

Outline