Solutions: Formal languages and theory of computation, part 1, CS 288, May 4, 2005

 

  1. At least 3 different strategies are possible.

    The first is to construct an NFA for the regular expression, perhaps with ε transitions. Of the many possible NFAs, one without ε transitions has the transition function given below. Converting this to a DFA gives a DFA isomorphic to the given one.

        δ   |    q0      q1      q2      q3
        ------------------------------------- 
        0   |    {}     {q0}    {q3}    {}
        1   | {q1, q2}   {}      {}    {q0}
    
    A second strategy is to find a regular expression for the DFA. Again there are several ways to do this, and some algebraic manipulation may be needed afterwards to get the expression into the correct form.

    If we let A, B, C, and D, be the sets of strings that go respectively to q0, q1, q2, and q3, then the DFA gives equations

       A = ε       B = A1 + D1      C = B0 + D0      D = C1
    By eliminating A and B and substituting, we get
       C = 10 + D10 + D0 = 10 + C110 + C10
       D = 101 + D101 + D01
    So C = 10(110 + 10)* = 1(011 + 01)*0 = (101 + 10)*10, and
    D = 101(101 + 01)* = 10(110 + 10)*1 + 1(011 + 01)*01 + (101 + 10)*101.
    Since A = ε, we get that
    L(M) = A + C + D = ε + (10 + 101)*(10 + 101) = ε + (10 + 101)+ = (10 + 101)*.

    A third strategy is to argue directly. If r is the given regular expression, we can show by induction that L(r) is a subset of L(M). Clearly ε is in L(M). And inductively, if x is in L(M), then x must be taken to q0, q2, or q3. But in each case, x10 goes to q2 and x101 goes to q3, so both x10 and x101 are in L(M).

    To show that L(M) is a subset of L(r), we may use the terminology of the previous strategy, and strong induction on the length of x. If x has length 0, then x = ε, so x is in L(r). This is the only case where x is in A.

    So if x in L(M) has nonzero length, x is in C or D. If x is in C, then inspection of M shows that x is of the form y10, where y is in A or C or D. By induction y is in L(r), so x must be as well. A similar argument handles the case where x is in D (since x must be of the form y101 where y is in A or C or D).

     

  2. a) It is undecidable whether a given CFG is ambiguous, or whether a given CFL is inherently ambiguous. Other examples are given in Hopcroft et al, p. 302, 407, and 408.
    b) No example is possible. A problem in NP has a deterministic solution algorithm (for example, it may be reduced to an NP-complete problem like 3SAT, which is decidable.
    c) One example is the language Lwwr of even-length palindromes.
    d) One example is QBF -- the question of whether a given quantified Boolean formula is true.
    e) No example is possible. If L is in P, then there is an algorithm for determining membership in L, and thus an algorithm for determining nonmembership in L.
    f) The universal language Lu and the complement of the diagonalization language Ld are two possible examples. There are others.

     

  3. Consider the following parsing strategy:

    Let w be a string derived from any variable A of the CFG.
    case 1:
      if w has a + that is outside of parentheses, then start with the rule E -> E+T,
      where the + matches the rightmost + that is outside of parentheses
    case 2:
      if w has a * but no + that is outside of parentheses, then start with the rule E->T or T->T*F
      (depending on whether A=E or A=T)
      where the * matches the rightmost * that is outside of parentheses
    case 3:
      if w begins with a left parenthesis and ends with a right one, then start with the rule
      E -> T or T -> F or F -> (E), depending on whether A = E or T or F
    case 4:
      if w=x, then start with the rule
      E -> T or T -> F or F -> x, depending on whether A = E or T or F
    case 5:
      if w=y, then start with the rule
      E -> T or T -> F or F -> y, depending on whether A = E or T or F

    We may show by induction on the length of w that this is the only possible parsing strategy, and thus that the parse tree is completely determined by w. First note that the cases are disjoint -- in particular, in cases 3 and 4 and 5 there is no * or + outside of parentheses.

    We next observe that F cannot derive a string with a + or * that occurs outside of parentheses. An easy induction on the length of w shows that if T derives w, then w can have no + outside of parentheses. Finally, note that no variable can derive a string with a different number of left parentheses or right parentheses.

    So in case 1, w can be derived only from E. The initial use of the rule E -> T would not allow matching of the + that is outside of parentheses, so we must begin with the rule E -> E + T. The + in the rule must in fact match the rightmost + outside of parentheses, since a string containing this + would have to be derivable from T. Since w is derivable, the portion of w to the left of this + has a parse tree rooted at E, and the portion to the right has a parse tree rooted at T. By induction, these parse trees are unique.

    In case 2, w cannot be derived from F, and the rule E -> E+T cannot be used initially (since the + would have to be outside of parentheses). So a derivation from T must begin with T*F. A derivation of w from E must begin, E -> T, and must then continue with T -> T*F. In either case, as in case 1 this * must match the rightmost * outside of parentheses, and the derivations of the substrings to the left and right from T and F are unique by induction.

    In case 3, we cannot use the rules E -> E+T or T -> T*F initially, since these would give a + or * outside of parentheses. Nor can we use the rules F -> x or F -> y. So a derivation from E must start with E => T => F => (E), a derivation from T must start with T => F => (E), and a derivation from F must start with F => (E). In each of these subcases, the derivation from the E inside parentheses is unique by induction. The argument for cases 4 and 5 is similar.

    Note that whenever multiple steps of a derivation are given in the first three paragraph, every step is consistent with the given strategy. Also note that all of these derivations are leftmost derivations, so that they uniquely determine a parse tree. Since we have handled all 5 cases, the proof is complete.