Chris Pollett> Old Classes> CS255
( Print View )

Student Corner:
[Submit Sec1]
[Grades Sec1]

[Lecture Notes]
[Discussion Board]

Course Info:
[Texts & Links]
[Description]
[Course Outcomes]
[Outcomes Matrix]
[Course Schedule]
[Grading]
[Requirements/HW/Quizzes]
[Class Protocols]
[Exam Info]
[Regrades]
[University Policies]
[Announcements]

HW Assignments:
[Hw1] [Hw2] [Hw3]
[Hw4] [Hw5] [Quizzes]

Practice Exams:
[Midterm] [Final]

HW#5 --- last modified May 13 2019 20:34:50.

Solution set.

Due date: May 13

Files to be submitted:
  Hw5.zip

Purpose: To gain experience with NP-completeness proofs and approximation algorithms.

Related Course Outcomes:

The main course outcomes covered by this assignment are:

CLO5 -- Given a problem determine within NP that is promised to be either in P or NP-complete prove which it is

CLO7 -- Analyze or code an approximation algorithm for a optimization problem whose decision problem is NP-complete.

Specification:

This homeworks consists of the following written problems, together with a programming exercise described below. Submit the written problems in Hw5.pdf as part of your Hw5.zip file. Submit your code in this ZIP file as well.

  1. Consider the language `CLIQUE_k = {langle G rangle | G mbox( encodes a graph with a clique of size ) k }`. Prove for any fixed `k > 1` that `CLIQUE_k` is in `P`.
  2. The sign of a literal is positive if it doesn't involve negation and is negative otherwise. Let `L = {phi | phi \in 3-SAT mbox( and each clause has at least one positive and one negative literal ) }`. Prove `L \in NPC`.
  3. Let `C = {S_1, S_2, ...}` be a collection of subsets of `NN`. A set `S'` is a hitting set for `C` if `S'` contains at least one element from each `S_i` in `C`. Let `L = {langle C, k rangle | C mbox( has hitting set of size ) k }`. Prove `L \in NPC`.

For the coding part of the homework, I want you to write program TextSummarizer in either Java or Python. It will be run from the command line with a syntax:

java TextSummarizer some_file_name.txt
or
python TextSummarizer.py some_file_name.txt

Here sum_file_name.txt is the name of a text file containing only ASCII characters. On such a file, it splits the text into sentences, where a sentence is delimited by . or a ? or a ! or by two newlines. Let `F` be the collection of sentences found. Your program then splits each sentence on any whitespace characters into terms. A sentence covers a term if the term appears in the sentence. Let `V` be the collection of all terms found in the document. Given these definitions your program should compute the greedy set cover of `V` using sentences of `F`. It should then output these sentences in the order that they originally appeared in the document. Here are some examples:

Example 1. If some_file_name.txt contained:

Ice cream is great. I ate ice cream. You ate ice cream. You and I ate ice cream.

Your program should output:

Ice cream is great. You and I ate ice cream.

Example 2. If some_file_name.txt contained:

Mary had a little lamb.
Little lamb, little lamb.
Mary had a little lamb.
Its fleece was white as snow.
And everywhere that Mary went.
Mary went, Mary went.
Everywhere that Mary went.
The lamb was sure to go.

Your program should output:

Mary had a little lamb.
Its fleece was white as snow.
And everywhere that Mary went.
The lamb was sure to go.

Example 3. If some_file_name.txt contained (if you view source, you can see the below uses htmlentities so is still a pure ASCII file):

Un éléphant se balançait
Sur une toile, toile, toile, toile d'araignée.
C'était un jeu tellement amusant
Qu'il appela... un deuxième éléphant.

Deux éléphants se balançaient
Sur une toile, toile, toile, toile d'araignée.
C'était un jeu tellement amusant
Qu'ils appelèrent... un troisième éléphant.

Your program should output:

Un éléphant se balançait
Sur une toile, toile, toile, toile d'araignée.
C'était un jeu tellement amusant
Qu'il appela. un deuxième éléphant.

Deux éléphants se balançaient
Sur une toile, toile, toile, toile d'araignée.
C'était un jeu tellement amusant
Qu'ils appelèrent. un troisième éléphant.

Notice in the last example, the first sentence ends after the second line, which is why "Sur une toile, toile, toile, toile d'araignée." repeats twice. Also, the ellipsis "..." Is treated as ending three sentences, two of which are empty sentences and so trivially covered.

Example 4. If some_file_name.txt contained:

Areumdawo sarangseureowo
Geurae neo, hey, geurae baro neo, hey.
Areumdawo sarangseureowo
Geurae neo, hey, geurae baro neo, hey
Jigeumbuteo gal ddekkaji gabolkka

Oppan Gangnam style.
Gangnam style.
Op, op, op, op
Oppan Gangnam style.
Gangnam style.
Op, op, op, op
Oppan Gangnam style.

Your program should output:

Areumdawo sarangseureowo
Geurae neo, hey, geurae baro neo, hey
Jigeumbuteo gal ddekkaji gabolkka

Op, op, op, op
Oppan Gangnam style.

Notice in this example where the sentences are ending according to the rules I've given.

Point Breakdown

Written problems (2pts each)6pts
Program compiles (runs without errors), takes its command line arguments, and reads in file specified1pt
Program splits input as specified1pt
Implementation of greedy set cover within code1pt
Output correct on all test cases1pt
Total10pts