Chris Pollett>
Old Classes>
CS255 |
HW#5 --- last modified May 13 2019 13:34:50.Due date: May 13 Files to be submitted: Purpose: To gain experience with NP-completeness proofs and approximation algorithms. Related Course Outcomes: The main course outcomes covered by this assignment are: CLO5 -- Given a problem determine within NP that is promised to be either in P or NP-complete prove which it is CLO7 -- Analyze or code an approximation algorithm for a optimization problem whose decision problem is NP-complete. Specification: This homeworks consists of the following written problems, together with a programming exercise described below. Submit the written problems in Hw5.pdf as part of your Hw5.zip file. Submit your code in this ZIP file as well.
For the coding part of the homework, I want you to write program TextSummarizer in either Java or Python. It will be run from the command line with a syntax: java TextSummarizer some_file_name.txtor python TextSummarizer.py some_file_name.txt Here sum_file_name.txt is the name of a text file containing only ASCII characters. On such a file, it splits the text into sentences, where a sentence is delimited by . or a ? or a ! or by two newlines. Let `F` be the collection of sentences found. Your program then splits each sentence on any whitespace characters into terms. A sentence covers a term if the term appears in the sentence. Let `V` be the collection of all terms found in the document. Given these definitions your program should compute the greedy set cover of `V` using sentences of `F`. It should then output these sentences in the order that they originally appeared in the document. Here are some examples: Example 1. If some_file_name.txt contained: Ice cream is great. I ate ice cream. You ate ice cream. You and I ate ice cream. Your program should output: Ice cream is great. You and I ate ice cream. Example 2. If some_file_name.txt contained: Mary had a little lamb. Little lamb, little lamb. Mary had a little lamb. Its fleece was white as snow. And everywhere that Mary went. Mary went, Mary went. Everywhere that Mary went. The lamb was sure to go. Your program should output: Mary had a little lamb. Its fleece was white as snow. And everywhere that Mary went. The lamb was sure to go. Example 3. If some_file_name.txt contained (if you view source, you can see the below uses htmlentities so is still a pure ASCII file): Un éléphant se balançait Sur une toile, toile, toile, toile d'araignée. C'était un jeu tellement amusant Qu'il appela... un deuxième éléphant. Deux éléphants se balançaient Sur une toile, toile, toile, toile d'araignée. C'était un jeu tellement amusant Qu'ils appelèrent... un troisième éléphant. Your program should output: Un éléphant se balançait Sur une toile, toile, toile, toile d'araignée. C'était un jeu tellement amusant Qu'il appela. un deuxième éléphant. Deux éléphants se balançaient Sur une toile, toile, toile, toile d'araignée. C'était un jeu tellement amusant Qu'ils appelèrent. un troisième éléphant. Notice in the last example, the first sentence ends after the second line, which is why "Sur une toile, toile, toile, toile d'araignée." repeats twice. Also, the ellipsis "..." Is treated as ending three sentences, two of which are empty sentences and so trivially covered. Example 4. If some_file_name.txt contained: Areumdawo sarangseureowo Geurae neo, hey, geurae baro neo, hey. Areumdawo sarangseureowo Geurae neo, hey, geurae baro neo, hey Jigeumbuteo gal ddekkaji gabolkka Oppan Gangnam style. Gangnam style. Op, op, op, op Oppan Gangnam style. Gangnam style. Op, op, op, op Oppan Gangnam style. Your program should output: Areumdawo sarangseureowo Geurae neo, hey, geurae baro neo, hey Jigeumbuteo gal ddekkaji gabolkka Op, op, op, op Oppan Gangnam style. Notice in this example where the sentences are ending according to the rules I've given. Point Breakdown
|