Chris Pollett > Students > Mangesh

    (Print View)

    [Bio]

    [Project Blog]

    [CS297 Proposal]

    [CS297 Presentation-PDF]

    [Text Summarizer based on Intersection method]

    [Text Summarizer based on Centroid method]

    [Text Summarizer based on TF-ISF method]

    [CS297 Report-PDF]

    [CS298 Proposal]

    [CS298 Presentation-PDF]

    [CS298 Report-PDF]

    [Graduation Photo]

                          

























Text Summarizer based on Intersection method

Description: This method works on the principle that, if two sentences have a good intersection, they probably holds the same information. So if one sentence has a good intersection with many other sentences, it probably holds some information from each one of them- or in other words, this is probably a key sentence in our text! We are using an intersection function to calculate the intersection between two sentences and we are creating a key-value dictionary, where the sentence itself is the key and the value is the total score.

Example:

We are going to generate a summary for following sample document:

Cache and Not Carry: Next Mars Rover to Collect Samples for Return to a Earth Someday

Have rover, need payload. Thats the state of things for NASA, which is planning to launch its next rover to Mars in 2020. The rover has ambitious goals, including searching for signs of habitability and life on the Red Planet, and collecting rock samples to be stored for future return to Earth. Now, NASA is asking scientists to propose instruments that will help the spacecraft accomplish its mission.

The space agency released an "announcement of opportunity" on September 24 calling for proposals by December 23. Researchers who plan to put an instrument in the hat must file a headsup about their plans, called a notice of intent, by October 15.

The design of the 2020 rover will hew closely to that of Curiosity, which landed on Mars in August 2012. The new vehicle will have the same basic body, called a chassis, and will use the same "sky crane" landing system to be lowered onto the surface. But the innards of the rover will be all new, featuring a suite of instruments that move beyond what Curiosity can do.

The instruments must accomplish specific goals for the rover set out in a July report by its Science Definition Team, which disbanded after the report was issued. The goals include scouting for habitable locations and looking for possible signs of past life there, such as microbial fossils and concentrations of organic material. The rover will also be tasked with digging up rock core samples and storing them for future retrieval and return to Earth by a future spacecraft, where they can be studied in laboratories with much more sophisticated instruments than anything that can be sent to Mars.

Because sample storage will take up room inside the rover, however, it wont be able to carry instruments for analyzing dugup samples on Mars as Curiosity does. "Curiosity has flown really high end instruments to do its measurements on the surface of Mars," says Jack Mustard of Brown University, who chaired the Science Definition Team. "What this coming rover will do is arguably a better job of finding materials that are interesting. Its somewhat upgraded in its capabilities to do remote measurements. It doesnt try to do any in situ analysis" like Curiosity Sample Analysis at Mars (SAM) and Chemistry and Mineralogy (CheMin) instruments do.

But that decision has angered some Mars scientists, who say the rover will have to sacrifice too much of its instrument space for caching samples. "I think if we are going to have a Curiosity duplicate rover in 2020, it should be loaded with instruments to do in situ science," says Robert Zubrin, cofounder and president of the Mars exploration advocacy nonprofit, The Mars Society. "This one says its going to have 28 kilograms of science instruments. Curiosity has 80 kilograms. They have reduced the science payload by a factor of three in order to have this caching function, which may not have any utility whatsoever." Zubrin says it leaves too much up to chance to have the return of these samples rely on an unspecified mission in the future making a precision rendezvous and landing at the same spot to collect them.

The Science Definition Team members say the 2020 rover will still be able to do significant science, and its important to initiate Mars sample return now. "This mission I think will be on par, in terms of what we learn, with Curiosity, and hold the future prospect of being able to learn 10 times more by bringing samples back to Earth. None of us are going looking for Klingons, but we would be thrilled if we could help find a sample that contains microbes," says Scott Murchie at Johns Hopkins University Applied Physics Laboratory, who was a member of the team.

Summary generated by summarizer based on Intersection function:

Cache and Not Carry: Next Mars Rover to Collect Samples for Return to a Earth Someday.

Thats the state of things for NASA, which is planning to launch its next rover to Mars in 2020.

The rover has ambitious goals, including searching for signs of habitability and life on the Red Planet, and collecting rock samples to be stored for future return to Earth.

Now, NASA is asking scientists to propose instruments that will help the spacecraft accomplish its mission.

The design of the 2020 rover will hew closely to that of Curiosity, which landed on Mars in August 2012.

The new vehicle will have the same basic body, called a chassis, and will use the same "sky crane" landing system to be lowered onto the surface.

But the innards of the rover will be all new, featuring a suite of instruments that move beyond what Curiosity can do.

Code for slider:

<html>
<head>
<link rel="stylesheet" href="http://code.jquery.com/ui/1.10.3/themes/smoothness/jquery-ui.css" />
  <script src="http://code.jquery.com/jquery-1.9.1.js"></script>
  <script src="http://code.jquery.com/jquery-latest.min.js"></script>
  <script src="http://code.jquery.com/ui/1.10.3/jquery-ui.js"></script>
  <link rel="stylesheet" href="/resources/demos/style.css" />
  <script>
  $(document).ready(function() {
  
  $("#slider").slider({
      min: 100, //minimum value
      max: 1000, //maximum value
      value: 200, //default value
      slide: function(event, ui) {
          $("#value2").val(ui.value);
          }
      });
  $("#value2").val($("#slider").slider("value"));
});
</script>
</head>
<body>

Drag the handle to select the length of desired summary:

	
<form action="intersection.php" method="POST">
<input id="value2" type="text" name="slide" />
<div id="slider"></div>
<input type="submit" />
</form>
</body>
</html>

Code for summarizer:

<!DOCTYPE html>
<html>
<body>

<?php
function microtime_float()
{
    list($usec, $sec) = explode(" ", microtime());
    return ((float)$usec + (float)$sec);
}


$charsummary = $_POST["slide"];
echo "$charsummary";
echo "<br><br>";

$time_start = microtime_float();
echo $time_start."<br><br>";
$doc = file_get_contents("sjsu.txt");
echo "<b>Original Content:</b><br>$doc<br><br>";

$paragraphs = preg_split("/\n\r/",$doc,-1,PREG_SPLIT_NO_EMPTY);

echo "<b>Paragraphs:</b><br>";
print_r($paragraphs);
$sentences = convtosentences($doc);
echo "<br><br><b>Sentences: </b><br>";
print_r($sentences);

#####Time for spliting into sentences
$time1 = microtime_float();
echo "<br><br>Time for spliting into sentences: ",$time1-$time_start;


$words = explode(" ",$doc);
$sentence_dict = get_ranks($sentences);

#####Time to create sentence dictionary
$time2 = microtime_float();
echo "<br><br>Time to create sentence dictionary: ",$time2-$time1;

arsort($sentence_dict);
$summary = get_summary($sentence_dict,$sentences);

#####Selecting sentences for summary
$time3 = microtime_float();
echo "<br><br>Selecting sentences for summary: ",$time3-$time2;

$sum_words = explode(" ",$summary);
echo "<br><br><b>Summary :</b><br> $summary";

echo "<br><br>Original Length:"." ". count($words)."<br>";
echo "Summary Length:"." ". count($sum_words)."<br>";
echo "Summary Length(in characters):"." ". strlen($summary)."<br>";
echo "Summary Ratio: ".(100 - (100 * count($sum_words) / (count($words))));

$time_stop = microtime_float();
echo "<br><br>".$time_stop;
echo "<br>Duration:". ($time_stop-$time_start);
function get_ranks($sentences)
{
	
	$n = count($sentences);
	$values=array();
	for($i=0;$i<$n;$i++)
	{
		for($j=0;$j<$n;$j++)
		{
			$values[$i][$j] = '0';
		}
	}
	echo "<br><br>";
	for($i=0;$i<$n;$i++)
	{
		for($j=$i+1;$j<$n;$j++)
		{
			$values[$i][$j] = sentence_intersection($sentences[$i],$sentences[$j]);
		}
	}
	$sentence_dict = array();
	for($i=0;$i<$n;$i++)
	{
		$score = 0;
		for($j=$i+1;$j<$n;$j++)
		{
			$score += $values[$i][$j];
		}
	$sentence_dict[format_sent($sentences[$i])] = $score;

	}
	
	echo "<br><b>Sentence Dictionary: </b><br>";

	foreach($sentences as $sent)
	{
		echo "$sent"." => ".$sentence_dict[format_sent($sent)]."<br>";
	}
	return $sentence_dict;
}

# Function to split content into sentences
function convtosentences($content)
{
	$content = preg_split("/\.\s|[\n\r]+/",$content,-1,PREG_SPLIT_NO_EMPTY);
	return $content;
}	

# Function to format sentences e.g. removing special characters, symbols etc.
function format_sent($sen)
{

	$sen = trim(preg_replace('/[^a-z0-9\n\.\s-]+/', '', strtolower($sen)));
	return $sen;
}

# Intersection function
function sentence_intersection($s1,$s2)
{
	$sentence1 = explode(" ",$s1);
	$sentence2 = explode(" ",$s2);
	
	if ((count($sentence1) + count($sentence2)) == 0)
		return 0;
	return count(array_intersect($sentence1,$sentence2)) / ((count($sentence1) + count($sentence2)) / 2);
}

# Function to sentences with highest scores into summary
function get_summary($sent_dict,$sentences)
{
	$summary = null;
	
	$top = sent_summary($sent_dict);
	$sliced_dict = array_slice($sent_dict,0,$top-1);
	foreach($sentences as $sent)
	{
		foreach($sliced_dict as $key=>$value)
		{
			if(format_sent($sent)==$key)
				$summary .= " $sent".".<br>";
		}
	}
	
	return $summary;
}

# Function to count how much sentence to take into summary to match the character limit
function sent_summary($sent_dict)
{
	$top=null;
	$count = 0;
	foreach($sent_dict as $key=>$value)
	{
		if($count<$_POST["slide"])
		{
			$count += strlen($key);
			$top++;
		}
	}
	return	$top;
}


?>
</body>
</html>