CS 40 - Lecture 19

Lecture 18 Recap

The Internet

- Network of interconnected computers
- Each computer has an address (e.g. 130.65.86.60)
- Each computer has a host name (e.g. oslo.cs.sjsu.edu)
- Developed by DARPA for sharing computer resources (late 60s and 70s)
- First killer app: email (1980s)
- Second killer app: the web (1990s)
The World-Wide Web

- Collection of documents, stored on computers (“web
servers”) connected to the internet
- “Hyperlinks” let you jump from one document to another
- Need a “browser” to display documents
- Documents coded in HTML
HTML

- Hypertext markup language
- Plain text, with tags
<h1>Welcome to my
page!<h1> to denote styles, fonts, colors, etc.
- Tags for links
<a
href="http://horstmann.com/index.html">Click me!</a>
- ...and images
<img
src="http://horstmann.com/cay-rowing.gif">
(The fellow pictured is Cay Horstmann, whose slides I have been using
this semester, and whose cat we put in jail not long ago.)
Lab 1. View Tags
- In your web browser, load the page http://horstmann.com/index.html.
Then select View -> Page Source.
- What examples of a heading and of an image tag do you see?ActiveLecture.org
Lab 2. Read Tags
- Start Greenfoot and open TextWorld
- Right-click on Text class, select
new Text(), drag to
world
- Right-click on text icon, select
void load(String
source)
- In dialog, enter
"http://horstmann.com/index.html". Be
sure to include the ""
- Select
getWords()
- Select
Inspect in the list of words, and in the
pop-up window that this brings up.
- What happens? ActiveLecture.org
Lab 2 Continued
- Now the web page is in a
Text object. We can analyze it in
the usual way.
- Complete this class
WebText1
that collects all hyperlinks.
- Hyperlinks start with
href
- Try the program like this:
Bored?

Google

- Search Engine (duh)
- How does it know all those pages?
- It “crawls”, loading a page and reading its links
- ...and then loading those pages...
- ...and following the links in them.
Google

- How does it know which pages are relevant?
- Page rank algorithm
- A page with many links to it is likely to be authoritative
- Links from that page are also likely to be authoritative
Lab 3
- To determine the "Google number" of a person, launch Google, type in
the name, hit Search, and keep looking at results until you have
information about the given person (and not just some other person with
the same name). The page number is the Google number.
- Type
Cay Horstmann into Google. Whoa! Google number 1
- Try
David Taylor (That would be me.)
What is my Google number?
- Why don't I have Google number 1?
Lab 3
- To determine the "Google number" of a person, launch Google, type in
the name, hit Search, and keep looking at results until you have
information about the given person (and not just some other person with
the same name). The page number is the Google number.
- Type
Cay Horstmann into Google. Whoa! Google number 1
- Try
David Taylor (That would be me.)
What is my Google number?
- Why don't I have Google number 1?
- Try
David Scot Taylor. (There is a reason that my
middle name has an unusual spelling.)
HTTP

- Browser uses very simple protocol (HTTP = Hypertext transfer
protocol)
- Connects to web server (e.g.
horstmann.com)
- Sends request:
GET /index.html
- Gets HTML page
- Disconnects from web server
- Stateless
Shopping Carts

- Stateless protocol makes it hard to implement shopping cart
- You tell browser to visit store page
- Browser connects, gets HTML, disconnects, shows page
- You click on "add to cart" link
- Browser connects again. Server says “Hello, stranger!”
- How can server remember you?
Cookies

- Solution 1: Put funny numbers in links
- Solution 2: Use cookies
- You tell browser to visit store page
- Server says “Hello stranger, next time we meet tell me that the
secret handshake is
w00z1e”
- Browser gets HTML, disconnects, shows page
- You click on "add to cart" link
- Browser connects again and says “Hello server, the secret
handshake is
w00z1e. Give me this page.”
Lab 4: Cookies

- In Mozilla select Edit -> Preferences -> Privacy -> Show
cookies. (Firefox, IE, Safari, all should have a way to do this.)
- Find some ad related cookie. (If you don't have one, visit a site with
lots of ads, then try again.) Expand the tree display. Click on a "Cookie
name" and read the "Content" below. What do you get? ActiveLecture.org
- That content is set back whenever you visit another site with ads from
the same server.
Lab 4: Cookies

- Uncheck "Accept Cookies from Sites"
- Visit http://nytimes.com, click on some articles until you get a login
prompt.
- You do have an account with the New York Times, right? (If not, make
one now.)
- Log in.
- What happens? Why? ActiveLecture.org
Reminders

- No class on Wednesday, November 25.
- Next class on Monday, November 30.
- December 2: Project presentations. All must be ready to
present that day.
- December 7: Last day of classes, and any projects we don't get
to on December 2.
- Homework 5 is graded, I will email you.
- Do you want one more homework assignment? A "make-up" homework?