Chris Pollett > Students > Smith

    Print View



    [CS297 Proposal]

    [Creating Vectors]

    [Film Resources]

    [CS297 Write-Up - pdf]

    [Deliverable 1: Parser]

    [Deliverable 2: Liner]

    [Deliverable 3: Training Set]

    [Deliverable 4: Lister]

    [CS298 Report - pdf]

    [CS298 Code - zip]

    [CS298 Defense Slides - pptx]

Project Blog:

May 20th, 2016 Held Defense from 2pm to 3pm in MacQuarrie Hall Room 225, SJSU

Spring 2016 Semester CS 298

Fall 2015 Semester CS 297

Blog for week of May 10, 2016

Met with Dr. Chris Pollett at 12:30 on Tuesday, May 10, 2016 to discuss thesis project.

note: this is the final official meeting

Progress for previous week of May 3 through May 10

  • Submitted form to DeAnna. Scheduled defense for Friday, May 20th at 2pm in MH 225
  • Updated algorithm to calculate each Target Feature seperately
  • Added new non-pure features
    • dialogueCountInShot
    • uniqueDialgoueCountInShot
    • actionCountInShot
    • lineCountInShot
  • Completed first pass at slides (currently 40 pages)

Recommended change to slides

  • i.e. should have a comma
  • make sure all lines either have period or no period
  • Conclusion : produce output
  • Change to Writing Project from Thesis Project

German Expressionism: The Cabinet of Dr. Calgari, Prince Achmed

For the webpage

  • Write-up direct link to pdf for CS297
  • Add write-up for CS298
  • Add zip of all final code

Blog for week of May 3, 2016

Met with Dr. Chris Pollett at 12:30 on Tuesday, May 3, 2016 to discuss thesis project.

Progress for previous week of April 26 through May 3

  • Began scheduling defense
    • Contacted Dr. Moh and Dr. Pearce to get possible defense times.
    • Spoke to Pearce in person during his Tuesday office hours at 11:30am
    • Moh currently says MWF afternoons available
    • Pearce is mostly open during the week but prefers Friday afternoon
  • Wrote HumanJudge program
    • takes different linings of the same script and outputs an easy to judge csv
  • Conducted experiment with humanJudge output on May 1st.
    • received results from over a dozen people
  • complete draft of paper
    • submitted to
    • sent the draft to Pearce, Moh and Pollett

This week

  • Make power point slides (35 minutes/pages)
  • Get form from Deanna to schedule defense

Fun look-up: Pollett's senior year undergrad "Soot and Sediment"

Blog for week of April 26, 2016

Met with Dr. Chris Pollett at 12:30 on Tuesday, April 26, 2016 to discuss thesis project.

Progress for previous week of April 19 through April 26

  • Updated lister tool
    • Can easily select script file
    • Can now select output file
    • Can now select vector file
    • Added option to automatically open the output in the Liner tool
  • Updated Vector populator with option to open Lister tool
  • Updated comparer tool
  • Received completed lined script of Reservoir Dogs as training set
  • Wrote more of the paper, updated sections
    • Added section about what came before. (literature review?)
    • More in depth about the feature settings
    • Added section about experiments
    • Added conclusion

Discussed finishing the paper

Wrap up paper get it to committee by next Tuesday, May 3, 2016 Send paper to

Look at FAQ on Dept. website for instructions to use turnitin

Use Pollett's email address if it says email graduate coordinator

Reference our text book for Naive Bayes

Write program to generate experiment samples

  • can load as many files as desired
  • will randomly select same 100 lines from each and output to easily readable format
  • try putting script once and the different result next to it
  • csv would be good if possible

This week, get everyone to agree on a defense time.

  • get everyone to agree on time
  • turn to turnitin
  • give everyone report
  • file paper at department with the date
  • Earliest date is the 17th (Tuesday) 10am to 4pm
  • Pollett has finals on the 18th and 19th

Defense is an hour (35minutes presentation, 15 minutes questions, 10 minutes nervous pacing)

May 20th might be the last day? Final Exams through May 25th

Today's Ted Talk: Different Kind of Data Visualization

  • Academy Awards from last 75 years

More about finishing up

  • A week after giving report, ask for feedback.
  • Create some slides for the report (35 minutes)
  • Committee may ask for revision
  • posted at scholarworks
  • Deanna sends culminating report over to graduate admissions
  • ping if I don't see anything after three or four weeks

Blog for week of April 19, 2016

Met with Dr. Chris Pollett at 12:30 on Tuesday, April 19, 2016 to discuss thesis project.

Progress for previous week of April 12 through April 19

  • Wrote initial rough draft of thesis paper, including around 24 pages

Movie Script Shot Liner = MSSL abbreviated. or MoSSLi

More to add to Draft:

  • write final three sections
  • go back over notes and blogs and see if there are things missed or embellishments that can be made to make it longer.
  • Add pictures of each tool at some point

Did I adequately describe the problem?

Still need to write a conclusion as well:

probably want to sum up how cool the project is but also still argue about whether this. First effort at eliminating humans from movie making.

Pollett did a quick read over the draft.

Pollett's notes

  • Definitions should be bold.
  • Contractions should be spelled out.
  • Add colon for listing two uses of liner tool, and make a list.
  • Check if JSON should be capitialized?
  • Add flow chart for the order of all the pieces. Like from one tool to the next.
  • Gleam should be glean.
  • dependent may have been spelled wrong
  • Don't start sentences with "And ..." like on page 20.
  • take out "a la" and replace with more obvious word choices
  • try to make the formulas more pretty/beautiful

Still need experiment:

  • computer generated vs. human done
  • people who know movies and people who don't know movies?

still want to tune the features and weights

in paper, list out features

how to decide what features should be considered

  • see if options (like cut/no cut) are too similar then disregard that feature
  • only keep features that seem to tell us something for each of the four target features. like there's no skew. 50/50 for example

add javadoc to code

write ten page paper that can be submitted somewhere (for publication)

add literature review (what people have tried before)

  • find first paper from the internet and find more
  • citeseer, Google Scholar, MS Bing Scholar, archive

Blog for week of April 12, 2016

Met with Dr. Chris Pollett at 12:30 on Tuesday, April 12, 2016 to discuss thesis project.

Progress for previous week of April 5 through April 12

  • Created FeatureSettings
    • Can customize which features are used for which target features
    • Can set the weight for each feature within each target feature
    • Feature settings can be saved to file and loaded when using the Lister
  • Updated the Liner tool
    • Now uses more convenient file system for starting new projects and loading already lined scripts
    • Unknown objects are now ignored in the liner tool on each line
  • Updated comparer tool
    • Created convenient file loader to load the files to compare
    • Updated comparer to check all four of the current Target Features
    • Created a window for displaying the stats in an easier to read format
  • Father lined a copy of Slingblade without looking at the movie to use as a control sample
  • Began to write 50 page thesis paper

Discussion of rating the output

  • Consider new metric
  • apply to either human or computer (judging bad shot)
  • determine what is reasonable?

Did research on one script for more than one movie and found "The Chair" - Starz - one script, two directors

New idea: Show two marked up scripts, about hundred lines from each and ask which looks better. One from human, one from lister AI

Figure out why output is tilt so much (add Tilt feature) add maxlines feature (max number of times feature is seen in real script)

Pollett likes idea of predicting good trailers from the script. ;)

Thesis Paper

  • Introduction to the problem
  • briefly results found
  • preliminary
    • what need to understand before you can understand what goes
    • give background in movie making
  • conclusion final results about problem
  • Blog for week of April 5, 2016

    Met with Dr. Chris Pollett at 12:30 on Tuesday, April 5, 2016 to discuss thesis project.

    Progress for previous week of March 29 through April 5

    • Completed Lining of script for Slingblade
    • Father completed lining of script for The Verdict
    • Updated Vector populator tool to make submission of training sets easier
    • Added new features to naive bayes algorithm within lister tool

    Catch up Discussion for the Week

    Pollett discussed his use of the Liner tool

    • Using mpeg stream clip for watching movies
    • Mentioned Vertov - silent film director
    • Requested Liner tool not show Unknown objects

    For final movie, pick a movie with same director as another already lined script for comparison purposes

    Thesis Paper

    Important considerations for Thesis Paper

    • Need to get to committee members by beginning of May
    • Needs to be 50 pages
    • Review other people's papers
    • Will want to add interesting statistics about different features and training sets
    • Write up experiments:
      • measure how important different factors are
      • see which factors are most important
      • experiment with different features for each targetFeature
      • all features but one, and see which works best
      • add different weights for each feature in multiplication
    • Write up liner tool

    For Next Week

    Things to work on for the upcoming week:

    • Update liner tool without Unknown
    • long scroll at page bottom advances page, scroll up too
    • line without watching x2
    • start writing up report
    • create experiment program with different features/weights
    • investigate different approach besides Naive Bayes
    • look into comparing other target features

    Blog for week of March 29, 2016

    No meeting due to Spring Break

    Progress from March 22 through March 29

    • Completed lining script for Pulp Fiction
    • Father finished lining script for Unbreakable
    • Father began lining script for The Verdict
    • Girlfriend made good progress lining script for Notting Hill
    • Sent Liner tool and script for Moonstruck to Dr. Pollett

    Current Training Sets:

    • Ghostbusters
    • Star Wars
    • The Breakfast Club
    • Badlands
    • Charade
    • Unbreakable
    • Pulp Fiction
    • The Verdict (In progress - father)
    • Notting Hill (In progress - girlfriend)
    • Moonstruck (In progress - Dr. Chris Pollett?)

    Blog for week of March 22, 2016

    Not able to meet with Dr. Chris Pollett at 12:30 on Tuesday, March 22, 2016 due to work obligations.

    Progress from March 15 through March 22:

    • Began lining script for Pulp Fiction
    • Father finished lining script for Charade
    • Father began lining script for Unbreakable
    • Trained brother on Liner tool and sent him program and script selection

    Working on the Lister Tool

    Made new progress on the implementation of Naive Bayes

    • Created idea of pure and non-pure features
      • Non-pure features are dependent on data that has been calculated on earlier lines of the script
        • example: all target features are non-pure features
      • Pure features are independent
    • Added concept of skipFeatures for each target feature
      • skipped features won't be used in the naive bayes calculation of a target feature
    • Implemented new features both pure and unpure

    Naive Bayes has implemented the following Target Features

    • NewShot (also known as Cut)
    • ShotType (i.e. Close Up, Medium Shot, etc.)
    • CleanType (i.e. Single, Two Shot)
    • Motion (i.e. Static, pan, dolly, etc.)

    At this time, the following Features have been implemented

    • Non-pure
      • shotType (also a target Feature, ignores shotType, motionType, cleanType)
      • motionType (also a target Feature, ignores motionType, cleanType)
      • cleanType (also a target Feature, ignore cleanType)
      • linesSinceCut (dependent on last calculated cut)
      • linesSinceShotTypeChange (dependent on previous calculations of shot type)
      • lastShotType (dependent on calculated shot type for previous line)
    • Pure
      • sceneObjectType (the type of scene object a line is, i.e. dialogue, action)
      • linesSinceSceneObjectTypeChange (line count since the object type changed)

    Many more features both pure and non-pure are in the works for being added

    The biggest jump in correctness of the algorithm came with adding skip features to the target features. Before that, features were feeding back on themselves skewing the data. Data should not be used to determine itself. After that, the output became much more reasonable, even bordering on natural. With the inclusion of more features, the data should become even more realistic as well as interesting.

    Comparer Tool

    Wrote a simple tool which takes two lined scripts as its input. Presumably, they should be the same script, whereas one is a training set, lined by hand while watching the corresponding movie. The other is the same script lined by the Naive Bayes algorithm. The comparer sees how similar or different they are from each other. The current implementation only looks at the cuts, or the newshot data. It uses an edit difference algorithm. A high number represents lots of difference and a low number means little. There is no absolute scale, you can only see which comparisions have lower numbers.

    One of the ultimate goals is to have a human line a script without watching the movie, and also the algorithm. If the algorithm has as low a number or lower than the human, the algorithm can be thought of as good.

    Interestingly enough, right now, even using the training set to create the vector of the same movie that is being lined creates a pretty high number. This should be lowered as more features are implemented

    Blog for week of March 15, 2016

    Met with Dr. Chris Pollett at 12:30 on Tuesday, March 15, 2016 to discuss thesis project.

    Last weeks progress:

    • lined Badlands script
    • added feature requests to the Liner tool, such as line highlighting and faster scrollwheel
    • taught girlfriend to use liner tool and started her on Notting Hill script
    • reworked vectors for Naive Bayes and got implementation working with simple feature set

    Naive Bayes

    Discussed pure features:

    • how many characters in the scene
    • length of scene
    • length of dialogue block
    • length of length of action block
    • two people talking in a scene alternating back and forth?

    What would be text to indicate a shot type?

    • For example, key word[s] that highly probable to mean establishing shot


    • Find 30 words that caused cut and use features
    • Find their baseline frequency:
      • freq word/len script vs freq word just before or after cut/length of text just before or after text
      • find words where the gap between these is highest

    Possible variant on naive bayes for cut:

    • Instead of comparing cut vs no cut, compare cut versus baseline average for cut

    It was reiterated to try the lister tool on The Player and Rope because they have some long takes.

    Calculating how Good/Bad the Lister Tool is doing

    Edit Distance

    • For each computer cut, distance to nearest real cut + for each real cut, distance to nearest comp cut / 2
    • How to normalize?
      • For now, at least compare results against each other.

    Will want to compare a human to comp and see who is closer to the real

    • Human will create shot list looking script without looking at movie (will ideally have never even seen the movie)
    • Lister tool will create a shot list from an unlined script
    • Compare the first with the second, and if the computer's edit distance is no worse than the humans, then the results are considered pretty good

    For next week...

    continue making sure cut works in the Naive Bayes implementation and then make sure shot type works

    Blog for week of March 8, 2016

    Met with Dr. Chris Pollett at 12:30 on Tuesday, March 8, 2016 to discuss thesis project.

    Last weeks progress:

    • finished lining Star Wars
    • added minimal help screen to liner tool
    • trained father to use liner tool (who started lining Charade)
    • made notes of fixes and new features to liner, noticed myself and requested by father
    • fixed object insert into liner, and removed unneeded parsing code
    • lined The Breakfast Club script

    Current training set/lined scripts:

    • Ghostbusters
    • Star Wars
    • The Breakfast Club
    • Charade (in progress)

    Training Algorithms:

    • Naive Bates
      • Choose variables
      • Calculate the probability of a cut given each variable
    • k means clustering
      • nearest neighbor
      • long vectors concatenation

    Next Week Goal:

    • Line two more scripts
    • Get Naive Bayes implementation working
    • Find features to compare

    Goal for March:

    • 25 scripts lined
    • get something other naive bayes working? (to compare to Naive Bayes)

    Blog for week of March 1, 2016

    Met with Dr. Chris Pollett at 12:30 on Tuesday, March 1, 2016 to discuss thesis project.

    Script Lining

    Lined the script "Star Wars"

    While lining script, added new features to speed up process:

    • Ctrl-click line copies its settings to current line
    • Ctrl-click cut-box or Ctrl-F2, copies ending settings of shot before last
    • Alt-click cut-box or Alt-F2, copies ending settings of two shots back
    • Ctrl-click visible object turns off/on all other objects
    • Alt-click visible object turns off/on all objects on line
    • Added ability to add/remove an object from a specific scene in ObjectForm

    We discussed how slow lining is, and decided 25 lined scripts might be a more effective number

    More AI implementations with Training Set

    For continued implementation of Naives Bayes, variables to use:

    • length of script
    • percentage through script

    For quick look-up of data: R-Tree BSP-Tree brute-force linear search through the document k-means clustering on the vectors keep the clusters rather the original data square root of the data set is optimal 100 for one script or 500 for 25 scripts one algorithm makes use of all the data we have vectors of last x lines no averaging, just huge vectors cut based on whether there's a cut of the last linear

    Blog for week of Feb 23, 2016

    Met with Dr. Chris Pollett at 12:30 on Tuesday, February 23, 2016 to discuss thesis project.

    Discussed the lining tool which now has added hot keys

    • selected line (via arrow keys or mouse click
    • page navigation via arrow keys, home, end, page up, page down
    • F2 to add/remove cut
    • F3/3 to scroll up and down shot types
    • F4/4 to scroll up and down clean types
    • F5/5 to scroll up and down motion types
    • F6 to F12 toggle visible for scene objects
    • Space Bar pauses/starts auto-advance of selected line

    For next week use new feature to line at least one script

    Blog for week of Feb 16, 2016

    Met with Dr. Chris Pollett at 12:30 on Tuesday, February 16, 2016 to discuss thesis project.

    Basic minutes of meeting:

    • Presented list of 100 scripts that had been chosen
    • Discussed the creation of the training sets
    • We went over the Liner tool created the previous semester and discussed ways to improve it and make script lining faster

    An average movie is about two hours, so what is the best way to get the lining time down to not much longer than that?

    Ideas for the training sets:

    • Get the time down on lining a script. Simplify process.
    • Do a couple and get ideas about reducing time.
    • Create a ten minute video explaining how to line script
    • Create a webpage to put the scripts on:
      • Scripts can be downloaded
      • Completed scripts will be marked
      • Look up .htaccess Passward protect folder on webpage

    Modifications for the liner tool:

    • Create a selected line feature
    • Add an auto-advance to selected line
    • Add hotkeys. For example, pressing F2 adds a cut on the selected line

    Blog for week of Feb 9, 2016

    Met with Dr. Chris Pollett at 12:00 on Tuesday, February 9, 2016 to discuss thesis project.

    We went over my project:

    • What the project is and scope
    • The items accomplished in the first semester
    • What still needed to be done to finish the project

    At this time, the most important thing to complete is collecting the training sets

    The goal for next week is collect 100 movie scripts that can be read into the liner tool and lined

    A training set of 100 lined scripts would provide excellent statistics for implementing the shot lister tool

    Agreed to change the meeting time starting next week to 12:30-1:15

    Discussed a few administrative issues, in particular turning in the Candidacy form and the Graduation form

    Blog for week of Feb 2, 2016

    Met with Dr. Chris Pollett at 1:00 on Tuesday, February 2, 2016 to discuss thesis project.

    Mostly an administrative meeting. Met with all of Dr. Pollett's thesis students and decided on meeting times for the rest of the semester.

    My time was picked to be 12:00-1:00 (later 12:30 - 1:15)

    Agreed that next week would bring write-up of project done thus far and what was left.

    End of First Semester CS297

    Blog for week of Dec. 7, 2015

    Met with Dr. Chris Pollett at 12:30 on Monday, December 7, 2015 to discuss thesis project.

    Final Semester report and Deliverables Due: Friday, December 11, 2015

    Write-up should be submitted as a .pdf

    Each deliverable should be a zip of the program files with screenshots

    • Put the link to each zip on a webpage with deliverable description

    Make sure website conforms

    • W3C validator
    • WAVE Accessibility Checker

    Further discussed Training Sets for next Semester

    • Creed was mentioned for having a very long shot
    • Consider Automated Turk for getting scripts lined
    • Kickstarter was mentioned

    Improve the Liner Tool to speed up lining

    • Copy lines automatically
    • Add hot keys
    • Consider leveraging closed captioning and display the movie in Java

    Went over the draft of the CS297 write-up

    • Bold the definitions (i.e. cut, shot)
    • Change the last paragraph of the Intro to describe deliverables
    • JSON should be upper case
    • Change personifications of parser to more technical
    • Remind reader why we're reading each section at the beginning

    Final Thoughts

    Make a webpage for each deliverable and put zip there

    Look at previous CS298 proposals and templates

    298 will have a 40 page paper to close it out

    Blog for week of Nov. 23, 2015

    Met with Dr. Chris Pollett at 12:30 on Monday, November 23, 2015 to discuss thesis project.

    Discussed the delivery of the deliverables

    Need a 10 page write-up to describe work on the project this semester

    • 1 page intro
    • 2 pages per deliverable
    • 1 page closing

    For each section/deliverable:

    • state the problem
    • how coded it
    • what kind of results did I get

    It was recommended I rename the vector class and subclasses to be unique non-java names

    Blog for week of Nov. 16, 2015

    Met with Dr. Chris Pollett at 12:30 on Monday, November 16, 2015 to discuss thesis project.

    Had an in depth discussion of how to store the counts for calculating the probabilities

    To cut or not to cut, the training set initially includes these four features:

    • number of lines since last cut
    • line type (i.e. scene header, dialogue, action, blank
    • shot type
    • motions

    vector (for cut for example) has an array of features

    each feature has an array of values the feature could have

    each value keeps the count for each value of the original object (so cut or not here)

    • This may mean, count the number of times we've been five lines since a cut when we had a cut, and since we didn't have a cut
    • This example is only 2 (cut or not cut), but it will be more when the original object can have more values

    Now we have something that looks like this: arraylist of features[featureNum][featureValues][featureCounts]

    We also need to keep track of the original objects base counts to get that probability

    Ultimately, we want to be able to get probabilities like Pr(Cut) and Pr(5LinesSinceCut | Cut)

    Blog for week of Nov. 9, 2015

    Met with Dr. Chris Pollett at 12:30 on Monday, November 9, 2015 to discuss thesis project.

    Further discussion of the Liner Tool

    • Tab delimited for readability
    • Add max width for header of the object section so all the same, i.e. seven characters wide
    • Add compression. Java has a built in zip tool

    Discussed initial implementation of Naive Bayes algorithm

    • Start with basic items and add more to vector later
    • Simple version would have four items
      • Lines since last cut
      • Cut or not
      • Shot type
      • Shot motion

    Further discussed coming up with list of movies

    • IMSDb
    • Ultimately, want 100 movies
    • Initial experiment with 5 movies

    Discussed deliverables. May include:

    • screenshot
    • zip of git repository or all required files/li>
    • short description

    May want to password protect training sets

    Blog for week of Nov. 2, 2015

    Met with Dr. Chris Pollett at 12:30 on Monday, November 2, 2015 to discuss thesis project.

    Went over the liner program and discussed improvements.

    Blog for week of Oct. 26, 2015

    Met with Dr. Chris Pollett at 12:30 on Monday, October 26, 2015 to discuss thesis project.

    Discussed future goals

    • Finish shot liner program
    • Create webpage for crowdsourcing training set
    • Post on CS Facebook requesting crowdsourcing
    • Post printout on wall at school
    • Send e-mail with bitbucket account
    • Collect a bunch of script txt
      • Pollett requests "Forbidden Planet" is included
      • '
      • Need a mix of genres
    • Look into public domain movies & scripts (for publishing purposes)
    • Bring AI book. Bring marker?
    • Look into Snigdha's work on Naive Bayes Classifier

    Went over Naive Bayes to get a better understanding

    Trigram is a tool of Naive Bayes

    More on the Liner program

    • Add a tool tip for acronyms
    • Add stop words like "the"
    • Add a tool tip for character/object names

    Blog for week Oct. 19, 2015

    Met with Dr. Chris Pollett at 12:30 on Monday, October 19, 2015 to discuss thesis project.

    Note: Need to make sure HTML is validated

    For next week

    • Make list for shot type
    • Make fixed finite list for camera movements
    • Begin coding training program
      • Ultimately want it to produce the vectors

    Dr. Pollett went over the basics of vectors with me

    • an ordered list, all values are of the same type i.e. integer, real
    • commutative
    • associative(scalar)
    • fields can be real numbers, integers, etc.
    • <0,5,-1> + <1,-2,3> = <1,3,2>
    • 2.2 <1,3,2> = <2.2,6.6,4.4>
    • a zero vector <0,0,0>

    Dr. Pollett recommended that I could brush up with a bit of Linear Algebra

    We will want a vector per script line, showing what is true in that line


    How would we train from the training sets?

    A few possibilies:

    • Naive Bayes
    • Neural Net
    • Decision Tree

    The easiest is probably Naive Bayes

    First Step: Make program for creating Training Sets

    Blog for week Oct. 5, 2015

    Met with Dr. Chris Pollett at 12:30 on Monday, October 5, 2015 to discuss thesis project.

    Discussed the problem of when to cut and not to cut. Line by line of the script, this is either a true or false

    • Create a vector of what things are true in this scene
    • using probabilities, cut on certain probabilities.

    Need to create list of shot kinds

    • i.e. WS, MS, CU, etc.

    Need to create list of reasons for each shot kind

    Useful information to gather:

    • Count total number of scenes
    • Count number of lines per scene
    • gather about how many minutes long the movie is and each scene
    • Age of script
      • i.e. Duck Soup or Modern Times may cut way less often because of age
    • Black and white or color
    • Genre
      • music video, action, horror, adventure may cut a lot
        • "Hey Yeah... shake it like a Polaroid
      • comedy, drama, period piece may cut very little
    • Get an adjective counter for gushing

    More on special words

    • Check if word is all uppercase
    • Check if a word stars with a capital letter (other than first word in sentence
    • Want to try to identify special word as character, object or sound
    • Special word later capitalized maybe a character or place
    • Might create special lists for common names, sounds, etc.

    The Vectors

    Will have Vectors with labels
    Need to come up with at least two vectors, one for cut, and one for type.

    • Cut Vector produces a value of either 0 or 1
    • Type Vector produces a value from 0 to 10 or whatever number of shot types
    • Will probably need a Moving Vector too, as a subtype of Shot type?
    Calculate vectors from training sets.
    Each element of vector is based on the list I've come up for when to cut, and what type of shot

    Need to create list for determining shot

    Need to figure out how to quantify values of both lists.

    Make numbers out of each item

    More on creating the training sets

    Create a program which allows line by line quickly identifying key features of script:

    • Cut or not cut
    • Type of shot
    • Any movement


    • XML document
    • vectors according to line numbers of XML

    Program will probably be easiest to write in Javascript?

    Blog for week Sept 28, 2015

    Met with Dr. Chris Pollett at 12:30 on Monday, September 28, 2015 to discuss thesis project.

    Side note: Pollett recommended I look up Star Wars in ascii for fun

    More on dtd

    • Use Oxygen for dtd validation
    • Discussed Entities as part of dtd
      • Probably not needed for my project.

    Classification from dialogue

    • Discussed how a particular shot could be classified by analyzing the dialogue and action of a shot.
      • Assume markers are already in place denoting a shot's beginning and end.
      • make vector from text
      • feature set of 100 words or more
      • normalized vector out of shot. figure out important words
      • use the vector to label the type of shot
      • maybe use something like naive bayes
      • this could at the very least figure out attributes of shot
    • How to figure out where shots start and stop

    • What features are important for making a shot?
      • Why did they cut?
      • Automate process of detecting those features
      • solve the question: start