About

What is this?
I have a significant interest in film. As with any art form, film is derivative. Directors and scriptwriters are undoubtedly inspired by works that came before them. Cinetrii is my attempt at finding connections (such as reoccurring themes, plots or motifs) between films, algorithmically. The algorithms take written reviews by critics and apply a number of natural language processing techniques to establish and rank the connections. The results are presented in a tree graph. I’ve been tuning the algorithm in my free time and the algorithm is far from perfect. If you have any pointers relating to the challenges in developing Cinetrii (mentioned below), feel free to contact me at everling@kth.se.

Who is it for?
Myself and anyone who is interested. One could consider Cinetrii as an educational tool for those wishing to discover the progenitors of contemporary films (and conversely, the descendants of old classics). For the open-minded, it can serve as an alternative way to discover movies based one’s own favorites. Whereas movie-recommending features on other websites are governed by common denominators such as lead actor or director, Cinetrii often shows something else completely. Take for example 2014’s Nightcrawler, a character piece about an entrepreneurial and sociopathic video journalist:

Nightcrawler on IMDb

IMDb recommends several movies with Jake Gyllenhaal as the lead, and a few movies from the same year.

Nightcrawler on Cinetrii
Whereas Cinetrii recommends some movies with similar subject matters.
Comparisons have been made between Gyllenhaal’s character and Travis Bickle (Taxi Driver) as well as Rupert Pupkin (The King of Comedy). Nightcrawler’s commentary on modern journalism also echoes previous efforts, such as Sidney Lumet’s prescient Network from 1976. While Zodiac and Prisoners (IMDb recommendations) are outstanding films, some of Cinetrii’s suggestions makes more sense as companion pieces.

The Neon Demon on Cinetrii
Variety.com notes influences for Nicolas Winding Refn and The Neon Demon.

Challenges
Data
Being dependent on online reviews for the NLP-driven approach, Cinetrii’s better query results will be of newer movies (older movies will sometimes have disappointingly sparse trees). There are more online news outlets now than during the early days of the internet, so naturally newer movies will have more reviews available. To make things more complicated, news outlets often take older content offline, resulting in dead links. The WayBack Machine and its API has been of great help to mitigate this. As for pre-internet movies, available reviews consist mostly of archived print articles and reviews written in conjunction with the home video releases. Futhering the bias in results towards newer movies, I think critics are more inclined to infer an artistic lineage when reviewing a contemporary work. Critique in retrospect instead tends to measure the cultural impact of a movie and examines how the movie differed from others at the time.

Movie/Review graph

Scoring
The algorithm which assigns a score to individual connections can be thought of as a ranking algorithm for a search engine. Many factors are involved, such as how many critics have made the same reference and if a member of the production of the referenced movie is mentioned. The difficulty lies in giving high scores only to legitimate connections. Inevitably, completely spurious connections will be established:

The Wizard of Oz

Transformers: Age of Extinction has very little to do with The Wizard of Oz.

In the future I will look into making a more nuanced semantic analysis of the quotes from critics.

Document classification
Online news outlets have different conventions and formats for reviewing films, and this induces some errors in the current version of Cinetrii. More specifically, some online magazines allot an entire page for a single movie review (good) while others fit several reviews on the same page (not good). I currently determine the relevant text segments for a given film based on cosine similarity checks to a vector of names in the film’s production. This method works reasonably well but lets through a few false positives. Irrelevant text segments cause irrelevant connections.

Polysemy
Reboots, remakes and book adaptations create ambiguity when gathering references from reviews. What the critic refers to will most likely be obvious to the reader, but without prerequisite knowledge it will be difficult for software. I have searched for a database of books for whenever a critic might refer to a book (rather than the less likely reference to the book adaptation) but have not found anything sufficient yet. As for remakes and reboots of the same name, given doubt the algorithm selects the most popular one (based on number of IMDb votes).

Tech Stack
Cinetrii.com is an ordinary LAMP stack. The frontend HTML5 canvas uses Fabric.js. The design is not mobile-friendly, so for the best experience use a PC browser. The actual Cinetrii software is built in Java and Scala. Films that have been queried previously are stored and can be loaded immediately. New queries require some processing time, however. Check out the already processed queries.

Leave a Reply

Your email address will not be published. Required fields are marked *