Which magazines have the most knowledgeable film critics? Introducing a metric.

By developing and maintaining Cinetrii I’ve been able to amass a rather unique data set. Rows upon rows of film connections, quotes and URL’s that are waiting to be mined for insights (granted, said rows are subject to a lot of noise). So this morning I decided to look into the reviewers who diligently draw the parallels and make the inferences that my thematic search engine depends on. Are some more knowledgeable than others? I made a metric based on inverse popularity of the films mentioned by the critics. Continue reading “Which magazines have the most knowledgeable film critics? Introducing a metric.”

2017 roadmap

Over the last couple of weeks Cinetrii has been getting some attention from various places, including Gizmodo, DONG and a small notice in Huffington Post.

Logistics-wise things have held up rather well. Once a query is processed for the first time, the result is stored for subsequent retrieval (though reprocessing occurs eventually) . The overwhelming majority of the queries of the last few weeks were already stored. 3000 new queries were handled mostly without congestion by my backend server.

My backend server. A tad slow, but it gets the job done! The worker processes could run off decently spec:ed cloud hosts, but I'm priced out of those options for now.
My backend server. A tad slow, but it gets the job done! The worker processes could run off decently spec:ed cloud hosts, but I’m priced out of those options for now.

While Cinetrii can return really interesting results, it can also be disappointing at times, as I’m sure most users stumbling upon the site have experienced. I’m currently writing my master thesis which takes most of my time, but I’ve outlined some much needed improvements that I will try to roll out during the year:

  • Ability for users to report erroneous connections.
  • More robust method of discriminating between multiple reviews on the same HTML page – possibly incorporating topic modeling.
  • Developing a new scoring algorithm leaning further towards supervised machine learning.
  • Possibly look into alternative review sources to extend coverage for non-English spoken film.
  • A network of movie connections mined from 154,000 reviews

    Earlier this month I launched Cinetrii which connects movies based on references by critics. The goal was to infer the artistic influences at work for a given movie. Of course, there are multiple biases at play here. I’ve already covered the unequal distribution of available reviews in my original blog post. Furthermore, art is only as interesting as the conversation around it, and not all movies warrant fine-grained reflection. Even for those movies that do, the connections are mostly speculative. Critics are not clairvoyant, they can’t know what directors were inspired by. Cinetrii approximates the perception of movie critics more than anything else. The ranking algorithm of Cinetrii which I am continually developing also tries to skewer away from connections with intersecting production cast and crew.

    With this in mind, I’ve gone ahead and batch processed the 200 most popular movies from each of the last six years (2010-2015) into Cinetrii. Those 1200 queries draw from roughly 154,000 collected online movie reviews. From the batch, I’ve sampled the 1000 strongest connections into a node graph Continue reading “A network of movie connections mined from 154,000 reviews”

    About

    What is this?
    I have a significant interest in film. As with any art form, film is derivative. Directors and scriptwriters are undoubtedly inspired by works that came before them. Cinetrii is my attempt at finding connections (such as reoccurring themes, plots or motifs) between films, algorithmically. The algorithms take written reviews by critics and apply a number of natural language processing techniques to establish and rank the connections. The results are presented in a tree graph. I’ve been tuning the algorithm in my free time and the algorithm is far from perfect. If you have any pointers relating to the challenges in developing Cinetrii (mentioned below), feel free to contact me at everling@kth.se.

    Who is it for?
    Myself and anyone who is interested. One could consider Cinetrii as an educational tool for those wishing to discover the progenitors of contemporary films (and conversely, the descendants of old classics). For the open-minded, it can serve as an alternative way to discover movies based one’s own favorites. Whereas movie-recommending features on other websites are governed by common denominators such as lead actor or director, Cinetrii often shows something else completely. Take for example 2014’s Nightcrawler, a character piece about an entrepreneurial and sociopathic video journalist:

    Nightcrawler on IMDb

    IMDb recommends several movies with Jake Gyllenhaal as the lead, and a few movies from the same year.

    Nightcrawler on Cinetrii
    Whereas Cinetrii recommends some movies with similar subject matters.
    Comparisons have been made between Gyllenhaal’s character and Travis Bickle (Taxi Driver) as well as Rupert Pupkin (The King of Comedy). Nightcrawler’s commentary on modern journalism also echoes previous efforts, such as Sidney Lumet’s prescient Network from 1976. While Zodiac and Prisoners (IMDb recommendations) are outstanding films, some of Cinetrii’s suggestions makes more sense as companion pieces.

    The Neon Demon on Cinetrii
    Variety.com notes influences for Nicolas Winding Refn and The Neon Demon.

    Challenges
    Data
    Being dependent on online reviews for the NLP-driven approach, Cinetrii’s better query results will be of newer movies (older movies will sometimes have disappointingly sparse trees). There are more online news outlets now than during the early days of the internet, so naturally newer movies will have more reviews available. To make things more complicated, news outlets often take older content offline, resulting in dead links. The WayBack Machine and its API has been of great help to mitigate this. As for pre-internet movies, available reviews consist mostly of archived print articles and reviews written in conjunction with the home video releases. Futhering the bias in results towards newer movies, I think critics are more inclined to infer an artistic lineage when reviewing a contemporary work. Critique in retrospect instead tends to measure the cultural impact of a movie and examines how the movie differed from others at the time.

    Movie/Review graph

    Scoring
    The algorithm which assigns a score to individual connections can be thought of as a ranking algorithm for a search engine. Many factors are involved, such as how many critics have made the same reference and if a member of the production of the referenced movie is mentioned. The difficulty lies in giving high scores only to legitimate connections. Inevitably, completely spurious connections will be established:

    The Wizard of Oz

    Transformers: Age of Extinction has very little to do with The Wizard of Oz.

    In the future I will look into making a more nuanced semantic analysis of the quotes from critics.

    Document classification
    Online news outlets have different conventions and formats for reviewing films, and this induces some errors in the current version of Cinetrii. More specifically, some online magazines allot an entire page for a single movie review (good) while others fit several reviews on the same page (not good). I currently determine the relevant text segments for a given film based on cosine similarity checks to a vector of names in the film’s production. This method works reasonably well but lets through a few false positives. Irrelevant text segments cause irrelevant connections.

    Polysemy
    Reboots, remakes and book adaptations create ambiguity when gathering references from reviews. What the critic refers to will most likely be obvious to the reader, but without prerequisite knowledge it will be difficult for software. I have searched for a database of books for whenever a critic might refer to a book (rather than the less likely reference to the book adaptation) but have not found anything sufficient yet. As for remakes and reboots of the same name, given doubt the algorithm selects the most popular one (based on number of IMDb votes).

    Tech Stack
    Cinetrii.com is an ordinary LAMP stack. The frontend HTML5 canvas uses Fabric.js. The design is not mobile-friendly, so for the best experience use a PC browser. The actual Cinetrii software is built in Java and Scala. Films that have been queried previously are stored and can be loaded immediately. New queries require some processing time, however. Check out the already processed queries.