Moving Archives / Methodology

Methodology

Each film in this series is generated using the same process, but uses a different collection of source material as input. No parts of the films are edited manually, rather a very specific workflow of computer scripts and algorithms were designed to retrieve, process, organize, and generate each film. This page will outline this technical and creative process in detail.

You can also view the technical documentation and code repository which was used to generate the films and is open source.

How is the source material selected?

I decided to focus on large collections of film and video that are in the public domain or available under a Creative Commons license and available to view and retrieve from the Web without restriction. As an artist, researcher, and citizen, I am interested in understanding the scope and nature of the cultural artifacts we collectively own and what I have readily available to use to create new things (mostly) without restriction.

The largest and least legally ambiguous collection of films are those created by the U.S. Government, which has produced thousands of films making us (the taxpayers) collectively the largest film financiers in the country. All films created by the U.S. government automatically enter the public domain the moment they are created. Citizens may reuse these materials freely without restriction. Technically, even secret films created by the government are in the public domain, but they are protected by other laws from use by the public.

How do you access and retrieve the films?

By far I have found the best resource for this is the Internet Archive, which is a non-profit digital library offering free universal access to books, video, audio, images, software, and archived web pages. One hidden feature is that you can filter by license. For example this link will show you all movies that are in the public domain or has a Creative Commons Attribution license, which as of this writing has over 175,000 results. (A Creative Commons Attribution (CC BY 3.0 US) license lets others distribute, transform, even commercially, as long as they credit you for the original creation.) Note that the person who uploads a file to the Internet Archive identifies the appropriate license for use, so you would likely need to do due diligence for anything you use or only use materials from users you trust.

There are a few curated collections of government film on the Internet Archive such as U.S. Government Films, Fedflix, NASA, and Living New Deal Project. Other notable collections in the public domain include the Prelinger Archives, the National Film Preservation Foundation, and Vintage Cartoons.

I wrote a couple simple scripts that downloads the metadata and assets for a particular collection.

How are the clips created for each collection of film?

Each Moving Archives film is composed of 16,384 (128 x 128) unique video clips from a particular collection. Each clip is anywhere between about 100 milliseconds and 2 seconds. Each clip is represents a distinct sonic “pulse” equivalent to a note in music or a syllable in spoken word.

The first step is to break up an entire collection in a series of audio samples. I used onset detection to automatically accomplish this. Depending on the size of the collection, this can easily be in the millions of samples. Then each clip was analyzed and assigned relative values for: volume, pitch (frequency), and clarity. I defined “clarity” loosely as having clear harmonic bands like that of a musical instrument; white noise would be an example of a sound with low clarity.

I used the three audio features of volume, pitch, and clarity to select a subset of clips. In my case, I preferred clips that were sufficiently clear and loud. I selected 16,384 clips so it can be laid out in a 128 by 128 grid.

How are the clips organized?

The clips are organized by spectral similarity. So, clips close to each other should sound similarly. This was accomplished by extracting a set of audible features from each clip using strategies common in speech recognition software. These set of features was then reduced to just two features using a machine learning algorithm called t-SNE. This allowed me to lay the clips out on a two dimensional grid.

Even though the clips are organized by sonic similarity, the clips also often align visually since clips from the same scene may be both visually and sonically similar. Or another example is if there is a repeated opening title scene in a set of films, those would also likely be grouped together.

What are the different ways you visualized the clips?

Given the sheer volume of material, the goal of the visualizations are not to understand the contents of a particular archive, but to simply immerse the audience in a collection’s patterns and textures of sound and image. There are eight discrete visualizations contained in each film:

I. Proliferation

The algorithm for this visualization is roughly as follows:

Arrange clips on a 128 x 128 grid
Play the four clips in the center sequentially over a 4-second interval
Play the outer clips that are neighboring the clips that were just played also over a 4-second interval
Play the clips in order of how similar they are to the center 4 clips
Repeat steps 3 and 4 until you reach the outer edge of the grid

II. Waves

The algorithm for this visualization is roughly as follows:

Start with clips on a 128 x 128 grid
Play clips as a circular wave emanating from the center over a 12-second interval
Clips in the center play louder
Slowly zoom in
Only play visible clips
Repeat steps 2 to 5 until just the center 32 x 32 clips are visible

III. Falling

The algorithm for this visualization is roughly as follows:

Arrange clips on a 128 x 128 grid
Only 32 x 32 clips are visible
Slowly move the camera down, while oscillating left and right
Play clips as they pass the center view of the camera
Slowly increase speed and oscillation of camera until you pass through the full 128 x 128 grid
Slowly decrease speed and oscillation of camera until you complete another pass through the full 128 x 128 grid

IV. Orbits

The algorithm for this visualization is roughly as follows:

Rotate clips clockwise around center of grid starting with the center 4 clips
Play clip when it passes the center horizontal line on the right side of the grid
Start rotating clips counter-clockwise around center after 32 steps
End when all clips are in their starting position