Image is Everything

On April 12th, 1933, several hundred guests stumbled into Philadelphia’s Academy of Music for what they presumed would be a typical concert. As the lights dimmed, the audience found their way to their seats until the room was plunged into an inky blackness. The Philadelphia Orchestra, as expected, lept into a rousing Wagner overture. But as the lights slowly crept on, the performance took a plunge into the unexpected. Instead of an orchestra in full regalia, the audience instead saw nothing. Or rather, two enormous loudspeakers amidst an empty stage. The recording engineer Harvey Fletcher, who had been standing in the wings, sauntered into view in front of his befuddled audience. “What you’ve just witnessed is an illusion,” he proclaimed, explaining that the orchestra was actually just a few floors downstairs, hidden inside the hall’s soundproof basement.

Fletcher’s stunt with the Philadelphia Orchestra was the public’s introduction of a brand new recording technology: stereophonic sound. Leopold Stokowski, the ambitious conductor of the Philadelphia Orchestra (who would, a few years later, become internationally famous for starring as the conductor in Walt Disney’s “Fantasia”) had long been frustrated with how orchestras were presented on recording. A full orchestra, after all, is a spectacle in its sheer size—a spectacle that, when compressed into a single tinny speaker in one’s living room, robs the listener of exactly what makes a trip to the symphony so hair-raising. Stokowski’s main issue was not fidelity, but what we now refer to as “image.” If we listen to an orchestra, we hear the horns a little bit to the left of center, and the basses to the right—a trip to the concert hall gives us a true “surround” image, in which acoustic information literally sits all around you. But in the 1930s, recordings were not optimized for this nuance. Commercial recordings were exclusively monaural: practically always recorded with one microphone, and always played back from a single speaker. Experiments with higher-order imaging (i.e. the use of more than one speaker to represent a sound) were seen as a waste of time and resource that could better be spent refining the acoustic fidelity of the recording process. In short, it wasn’t so much that a speaker couldn’t produce the full volume of an orchestra, it was the fact that a speaker translated a sound that came from a stadium-sized area and reduced it to a single point of sonic contact. Monaural recording destroyed the space the orchestra sat in—a quality that Stokowski clearly believed was as, if not more important, than the content the musicians were playing.

 
Leopold Stokowski, in perhaps a slightly more familiar setting.

Leopold Stokowski, in perhaps a slightly more familiar setting.

 
Bartok’s orchestral image. Kinda.

Bartok’s orchestral image. Kinda.

 

Back in the concert hall, Fletcher continued to amaze and astound his audience, who prior to that evening had never heard a recorded orchestra that sounded like, well, an orchestra. The rest of the evening consisted of magic tricks demonstrating the amazing psychological feats one could do with two speakers instead of one. In that evening’s showstopper, an invisible handyman walked across the stage and audibly hammered planks of wood together. Suddenly, a phone rang on the opposite side of the stage. The handyman, naturally, walked over and said hello just as a person would. According to historian Greg Milner, “the eyes of the audience followed [his] every move.”

Two Bell Labs engineers sit in the Academy’s basement, directly between the transmitting orchestra and the stereo speakers upstairs in the concert hall.

Two Bell Labs engineers sit in the Academy’s basement, directly between the transmitting orchestra and the stereo speakers upstairs in the concert hall.

Stokowski’s friendship with a Fletcher arose from an unlikely project at Bell Labs. Fletcher, who just ten years prior helped prove the existence of electrons, was working at Bell Labs as a telecommunications engineer. A music buff, Fletcher convinced Stokowski to let him record the Philadelphia Orchestra’s rehearsals in order to test his novel recording inventions. Aware of Stokowski’s frustration with one-speaker playback, Fletcher created a brand new way to cut a record in order to allow for multiple channels to be carved into the record’s groove. Assisted by Arthur Keller, Fletcher’s new system used two arms to carve into vinyl, each perpendicular to the other. The result was a record that contained two different microphone signals in each of the sides of the deep groove it carved. The two microphones provided a handy model for a human’s two ears, letting them hear the horns slightly to the left and the basses slightly to the right. From the Greek word for “three-dimensional,” the term “stereo” was quickly adapted to this two-channel recording technique.

The unlikely creative friendship of Fletcher and Stokowski stresses a duality that’s long been embedded in the history of communication--technical innovations lead to a wide variety of artistic applications, and vice versa. It represents a unique form of cultural partnership: that where research driven by creativity meets creativity driven by inquiry. This month, Dogbotic Labs kicks off the first of what will be a continued series of residencies that straddle the artistic and scientific. Ilona Brand, our first resident, is a Brooklyn-based multimedia artist and software developer interested in promoting thoughtful, deliberate discourse through media that typically do not afford that. How might the World Wide Web be better designed to encourage empathy and positive discourse?

The second, and perhaps more relevant, reason I chose to open this series with the stereo anecdote is that it formed the basis for my and Ilona’s discussions about the creation of a better Web. The magic of imaging is that it communicates space, in a visceral—almost primal—way. A stereo image gives the listener a discrete vantage point: it tells you exactly where you are relative to the thing you’re hearing. Mono recordings, in contrast, are their own source. Your position to the sound in question is your position relative to the speaker. The addition of multiple channels of audio to the consumer listening experience helped blur the line between the universe of the listener in the universe of the recording, as it placed the listener in that reality. Imaging technology took recorded sound—in all its seeming objectivity—and created a space in which the audience could be vulnerable. In higher-order images, such as quad, 5.1 surround, and eight-channel, the distinction between the universe of the listener and the universe of the recording becomes less and less relevant. Surround your audience in a field of speakers, and you get access to one of the most amazing facets of human perception: the ability to mentally map out a space using just your two ears.


An album mixed with stereo imaging. Ever since the 1940s, the two-channel mix has been the standard for both recording and playback.

An album mixed with stereo imaging. Ever since the 1940s, the two-channel mix has been the standard for both recording and playback.

An album mixed with quad (4-channel) imaging. Quad setups still exist, despite being a failure in the consumer goods market.

An album mixed with quad (4-channel) imaging. Quad setups still exist, despite being a failure in the consumer goods market.

An album mixed with 5.1 surround (5-channel) imaging. 5.1 is the most common surround format today, with five channels of positioned audio with the “.1” representing a separate subwoofer.

An album mixed with 5.1 surround (5-channel) imaging. 5.1 is the most common surround format today, with five channels of positioned audio with the “.1” representing a separate subwoofer.


stokowski_and_fletcher.png

The sake of vulnerability is precisely what attracts us to researching the design and storytelling capabilities of novel audio imaging tech. The worlds of VR and AR, specifically in film and gaming, have given rise to a slew of amazing new technologies in this realm. Technologies that—literally three years ago—were prohibitively expensive for any non-technical specialist to play with. The ability to produce a realistic audio image has plenty of very practical applications outside of entertainment, of course. Imagine a group telephone call where each person’s voice emanated from a discrete location in a room; where you actually felt the need to turn your head from person to person to maintain “eye contact.” Might this interface promote intimacy in conversations? How about trust? Or what about a web browser that—instead of looking at results on a visual list—mimicked voices that read their subject lines as they “sped by you?” How might that change your patience with clickbait? Could that potentially foster critical discourse around the spread of fake news, simply by changing how it is presented?

For me, the joy of thinking about human-centered sound design is that its implications are incredibly practical, yet its methods are undeniably artistic. In the case of imaging, Stokowski and Fletcher were able to bring the nuanced and psychologically abstract quality of physical space into the world of recording. For the next few weeks, we’re looking to bring that same quality of space to a world famously devoid of anything physical—the World Wide Web. It is our belief that the addition of physical space to something as abstract as digital communication might promote our notion of a gentler future: one in which communication is an act of vulnerability, and system design helps highlight the implications of one’s actions. Sound plays into our perception of space so intimately, it opens up a playground in which we can easily explore the profound implications of the physical world we all inhabit. Sound is space, codified.