What It Takes to Caption Music: Thoughts In Wake of Lawsuit Over Lyrics Captioning

To follow up on a previous blog on this case, courts recently sided with Hollywood studios over a lawsuit with the Alexander Graham Bell Association for the Deaf and Hard of Hearing concerning the lack of song-lyric captions, leaving caption content at the studios’ discretion. “From the description of both parties, it seems clear to the Court that captions, and specifically the decision regarding what content to caption, is a component of the moviemaking process, as the Studios must decide what level of captioning would provide the best experience for consumers using the caption and subtitle features,” writes the judge.

Here at VITAC, we strive to relay as full a viewing experience as possible through the written word, and follow FCC Caption Quality Best Practices to ensure lyrics are always included in captions. Music is an important part of conveying meaning, and our pre-recorded captioning experts consider more than lyrics when creating captions–they also must describe varying types of instrumental music, including the following:

  1. Transition: There is music playing, but all it’s really doing if filling dead air. Perhaps a couple on “House Hunters” is driving to their second location or the title card on “Castle” plays a few punchy notes as the show opens. For this, two music notes are placed in the upper right-hand corner of the screen.
  2. Identification: With this type of music, which is normally instrumental, the hearing viewer would be learning something. If more information about tone or plot is being imparted than can be derived from visuals alone, some sort of signifier is key to that viewing experience. Descriptors are also used to identify specific songs being used as background music. Some examples include:
    • [ Suspenseful music playing ]
    • [ Upbeat Jazz playing ]
    • [ Steel drum playing ]
    • [ Mumford & Sons’ “Hopeless Wanderer” playing ]
  3. Lyrics: Primary focus is being placed on the music. Lyrics are obviously captioned for concerts, but think about your hospital melodrama montages which have given many alternative musicians their break into mainstream. If the creators of a program are allowing enough room in-between dialogue to for a viewer to hear the lyrics, there should be enough room to caption them, as well.

While the format of lyrics and descriptors remains the same across all VITAC programming finding the right way to impart the experience of what’s being heard to a viewer is where captioners need to get a little creative. One pre-recorded captioner writes:

“I once worked on a show for Vice that was nothing but a compilation of their unused B roll for transitions and such. It was kind of artsy and was mostly montages set to different music. That job had everything from [ Soft choral music playing ] to [ Speed metal playing ]. Some of the highlights were [ Pungi playing ], [ Tense, ethereal music playing], [ Slow classical fusion music playing ], and [ Electro-funk playing ].”

There are a couple of puzzles in finding the appropriate words to articulate sound—music and cartoon sound effects being the most notable—but captioning music has plenty of other difficulties, as well. For instance, if you’ve ever tried finding lyrics online, you know that almost every lyrics site is user-generated, which allows for irregularities and inaccuracies. Still, though, they’ll get you in the ballpark.

Robert Plant, Austin City Limits

As for concerts, on the upcoming “Austin City Limits” with Robert Plant & The Sensational Space shifters or the recent episode with James Taylor, there is a whole lot of vamping and improvisation with the classics they’ve performed dozens, if not hundreds, of times. Combine that with Plant’s unique singing voice and a full band, and deciphering lyrics becomes an almost superhuman feat.The Voice


We do receive lyrics for some programming, such as “The Voice”, where covers and new arrangements abound. And, as you can imagine, as music takes center stage for these shows, crowdsourced lyrics will not suffice. Another hurdle for the captioners of these shows is timing the work to ensure rhythm and accuracy, especially in duets. As captions require varying amounts of time to load, ensuring that everyone hits their cue.

Despite the fact that we include lyrics in all of our captions, sometimes we’ve noticed that by the time a program gets to air, the lyrics are deleted from the program.  This especially occurs on streaming platforms, and we always try to educate the programmer about the importance of providing a full viewing experience to viewers who  rely on captions. While this lawsuit may allow the right to refuse captioning music, VITAC will keep working to bring viewers the most accessible programming possible.

Offline Hurdles: The Power of Briefs

Autocorrecting Your Way to a Smarter Workflow

Yelling frustrations into smartphones has become an almost daily part of life. As we hurriedly zip out texts, this function has the tendency to take the reins and skew meaning in annoying, sometimes hilarious, ways. By taking control of this system, offline captioners greatly increase productivity, convenience, and peace of mind in their contribution to video post-production. After all, with the sheer volumes of content being generated today, we need every shortcut out there.

At VITAC, this personalized list of shortcuts are called briefs; some call them macros, text expanders, or “autocorrect”, and they can either drive you mad or save your sanity. Early on in the career of an offline captioner, it becomes apparent that people today tend to use a very, very common vocabulary. Narrow the field of vision to cookingFlavortown, Guy Fieri, Captions, Captioning, Food Coma Town shows, and the variety of words used gets even slimmer. No matter how fast you think you can type, there comes a moment where you simply cannot type the words, “delicious,” “restaurant,” or “sauté” a single time more, so you classify a set of shortcuts to expand “rt” to “restaurant,” or “dc” to “delicious,” and that’s how you keep ahead of any especially fast-talking chefs. “Ft” could very well help you never have to type the word “Flavortown” again.

Briefs, captioning, shortcuts, macros, autocorrect


Within our proprietary offline captioning software, VNL, is the briefs interface, where captioners distinguish and categorize shortcuts for use when most needed. There are the personal briefs, for every show, temporary, series-specific, computer-specific, and “other” for our captioning staff. Sometimes a little gem-like inside jokes left are left behind by someone who’s worked on the show before. One captioner writes, “I’m not sure what was happening in my life when I made this one.”


After long enough in the captioning industry, captioners have a brief for almost anything you can think of, but here are some general rules of thumb followed:

  • Adverbs: People actually use a lot of adverbs. Seriously, definitely do that immediately—really.
  • Fillers: Unscripted actors, those on home-renovation, cooking, reality programming, and the like, say “like” and “you know” all the time, along with various other filler words. It won’t take long to notice which filler words people absentmindedly throw into everyday conversation. Yes, it will get annoying.
  • Sound Effects: There are only so many sound effects. Everyone [ Laughs ], [ Cries ], and [ Scoffs ]. There will always be another [ Phone ringing ].
  • Misspellings: Do your fingers insist on typing “jsut” instead of “just”, or “recieve” instead of “receive”? Do you tend to capitalize the first to letters of every sentence or end words with “ign” instead of “ing”? Let the system take care of all the words you refuse to learn and the wrong ways your fingers hit the keyboard at 90wpm.


At VITAC, we caption videos on every subject imaginable, and each genre has its own specific parlance which you can key into and begin to predict what will be said next, and staying one step ahead of your project is the trick to efficient captioning. Then, as with most jobs, captioners may, without noticing, take work home with them. They might use bracketed sound effects in text messages, or begin to notice that they text the same words many times each day. That’s when it’s time to open the autocorrect settings and begin developing a personal dictionary of briefs. There is truly no limit, as evidenced by this enterprising and devoted son when he made a brief on his mother’s phone to exchange the phrase “dirty laundry” to a transcript of James Joyce’s Ulysses.

Autocorrect prank, shortcuts, Ulysses, briefs
So, which briefs couldn’t you live without? How are you making autocorrect work for you instead of against? Let us know! Be sure to check out the other Offline Hurdles, and look to our offline captioning page to find out more about the work being done at VITAC.


Offline Hurdles: The Captioned Puzzle

Clarity, Accuracy, and Timing in Accessible Programming

There are a few different styles in which television is captioned, each with its own merits and flaws, but the four pillars of closed captioning are as follows:

  • Accuracy: Captioning shall match the spoken words (or song lyrics when provided on the audio track) in their original language (English or Spanish), without paraphrasing, except to resolve any time constraints.
  • Synchronicity: Captioning shall coincide with the corresponding spoken words and sounds to the greatest extent possible.
  • Completeness: Captioning shall run from the beginning to the end of the program, to the fullest extent possible.
  • Placement: Captioning shall not block on-screen graphics.


Combined, these tenets, directed by the FCC, attempt to create a viewing experience like that which the audio track delivers. In the world of pre-recorded captioning, this can become something of a balancing act—one which involves a couple hand-offs and some personal discretion. Your pre-recorded programming comes through three levels at VITAC; it is transcribed, timed/placed, and reviewed. Now, it’s the job of each captioner to make their successor’s as easy and streamlined as possible. If all goes according to plan, the timer/placer won’t change much, and the review will watch a clean file then deliver.


Transcription and the Questions That Arise

It seems clear enough, right? Just write down what people say. Before beginning anything, though, a treatment is consulted to verify any  client-specific requirements. Can dialogue be cleaned up to cut down stuttering or filler words like “um,” “like,” and “you know”? Are you allowed to write “gonna” or should it always be “going to”? If an actor is dropping the ends of their words, do they write “droppin’”? How are accents handled? How is profanity handled? A lot of these questions come up due to the unpredictability of unscripted television, as we can mostly assume scripted stutters and other acting is intentional, but every show has its own particular guidelines. Now it’s off to the races of accurate, light-speed typing, ensuring that every name and obscure pop-culture reference is spelled correctly before handing the project to a timer/placer.


Time, Place, and Make Everything Fit

Get all the words on the screen as they’re spoken with enough time to load and make sure everything’s broken up in to easily digestible, almost poetic, bites. If you think about the captions that scroll up the screen like a kind of continuous loading bar, know that the one’s that pop on need to fully load before springing onto screen. This load time is affected by pretty much anything you change—caption length and position being the top two variables.

As more people begin talking simultaneously and plot-pertinent sound effects take place, things can get a bit dicey. Do you polish something? Bring the captions in early? Late? How important is that ringing cellphone? It’s all about doing what you can to communicate an auditory experience visually in an efficient manner. In addition to grammar genies, captioners need to be pretty adept at solving these puzzles.



Accuracy and synchronicity are on opposing sides of a see-saw, vying for control, and your captioner is the mediator, trying to keep things balanced. Hidden within every second of broadcast television are many decisions and perspectives at work. And after any time spent captioning, especially loud reality programming, the viewing experience is forever changed. You’ll always be asking yourself, “How in the world did they handle that?” Look closely and pay attention to the captioning medium. You’ll find there’s a lot behind how this information is translated and delivered.