From Lip to Sync: How YouTube is Captioned

Any loyal YouTuber knows that the site is the immensely dynamic, shuffling videos in and out of availability on a constant basis according to owners’ rights, and users’ demands. Its own statistic estimates that 100 hours of content are uploaded every minute to the page by users worldwide; Likewise, content that is copyrighted or obscene is removed within hours. In such a rapidly changing environment, what does the video sharing site do to make so much media accessible?

  1. Automatic captions — To ensure that the massive volume of content that is uploaded every minute has some degree of captioning, Google, YouTube’s owner, has made admirable advancements in its speech-recognition software. The technology is built to recognize the speaker’s language and provide accurate caption text based on what the speaker is saying. In theory, this should allow every video to be captioned, without so much as a push of a button from the video owners. However the technology is a work in progress and suffers from noticable, regular glitches — the automatic captions are often ridiculed by uninformed internet users for their extreme, naughty, ironic, or just plain funny goofs. Granted, speech-recognition has been in development for years, and like Siri of iPhone fame and my dog Marvin, it is designed for loyalty to one voice only. A tall order for the vast multitude of languages, dialects, and accents that exist on this earth, not to mention videos with poor-quality audio!
  2. Auto-syncing — YouTube users who have a basic transcript of their video file (meaning just text, without timecodes for when each sentence should appear on-screen) can enter the text in the video manager of their account, and YouTube will automatically match the words to the audio. This places the burden of accurate spelling and correct transcription on the video owner, and the timing on YouTube. Division of labor — I love it! Yet the technology often gets distracted by ambient noise and poor sound quality. I tried this technique with a video of Matisyahu’s “One Day” playing at a mall, entering the lyrics in the appropriate field, YouTube eventually gave up on the sync, telling me simply “Track content is not processed.” Admittedly, the audio was poor. This technique is perfect for a single speaker, speaking directly into the camera in an indoor environment.
  3. Creation of a timed, accurate caption file — If one wants a professional, high-quality YouTube video, he or she would do well to have it captioned by a professional, high-quality service. VITAC has extensive experience creating timed, placed, and accurate caption files for YouTube users intent on making their content shine. Not only does it reflect well on the user’s video, it improves the SEO for his or her page, making the content more searchable through sites like Google. VITAC can produce caption files for Google in a number of file formats, including SRT, SCC, and CAP. These files can then be easily uploaded to YouTube by the user. These captions come in nearly any format you see on TV, including center-placed pop-on (which is movable by clicking on the captions in the video and dragging them), roll-up, and placed pop-on, where the captions appear next to the speaker. The result is accessible media that matches the quality of the content.

As YouTube continues to improve its captioning software, VITAC is always standing by to accomodate new possibilities and better serve the client. Please visit VITAC’s caption YouTube page for information on how to get your YouTube video captioned.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>