FCC Considering Unsupervised ASR for Captioned Telephone Service

by: David Titmus

The Federal Communications Commission is considering applications by three companies seeking to provide Internet Protocol Captioned Telephone Service (IP CTS) solely via automatic speech recognition technologies.

Applications were filed by VTCSecure, LLC, MachineGenius, Inc., and Clarity Products, LLC to provide IP CTS calls using automatic speech recognition (ASR) without the presence of a trained human operator assisting in captioning the call.

The Federal Communications Commission (FCC) in June 2018 took steps to reform the Internet Protocol Captioned Telephone Service in an effort to modernize the system and resolve compensation and funding issues.

Close-up of buttons on a telephoneA popular critical communications service for deaf or hard-of-hearing individuals who communicate by speaking, IP CTS enables a person with hearing loss to call another person and simultaneously read captions of what the other party is saying via a special display screen on the phone or other web-enabled device.

Traditionally, a communications assistant (CA) or voice captioner listens to the call, repeats what that person says, and, with the use of ASR software, transcribes their words to text.

Among the items in last summer’s report and proposed rulemaking, the FCC determined that improvements in ASR technology have made the use of speech recognition by itself an acceptable alternative to the CA-assisted method described above. The FCC also suggested that ASR could provide faster, more private captions than those created by voice captioners and at a lower cost.

In fact, costs are among the items prominently mentioned in each of the company’s applications, with one boasting a savings of up to 85% from the current IP CTS rate.

And while costs always are a concern, it shouldn’t come down to dollars and cents when discussing equal access for the deaf and hard-of-hearing community.

The conversations around ASR captions and FCC compliance generally are about getting them to be “good enough” while driving down costs. “Good enough,” however, simply is not good enough. IP CTS calls, whether they be to friends, family doctors, or emergency officials, deserve to be relayed in the clearest, most accurate manner available.

One of the applicants also touts that their ASR engine achieves high accuracy when operating “under ideal conditions” and when the “engine is receiving HD voice” – situations that aren’t always present in everyday telephone calls.

As we’ve found in studying captions in other areas (broadcast, video, etc.) there is a difference between “Supervised ASR,” where human voice writers are taught to interact with a trained ASR engine to create captions, and “Unsupervised ASR,” where an ASR engine creates captions without human intervention.

Unsupervised ASR often omits words, selects wrong words, or contains punctuation errors, and also misses more important words compared to those missed by human captioners. Unsupervised ASR most frequently leaves out proper nouns, pronouns, nouns, and verbs, whereas human captioners omit proper nouns, nouns, and verbs much less frequently.

The fundamental purpose of captions is to convey meaning, and words – nouns, proper nouns, and verbs – that carry the most meaning cannot be omitted.

To this end, some are asking that the FCC hold on awarding ASR-only applications until the commission has adopted a regulatory framework for ASR-only service that ensures equivalence for consumers.