Court Reporters v. Digital Recording and Voice Recognition: A Comprehensive Breakdown
“On a long enough timeline, the survival rate for everyone drops to zero.”
– Chuck Palahniuk
Not a day goes by I don’t read some sort of labor force doomsday article with a headline warning of technology, AI, and machines we can’t even begin to understand taking over every profession known to man; from folding laundry to performing complex radiographic studies and brain surgery. However, when you get past the scary headlines and read the content, most of these articles point out that while the technology may be there in theory and under extremely controlled laboratory settings, it is nowhere near the level of sophistication needed to perform these jobs in the real world, and would perhaps just cause an increase (ironically) in the production of incompetent automated consumer-complaint chatbots.
Having said that, and being a late Gen-Xer, I grew up with rapid technology growth and recognize when a newly introduced technology is beneficial for all, and conversely, when a new technology is simply a bunch of bells and whistles that does nothing more than complicate something that already exists. I also understand that when a truly innovative technology emerges, it will create far more jobs than it will eliminate, and sometimes it will create entire industries. For example, the refrigerator sent many ice and milk deliverymen to the unemployment lines, but created an entire frozen food industry, frozen and refrigerated trucking industry, not to mention the countless jobs in the design and production of millions of refrigerators. Granted this is a very old example, but this same principle can be applied to just about every technology that advances us; and technologies that do not spawn economic, workforce and industry growth that once seemed cutting-edge will end up in the graveyard beside the tombstones of LaserDisk and MySpace. A balanced human reaction to technology is to adapt to the game-changers, but to temporarily marvel, be amused by and then subsequently learn from the ones that failed more often than worked.
TECHNOLOGY IN THE LEGAL WORLD
Like all industries, the legal industry is being confronted every day with threats of technologies that will replace human beings, and if you read enough articles on LinkedIn, you may even be led to believe that lawyers will be completely replaced by AI in the next few years. But this is obviously not true.
Last month I attended an all-day seminar put on by the Cleveland Metropolitan Bar Association regarding AI and other technologies that are rapidly seeping into the daily operations of mid-sized to gigantic law firms all around the world. The emergence of very basic and simple AI doing the work of entry-level associates in the categorization and organization of millions of pages of discovery documents was discussed, and a very refreshing, optimistic view of this technology’s place in the practice of law was the resounding takeaway. Using AI technology to relieve humans of the mundane tasks of sorting and organizing, and instead spending that crucial first few years as an associate actually practicing law will only produce more prepared and experienced, enthusiastic lawyers.
TECHNOLOGY HAS BEEN THREATENING COURT REPORTERS FOR DECADES
The litigation support industry, especially the court reporting industry, has been challenged by emerging technologies since the advent of the tape-recorder. Over the last two decades, Courts of Common Pleas across America have experimented with replacing human court reporters with digital recording equipment to the detriment of not only due process and expediency in appeals, but also to the detriment of the record itself. In fact, many of these courts who have tried the digital recording route have now brought back the human court reporters after quickly realizing that bringing in digital recording equipment as a substitute for a highly skilled court reporter was a giant step backwards in courtroom technology. So why did this happen in the first place? There are two main causes: flawed budgetary studies and misunderstood technology.
Obviously, replacing human court reporters with digital recording equipment would significantly loosen the budgetary constraints placed on countless communities across the country. The problem is, digital recording equipment can’t do what court reporters do. Replacing court reporters with recording equipment is analogous to a community replacing all human police officers with simple cameras on every street corner. Sure, this would improve your budget, but what you’d be left with is a lawless, fearful and anxious community. And this is obviously not an apples-to-apples substitution. No one would ever think or suggest that an army of cameras can do the job of a human police force. Perhaps this replacement seems so ridiculous because the general public understands the importance of human police officers, the complexities of their jobs, and realizes that a camera could in no way replace them in keeping our communities safe and lawful.
However, the same thing can’t be said for court reporters. We are a relatively small society of professionals, and with the exception of some of our friends and family, most of the general public and even our legislators don’t really understand what it is we do or how we do it. In fact, even Hollywood still portrays us in modern movies as a simple person sitting in front of a mechanical machine with an endless roll of paper cascading onto the cinematic courtroom floor. We live in a largely pop culture world, so this is how the general public understands our profession. And if this was actually the case, I would agree; isn’t there a better way to do this?
But in reality, court reporters use extremely sophisticated technology. Even so, not a month goes by where I don’t have a witness in a deposition at a break ask me, “Why don’t you just record this?” My answer is always, “Recordings can’t make a transcript.” And they always then say, “What about voice recognition? Why don’t you use that?” To that I simply say that technology like that doesn’t exist yet to even begin to compete with what we do. The fact of the matter is, our profession and the skill and technology behind it is grossly misunderstood.
In this article, I will attempt to explain in great detail the technology and training of a modern-day court reporter. Then I will provide an extremely comprehensive breakdown of how digital recording and voice recognition technologies as they exist today stack up against our current technology (if at all). Finally, I will end with some thoughts on technology in general that will hopefully give a more optimistic prognosis for our future as working humans, and how we should all react to emerging technologies that begin to enter and threaten whatever industry it is we make our living.
COURT REPORTING TECHNOLOGY
First, let’s set the record straight. Court reporters do not dictate from paper notes and then use simple word processing to create a transcript. We do not use paper in any way. We do not use purely mechanical devices. Instead, the technology we do use is incredibly sophisticated, creating an immediate readable record at a 99-percent accuracy rate or above, capturing complex testimony at rates of 225 words per minute and higher.
To put that in perspective: An extremely proficient typist on a traditional QWERTY keyboard will max out at about 110 words per minute; the average human speaks at about 180 words per minute; when you add multiple speakers at once, spoken words per minute can exceed 300 in lightning fast bursts. Because of this, a person trying to capture the spoken word using a traditional QWERTY keyboard will start to fall behind after about the first 10 seconds of a deposition or trial. Conversely, court reporters capture every word as it is being spoken, including punctuation and speaker identification, over sometimes mind-numbing increments of eight hours or more, never falling behind. So how do we do this?
The first six months of a court reporter’s training is spent learning a new language. We call this language machine shorthand, or simply “steno.” As stated above, a traditional QWERTY keyboard can’t capture the spoken word without falling behind almost immediately. To handle the speeds of human speech and conversation, the court reporting machine and machine shorthand were born.
Our modern-day machines are extremely complex computers with hypersensitive keyboards consisting of 22 blank keys and a long blank number bar. The spoken English language is then broken down into combinations of sounds and phrases that the court reporter will capture using keystrokes consisting of thousands of combinations of these blank keys.
The six months spent learning this new language consists of learning keystroke combinations that correspond to sounds of spoken English, and memorizing thousands of keystroke combinations that represent frequently used phrasing in the English language, as well as thousands of brief keystrokes used for commonly used words and complicated medical and industry terms. Our left hand is responsible for capturing the beginning consonant sounds of a word or syllable. Our right hand is responsible for capturing the ending consonant sounds of a word or syllable. Our thumbs are responsible for the middle vowel sounds of any word or syllable. So unlike the QWERTY keyboard, where one letter is typed at a time to form a word, on our machines we type the whole word or phrase at once in a split second. This is very analogous to playing single notes versus chords on a piano keyboard. In addition, we learn keystroke combinations used for all punctuation and speaker identification.
After the new language of machine shorthand is learned by the court reporter in training, the entirety of the next couple years in school will be spent building speed. True speed-building is achieved when the court reporter takes his or her mind out of the process and learns to let their ears communicate directly with their hands. When the conscious mind of the court reporter gets involved and becomes hyper-focused on the words being spoken, he or she will quickly fall behind. Instead, we use the capacity of our conscious minds to survey what is happening in the room, constantly scanning with our eyes to pick up not only who is speaking at any given time, but to pick up on any body language around the room that would indicate someone else is about to speak at the same time as somebody else. So we need to be ready to use some of that brain capacity for retention until the simultaneous speaking is finished, and then we can go back to letting all the words go directly from ear to hand.
Now, this doesn’t mean that we are not listening to the words being spoken. In the court reporting profession, there is a big difference between listening and hearing. We are definitely listening and understanding. However, we are not focusing on each individual word. Rather, what we are “hearing” goes directly from ear to hand with no thought involved. It is this crucial balance of listening and direct ear-to-hand hearing that enables the court reporter to remain calm and collected, and to not rattle or fall behind when conversation becomes stacked or highly contentious.
COMPUTER-AIDED TRANSCRIPTION (CAT) SOFTWARE
Learning the language of machine shorthand and mastering that language on our court reporting machines is only half of the technology we use in producing incredibly accurate and immediate transcripts. The second half of court reporting technology is computer-aided transcription (CAT) software. Like our machines themselves, CAT software is extremely sophisticated, very expensive, and requires separate training for a court reporter to truly become comfortable with and proficient in all its functions and capabilities.
CAT software is run on a laptop, and that laptop will be communicating with our court reporting machines either wirelessly or through USB, and will be translating the keystrokes of machine shorthand into written English on the laptop screen in real time. But it’s not just a bunch of words showing up on the screen in little or no format, like a Word document. CAT software translates into an immediate transcript format with specific spacing, line numbers, timestamping, margins, and automatic punctuation at the ends of questions and answers.
What is actually happening here is the court reporter is instantaneously translating spoken English into machine shorthand in the form of quick keystrokes on the court reporting machine, and then the CAT software is translating those complicated keystrokes of machine shorthand into written English on a screen. There is virtually no delay in the time someone speaks and when the written words show up on the screen. These two levels of translation happen that fast and incredibly accurately.
Another capability of CAT software is sending out feeds of the realtime transcript to other devices wirelessly and even remotely. The court reporter will connect his or her laptop to a private, secure wifi router, and any attorney or judge can then receive the realtime feed on their own tablet or laptop using a free downloadable app that is compatible with the court reporter’s CAT software.
This immediate feed of the court reporter’s transcript can also be sent around the world by setting up a remote login for any attorney in a location other than the venue of the proceeding to view the transcript in real time. The court reporter will set this up, and their CAT software will send the feed to the remote hosting cloud server for anyone given login access to follow along with the deposition or trial.
I could go on in further detail of all the functions of CAT software, but we now have a basic understanding of the very sophisticated and complex technologies court reporters use in capturing the spoken word and creating transcripts. More detailed functions will be explained as we delve into the exercise of comparing court reporting technology to both digital recording and voice recognition, the two technologies that have posed the greatest threat to the court reporting profession.
But just how credible are these threats?
COURT REPORTERS V. DIGITAL RECORDING
To begin the analysis of court reporting versus digital recording technologies, it will be helpful to first introduce a table to provide a quick overview of what each are capable of with respect to making a record of a deposition or trial:
Now, the table above is fairly self-explanatory, and it is quite obvious from a cursory glance that court reporting technology is vastly superior to digital recording. But when we delve further into the analysis, the shortcomings of digital recording become so apparent it is a wonder how such a limited technology could ever have been seen as a threat to court reporters in the first place.
A DIGITAL RECORDING HAS ZERO FILTERS
Unless courtrooms and conference rooms where depositions are held suddenly become professional recording studios with separate equalizers for each microphone and a talented technician running the soundboard at all times, any mic in the room will pick up any sound with absolutely no discrimination or filtration, and all this sound becomes part of the official record.
Because of this, in the table above, digital recording receives a “sort of” rating with respect to capturing testimony at 99% and above accuracy. Instead of a clear answer to a very important passage of testimony, what you may end up with on a record made by digital recording is a cough, a rustling of papers, or any number of extraneous sounds that mics will pick up indiscriminately.
Court reporters, on the other hand, hear in three dimensions and have the ability to filter sounds. We deal with all sorts of noises during any proceeding that shouldn’t be part of the record in a very simple way: We don’t even hear them. An experienced court reporter has perfected their ear-hand coordination in such a way that the only thing getting through to their hands are the sounds of spoken language. There have been times when I’ve been in a deposition and the questioning attorney has requested to go off the record for a minute because there were very loud sirens happening outside on the street, and until he said anything about it, I hadn’t even noticed them and was having no trouble at all taking down the testimony. As for a cough or a rustling of papers, we don’t even notice those types of noises, and they never interfere with the record.
Courts that have brought in recording equipment to replace human court reporters quickly recognized the problem of lack of filters on recording equipment, and during a high-profile criminal trial or extremely complex and drawn-out civil case will bring in a human reporter to be the official record. There is simply too much at stake in many cases on any given day in courtrooms across the country to risk crucial testimony being lost due to faulty equipment or a garbled or unintelligible recording.
Dealing with multiple speakers talking at once is one of the biggest challenges for court reporters, and it is something we really don’t encounter until our first job after school. All multi-voice Q&A testing in school, although dictated at very high speeds, is one voice at a time. However, having learned ear-hand coordination in school, we deal with multiple speakers at once by having one voice go directly from ear to hand, and we use retention techniques to handle the other voices, separating them out in our heads until all the testimony has been captured. Speaking at the same time is part of human conversation and happens at every deposition and every trial, so being able to handle multiple speakers at once is crucial to making a clear and accurate record.
In the same way that recording devices can’t filter and differentiate between a cough and a human voice, they also can’t separate out multiple voices at once. When more than one person is speaking simultaneously and the proceeding is being digitally recorded, what you get for a record is a cacophonous mess of human voices; completely stacked and unintelligible.
Even when all participants are speaking one at a time, another crucial aspect of a record is the identification of who exactly is speaking. Now, unless each speaker clearly identifies themselves before they start speaking, a digital recording has no way to perform speaker identification.
Court reporters, however, are trained in speaker identification, and have multiple keystrokes that will immediately identify any number of speakers before they speak. It is one of the first things we learn how to do once we learn machine shorthand, and whenever a record is made by a human court reporter, there will be no question as to who is saying what.
CREATING A WRITTEN TRANSCRIPT
Court reporters produce written transcripts. It’s what we do. We produce an immediate realtime transcript as it is happening (think closed-captioning), and we can produce a final, edited and proofread, certified transcript the same day or next day after a proceeding is completed, depending on the length of the proceeding.
A digital recording does not make a transcript. Ever. In fact, when an attorney wants a transcript from a trial that was digitally recorded, they will obtain the digital file from the court and then give it to an independent court reporting firm like us to transcribe it. And because of the usually poor quality of recorded testimony, we charge a premium for this service, and the transcript is most oftentimes riddled with inaudible and unintelligible passages which would not otherwise be there had a live reporter taken down the proceeding in the first place.
I think it must be said here, too, that when I say we produce transcripts, don’t think of a huge stack of paper. Hard copy transcripts are rarely ever ordered in today’s world. What we do produce are click-searchable, indexed, highly functional digital files with hyperlinks to digital exhibits. Again, a digital recording cannot do any of this.
The rest of the above table deals with some duties of court reporters that a digital recorder obviously can’t do, like marking exhibits, immediately reading back, and swearing witnesses. These ancillary job functions are just as important as any other in preserving a complete record of a deposition or trial, and without a human there, they simply can’t be done.
It is now very clear that court reporting technology versus digital recording technology is no contest in the creation, production and preservation of an official record of a deposition or trial.
Next up is voice recognition technology. Is this a credible threat?
COURT REPORTERS V. VOICE RECOGNITION
Again, as you can see from just a cursory review of the above table, voice recognition technology stacks up just as poorly to existing court reporting technology as digital recording does. But before I get into the details of the table, let’s discuss two glaring problems with voice recognition that I believe make it highly unlikely it will have a significant role in creating records of depositions or trials at any time in the near future.
THE INHERENT FLAW
The biggest problem with voice recognition technology is that at its most fundamental level, it relies on digital recording and microphones to make it work. All the same difficulties digital recording ran into due to lack of filters will inevitably show up in the exact same way with voice recognition. Extraneous noises will interfere. Multiple speakers at once will create an unreadable transcript. Speakers will not be identified unless they state who they are before they speak. No matter how sophisticated and accurate the voice recognition technology becomes, this inherent flaw will probably always be there.
VOICE RECOGNITION DEVELOPMENT IS A BILLION DOLLAR INDUSTRY
Of all the arguments against voice recognition in the court reporting world, perhaps one of the most important is the one I’ve never really heard anyone talk about. The developers of voice recognition technology are the tech giants of the world. Microsoft, Google, IBM, Apple, Facebook; they all have billions invested in its research and development. Therefore, to get any kind of return on this enormous investment, the reach they have in mind is focused on personal, single-user applications that do not come close to meeting the needs of the relatively miniscule court reporting industry.
We have all seen and most likely used what they have come up with so far, and for the applications it is used in presently it is helpful, but so far from perfect. Most of us have used speech to text to send a text or an email, and sometimes the results are laughable. But in this very informal application, it serves its purpose of hands-free written communication. We have all gotten used to deciphering the mistakes voice recognition makes by using the context of the overall message and even from past experiences where we have seen the same errors. But I invite you to try something. Take out your phone, or if you use Dragon software on your computer, proceed to use the speech to text app you have, but this time have two people talk at the same time and have another person coughing. Check the results of the written record you have. Now extrapolate that over an eight-hour deposition, and you were probably better off recording the deposition with a 40-year-old microphone and a RadioShack tape-recorder and giving that tape to a court reporter to transcribe.
Again, the billions of dollars in R&D being spent on voice recognition presently is mostly for single-user, personal applications. To create a voice recognition software specifically for the court reporting industry that could handle multiple speakers at once, filter extraneous noises, identify speakers before they speak, punctuate without the speaker speaking his or her punctuation, and producing an immediate transcript in the proper format would require billions more in R&D for an industry that wouldn’t even come close to generating the necessary return on investment. Even if the software license was $1,000,000 per court reporter or court to use this software, it would not be worth it for the tech giants to develop this software. Now, over time, and building upon the research and development of others, the technology will improve. But to say it’s even close to being a competent substitute for court reporting technology as it exists today is simply not true.
CREATING A TRANSCRIPT AND HUMAN FACTORS
Just like digital recording technology, voice recognition technology cannot produce a transcript at the same level of accuracy, expediency and formatting that human court reporters can. In addition, voice recognition is not able to on its own swear witnesses, mark exhibits, or stop a proceeding due to a soft-spoken witness. In many of the same ways digital recording does, voice recognition falls incredibly short in matching the expertise and advanced technology currently used by human court reporters.
THE FUTURE OF THE COURT REPORTING PROFESSION
As courts across the country continue to bring back human court reporters in lieu of experimental digital recording equipment, the future of our industry is bright. Not to mention the fact that in the world of civil litigation, deposition and discovery, which is probably the most lucrative of all the fields a court reporter can work in, human court reporters have never been replaced.
However, due to many of the misconceptions about our field mentioned at the beginning of this article, enrollment is down in court reporting programs all over the country, and in many instances schools and programs have been eliminated completely. With an aging population of currently working court reporters, there will be a shortage of court reporters within the next five years in every city in the United States to meet the needs of the steady or increasing industry of civil litigation. Will this create a crisis in due process and cause a monumental roadblock in the already congested civil dockets of our State Courts? That obviously has yet to be seen. But one thing is certain; court reporters perform an absolutely vital role in our justice system, and as our numbers dwindle and are not replaced by new reporters, the justice system we rely upon and recognize now will not exist.
“ON A LONG ENOUGH TIMELINE, THE SURVIVAL RATE FOR EVERYONE DROPS TO ZERO”
I chose this quote to open this article to demonstrate that I am not just writing this with a bias toward the field in which I work. Given enough time and human progress, everything will be different and unrecognizable. In 100 or 200 or 300 years from now, probably none of the jobs that exist today will exist as they do now, including court reporters.
This quote is very nihilistic in its “nothing really matters” sentiment, but a little nihilism in the world can keep us grounded and focused on the now. There is an entire industry of prediction that is more often than not wrong, and is ultimately responsible for doomsday headlines we scroll through every day. Technology and AI are a very hot topic right now. But unlike any other time in history, we are being bombarded with articles written about theoretical technology with headlines that read as if the technology is already there.
If a technology is threatening your industry, do some research. Is this technology real or theoretical? Does the new technology do your job better than you do, or is it taking a step backwards? Does this new technology improve upon what you do and will it actually help you do your job better?
I guess a good rule of thumb is to keep your eye on technologies in your industry. If they don’t improve upon what it is you do now, dismiss them. If they can improve what you do and make your life easier, implement them. If they threaten to take the place of what you do, do something about it. Adapt and keep up.
As far as the court reporting industry is concerned, on this timeline in the year 2018 and into the foreseeable future, we are the undisputed champions of the capture and preservation of the record in the legal world. Sure, we suffer the blows of theoretical punches year after year, but theoretical punches don’t hurt us or knock us down. They only make us adapt and become even more prepared for the real punches that will inevitably come our way.
About the Author:
Todd L. Persson has been serving the Cleveland legal community as a court reporter since 2002 and is a Co-Founder of Cleveland-based litigation support firm Cleveland Reporting Partners, LLC. He has spoken on the future of court reporting and technology on the Stenographers World Radio national podcast, has had blogs featured nationally by the National Court Reporters Association and the American Translators Association, and has contributed articles to the Cleveland Metropolitan Bar Journal. To read Todd’s full bio, visit our Partners page. Connect with him on LinkedIn here.
CRP Blog Editors in Chief:
Grace Hilpert-Roach has been serving the Cleveland legal community as a court reporter since 1992 and is a Co-Founder of Cleveland Reporting Partners, LLC. To read Grace’s full bio, visit our Partners Page. Connect with her on LinkedIn here.
Christine Zarife Green has been serving the Cleveland legal community as a court reporter since 2008 and is a Co-Founder of Cleveland Reporting Partners, LLC. To read Christine’s full bio, visit our Partners Page. Connect with her on LinkedIn here.