Skip to main content

Using voice-to-text to save time and energy and boost efficiency at work

A slew of transcription apps are helping journos, lawyers, medics and others do away with manually taking down notes by converting and, in some cases, translating speech to text


Namrata Kohli  |  New Delhi


Journalist Amit Tyagi works for a leading digital news platform and has to engage with several people through the day over Zoom Calls, telephonic discussions and webinars. What makes his life simpler is a transcription service called Speechnotes. Says Tyagi, “I speak, it notes down as its name suggests. Whatever I speak gets written down automatically. In the process I save a great deal of time.”

Lawyer Lubna Yusuf uses 'live transcribe' by and 'Samsung note' app, both of which were pre-installed in her Galaxy note phone. Says Yusuf, “I use transcription services to record certain client meetings, especially the complex ones that I need to refer to again. It helps me track the details later. I also use it like a virtual typist or secretary to take down notes when I'm thinking aloud.”

People are aware of the power of voice and don’t want a single breath to go waste. According to Bobble AI Data Intelligence, 18 per cent of users use ‘speech-to-text' daily. Says Rahul Prasad, Co-founder, Bobble AI, a conversation media platform: “There has been a massive increase in the adoption of transcription in the recent past. Bobble voice API enables developers to convert speech-to-text by using neural network models across diverse accents and dialects. Our ASR engine works with nine major Indian Languages--English, Hindi, Bengali, Punjabi, Marathi, Gujarati, Kannada, Tamil, and Telugu.”

Man versus machine

Transcription services allow you to save time and energy. How much time does it take to manually transcribe an hour of audio? The industry standard is four hours of transcription time for one hour of clear audio, or a 4:1 ratio. That's four minutes of transcription time for every minute of recorded speech. But with voice-to-text technology, things change dramatically. Says Sri Lanka based-Inoka Dias, who runs coaching classes: “I use a transcription service called Temi to record my lectures which span 45 minutes to an hour and I get the lectures transcribed in less than an hour. A lot depends on the number of words. In coaching sessions, you do get spells of silence that can run up to a minute or two at a stretch, but you are charged per word. So far I’ve paid $7-9.” These are hugely popular with students and teachers globally. Says Dias: “I find them 90 per cent accurate. Only sometimes certain names and a few difficult words aren't transcribed properly, that too because of the accent. The best part is that Temi eliminates noise and gives a time stamp, so that makes life a lot easier.”

Table 1: Transcription voice-to-text application companies/developers

No.

Application

Company/developer

Type of subscription

1

Cloud Speech-to-Text API

Google

Up to 60 minutes free*

2

Amazon Transcribe

Amazon

After free trial upto 60 minutes for 12 months, $0.0004 per second

3

Azure Cognitive   Services AI platform 'Transcribe in Word'

Microsoft

5 audio hours free per month**

4

Dragon Dictation

Nuance Communications, Inc.

Free

5

Speech to Text Converter - Voice Typing App

Nazmain Apps

Free

6

Speechnotes - Speech to Text Notepad

Speechlogger

$7.27 lifetime or 7 days free then $0.93 per month (Only extra feature and ads-free experience)

7

Speech to Text

Xenom Apps

Free

8

Voice Notebook - Continuous Speech to Text

Simple Seo Solutions

$3.30 for lifetime (For premium features and ads-free experience)

9

WhatsMic Keyboard: Voice to Text Converter App

APK Kajal

$9.25 for lifetime or $1.85 per month (For premium feature and ads-free experience)

10

Otter Voice Meeting Noyes (For English)

Otter.ai

$108.37 annually or $8.59 monthly (For premium)

11

Translate All - Text, Voice & Camera Translator

Asitis

Free

12

Voice Notes

Pacific Fisher Group

$2.51 lifetime for ads-free experience

13

Speech Texter - Speech to Text

Speech Texter

Free

14

Write SMS by Voice

UX Apps

$2.25 lifetime for ads-free experience

15

Voice Typing Keyboard - Speech to Text Converter

Appezite Studio

$1.12 lifetime for ads-free experience

16

Translate All Text Voice Conversation Translator

Infinity Apps Sol

$4.76 for 1 month or $13.08 for 3 months or $25.11 for 6 months or $48.90 for 1 year (For unlimited feature and ads-free experience)

17

Voice Notes - Speech to Text Notes

Innovative World

Free

18

English Voice Typing: Voice to Text Converter

Solace Apps

Free

19

Temi

Rev

After first free transcript up to 45 minutes, $0.25 per
audio minute


What's available*See Table 2; **See Table 3; Source: TechSci Research

Technology majors such as Google, Microsoft and Amazon all have such an offering. Initial trials are mostly free and you are charged per audio file later. There are some that continue to be free such as Dragon Dictation by Nuance Communications, Nazmain Apps’ Voice Typing app, Xenom Apps' Speech to Text, Asitis’ Translate, and Speech Texter. Some, like WhatsMic and Speechlogger charge a one-time fee with unlimited access.

Says Prashanth Rao, Partner, Deloitte India: “Most operating systems within the phone have voice-to-text embedded in the OS. iOS calls this as dictation. Apple is further enhancing iOS 14, which has just been released, by bringing translation from one language to another. has a speech-to-text application programming interface (API) based on cloud which can be used by any of the app developers to embed this feature into their  Amazon has a similar API-based service on cloud called Amazon Transcribe. You also have a bunch of voice-to-text apps on Play Store and iOS store.” There are innovations expected in this space. Says Rao: “We may soon be seeing collaborative tools such as MS Teams, and Zoom coming up with a feature to enable users to capture the conversation as notes. A bunch of note-taking apps also are looking to add this feature to the app.”

Accuracy is paramount

The clearer the audio, the higher the accuracy. Imagine an audio clip with two speakers with distinctly different accents, coining a few technical terms, industry jargon and niche brand names. The accuracy of transcription can vary from 90 to 100 per cent but that makes all the difference. Generally, a few factors affect accuracy. These include audio clarity, audio recording quality, number of speakers, background noise and regional accents. It also depends on the “coherence” of the speaker, that is, do the speakers talk over each other? Do they speak quickly or slowly? Do they finish a thought before beginning the next sentence? If it’s a specialised field such as medical or legal, a certain amount of research may be required to double-check names, places and specialised terminology. Other challenges may be related to use of short forms and out-of-dictionary words.

Speed is another crucial factor. Given enough time, we could all transcribe audio with close to 100 per cent accuracy, but these services are designed to take the manual labour out of transcription. From the moment we hit “upload” to the second that the transcription was finished; the timer was running. Transcription apps are rated on how fast they can convert voice to text with maximum accuracy.

Table 2: Cloud speech-to-text API pricing

FeatureStandard models (all models except enhanced video and phone call)Enhanced models (video and phone call)
0-60 MinutesOver 60 Minutes, up to 1 million minutes0-60 minutesOver 60 minutes, up to 1 million minutes
Speech recognition (without data logging-default)Free$0.006/15 secondsFree$0.009/15 seconds
Speech recognition (with data logging opt-in)Free$0.004/15 secondsFree$0.006/15 seconds


Transcription process

The process of transcribing an audio file is simple and logical. Visit the app or website and "Select Audio/Video File" from your phone or computer and upload it. Enter your email address. In a few minutes, you'll receive an email when your transcript is ready. You can then download the transcript in your preferred format such as word doc, pdf, txt, srt, or vtt.

Many websites allow you to place your first order for free. For example, with Temi you will be able to place your first free trial for files spanning 45 minutes or less. After your first free transcript, Temi orders will cost $0.25 per audio minute. Says Sebastian Lanser, Temi Support: “Once your first order has been placed, you'll be prompted to set up your account when you open your transcript in the delivery email. Once your account is set up, you'll have full access to the Temi Editor to review and edit your transcript as well as our various file formats for download.”

What about video-to-text?

The good thing is that video platforms have started with this service, at least in their advanced versions.

Take the case of Zoom. The platform offers cloud recording transcripts in its business and enterprise plans. Says Lola Garcia Santos, Account Executive, Zoom Video Communications: “The business license which costs $199.90 per host licence a month or $1,999 per host license a year allows 300 participants per meeting and transcriptions on cloud recordings. Zoom has an ‘audio transcript option’ under cloud recording, to automatically transcribe the audio of a meeting or webinar recorded to the cloud. After this transcript is processed, it appears as a separate vtt text file in the list of recorded meetings. In addition, you have the option to display the transcript text within the video itself, similar to a closed-caption display.”

The transcript is divided into sections, each with a timestamp that shows how far into the recording that portion of the text was recorded. You can edit the text to more accurately capture the words, or to add capitalisation and punctuation, which are not captured by the transcript.

Table 3: Microsoft Azure Cognitive Services AI platform 'Transcribe in Word' pricing

FeatureFeaturePricing
Free - Web/ContainerStandard5 audio hours free per month
One concurrent requestCustom5 audio hours free per month
Endpoint hosting: 1 model free per month
.Conversation Transcription Multichannel Audio5 audio hours free per month
Standard - Web/ContainerStandard$1 per audio hour
20 concurrent requestsCustom$1.40 per audio hour
Endpoint hosting: $0.0538 per model per hour
.Conversation Transcription Multichannel Audio$2.10 per audio hour


For me-time too

These AI-powered transcription tools can be useful in one’s personal life apart from professional work. Take the case of IT professional Hanif Sohrab who uses voice-to-text apps when on the move especially while jogging. Says Sohrab: “I use Microsoft OneNote to record my thoughts during my morning walks. I talk to my phone on what all needs to get done such as urgent emails, or WhatsApp replies that require immediate attention--or general thoughts that come to my head while walking, related to an article I may be writing, or some software logic that I was thinking about. I use Microsoft Cortana while at work in front of my personal computer (PC). In my experience, Microsoft OneNote app is fairly accurate whereas Microsoft Cortana, which is used on my PC, needs editing.”

Some people use transcription services for storytelling and capturing interesting anecdotes that others randomly share. Take the case of techie Sami Iqram who shares in his blog how he was at his friend's place when the friend’s grandmother narrated a story from her childhood. Says Iqram: “I could see that she was excited about sharing it with everyone but there was a problem—she narrated the story in Spanish, a language I don’t understand. I pulled out Google Translate to transcribe the speech as it was happening. As she was telling the story, the English translation appeared on my phone so that I could follow—it fostered a moment of understanding that would otherwise have been lost.”

All it takes is your voice. Transcription services nudge you to harness the power of your voice. Now more than ever, we’re all very busy—juggling family, work, friends, and whatever else life throws our way. These are tools that allow us to immerse and revel in the conversation at hand, the idea, the thought and “live” in the moment. Transcription services allow us to live mindfully.

https://www.business-standard.com/article/pf/using-voice-to-text-to-save-time-and-energy-and-boost-efficiency-at-work-120092500652_1.html





Comments

Popular posts from this blog

Telemedicine to the aid of home-bound patients in the time of Covid-19

Telemedicine in covid-19 times: You can get to the doctor almost anytime, anywhere, be it on your screen, via voice or plain text for a lower price than in-person consult Namrata Kohli   |   New Delhi Telehealth is bridging the gap between patient and physicians. The physician can now virtually visit the stay-at-home patient and heal from a distance Telemedicine in covid-19 times:  When 37-year-old Priyanka was down with fever and dry cough, she decided to consult a doctor over a WhatsApp call before giving her blood sample for an RT-PCR test. Based on her symptoms, the physician alerted her that it wasn't a mild Covid infection but a moderate one. His diagnosis was confirmed when the test report showed a viral load count of 20. “The massive benefits of telemedicine became evident during the pandemic,” says Priyanka’s doctor, New Delhi-based consultant physician Dr Arvind Kumar. “Everything is about time and if my patients have complications late at ni...

The travel bug is mutating: Event-driven gig journeys are the new thing

  Many Indian travelers are no longer satisfied with traditional, cookie-cutter travel itineraries. Defying conventions they are willing to travel the globe seeking new cultural or sports based immersive experiences By Namrata Kohli  Every gig you take away from home is a ticket to a new story, a new culture, and a new adventure. For Indian travelers, the emerging trend of "gig tripping" - planning international trips around concerts and sports events - is gaining significant momentum. Data from EaseMyTrip reveals how this evolution in travel preferences has led to a 20% rise in bookings for destinations hosting major concerts and sporting events during November/December 2024. According to Nishant Pitti, CEO and co-founder of EaseMyTrip, “Around 40% of Indian travellers are willing to fly short-haul for events, while 35% are considering long-haul destinations. The trend extends beyond music to sports events, with 45% of Indian travelers showing interest in supporting their fa...

“Religion is not restricting, rather liberating...there are no rules, only guidances,” says Abhay Firodia on Abhay Prabhavana

A-first-of-its-kind knowledge center based on philosophy of Jainism was inaugurated on 5 th November by Union Minister Nitin Gadkari at Maval, on the outskirts of Pune in Maharashtra . Established by  Abhay Firodia, Chairman Force Motors,  a Pune based leading automotive company, this Museum represents a significant milestone in preserving and promoting India’s spiritual legacy as seen through the lens of Jain philosophy.  An alumnus of Scindia School, this “Museum of Ideas”, dedicated to Jain values  was inaugurated on Abhay Firodia's 80 th birthday.  The inauguration was graced by Jyotiraditya Scindia, Union Minister and Maharaja of Gwalior, Justice Dalveer Bhandari, International Court of Justice, The Hague, Maharaj Kumar Lakshyaraj Singh of Mewar; Padma Bhushan D R Mehta, Founder of BMVSS; and Padma Bhushan Anna Hazare, Gandhian leader, Smt. Maneka Gandhi, Minister of State (Independent Charge) for Environment and Forests, Government of India. The event re...