What is Transcription?

what is transcription

The article contains detailed information about the definition of transcription, manual and automatic methods of transcription, tips on how to get better accuracy with automated transcription software, and tools that help in speeding up reviewing and editing process.


  1. What is Transcription?
  2. Manual Transcription
  3. What is Transcription Software?
  4. What is Automatic Transcription?
  5. Benefits of Automatic Transcription Software
  6. Human Versus Automatic Transcription Services: Which One to Use?
  7. How to Get Good Accuracy with Automatic Transcription?
  8. Tools for Speeding Up Reviewing and Editing Process



What is Transcription?


The word transcription is used in various areas, but in linguistics, it describes a process of converting (transcribing) speech to text. Transcribing audio results in a transcript - file containing the text. Such work is often needed to transcribe interviews, seminars, meetings, conferences, roundtables, add subtitles to videos, etc. Converting speech to text is a complex process and requires individuals to have speed, knowledge of grammar, patience, and also quick typing skills. Transcription is used by researchers, journalists, scholars, students and done manually or using automatic transcription software.

Transcription is used by a wide range of industries and experts for professional and academic purposes. Basically, anyone who needs to convert spoken information to written form needs transcription. Some of them are:

- Researchers, students doing a thesis, marketers need to analyze a large amount of audio or video interviews. Spoken data needs to be transcribed for analysis and getting various statistics and insights from data.
- Journalists probably conduct more interviews than anybody else. They usually publish the interview in text form, so it is essential to transcribe the audio files.
- Healthcare, HR, and sales professionals need to transcribe interviews for reporting.
- Media production companies that create videos, podcasts, movies, TV shows use transcripts to create subtitles or closed captions.
- Government organizations and companies need to transcribe meetings for archiving and reporting, etc.


Manual Transcription


You can transcribe the file yourself using software and tools that facilitate the transcription process, in which case it will take long enough time (for every hour of recording, at least 4 hours of work required for the professional transcriber, and a beginner needs much more). This method does not provide the necessary speed in many cases.
Alternatively, you can hire an experienced transcriber, which will spend less time than you for transcription, but still, for an hour of recording, several hours of work is required. Also, it will cost a lot of money and urgent transcription is more expensive.



What is Transcription Software?


Transcription software is a computer program that assists humans in converting speech to text. The user is provided with the tools to play/pause the audio fragment and type the spoken text, and then repeat the process for the whole audio. The main components of the transcription software are an audio player with play, pause, rewind, change playback speed and other functionalities, also text editor. The software can be used along with specialized hardware like foot pedals for playing/pausing, and headphones. The software is used for transcription of all kinds of speech recordings like meetings, interviews, lectures, also for adding subtitles to video. In the latter case, timing information for each piece of text is also added.
While all these tools help the user in the transcription process, they don’t speed up the most time-consuming part of it – listening to audio and typing the text manually. Professional transcriber still needs 4 hours to transcribe 1 hour of audio, while non-professional might need 6-7 hours. This is where automated transcription software helps.



What is Automatic Transcription?


Automatic transcription software includes speech recognition technology (speech-to-text) which automatically transcribes the speech with reasonable accuracy in a short time and provides a text editor with an integrated audio player for easier reviewing and editing the transcript. It dramatically reduces time and effort spent to transcribe speech to text in the following ways:

  • The audio is automatically transcribed with 85-90% accuracy (for good quality audio) in a very short time, leaving reviewing and error correction to the user.
  • Special text editor plays the audio fragment for any selected part of the text, facilitating the review and correction process.
  • Timestamps are automatically determined for every word, so closed captions can be easily extracted.
  • When playing audio, every played word is highlighted in transcription for easier review.



Benefits of Automatic Transcription Software


Since their launch, automatic transcription software has brought many important features that benefit the users.

Captures the Details

You may have a notebook with you, but it is almost impossible to write down all the critical information. You may need to remember a particular book or author the teacher recommended to read, a piece of information you missed in the meeting because you were distracted by something. When you look at your notebook, it is not there. There is nothing to worry about. Automatic transcription software converts the audio to text from start to finish, which means it includes all the details. You can find any word/topic you want. It may also help individuals with a short attention span and who can’t focus on tasks for more than a certain length of time without getting distracted.

Keyword search

As you have the full transcription of audio, you can easily navigate through the text using keyword search (Ctrl+F). With the help of this feature, you can catch up on what you might have missed during the recording process. Additionally, by clicking on any sentence related to the topic you are struggling to remember, you can play the audio part associated with that sentence.

Saves Your Time and Money

Of course, you may try to transcribe the recording yourself or hire a human transcriber, but it will cost you too much time and money. With the help of the software, you can transcribe a one-hour file in less than an hour and for several times less money.

Identifies Different Speakers

The software works with speaker diarisation technology. It can identify the number of people in the room. Therefore, you will not have difficulty assigning quotes to respective owners. The only thing you have to make sure is the recording quality is good.

Helps with a summary of the event

The transcript is a documented account of the event and contains all the details discussed at the event, making it possible to get an overview of the event without spending too much time listening to audio.

Helps you publish quickly

Researchers, journalists, and other users of interview transcription services need to publish their original work as soon as possible before others do. A qualitative study, for example, requires the transcription of many hours of audio/video interviews and observations. Transcribing them manually or hiring a professional transcriber might be too slow and expensive here, automatic transcription speeds up this process many times.

Transcribes large archives of audio/video

News, TV, and radio need keyword search in thousands of hours of audio/video recordings. Call centers need transcriptions of a large audio archive of calls for quality evaluation and statistics. Automatic transcription makes huge archives of data accessible, which is impossible to transcribe manually.

Makes audio/video accessible for disabled

As speech recognition technology provides timestamps for every word in the automatically generated transcript, video subtitles can be extracted easily. While automatically generated subtitles are not always perfect, they are still better than nothing for people with hearing disabilities. That’s why automatic subtitling is already in use in some video hosting services like YouTube for some languages.

Protects sensitive information

Some company meetings, psychology consultation sessions, or healthcare research interviews may include sensitive information that cannot be shared with human transcribers. Even though human transcribers sign NDA and other contracts for not disclosing personal or confidential information, many patients still feel uncomfortable if their voice recordings will be listened to by a third person. Automated transcription software ensures that no person is involved in the transcription process except the user, and data is password-protected. Besides, GDPR-compliant automatic transcription service providers are enforced by law to delete all data from servers leaving no logs, if the user is done transcribing and deletes the data (either from the web interface or asking the support team to delete it).



Human Versus Automatic Transcription Services: Which One to Use?


With huge demand, a large amount of information, and content to create, manual transcription takes too much time and effort. Although automatic transcription software programs are gaining popularity, manual transcription is still used heavily in many areas.
Manual transcription involves using the services of individuals rather than technology. It is done by listening to the audio file and writing or typing the text with higher accuracy. In comparison, automatic transcription software uses automatic speech recognition technology (ASR), which automatically converts the audio to text. Let’s look at different factors and see how two services perform in each regard.

Speed and productivity

When it comes to speed, automatic software, without a doubt, comes on top. Automatic transcription services provide transcriptions for the same audio file in a matter of minutes, compared with human transcribers who may need several days.


Humans are more accurate in transcribing (Though it can change in the future). Even though the technology has advanced than ever before, ASR can still make some mistakes during the process. The main reasons for that are some heavy accents or dialects, noisy background, or too many people speaking at the same time. Another possible drawback of automatic transcription software is that it can also show the filler words or repeated words that are not important. The human transcriber cuts these filler words during the transcription process.


It is one of the main reasons people opt for automatic transcription services. To use the automatic transcription services, you pay many times less money compared with the human service. That is a significant amount of money, especially if you have many audio files to transcribe. It is also worth mentioning that some automatic services can give you a free plan. You can test the software for free or even use it regularly with some time limits. On the other hand, it is also tough to find transcribers who know how to transcribe audio professionally. There is a growing shortage of human transcribers available on the market, and with fewer individuals available for the service, costs can also be high.

Information safety

As was already stated above, if you have sensitive information that you do not want to give to a third party – a human transcriber, you can use automatic software to create the transcript yourself quickly.

To sum it up, choosing either service depends on the features that come with them. If accuracy is your top priority and you are not worried much about time and cost, you should opt for human service. This also applies to low-quality or noisy audio recording where automatic transcription errors will be so many that editing it will be harder than transcribing manually from scratch. If your audio quality is good enough and you wish for a much more cost and time-effective service, automatic transcription is the way to go.



How to Get Good Accuracy with Automatic Transcription?


Although automatic transcription software makes it easier to transcribe audio files, some steps are required to get an accurate transcript. The person who records the interview, focus group discussion, meeting, etc. should ensure the following in the recording process:
Make sure that the recording device is intact and functions with full potential. It should not have a problem recording the voices, so it is important to have microphones in good condition and clean. Make a test recording and listen to ensure the recording quality is high. The recording device should also have enough battery and memory to record the interview.
The microphone should be close to speakers, at a maximum of one or two meters. So, if you’re a student sitting at the last row in the lecture, recording the voice, don’t expect high accuracy from automatic transcription software. You can hold the recording device in your hand close to the speaker or put it on the table in front of him/her. If there a few speakers, it’s best to have them to seat around a table or in a round ensuring that everyone is close to the microphone. In large meetings or conferences, speakers should come and speak to the microphone instead of speaking or asking questions from their place. It’s best to have a separate microphone for everyone in meeting as in conference rooms and have voice recorded in separate audio channels.
Make sure only one speaker speaks at a time. The automatic transcription accuracy drops dramatically if two or many people are speaking at the same time as currently, this is one of the hardest research challenges in speech recognition.
The level of background noise and echo is essential in the quality of recording and, therefore, transcript. Try to conduct recording in a quiet place with as little background noise as possible. One should avoid conducting interviews in crowded places, with background music or large rooms with echo.
Whether the transcription process will be done manually or automatically, following these steps provides a chance to have a good quality recording, which is the main factor in a good quality transcript. Even hiring a human transcription service will cost you higher if audio quality is poor.



Tools for Speeding Up Reviewing and Editing Process


The first step in transcribing your file using the automatic transcription software is uploading your audio or video file and having it automatically transcribed with speech-to-text technology. Usually, it takes less than the duration of the audio file. Then, draft transcription opens in the special text editor with the integrated audio player. The audio player knows the timing of each word in the text. In Voicedocs Transcription Editor you can review and edit the transcription in two ways:

  1. You can use the Play all feature of the audio player. When playing audio, every played word is highlighted in the text so you can easily see if there are incorrectly recognized words. Playing pauses when you click on and edit the word, then you can resume playing from that part until the end.
  2. You can navigate through the text with arrow buttons on the keyboard. Every text fragment around the cursor is automatically played so you can see if there are errors. Audio of the fragment automatically played each time you move to a new fragment. Fragment length can be changed from Toolbar. The current fragment can be played again by pressing Ctrl+Space.

You can change audio playback speed from Toolbar for faster reviewing in both methods. One of the best practices is to play all audio again from start to end after finishing all edits as a final review.
After editing and reviewing, you can press the Export button and download the text to your device in different formats such as DOC, TXT, SRT, or JSON. You can also include timestamps for every sentence when downloading the transcript.
Subtitle editor tool with more features for adding subtitles to videos will be discussed in a separate article.