Speech-to-text conversion is a powerful technology that allows users to convert spoken words into written text. This technology has become increasingly popular due to its convenience and accessibility. In this tutorial, we will learn how to create a speech-to-text converter using JavaScript. We will use the Web Speech API, which is a part of the HTML5 specification and provides speech recognition capabilities in the browser. By the end of this tutorial, you will have a basic understanding of how to use the Web Speech API to convert speech to text in your web applications.
This speech-to-text converter has multiple languages in which you can listen to the speech and also in men’s as well as women’s voices.
Let’s get started with the coding, create an HTML file name it with .HTML extension, and paste the code below
Note: Don’t forget to read the explanation and understand the code other it is just a waste to copy-paste and make the project. A whole explanation of the code is given after each file CSS, HTML, and JS, respectively.
When to use which CSS property
HTML File(maze.html):
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Text to Speech Converter</title>
<link rel="stylesheet" href="m.css">
</head>
<body>
<div class="container">
<h1>Text to Speech Converter</h1>
<textarea id="text" placeholder="Enter text..."></textarea>
<label for="voice-select">Select a voice:</label>
<select id="voice-select">Male</select>
<select id="voice-select">Female</select>
<button id="speak-btn">Speak</button>
</div>
<script src="maz.js"></script>
</body>
</html>
CSS File(m.css):
body {
font-family: Arial, sans-serif;
margin: 0;
padding: 0;
display: flex;
justify-content: center;
align-items: center;
height: 100vh;
background-color: #f2f2f2;
}
.container {
text-align: center;
}
textarea {
width: 80%;
height: 200px;
margin-bottom: 20px;
padding: 10px;
border: 1px solid #ccc;
border-radius: 5px;
resize: vertical;
}
button {
padding: 10px 20px;
font-size: 1rem;
background-color: #007bff;
color: white;
border: none;
border-radius: 5px;
cursor: pointer;
}
button:hover {
background-color: #0056b3;
}
JavaScript File(maz.js):
const textArea = document.getElementById('text');
const speakButton = document.getElementById('speak-btn');
const voiceSelect = document.getElementById('voice-select');
let voices = [];
// Fetch available voices and populate the voice select dropdown
function populateVoiceList() {
voices = speechSynthesis.getVoices();
voiceSelect.innerHTML = '';
voices.forEach((voice) => {
const option = document.createElement('option');
option.textContent = `${voice.name} (${voice.lang})`;
option.setAttribute('data-lang', voice.lang);
option.setAttribute('data-name', voice.name);
voiceSelect.appendChild(option);
});
}
populateVoiceList();
if (speechSynthesis.onvoiceschanged !== undefined) {
speechSynthesis.onvoiceschanged = populateVoiceList;
}
// Speak text using the selected voice
speakButton.addEventListener('click', () => {
const text = textArea.value.trim();
if (text !== '') {
const selectedVoice = voiceSelect.selectedOptions[0].getAttribute('data-name');
const utterance = new SpeechSynthesisUtterance(text);
const voice = voices.find((v) => v.name === selectedVoice);
if (voice) {
utterance.voice = voice;
speechSynthesis.speak(utterance);
}
}
});

Explanation of HTML Code:
<!DOCTYPE html>: Declares the document type and version of HTML (HTML5 in this case).<html lang="en">: Opens the HTML document and specifies the language of the document (English).<head>: Contains meta-information about the document, such as character encoding and viewport settings.<meta charset="UTF-8">: Specifies the character encoding for the document as UTF-8, which supports most characters worldwide.<meta name="viewport" content="width=device-width, initial-scale=1.0">: Sets the viewport to the width of the device and sets the initial zoom level to 1.0, ensuring that the page is responsive and scales correctly on different devices.<title>Text to Speech Converter</title>: Sets the title of the document, which is displayed in the browser tab.<link rel="stylesheet" href="m.css">: Links an external CSS file (m.css) to style the HTML content.</head>: Closes the head section of the document.<body>: Opens the body section of the document, which contains the visible content of the page.<div class="container">: Defines a container div to hold the content of the page.<h1>Text to Speech Converter</h1>: Displays a heading at the top of the page.<textarea id="text" placeholder="Enter text..."></textarea>: Creates a textarea element where the user can enter text to be converted to speech. The placeholder attribute provides a hint to the user.<label for="voice-select">Select a voice:</label>: Displays a label for the voice selection dropdown.<select id="voice-select">Male</select>: Creates a dropdown (select) element for selecting a male voice. However, using the same id for multiple elements is invalid in HTML.<select id="voice-select">Female</select>: Creates a dropdown (select) element for selecting a female voice. This should be a separate select element with a unique id.<button id="speak-btn">Speak</button>: Creates a button element that, when clicked, triggers the text-to-speech conversion.</div>: Closes the container div.<script src="maz.js"></script>: Links an external JavaScript file (maz.js) that contains the logic for the text-to-speech conversion.</body>: Closes the body section of the document.</html>: Closes the HTML document.
Please note that using the same id for multiple elements (voice-select in this case) is invalid HTML. Each element should have a unique id.
Explanation of CSS Code:
body: Selects the entire body of the HTML document.font-family: Arial, sans-serif;: Sets the font family to Arial, a common sans-serif font, and fallback to a generic sans-serif font if Arial is not available.margin: 0;: Removes any default margin around the body.padding: 0;: Removes any default padding around the body.display: flex;: Turns the body into a flex container.justify-content: center;: Centers the flex items along the main axis (horizontally).align-items: center;: Centers the flex items along the cross axis (vertically).height: 100vh;: Sets the height of the body to 100% of the viewport height, ensuring that the content is vertically centered.background-color: #f2f2f2;: Sets the background color of the body to a light gray..container: Selects an element with the class name “container”.text-align: center;: Centers the text inside the container horizontally.textarea: Selects all textarea elements.width: 80%;: Sets the width of the textarea to 80% of its containing element.height: 200px;: Sets the height of the textarea to 200 pixels.margin-bottom: 20px;: Adds a bottom margin of 20 pixels to the textarea.padding: 10px;: Adds 10 pixels of padding inside the textarea.border: 1px solid #ccc;: Adds a 1-pixel solid border around the textarea with a light gray color.border-radius: 5px;: Rounds the corners of the textarea.resize: vertical;: Allows the user to vertically resize the textarea.button: Selects all button elements.padding: 10px 20px;: Adds 10 pixels of padding on the top and bottom, and 20 pixels of padding on the left and right, inside the button.font-size: 1rem;: Sets the font size of the button to 1 rem (equivalent to 16 pixels).background-color: #007bff;: Sets the background color of the button to a blue color.color: white;: Sets the text color of the button to white.border: none;: Removes the border around the button.border-radius: 5px;: Rounds the corners of the button.cursor: pointer;: Changes the mouse cursor to a pointer when hovering over the button.button:hover: Selects the button when the mouse hovers over it.background-color: #0056b3;: Changes the background color of the button to a darker blue when hovering.
Explanation of Javascript Code:
const textArea = document.getElementById('text');: Gets the textarea element with the id “text” and stores it in thetextAreavariable.const speakButton = document.getElementById('speak-btn');: Gets the button element with the id “speak-btn” and stores it in thespeakButtonvariable.const voiceSelect = document.getElementById('voice-select');: Gets the select element with the id “voice-select” and stores it in thevoiceSelectvariable.let voices = [];: Initializes an empty array to store the available voices.function populateVoiceList() { ... }: Defines a function calledpopulateVoiceListto fetch the available voices and populate the voice select dropdown with them.voices = speechSynthesis.getVoices();: Gets the available voices using thespeechSynthesis.getVoices()method and stores them in thevoicesarray.voiceSelect.innerHTML = '';: Clears the existing options in the voice select dropdown.voices.forEach((voice) => { ... });: Iterates over thevoicesarray and creates an option element for each voice, appending it to the voice select dropdown.populateVoiceList();: Calls thepopulateVoiceListfunction to populate the voice select dropdown when the page loads.if (speechSynthesis.onvoiceschanged !== undefined) { ... }: Checks if thespeechSynthesisobject has anonvoiceschangedproperty. If it does, it sets theonvoiceschangedevent to call thepopulateVoiceListfunction when voices change.speakButton.addEventListener('click', () => { ... });: Adds a click event listener to the speak button. When the button is clicked, it executes the function to speak the text using the selected voice.const text = textArea.value.trim();: Gets the trimmed text from the textarea.const selectedVoice = voiceSelect.selectedOptions[0].getAttribute('data-name');: Gets the selected voice’s name from the data-name attribute of the selected option in the voice select dropdown.const utterance = new SpeechSynthesisUtterance(text);: Creates a new SpeechSynthesisUtterance object with the text to be spoken.const voice = voices.find((v) => v.name === selectedVoice);: Finds the voice object in thevoicesarray that matches the selected voice’s name.if (voice) { ... }: Checks if a matching voice was found. If a voice was found, it sets the utterance’s voice property to the selected voice and speaks the utterance usingspeechSynthesis.speak(utterance);.
In conclusion, creating a speech to text converter using JavaScript is a valuable skill that can be used to enhance the functionality of your web applications. The Web Speech API provides a simple and efficient way to add speech recognition capabilities to your applications, making them more accessible and user-friendly. By following the steps outlined in this tutorial, you can easily create your own speech to text converter and explore the possibilities of integrating speech recognition into your projects.





Leave a Reply