Ralf de Rijcke | String to .wav in Node

String to .wav in Node

String to .wav

The title 'String to .wav in Node' is a bit techie. In normal people language this means 'Synthesis text to speech in Node using Azure'. Hmmm, boring text. Let's get this done.

Prerequisites

Node installed (nodejs.org)
Azure subscription (azure.microsoft.com)

Create as resource

To be able to synthesis text into speech a Cognitive Services Speech Service resource is required. In the following steps we'll create such a resource. I'll use Microsoft's Recommended naming and tagging conventions.

Resource group

First step is to creat a new Azure resource group with name rg-ttsnode-demo and located in West Europe.

az group create --name rg-ttsnode-demo --location westeurope

Cognitive Services Speech Services

Second step is creating a new Cognitive Services Speech Service resource with name cog-ttsnode-demo. For this example the Free tier is used. Take note of location 'westeurope', we need it in a later step.

az cognitiveservices account create --name cog-ttsnode-demo --resource-group rg-ttsnode-demo --location westeurope --kind SpeechServices --sku F0

Get the resource key

Use the following command to display the subscription keys of our Cognitive Service Speech Service.

az cognitiveservices account keys list --name cog-ttsnode-demo --resource-group rg-ttsnode-demo

This will result in something like below. Take note of 'key1', we need it in a later step.

{
  "key1": "f32f95d207514d22933841ee9670444e",
  "key2": "12065ae5b39a4f16b99b83657dffc60e"
}

The keys shown in the result above cannot be used in your application, that resource has already been deleted. :clown_face:

The Node script

Create a new folder called ttsnode-demo. With a command line tool step into that folder. To initialize a new npm package run the initialize command below to create a package.json file. The file will be created in our ttsnode-demo folder.

npm init

Then install the Speech SDK by calling the install command.

npm install microsoft-cognitiveservices-speech-sdk

Create a file called index.js and open it with your favorite text editor. Add the following line to the top. This makes it possible to access the Speech SDK.

const sdk = require("microsoft-cognitiveservices-speech-sdk");

Next we add two constants, one for our subscription key and the second for the region of where our resource is located. Both values u probably didn't note somewhere when I said to. But you can find them in the steps where you created the Cognitive Services Speech Services resource.

const subscriptionKey = "f32f95d207514d22933841ee9670444e";
const serviceRegion = "westeurope";

Create another constant that will hold our speech configuration.

const speechConfig = sdk.SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);

The real work will be done in the following function. It uses the Speech SDK and our constants to synthesis text into speech. Create a function that takes 3 parameters, the first one text holds the text to synthesis. The second one (filename) will contain the filename used for writing our .wav output file. The last parameter (callback) is our callback method for when the synthesis is completed.

function toSpeech(text, filename, callback) {}

In this function we setup the output file and SpeechSynthesizer by adding these two lines to it.

var audioConfig = sdk.AudioConfig.fromAudioFileOutput(filename);
var synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

Next add the synthesizer.speakTextAsync(...) method, below the previous lines. In the next step i'll explain the result => {} and err => {} parameters.

synthesizer.speakTextAsync(text, result => {}, error => {});

The result parameter will contain our synthesised text; the final product or when there was a problem, some problem information. Replace result => {} with the code below. Remember to call synthesizer.close() before using the generated file.

result => {
  if (result.reason !== sdk.ResultReason.SynthesizingAudioCompleted) {
    synthesizer.close();
    console.error(`Failed with ${JSON.stringify(result)}`);
    callback();
  }
  synthesizer.close();
  callback(filename);
}

As you can see, when calling the callback(...) method, I only supply the filename when the synthesis succeeds. Otherwise it will be undefined.

The error parameter holds the error information when all went wrong. Apply the same technique as before, replace error => {} with the code below.

error => {
  synthesizer.close();
  console.error(`Failed with error ${error}`);
  callback();
}

At the end of the file we setup our required variable.

var text = "Hello this is a text.";
var filename = `${__dirname}/output.wav`;

var callback = function (filename) {
  if(filename) {
    console.log(`File ${filename} has been created.`);
  }
  else {
    console.error('There\'s no output.');
  };
}

After setup we simply call our toSpeech(text, filename, callback) method.

toSpeech(text, filename, callback);

Save the file and call it by executing the following command.

node index.js

A output.wav file will be created in your script directory. That's all to convert a string to .wav in Node.

Completed example

The completed index.js file should look like this.

const sdk = require("microsoft-cognitiveservices-speech-sdk");

const subscriptionKey = "f32f95d207514d22933841ee9670444e";
const serviceRegion = "westeurope";

const speechConfig = sdk.SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);

function toSpeech(text, filename, callback) {
    var audioConfig = sdk.AudioConfig.fromAudioFileOutput(filename);
    var synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

    synthesizer.speakTextAsync(text, result => {
        if (result.reason !== sdk.ResultReason.SynthesizingAudioCompleted) {
          synthesizer.close();
          console.error(`Failed with ${JSON.stringify(result)}`);
          callback();
        }
        synthesizer.close();
        callback(filename);
      }, error => {
        synthesizer.close();
        console.error(`Failed with error ${error}`);
        callback();
      });
}

var text = "Hello this is a text.";
var filename = `${__dirname}/output.wav`;

var callback = function (filename) {
  if(filename) {
    console.log(`File ${filename} has been created.`);
  }
  else {
    console.error('There\'s no output.');
  };
}

toSpeech(text, filename, callback);