Ralf de Rijcke | Keyword recognition on steroids

Keyword recognition on steroids

Keyword Recognition

Last updated on 22-12-2020

Keyword recognition in continously speech is not easy and frequenty have a terrible delay between saying it's keyword and executing the related action. With Microsoft Cognitive Services Speech Studio it's getting much easier and faster. With the generation of custom keywords (pre-trained word models) there's no need for creating and training a neural network for just recognizing a keyword. And the best of al, the keyword reognition can be done offline.

In this blog I'll demonstrate how to use the Speech Studio feature of Cognitive Services.

Prerequisites

A Microsoft Azure subscription is required to access the Microsoft Cognitive Services Speech Studio. Also a basic understanding of the .NET Core CLI and C# language would be helpful. Other languages are also supported though, like JavaScript, Python, Java, Go and R.

Getting started

The idea is to show you how to create a console app that recognizes a keyword in speech. This keyword recognizer will use a pre-trained model to recognize a keyword.

First we need to create a custom keyword project which will contain our trained keyword model. This can by done with a few simple steps in Cognitive Services Speech Studio. After that we build a application that uses our model to recognize keywords in speech.

Create a custom keyword

Open Azure Cognitive Service Speech Studio.
Below Voice Assistants select Custom Keyword.
Create a New project with the name KeywordRecognizer and select English (United States) as Language.
When the project is created select/click/press the name of newly created project from the Custom Keyword project list.
Select Train model and give the model a name followed by typing the word you would like to have as keyword in the Keyword field.
For this example I'll use 'Computer' both as _Name and as Keyword ._

Effective keywords

The best results are made with words of 4 to 7 syllables. Words with less syllables, like 'hello', would create much more false triggers than 'jumping donkey' for example. It's also better not to choose for words that are frequently used in ordinary conversation. More info in creating an effective keyword see the given guidelines.

Select Next followed by Train on the 'Choose pronunciations' screen.
Now wait until the model is trained. The status of your keyword will say Succeeded.
For common words this wil take a minute or two. More complex words can take up to a few hours.

And we're done, let's write some code.

Create a Console Application

The first step is creating a new project called KeywordRecognizer. The project can be create using the dotnet Core CLI.

dotnet new console -n KeywordRecognizer

Add CognitiveServices Speech package to the project for Cognitive Services support.

dotnet add KeywordRecognizer package Microsoft.CognitiveServices.Speech

Load table

Add pre-trained keyword to your project

From the KeywordRecognizer keyword list select (click) the name of your pre-traind model.
Download the model by selecting Download. This will download a .zip archive containing a .table file.
Extract the downloaded .zip archive to the root of your project directory and rename the extracted .table file to Computer.table.

Set CopyToOutputDirectory for Computer.table to PreserveNewest. Do this by hand in the solution explorer or add the following <ItemGroup> to your project .proj file.

<ItemGroup>
  <None Update="Computer.table">
    <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
  </None>
</ItemGroup>

Recognize keyword once

For this situation all functionality runs locally.

Add the following usings to the Program.cs file.

using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

Create a static async Task RecognizeKeywordAsync() method below the Main method. This method will handle all keyword recognition.

static async Task RecognizeKeywordAsync()
{
}

In the RecognizeKeywordAsync method first load the Computer.table keyword recognition model from file.

var keywordModel = KeywordRecognitionModel.FromFile("Computer.table");

Then we configure the microphone input and create the keyword recognizer.

using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using var keywordRecognizer = new Microsoft.CognitiveServices.Speech.KeywordRecognizer(audioConfig);

Next we await the keyword recognizer until it recognizes the keyword. When that happens, stop the recognition and print the result text.

KeywordRecognitionResult result = await keywordRecognizer.RecognizeOnceAsync(keywordModel);
await keywordRecognizer.StopRecognitionAsync();
Console.WriteLine($"Keyword {result.Text} recognized.");

Now change Main to an async method and let it return a Task. From the Main method call RecognizeKeywordAsync.

static async Task Main(string[] args)
{
    await RecognizeKeywordAsync();
}

Run the program, and say 'Computer' when you think the time is right. The keyword is recognized once and immediately after you finished saying the keyword.

Below is the complete example.

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace KeywordRecognizer
{
    class Program
    {
        static async Task Main(string[] args)
        {
            await RecognizeKeywordAsync();
        }

        static async Task RecognizeKeywordAsync()
        {
            var keywordModel = KeywordRecognitionModel.FromFile("Computer.table");

            using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
            using var keywordRecognizer = new Microsoft.CognitiveServices.Speech.KeywordRecognizer(audioConfig);

            KeywordRecognitionResult result = await keywordRecognizer.RecognizeOnceAsync(keywordModel);
            await keywordRecognizer.StopRecognitionAsync();
            Console.WriteLine($"Keyword {result.Text} recognized.");
        }
    }
}

Recognize keyword continuously

To be able to recognize continuously a Cognitive Services Speech Services required.

Azure naming and tagging conventions

In the following example I'll use Microsoft's Recommended naming and tagging conventions.

Resource group

Create a new resource group with name rg-keywordrecognizer-demo.

az group create --name rg-keywordrecognizer-demo --location westeurope

Cognitive Services Speech Services

Create a new Cognitive Services Speech Service with name cog-keywordrecognizer-demo. In the following example the Free tier is used.

az cognitiveservices account create --name cog-keywordrecognizer-demo --resource-group rg-keywordrecognizer-demo --location westeurope --kind SpeechServices --sku F0

List sku's

Use the following command to list all possible SpeechServices sku's in your preferred location
az cognitiveservices account list-skus --kind SpeechServices --location westeurope

Get the subscription key and location

In code we need to known which key we can use to access the Speech Services and what it's location is. This is done in code by supplying a subscription key and location.

Subscription key

Use the following command to get the subscription keys of the earlier created Cognitive Service Speech Service.

az cognitiveservices account keys list --name cog-keywordrecognizer-demo --resource-group rg-keywordrecognizer-demo

The result is something like

{
  "key1": "f32f95d207514d22933841ee9670444e",
  "key2": "12065ae5b39a4f16b99b83657dffc60e"
}

The value of key1 (f32f95d207514d22933841ee9670444e) is used as subscription key in the following code examples.
The keys shown in the result above cannot be used in your application, that resource has already been deleted. :clown_face:

Location

The location is where the Cognitive Services Speech Service is located. This value is the same as we supplied when we created our cog-keywordrecognizer-demo Speech Service. In my example I use the location westeurope.

Back to the code

The code is basically the same as the previous example as shown below.

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace KeywordRecognizer
{
    class Program
    {
        public static async Task Main(string[] args)
        {
            ...
        }
    }
}

Create a static async Task RecognizeKeywordContinuouslyAsync() method below the Main method. This method will handle the continuously keyword recognition.

static async Task RecognizeKeywordContinuouslyAsync()
{
}

In the RecognizeKeywordContinuouslyAsync method first configure the subscription.

var config = SpeechConfig.FromSubscription("{SUBSCRIPTION_KEY}", "{LOCATION}");

Then add a TaskCompletionSource to use for waiting until completion.

var completionSource = new TaskCompletionSource<int>();

Followed by setting up our SpeechRecognizer. The Recognized code will be called when the keyword is recognized. Based on the recognized text, the program will continue or exit when 'Computer exit.' is recognized.

using var recognizer = new SpeechRecognizer(config);
recognizer.Recognized += (s, e) =>
{
    switch (e.Result.Text)
    {
        case "Computer exit.":
            completionSource.TrySetResult(0);
            break;
        default:
            Console.WriteLine($"{e.Result.Reason} {e.Result.Text}");
            break;
    }
};

Then load the Computer.table keyword recognition model from file.

var keywordModel = KeywordRecognitionModel.FromFile("Computer.table");

And finally start recognizing continously.

await recognizer.StartKeywordRecognitionAsync(keywordModel).ConfigureAwait(false);
Task.WaitAny(new[] { completionSource.Task });
await recognizer.StopKeywordRecognitionAsync().ConfigureAwait(false);

Run program

Say for example 'Computer hello' or just keep talking after saying the keyword 'Computer', the keyword is recognized immediately and the program will keep listening until you finished the sentence. When 'Computer exit.' is recognized the program will exit.

A sentence ends with a . (dot)

Take note of the ending . (dot) after a sentence.