Keyword recognition on steroids
Last updated on 22-12-2020
Keyword recognition in continously speech is not easy and frequenty have a terrible delay between saying it's keyword and executing the related action. With Microsoft Cognitive Services Speech Studio it's getting much easier and faster. With the generation of custom keywords (pre-trained word models) there's no need for creating and training a neural network for just recognizing a keyword. And the best of al, the keyword reognition can be done offline.
In this blog I'll demonstrate how to use the Speech Studio feature of Cognitive Services.
Prerequisites
A Microsoft Azure subscription is required to access the Microsoft Cognitive Services Speech Studio. Also a basic understanding of the .NET Core CLI and C# language would be helpful. Other languages are also supported though, like JavaScript, Python, Java, Go and R.
Getting started
The idea is to show you how to create a console app that recognizes a keyword in speech. This keyword recognizer will use a pre-trained model to recognize a keyword.
First we need to create a custom keyword project which will contain our trained keyword model. This can by done with a few simple steps in Cognitive Services Speech Studio. After that we build a application that uses our model to recognize keywords in speech.
Create a custom keyword
- Open Azure Cognitive Service Speech Studio.
- Below Voice Assistants select Custom Keyword.
- Create a New project with the name KeywordRecognizer and select English (United States) as Language.
- When the project is created select/click/press the name of newly created project from the Custom Keyword project list.
- Select Train model and give the model a name followed by typing the word you would like to have as keyword in the Keyword field.
For this example I'll use 'Computer' both as _Name and as Keyword ._
Effective keywords
The best results are made with words of 4 to 7 syllables. Words with less syllables, like 'hello', would create much more false triggers than 'jumping donkey' for example. It's also better not to choose for words that are frequently used in ordinary conversation. More info in creating an effective keyword see the given guidelines.
- Select Next followed by Train on the 'Choose pronunciations' screen.
- Now wait until the model is trained. The status of your keyword will say Succeeded.
For common words this wil take a minute or two. More complex words can take up to a few hours.
And we're done, let's write some code.
Create a Console Application
The first step is creating a new project called KeywordRecognizer. The project can be create using the dotnet Core CLI.
dotnet new console -n KeywordRecognizer
Add CognitiveServices Speech package to the project for Cognitive Services support.
dotnet add KeywordRecognizer package Microsoft.CognitiveServices.Speech
Load table
Add pre-trained keyword to your project
From the KeywordRecognizer keyword list select (click) the name of your pre-traind model.
Download the model by selecting Download. This will download a .zip archive containing a .table file.
Extract the downloaded .zip archive to the root of your project directory and rename the extracted .table file to Computer.table.
Set CopyToOutputDirectory for Computer.table to PreserveNewest. Do this by hand in the solution explorer or add the following <ItemGroup>
to your project .proj file.
<ItemGroup>
<None Update="Computer.table">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
</ItemGroup>
Recognize keyword once
For this situation all functionality runs locally.
Add the following usings to the Program.cs
file.
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
Create a static async Task RecognizeKeywordAsync()
method below the Main
method. This method will handle all keyword recognition.
static async Task RecognizeKeywordAsync()
{
}
In the RecognizeKeywordAsync method first load the Computer.table
keyword recognition model from file.
var keywordModel = KeywordRecognitionModel.FromFile("Computer.table");
Then we configure the microphone input and create the keyword recognizer.
using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using var keywordRecognizer = new Microsoft.CognitiveServices.Speech.KeywordRecognizer(audioConfig);
Next we await
the keyword recognizer until it recognizes the keyword. When that happens, stop the recognition and print the result text.
KeywordRecognitionResult result = await keywordRecognizer.RecognizeOnceAsync(keywordModel);
await keywordRecognizer.StopRecognitionAsync();
Console.WriteLine($"Keyword {result.Text} recognized.");
Now change Main to an async
method and let it return a Task. From the Main method call RecognizeKeywordAsync
.
static async Task Main(string[] args)
{
await RecognizeKeywordAsync();
}
Run the program, and say 'Computer' when you think the time is right. The keyword is recognized once and immediately after you finished saying the keyword.
Below is the complete example.
using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
namespace KeywordRecognizer
{
class Program
{
static async Task Main(string[] args)
{
await RecognizeKeywordAsync();
}
static async Task RecognizeKeywordAsync()
{
var keywordModel = KeywordRecognitionModel.FromFile("Computer.table");
using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using var keywordRecognizer = new Microsoft.CognitiveServices.Speech.KeywordRecognizer(audioConfig);
KeywordRecognitionResult result = await keywordRecognizer.RecognizeOnceAsync(keywordModel);
await keywordRecognizer.StopRecognitionAsync();
Console.WriteLine($"Keyword {result.Text} recognized.");
}
}
}
Recognize keyword continuously
To be able to recognize continuously a Cognitive Services Speech Services required.
Azure naming and tagging conventions
In the following example I'll use Microsoft's Recommended naming and tagging conventions.
Resource group
Create a new resource group with name rg-keywordrecognizer-demo.
az group create --name rg-keywordrecognizer-demo --location westeurope
Cognitive Services Speech Services
Create a new Cognitive Services Speech Service with name cog-keywordrecognizer-demo. In the following example the Free tier is used.
az cognitiveservices account create --name cog-keywordrecognizer-demo --resource-group rg-keywordrecognizer-demo --location westeurope --kind SpeechServices --sku F0
List sku's
Use the following command to list all possible SpeechServices sku's in your preferred location
az cognitiveservices account list-skus --kind SpeechServices --location westeurope
Get the subscription key and location
In code we need to known which key we can use to access the Speech Services and what it's location is. This is done in code by supplying a subscription key and location.
Subscription key
Use the following command to get the subscription keys of the earlier created Cognitive Service Speech Service.
az cognitiveservices account keys list --name cog-keywordrecognizer-demo --resource-group rg-keywordrecognizer-demo
The result is something like
{
"key1": "f32f95d207514d22933841ee9670444e",
"key2": "12065ae5b39a4f16b99b83657dffc60e"
}
The value of key1 (f32f95d207514d22933841ee9670444e
) is used as subscription key in the following code examples.
The keys shown in the result above cannot be used in your application, that resource has already been deleted. :clown_face:
Location
The location is where the Cognitive Services Speech Service is located. This value is the same as we supplied when we created our cog-keywordrecognizer-demo Speech Service. In my example I use the location westeurope
.
Back to the code
The code is basically the same as the previous example as shown below.
using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
namespace KeywordRecognizer
{
class Program
{
public static async Task Main(string[] args)
{
...
}
}
}
Create a static async Task RecognizeKeywordContinuouslyAsync()
method below the Main
method. This method will handle the continuously keyword recognition.
static async Task RecognizeKeywordContinuouslyAsync()
{
}
In the RecognizeKeywordContinuouslyAsync method first configure the subscription.
var config = SpeechConfig.FromSubscription("{SUBSCRIPTION_KEY}", "{LOCATION}");
Then add a TaskCompletionSource
to use for waiting until completion.
var completionSource = new TaskCompletionSource<int>();
Followed by setting up our SpeechRecognizer. The Recognized
code will be called when the keyword is recognized. Based on the recognized text, the program will continue or exit when 'Computer exit.' is recognized.
using var recognizer = new SpeechRecognizer(config);
recognizer.Recognized += (s, e) =>
{
switch (e.Result.Text)
{
case "Computer exit.":
completionSource.TrySetResult(0);
break;
default:
Console.WriteLine($"{e.Result.Reason} {e.Result.Text}");
break;
}
};
Then load the Computer.table
keyword recognition model from file.
var keywordModel = KeywordRecognitionModel.FromFile("Computer.table");
And finally start recognizing continously.
await recognizer.StartKeywordRecognitionAsync(keywordModel).ConfigureAwait(false);
Task.WaitAny(new[] { completionSource.Task });
await recognizer.StopKeywordRecognitionAsync().ConfigureAwait(false);
Run program
Say for example 'Computer hello' or just keep talking after saying the keyword 'Computer', the keyword is recognized immediately and the program will keep listening until you finished the sentence. When 'Computer exit.' is recognized the program will exit.
A sentence ends with a . (dot)
Take note of the ending . (dot) after a sentence.
Below is the complete example.
using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
namespace KeywordRecognizer
{
class Program
{
public static async Task Main(string[] args)
{
await RecognizeKeywordContinuouslyAsync();
}
static async Task RecognizeKeywordContinuouslyAsync()
{
var config = SpeechConfig.FromSubscription("{SUBSCRIPTION_KEY}", "{LOCATION}");
var completionSource = new TaskCompletionSource<int>();
using var recognizer = new SpeechRecognizer(config);
recognizer.Recognized += (s, e) =>
{
switch (e.Result.Text)
{
case "Computer exit.":
completionSource.TrySetResult(0);
break;
default:
Console.WriteLine($"{e.Result.Reason} {e.Result.Text}");
break;
}
};
var keywordModel = KeywordRecognitionModel.FromFile("Computer.table");
await recognizer.StartKeywordRecognitionAsync(keywordModel).ConfigureAwait(false);
Task.WaitAny(new[] { completionSource.Task });
await recognizer.StopKeywordRecognitionAsync().ConfigureAwait(false);
}
}
}