[german]System.Speech in .NET 3 zur Spracherkennung[/german][english]System.Speech in .NET 3 for Speech Recognition[/english]

[german]Weil ich mich derzeit mit der Spracherkennung von Windows (Vista) beschäftige habe ich mit C# angefangen. Hier eine kleine Einführung mit Demo und Erklärung.

Unsere Aufgabe

Unser Programm wird ein Minimal-Beispiel für System.Speech-Spracherkennung mit Grammatik. Es besteht aus einer Commandline-Applikation mit einer externen XML-Grammatik. Der Grammatik nach erkennen wir wahlweise eines der Wörter „Design“, „Wirtschaft“ oder „Informatik“ ((warum gerade diese Begriffe?)). Das Programm gibt uns das erkannte Wort aus und zeigt zusätzlich, mit welcher Sicherheit das Wort erkannt wurde. Ein Tastendruck beendet das Programm.

Schritt 1 — Ein Projekt erstellen

Erstellt habe ich das Projekt in Visual C# 2008 Express Edition – die frei heruntergeladen werden kann. Das .NET 3 SDK ist im Visual Studio Download enthalten (zumindest kann ich mich nicht erinnern, es von Hand installiert zu haben). Zunächst erstellte ich ein neues Projekt als Commandline Application.

Damit die Spracherkennung funktionieren kann, muss man die entsprechende Bibliothek noch referenzieren: Über Project → Add Reference → System.Speech (siehe Screenshot)
Speech-Referenz hinzufügen

Schritt 2 — Der Programmcode

using System;
using System.Collections.Generic;
using System.Linq;
using System.Speech;
using System.Speech.Recognition;
using System.Text;

namespace mini_speech_demo
{
    class Program
    {
        // Die main-Funktion wird alle Aufgaben ausführen. In einem größeren Programm würde man das eher auslagern.
        static void Main(string[] args)
        {
            // Hier werden recognition-engine und grammar vorbereitet. SpeechRecognitionEngine bedeutet, dass nur unser Programm auf die Spracherkennung zugreift.
            Console.WriteLine("Firing up speech-demo");
            SpeechRecognitionEngine recognizer = new SpeechRecognitionEngine();
            recognizer.SetInputToDefaultAudioDevice();
            // hier setzen wir einen Event-Handler für das "Recognized"-Event. Es gibt noch eine Menge anderer Events.
            // Die aufgerufene Funktion recognizer_SpeechRecognized wird aufgerufen.
            recognizer.SpeechRecognized += new EventHandler(recognizer_SpeechRecognized);

            // der try-catch block ist eigentlich optional, aber ich hatte am Anfang so viele Exceptions, dass ich ihn drin gelassen habe.
            try
            {
                Grammar grammar = new Grammar("grammar.xml", "thema");
                recognizer.UnloadAllGrammars();
                recognizer.LoadGrammar(grammar);
                // mit folgender Zeile wird die eigentliche Erkennung gestartet.
                recognizer.RecognizeAsync(RecognizeMode.Multiple);
            }
            catch (Exception e)
            {
                Console.WriteLine("Exception aufgetreten: " + e.Message);
                return;
            }
            // wenn wir es bis hier geschafft haben, ist alles ok. Das zeigen wir dem Benutzer an...
            Console.WriteLine("Speech-Engine up and running");
            // ... und gehen in eine Schleife, die auf Tastendruck beendet wird (und mit ihr auch das Programm).
            while (!Console.KeyAvailable)
            {
                System.Threading.Thread.Sleep(100);
            }
            Console.WriteLine("terminating.");
            // Jetzt noch schnell aufräumen.
            recognizer.Dispose();
            return;
        }
        // Diese Funktion wird aufgerufen, sobald etwas erkannt wurde.
        // In e.Result.Text steht der erkannte Text.
        private static void recognizer_SpeechRecognized(object sender, System.Speech.Recognition.SpeechRecognizedEventArgs e)
        {
            Console.WriteLine(e.Result.Text + " (" + e.Result.Confidence.ToString() + ")");
        }
    }
}

Schritt 3 — Die grammar.xml-Datei

  
    
      Design
      Wirtschaft
      Informatik

Schritt 4 — Kompilieren

Naja… kompilieren halt. Es hilft übrigens ungemein, wenn man Visual Studio sagt, es soll die grammar.xml-Datei immer mit in den Zielordner kopieren. (alternativ: von Hand in den Ordner legen oder Adresse absolut angeben).

Abschließend

Ich hoffe, ich habe jemandem den Einstieg in die Spracherkennung erleichtert. Es gibt zu .NET 3 System.Speech bisher keine wirklich guten Ressourcen. Die MSDN-Dokus sind etwas trocken aber das einzige Code-Beispiel ((dort der erste Kommentar)), das ich finden konnte ist mir eine große Hilfe gewesen. Es zeigt sehr schön, welche anderen Event-Handler es gibt.

~~Ich werde bei nächster Gelegenheit den Code und ein Kompilat packen und zum Download anbieten~~.

Ich habe nun zwei passende Dateien ((Achtung, zum Ausführen wird das .NET Framework 3.5 benötigt!)) erstellt:

[/german]
[english]
Since i’m working on speech recognition with Vista’s built-in recognition-engine, i started learning C#. Here is a little introduction and a technical demo of my findings.

Our Task

Our program is going to be an example on how to use System.Speech for grammar-based speech-recognition. It will consist of a little commandline-application with an external XML-grammar. The grammar lets you speak the words „Design“, „Wirtschaft“ or „Informatik“ (change grammar.xml for your own words and/or language!). The program will show you what it recognized, and how confident it is with this recognition. Pressing a key will terminate the program.

Step 1 — Setting up a new project

My demo project was made using Visual C# 2008 Express Edition – the express edition is a free download. It appears that the .NET 3 SDK is a part of this download. Create a new project choosing the Commandline Application.

To make speechrecognition work, you need to reference the neccessary library: Via Project → Add Reference → System.Speech (see also screenshot below)
Adding Speech-reference

Step 2 — The Code

using System;
using System.Collections.Generic;
using System.Linq;
using System.Speech;
using System.Speech.Recognition;
using System.Text;

namespace mini_speech_demo
{
    class Program
    {
        // Our main function will do just about anything. In a real programm one would likely write functions for all this. but remember: its a minimum-example.
        static void Main(string[] args)
        {
            // This prepares the recognition. SpeechRecognitionEngine means, only our little programm will have access to recognition (as opposed to: any program may recognize)
            Console.WriteLine("Firing up speech-demo");
            SpeechRecognitionEngine recognizer = new SpeechRecognitionEngine();
            recognizer.SetInputToDefaultAudioDevice();
            // Here we will set the event-handler for SpeechRecognized. There are lots of other events.
            // Thefunction recognizer_SpeechRecognized will be called.
            recognizer.SpeechRecognized += new EventHandler(recognizer_SpeechRecognized);

            // this try-catch-block is optional. I left it in because i had many errors in the beginning of my project.
            try
            {
                Grammar grammar = new Grammar("grammar.xml", "thema");
                recognizer.UnloadAllGrammars();
                recognizer.LoadGrammar(grammar);
                // the following line starts the actual recognition.
                recognizer.RecognizeAsync(RecognizeMode.Multiple);
            }
            catch (Exception e)
            {
                Console.WriteLine("Exception aufgetreten: " + e.Message);
                return;
            }
            // If we made it through here, everything should be fine.
            Console.WriteLine("Speech-Engine up and running");
            // We now go into a loop, so the program keeps running.
            while (!Console.KeyAvailable)
            {
                System.Threading.Thread.Sleep(100);
            }
            Console.WriteLine("terminating.");
            // Cleaning up
            recognizer.Dispose();
            return;
        }
        // This is the function that will be called on recognition.
        // In e.Result.Text is our recognition result
        private static void recognizer_SpeechRecognized(object sender, System.Speech.Recognition.SpeechRecognizedEventArgs e)
        {
            Console.WriteLine(e.Result.Text + " (" + e.Result.Confidence.ToString() + ")");
        }
    }
}

Step 3 — The grammar.xml-file

  
    
      Design
      Wirtschaft
      Informatik

Step 4 — Compile

Well just compile it. There is one minor thing left to do: Tell Visual Studio to copy the grammar.xml to the destination-folder. Alternatively you may write an absolute reference into your code.

Summing up

I Hope this was a helpful introduction on how to use System.Speech for the first time. Up until now there were no examples for this. MSDN did not really describe a hands on example. This Code-Example ((its first comment)) was of great help.

Here are two downloads for you. You will need .NET Framework 3 runtime to use these.

[/english]

25 Gedanken zu „[german]System.Speech in .NET 3 zur Spracherkennung[/german][english]System.Speech in .NET 3 for Speech Recognition[/english]“

schneckchen sagt:

18.12.2007 um 00:48 Uhr

Und wenn ich groß bin, werde ich den Quellcode voll und ganz verstehen. 😉
Welchen Zweck soll das Programm denn später mal haben oder ist das einfach nur ein bisschen Spielerei zum Kennenlernen dieser Klassen?
Mona sagt:

24.12.2007 um 10:08 Uhr

Was gibt es schöneres, als an Weihnachten ein Tutorial über Spracherkennung zu lesen 😉 Nein, es war wirklich sehr interessant und ich finds klasse, dass Du Dich da so reinhängst!!
Have a nice christmas und falls Du mal etwas anderes, mehr ….äh, emotionales lesen möchtest: http://moenchen.livejournal.com/

Bis bald 🙂
Stephen Waller sagt:

06.02.2008 um 17:24 Uhr

Thank you, Thank You, THANK YOU! Microsoft has no examples whatsoever on using speech in an application. Your example got me started on the right path. Thanks!
Claudius Coenen sagt:

06.02.2008 um 17:28 Uhr

You’re welcome!
Florian sagt:

27.05.2008 um 20:04 Uhr

This is very good, because the computer only understand the words you want (the words you’ve written into the grammar.xml file). But I need the Source Code in Visual Basic and i hope, you can help me: I can write the whole project in VB but theres a little problem: Line 21
– recognizer.SpeechRecognized += new EventHandler(recognizer_SpeechRecognized); –
can’t be translated into VB, because there is no Event called SpeechRecognized
Who can help me?
Thank you

e-Mail: floriansoftware[at]web.de
Bilal sagt:

07.06.2008 um 22:39 Uhr

i have to find if a bigger audio file contains a specific small audio chunk. how can i do that in a better manner?
Claudius Coenen sagt:

08.06.2008 um 00:46 Uhr

If you are trying to recognize a word, you may try loading the file. The result in recognizer_SpeechRecognized-function gives you information about the exact position. I am not sure whether this is analyzed in realtime or if there’s a way to make it go faster.

If you’re trying to match a sound, i don’t think that’s possible with the Recognition-Engine as it was designed for speech-analysis.
wolf sagt:

15.06.2008 um 16:32 Uhr

Finde das Tut wirklich gut habs auch gleich mal ausprobiert. Nun zu meiner Frage was müsste ich da ändern um es in einer Windowsanwendung anzuwenden. Habs versucht aber es gelingt mir einfach nicht.

Mfg wolf.
Claudius Coenen sagt:

16.06.2008 um 11:46 Uhr

Eigentlich kaum etwas. Auch wenn die bestehende Anwendung sehr umfangreich ist, muss relativ wenig geändert werden.

Man würde die hier beschriebene Main-Funktion in zwei Teile teilen. Einmal die Initialisierung (alles bis zur while-Schleife) und einmal das Aufräumen (alles nach der while-Schleife). Die Funktionen müsste man im bestehenden Programm aufrufen, wenn Spracherkennung starten oder enden soll (wahrscheinlich also am Anfang und am Ende der Anwendung.

Dann kommt’s natürlich noch darauf an, was man mit der Erkennung machen möchte. Innerhalb von recognizer_SpeechRecognized kann man zum Beispiel e.Result.Text auswerten und entsprechend andere Funktionen aufrufen.

Je nach Anwendung würde man wahrscheinlich auch alle Console.WriteLines entfernen.
wolf sagt:

17.06.2008 um 11:09 Uhr

Hi, thx für die Antword nun besteht ein anders Problem un zwar erkennt er die drei Wörter die im Tut verwändet wurden aber keine anderen was mach ich da falsch? (sry wenn ich so viel frage bin Anfänger in C# ^^).

Mfg wolf.
Leo sagt:

02.07.2008 um 09:09 Uhr

Hallo,

das ist echt klasse, auch mal etwas über die Spracherkennung bei Framework .Net zu lesen. Herzlichen Dank dafür.

Leider taucht bei mir folgende Fehlermeldung auf in folgender Zeile:
recognizer.SetInputToDefaultAudioDevice();

Die Fehlermeldung:
„SAPI und die Spracherkennungsmodule wurden nicht gefunden.“

Ist hierfür eine extra Bibliothek notwendig? Und falls ja welche?

Laut MSDN sollte es auch mit Windows XP funktionieren. Ich verwende Windows XP SP 2.0. Framework .Net 3.0, sowie Framework .Net 3.5 sind installiert. Ich verwende Visual Studio 2008 Express deutsch.

@wolf
so wie ich es verstanden hab, müsstest du nur die zusätzlich zu erkennenden Wörter in grammar.xml einbauen, analog zu design, wirtschaft, informatik etc…

Viele Grüße

Leo
Claudius Coenen sagt:

03.07.2008 um 12:09 Uhr

Ich hatte die Anwendung bisher nicht auf Windows XP getestet. Sie funktioniert jedoch auch bei mir nicht unter XP. (fliege auch bei SetInputToDefaultAudioDevice() raus.)
guest sagt:

23.10.2008 um 21:04 Uhr

@leo:
du musst das Speech SDK installieren, dann läuft das (zumindest mit einer englischen Grammatik, da muss man im XML lang von de-DE auf en-US ändern und alles englisch aussprechen) auch unter XP
Kenneth R. Lewis sagt:

26.11.2008 um 01:42 Uhr

Hi Claudius,

I am trying to get the speech recognition to work on Vista. What I have is my own grammar which I load but it doesn’t work on Vista. I only want the speech recognition to only respond to my commands that I have loaded and nothing else. I works perfect on Windows XP but not under Vista. I tried calling UpdateRecognizerSetting(„AdaptationOn“, 0) but this doesn’t seem to work. I am using C# 3.5.

Thanks,

Kenneth
Kenneth R. Lewis sagt:

26.11.2008 um 01:43 Uhr

To add to what I said. I have my own XML file which I programmatically load into the recognizer.
Claudius Coenen sagt:

28.11.2008 um 01:32 Uhr

When you say it doesn’t work, do you mean „it still reacts to everything“ or „my program crashes“?

If it crashes or exits, do you have a specific error-message?
Enes sagt:

02.12.2008 um 16:29 Uhr

Hey this program is really what I need but I have a trouble : It says “ language of the grammar doesnt match language of the speech recogniser“ I ve tried to change languages but I didnt really understand what does it mean..
Claudius Coenen sagt:

03.12.2008 um 01:48 Uhr

Maybe it’s referring the grammar-file’s language. there a lang=“de_DE“ somewhere in the first grammar-XML-Element.

Did you try changing this?
Enes sagt:

04.12.2008 um 11:59 Uhr

yea I changed it to en_US but it doesnt work again
Enes sagt:

04.12.2008 um 21:59 Uhr

hey claudius , I need the program ask me that I heard …(sth in xml file) and is dat true?And than it should get yes or no from me and go on processing, how can I do this?
Claudius Coenen sagt:

05.12.2008 um 00:20 Uhr

Speech synthesis is a whole other topic, but it’s fortunately easier than recognition. There are various good tutorials on .net’s speech synthesis.

Basically you just need an instance of the class SpeechSynthesizer. Then you can call it’s method Speak(„your text here“);

Parsing the questions out of some XML-file shouldn’t be too hard. If you really just need a list of word (like „did you hear the lion“, „did you hear the wind“) i’d probably even go with some regular text-file and read this into an array inside the program.

You might want to take a look at this tutorial here, they put together a small application consisting of C&C-grammar, general speech recognition and synthesis in one application.
http://www.codeproject.com/KB/vista/Vista_Speech_Recognition.aspx

To the language-issue: Your recognition engine is part of the operating system. I can’t look it up at the moment as i am writing this on an XP-machine, but there should be a panel somewhere in vista’s control panel just for speech recognition. There you should find information on your engine’s language. Depending on what you need, you can also always create a dynamic grammar inside your program. In case you only need yes and no, this is even easier.
treckerfreak sagt:

22.12.2008 um 10:03 Uhr

Hallo finde das Beispiel sehr interessant leider gibt mir die Anwendung immer eine exception aus.

Ausgabe:
Firing up speech-demo
Exception aufgetreten: Die Spracherkennung ist auf diesem System nicht verfügbar
. SAPI und die Spracherkennungsmodule wurden nicht gefunden.

Kann es sein dass die Anwendung nur auf Vista funktioniert?

Verwende Windows XP Home SP2

Gruß

Markus
treckerfreak sagt:

22.12.2008 um 16:37 Uhr

Hi,

habe meinen Fehler gefunden. Gibt es für Win XP auch eine Deutsche Version?
Claudius Coenen sagt:

22.12.2008 um 18:44 Uhr

Ich bin nicht sicher, ich glaube das war damals ausschlaggebend, warum ich mich mit der Vista-SAPI beschäftigte.
Pingback: Projekt “Schizophrenie” gestartet at amenthes.de

Kommentare sind geschlossen.