Note: this post originally appeared on my personal site, chadwallacehart.com. I wanted to post it here on cogint.ai, but this site wasn't ready for launch. The content is reposted below with some minor updates
I have been playing around with various Voice Assistants and speech interaction for a little while now. There is huge potential there and particularly meaningful implications for communications applications since voice is already a involved. Alexa is clearly the leader with Google and Microsoft doing some interesting stuff. The challenge with all of these solutions is that you are locked into the capabilities and roadmaps of these platforms. These platforms certainly bring the advantages of a mature platform and massive audience, but they also limit flexibility.
I came across an open source voice assistant called MyCroft a few months ago. Mycroft started selling its assistant on Kickstarter in 2015. Unlike other platforms, it has taken an open approach and publishes all of its code online under an Apache license. They have done well getting some initial traction, raising $3M in funding including investment from Jaguar Land Rover.
Make your own Voice Assistant
What caught my attention was the ability to run Mycroft on a Raspberry Pi - you don't need to use their Kickstarter/Indigogo hardware. Developers can even run it on their desktops.
Running it on AIY Voice Kit
Even better, they have a pre-built image that will run on Google's AIY Voice Kit. The AIY Voice Kit is a hardware kit for the Raspberry Pi that lets you run the Google Home
assistant and easily use their cloud speech API to experiment with voice interactions. Much like Google's Cardboard VR kit, the voice kit comes with a cardboard box for inexpensive. But the kit is much more than its case - it also includes an LED button, a speaker, a stereo microphone that all plug into a custom controller board - a solid set of hardware needed for any Voice Assistant.
The original Voice Kit listed at $25, but in total it cost more than double that with all the additional Raspberry Pi 3 hardware. Google just updated the Voice Kit package to version 2 with new hardware and much better documentation aimed at the STEM community. This new one has all the hardware you need (except a micro-USB power supply) for $50 and they seem to be exclusively selling it at Target. It seems other sellers then had to clear their inventory of the v1 kit at crazy discounts, so I was able to pick up a few for only $3.50 a piece.
After about 30 minutes of building and loading the Picroft image, you get something like this:
The kit runs reasonably well out of the box. It is not close to having user friendliness of Alexa and just has a small number of skills, but you can see how it could potentially get there.
My first Mycroft Skill - the Tragedy of Darth Plagueis
The next step was to develop a skill for it. I wanted to build something more than a "hello word" app (which you start with anyway) that could test interactive dialog, using context, playing custom media, interfacing with hardware, and custom hotword detection. Star Wars Day was coming up and my friend is obsessed with one scene from Star Wars The Revenge of the Sith known as the Tradgedy of Darth Plaguies the Wise, so I decided to recreate this scene as a skill. Basically I require the user to play the part of Anakin and the app plays Palpatine.
You can see my code here: https://github.com/chadwallacehart/mycroft-skill-darth-plagueis
Overall it came out better than I expected:
How does Mycroft compare?
As an open source product, you are free to go and code anything you like. Realistically though, most developers are going to start with the default options Mycroft provides, so let's take a look at how these options compare to commercial Voice Assistants.
Creating a basic skill was initially actually quite easy compared to Google's DialogFlow and Alexa. On Mycroft there is no cloud environment you need to setup and no GUI-based workflow to learn - you just copy the demo example folder and create your skill in Python and some simple text files. However, once I started testing different ways of stating my intents and chaining dialog, I started to yearn for a GUI that lets you test your commands with actual speech in real time.
Stateful dialogs possible, but not well documented
I also had a harder time than I expected figuring out how to run a dialog and handle context. A lot of this is because Mycroft recently added some new methods to do this and there is not a ton of documentation, which meant a lot of trial-and-error and forum reading on my part to get the flow I wanted.
Choice of speech engines
Mycroft also lets you choose what speech recognition and Speech to Text engine to use. It now defaults with Mozilla's open source Deep Speech or you can use Google's Speech API. Deep Speech is not as accurate as commercial solutions, but is improving very rapidly. On the text-to-speech synthesizer side, Mycroft has its own Mimic engine, based on CMU Flite, or again you can use Google's. I started with Mimic, but liked the sound of Google's better. (Note I have not yet tried the new version of Mimic that is supposed to sound a lot better). Note that change this change some of the dialog behavior. For example a dialog of "." is ignore by Mimic but Google says "period".
Full access to local resources
I started by using Mycroft's voice synthesizer to speak the dialog, but then decided to actually play audio from the scene. This was actually very easy to do, particularly since you can easy load all the sound files locally. One challenge I had with using audio is delaying the new prompts since audio is played asynchronously. I also noticed that the Voice Kit LED does not light up when playing sound files. To fix this I had to manually initiate the LEDs through the Voice Kit - this is kind of hacky and seems to interfere with the normal LED operation, but is nice to get full control of the hardware.
Easy ability to set a custom hotword
One area where Mycroft has a clear advantage is its ability to use a custom hotword. You can set this up via the GUI interface and they have a guide to help put in a phonetic spelling. It was surprisingly easy and worked for me right away.
As you can hear in the video, I chose "Palpatine". There are some hacks you can do with other voice assistants using 3rd party hotword tools like Snowboy, but fundamentally "Ok, Google" and "Alexa" are required hotwords, meaning your app needs to carry some level of someone else's branding for access. Mycroft does not have that limitation.
It is clearly early days for Mycroft, but I am a big believer in the power and speed of open source, so I see a lot of potential there. Mycroft is the only open source voice assistant project I know with solid funding and a growing community. If you want to build a voice app for millions of people then I wouldn't start with Mycroft. But, if you want to build on an open platform with solid potential, Mycroft seems to be a decent bet.
Now to finish this Star Wars Day, I am going to try to figure out how to mount my Voice Kit hardware in the Sith Holocron I just 3D printed...