In this third installment of our Building a Voicebot IVR with Dialogflow series, we take a look at Voximplant’s Dialogflow Connector. Voximplant was the first CPaaS vendor to have a direct telephony integration with Dialogflow. Alexey Aylarov gave an introduction how they build this gateway here soon after it was first released. Emiliano and I ended up using Voximpant’s Dialogflow Connector for our project, so we will go much deeper into how Voximplant matched up to our requirements and provide some code samples in this post.

Voximplant-Dialogflow-Connector-architecture

VoxEngine Approach

Serverless Execution

Unlike the XML-script based approach used by many platforms, Voximplant has a JavaScript-based execution environment it calls VoxEngine. While not terribly complex, it does require some JavaScript skills if you want to do more than copy and paste example code you may find. This all runs serverless so you don’t have do setup your own development environment and worry about servers. It is quick to get going if you just need to manage a few lines of code. The downside is you need to get used to their web-based Integrated Development Environment (IDE) and debugging tools or figure out how to use thier APIs to synch your JavaScript files. You will likely start to miss your preferred IDE as your VoxEngine scripts grow in complexity. Also, make sure to remember to save often since the web interface won’t do that for you automatically!

Dialogflow API wrappers

Voximplant provides a lot of control over its interaction with Dialogflow. It has effectively wrapped most of the Dialogflow API’s for interacting with an existing agent (but not the ones for programming an agent).

This API includes a method to start a Dialogflow session and a number of events and classes:

Classes	Events
DialogflowError DialogflowPlaybackFinished DialogflowPlaybackMarkerReached DialogflowPlaybackStarted DialogflowResponse DialogflowStopped	DialogflowEventInput DialogflowInstance DialogflowOutputAudioConfig DialogflowQueryInput DialogflowQueryParameters DialogflowResponse DialogflowResult DialogflowSettings DialogflowStreamingRecognitionResult DialogflowSynthesizeSpeechConfig DialogflowTextInput DialogflowVoiceSelectionParams

I will review some of these events and classes in the making our voicebot section.

Pricing

Voximplant charges $0.005 / minute for inbound traffic to VoxEngine and $0.015/minute for PSTN calls out of VoxEngine. This gives their Dialogflow Connector a 32% discount vs. the call forwarding approach on their same platform.

US phone numbers are $1.00/month.

As mentioned previously in this series, you can use the Dialogflow Phone Gateway for free, but I assume one would pay Google for one of its Enterprise plans for an unlimited quota with an SLA.

Voximplant-Dialogflow-pricing-comparison-1

Making our voicebot

The first post in this series describes requirements for making a voicebot gateway to replace an IVR. In this section I will give some highlights of Voximplant’s setup and review how to implement some of the requirements, with code samples where appropriate.

You can see my entire VoxEngine JavaScript program in this gist.

Note the samples are below are not intended to be a step-by-step development guide - refer to Voximplant for that.

Setup via Marketplace

In addition to the reference guide here for setting up the Dialogflow Connector, Voximplant also added an integration option inside thier Marketplace to make this easy. You can find this off of their main side menu.

Voximplant-Dialogflow-Connector-wizard

From there you will need to go into the Google Cloud Console to get your service account key. Follow the steps setting up authentication guide from the Dialogflow docs to get that. Once you get that and upload it, Voximplant will create a new application that includes the Agent connection. You should see the default Dialogflow Connector code show up in your scenarios list:

Voximplant-scenario
You can also access, add, and delete an agent in the Dialogflow Connector menu inside your application. If you add a Dialogflow agent this way it will not automatically create the Scenario code.
Voximplant-Dialogflow-Connector-page

Voximplant’s guide mentions this, but don’t forget to set your Text to Speech configuration in Dialogflow to MP3 or Ogg Opus. I forgot this and was wondering why I was only hearing silence.

The code itself is documented so I will give some highlights below.

Dialogflow Events and Contexts

Sending Events

In addition to handling text and speech-based inputs to initiate an intents, Dialogflow can also handle incoming events. The advantage to events vs. sending utterances is that here is no Machine Learning processing of the utterance. Using an event allows you to immediately and definitively invoke an intent.

Voximplant uses this event-based system when it sends an initial query, after connecting:

// Sending WELCOME event to let the agent says a welcome message
dialogflow.sendQuery({event : {name: "WELCOME", language_code: "en"}})

Setting Contexts

In addition, Dialogflow also lets you set contexts. Contexts let you store stateful information across intents during a user's bot interaction. Contexts can also be used as filters when you might want to have different responses for the same intent in different scenarios - like responding differently on a voice call vs. a text message. In our voicebot we wanted to have a phone context that kept the user’s caller ID and phone number. VoxEngine has a dialogflow.setQueryParameters method to do this that you should set before sending a query. To do this we just added the following instead of the above code:

// Set a phone context with phone parameters
// Note: Dialogflow seems to convert camelCase to camel_case, so I just changed it here
// ToDo: error handling if returned parameters are null?
phoneContext = {
  name: "phone",
  lifespanCount: 99,
  parameters: {
    caller_id: call.callerid(),
    called_number: call.number()
  }
}

dialogflow.setQueryParameters({contexts: [phoneContext]})

// Sending WELCOME event to let the agent says a welcome message
dialogflow.sendQuery({event : {name: "WELCOME", language_code:"en"}})

Call Transfer

The bottom part of Voximplant’s code gives examples for handling telephony responses from Dialogflow including call transfer, speech synthesis, and playback of an audio file - all the options available in Dialogflow’s Telephony menu. Basically all you need to do is set your Telephony options there, just like you would do if you were using Dialogflow’s Phone Gateway.

Dialogflow Telephony Tab

Dialogflow’s built-in Telephony options all work with Voximplant

Just uncomment out the transfer code and enter one of your phone numbers to dial from:

function processTelephonyMessage(msg) {
  // Transfer call to msg.telephonyTransferCall.phoneNumber
  if (msg.telephonyTransferCall !== undefined) {
    dialogflow.stop()
    let newcall = VoxEngine.callPSTN(msg.telephonyTransferCall.phoneNumber, VOICEBOT_PHONE_NUMBER)
    VoxEngine.easyProcess(call, newcall)
  }
}

Recording

Call recording is trivial to do. Just add a call.record({stereo: true}). Stereo is not even mandatory, but it makes debugging easier since the caller and Dialogflow agent are on separate audio channels. I placed this in the onCallRecorded function.

Voximplant puts the call recording in the call history tab. That is all I needed since I just wanted the recordings for debugging. In a real application you could share the recording URL and save it somewhere else.
Voximplant-call-history

Example from Voximplant’s call history showing charges charges and recording playback

Playback Interruption

We want our voicebot to be realistic as possible. One aspect of human-to-human conversations is that we interrupt each other. Ideally the voicebot could speak and listen at the same time and even interrupt its own responses if needed. The way Dialogflow’s query API works is you send it an utterance and it returns a response. Sending a bunch of queries back to back is fine, it does not make sense to playback a multiple responses over the top of each other. To avoid this lot of voicebots take a sequential approach:

Stream audio to Dialogflow, which listens for an utterance
When Dialogflow hears and utterance it sends a response to the Gateway
The gateway then stops listening for new audio
The gateway plays back the audio provided by Dialogflow (or synthesizes speech from the returned text)
Then the gateway starts listening again, starting the cycle over

Voximplant does not have a perfect solution for playback interruption, but they do have a playback marker concept that lets you tell the gateway when to start listening again after playback. VoxImplant actually lets you set this to a negative value, so it starts listening again before the playback of the previous intent response is done. If you get this timer right you can allow some ability to interrupt without worrying about overlapping playback.

Voximplant has a single line of code for this:

   // Playback marker used for better user experience
   dialogflow.addMarker(-300)`

It may be possible continuously listen for user utterances and use a dialogflow.stopMediaTo(call) to stop playback if a new intent comes in while the previous intent response is still playing. The logic to get this right seems complex and the application itself might want to choose when it is worth interrupting playback. I did not experiment with this yet.

No Activity Detection

if you were in the middle of a call with another person and they suddenly went silent, you would say “are you there”? The voicebot needs the mechanism to do something similar. Google Assistant provides an actions_intent_NO_INPUT event to Dialogflow to handle this. We can create something similar inside VoxEngine.

First we will need to create a timer to keep track of silence time. I created a Timer class for this:

const MAX_NO_INPUT_TIME = 15000
let dialogflow, call, hangup

// Fire the "NO_INPUT" event if there is no speech for MAX_NO_INPUT_TIME
class Timer {
 constructor() {
   this.expired = false
   this.noInputTimer = null
   }
   start(){
       this.noInputTimer = setTimeout(()=>{
           this.expired = true
           Logger.write("No_Input timer exceeded")
           dialogflow.sendQuery({event : {name: "NO_INPUT", language_code: "en"}})
       }, MAX_NO_INPUT_TIME || 30 * 1000)
       Logger.write("No_Input timer started")
   }
   stop(){
       this.expired = false
       clearTimeout(this.noInputTimer)
       Logger.write("No_Input cleared")
   }
 }

let timer = new Timer()

You will notice my class has a start function that uses a simple setTimeout and will send a NO_INPUT event to the Dialogflow agent if it expires.

Then we just add this start timer to whenever an Dialogflow audio response finishing playing. If the timer runs out then the NO_INPUT event is sent.

   dialogflow.addEventListener(AI.Events.DialogflowPlaybackFinished, (e) => {
     timer.start()
   })
   dialogflow.addEventListener(AI.Events.DialogflowPlaybackStarted, (e) => {
     // Dialogflow TTS playback started
     timer.stop()
   })

I thought this would be all I needed, but when I first tried it didn’t work. It turns out calling sendMedia function right after the sendQuery overrides the stream. To prevent this race conditions, I had to add another check to the onDialogflowResponse function to not send media to Dialogflow if the timer is expired. We want the NO_INPUT response to come back and play first. It will not work without this.

// Handle Dialogflow responses
function onDialogflowResponse(e) {
 // If DialogflowResponse with queryResult received - the call stops sending media to Dialogflow
 // in case of response with queryResult but without responseId we can continue sending media to dialogflow
 if (e.response.queryResult !== undefined && e.response.responseId === undefined) {
   if (!timer.expired)
     call.sendMediaTo(dialogflow)

On the Dialogflow side, make sure to make an intent and populate the NO_INPUT event with some default responses.
Dialogflow-No-Input-intent

DTMF detection

DTMF entry is a nice to have feature in some circumstances, even if the overall goal is to eliminate the traditional “select 1 for…” approach. For example, if you asked for a phone number, a user may prefer to touchtone that rather than say it. In other cases one might need to maintain some existing IVR functionality for scripts that auto-navigate through some options.

Voximplant’s call class has the ability to alert on DTMF tones:

let waitForTone = false // global we’ll need later

// ...
// This is part of the VoxEngine.addEventListener block
// ...

 call.handleTones(true)
 call.addEventListener(CallEvents.ToneReceived, onTone)

Our onTone function then just needs to send an event to Dialogflow. We will add a parameter to pass the digit:

function onTone(e){
 //ToDo: error handling - i.e. check to make sure Dialogflow is connected and running first, tone valuess
 waitForTone = true
 dialogflow.sendQuery({event : {name: "DTMF", language_code: "en", parameters: { dtmf_digits: e.tone} }})
}

In a real app you likely would want to capture multiple digits if they are entered in sequence and send them to Dialogflow as a single event instead of a series of events.

Like with the No Activity Detection above, we need to pause sending media to Dialogflow while our while we handle our event response:

function onDialogflowResponse(e) {
 // If DialogflowResponse with queryResult received - the call stops sending media to Dialogflow
 // in case of response with queryResult but without responseId we can continue sending media to dialogflow
 if (e.response.queryResult !== undefined && e.response.responseId === undefined) {
       if (!timer.expired && !waitForTone)
         call.sendMediaTo(dialogflow)

Lastly, remember to reset our global waitForTone to resume sending media to Dialogflow:

   dialogflow.addEventListener(AI.Events.DialogflowPlaybackFinished, (e) => {
     timer.start()
     waitForTone = false
   })

Like with the No Activity Detection, we need to setup our event in Dialogflow. I created an Entity to handle the parameters passed by the VoxEngine script. This one also needs some training phrases.
Dialogflow-DTMF-intent

No SMS Integration with Dialogflow

Voximplant does not have any VoxEngine control over SMS. This means VoxEngine has no way to send incoming SMS messages to our bot. Like with other platforms, you could always setup your own server and use webhooks to handle this interaction. We were looking for something simpler.

Scorecard

Voximplant ended up being a good (but not perfect) fit against the criteria we identified in the first post of this series:

Requirement	Voximplant Dialogflow Connector
Call Transfer	Yes
Recording	Yes
Playback Interruption	Some - playback marker gives some of the effect
No activity detection	Yes
DTMF detection	Yes
SMS support	No

What’s Next

Now that we have established our requirements and evaluated a few implementation options, in the last few posts in this series we will start to get into architectural considerations and actual implementation challenges. You can leave comments below and remember to subscribe if you want to see more.

About the Author

Chad Hart is an analyst and consultant with cwh.consulting, a product management, marketing, and strategy advisory helping to advance the communications industry. In addition, recently he co-authored a study on AI in RTC and helped to organize an event / YouTube series covering that topic.

Bookmarks

Voximplant's Dialogflow Connector: 2019 Review

VoxEngine Approach

Serverless Execution

Dialogflow API wrappers

Pricing

Making our voicebot

Setup via Marketplace

Dialogflow Events and Contexts

Sending Events

Setting Contexts

Call Transfer

Dialogflow’s built-in Telephony options all work with Voximplant

Recording

Example from Voximplant’s call history showing charges charges and recording playback

Playback Interruption

No Activity Detection

DTMF detection

No SMS Integration with Dialogflow

Scorecard

What’s Next

About the Author

Remember to subscribe for new post notifications and follow @cogintai.

Comments

Voximplant's Dialogflow Connector: 2019 Review

VoxEngine Approach

Serverless Execution

Dialogflow API wrappers

Pricing

Making our voicebot

Setup via Marketplace

Dialogflow Events and Contexts

Sending Events

Setting Contexts

Call Transfer

Dialogflow’s built-in Telephony options all work with Voximplant

Recording

Example from Voximplant’s call history showing charges charges and recording playback

Playback Interruption

No Activity Detection

DTMF detection

No SMS Integration with Dialogflow

Scorecard

What’s Next

About the Author

Remember to subscribe for new post notifications and follow @cogintai.

Comments

Related posts

3 Methods for Connecting a Phone Call to Dialogflow

Kranky Geek AI in RTC 2018 Event Review

SignalWire's Dialogflow Connector

Making a Star Wars Day Skill for Mycroft - the open source voice assistant