Bookmarks

You haven't yet saved any bookmarks. To bookmark a post, just click .

  • Voximplant's Dialogflow Connector: 2019 Review

  • In this third installment of our Building a Voicebot IVR with Dialogflow series, we take a look at Voximplant’s Dialogflow Connector. Voximplant was the first CPaaS vendor to have a direct telephony integration with Dialogflow. Alexey Aylarov gave an introduction how they build this gateway here soon after it was first released. Emiliano and I ended up using Voximpant’s Dialogflow Connector for our project, so we will go much deeper into how Voximplant matched up to our requirements and provide some code samples in this post.

    Voximplant-Dialogflow-Connector-architecture

    VoxEngine Approach

    Serverless Execution

    Unlike the XML-script based approach used by many platforms, Voximplant has a JavaScript-based execution environment it calls VoxEngine. While not terribly complex, it does require some JavaScript skills if you want to do more than copy and paste example code you may find. This all runs serverless so you don’t have do setup your own development environment and worry about servers. It is quick to get going if you just need to manage a few lines of code. The downside is you need to get used to their web-based Integrated Development Environment (IDE) and debugging tools or figure out how to use thier APIs to synch your JavaScript files. You will likely start to miss your preferred IDE as your VoxEngine scripts grow in complexity. Also, make sure to remember to save often since the web interface won’t do that for you automatically!

    Dialogflow API wrappers

    Voximplant provides a lot of control over its interaction with Dialogflow. It has effectively wrapped most of the Dialogflow API’s for interacting with an existing agent (but not the ones for programming an agent).

    This API includes a method to start a Dialogflow session and a number of events and classes:

    ClassesEvents
    • DialogflowError
    • DialogflowPlaybackFinished
    • DialogflowPlaybackMarkerReached
    • DialogflowPlaybackStarted
    • DialogflowResponse
    • DialogflowStopped
  • DialogflowEventInput
  • DialogflowInstance
  • DialogflowOutputAudioConfig
  • DialogflowQueryInput
  • DialogflowQueryParameters
  • DialogflowResponse
  • DialogflowResult
  • DialogflowSettings
  • DialogflowStreamingRecognitionResult
  • DialogflowSynthesizeSpeechConfig
  • DialogflowTextInput
  • DialogflowVoiceSelectionParams

  • I will review some of these events and classes in the making our voicebot section.

    Pricing

    Voximplant charges $0.005 / minute for inbound traffic to VoxEngine and $0.015/minute for PSTN calls out of VoxEngine. This gives their Dialogflow Connector a 32% discount vs. the call forwarding approach on their same platform.

    US phone numbers are $1.00/month.

    As mentioned previously in this series, you can use the Dialogflow Phone Gateway for free, but I assume one would pay Google for one of its Enterprise plans for an unlimited quota with an SLA.

    Voximplant-Dialogflow-pricing-comparison-1

    Making our voicebot

    The first post in this series describes requirements for making a voicebot gateway to replace an IVR. In this section I will give some highlights of Voximplant’s setup and review how to implement some of the requirements, with code samples where appropriate.

    You can see my entire VoxEngine JavaScript program in this gist.

    Note the samples are below are not intended to be a step-by-step development guide - refer to Voximplant for that.

    Setup via Marketplace

    In addition to the reference guide here for setting up the Dialogflow Connector, Voximplant also added an integration option inside thier Marketplace to make this easy. You can find this off of their main side menu.

    Voximplant-Dialogflow-Connector-wizard

    From there you will need to go into the Google Cloud Console to get your service account key. Follow the steps setting up authentication guide from the Dialogflow docs to get that. Once you get that and upload it, Voximplant will create a new application that includes the Agent connection. You should see the default Dialogflow Connector code show up in your scenarios list:

    Voximplant-scenario
    You can also access, add, and delete an agent in the Dialogflow Connector menu inside your application. If you add a Dialogflow agent this way it will not automatically create the Scenario code.
    Voximplant-Dialogflow-Connector-page

    Voximplant’s guide mentions this, but don’t forget to set your Text to Speech configuration in Dialogflow to MP3 or Ogg Opus. I forgot this and was wondering why I was only hearing silence.

    The code itself is documented so I will give some highlights below.

    Dialogflow Events and Contexts

    Sending Events

    In addition to handling text and speech-based inputs to initiate an intents, Dialogflow can also handle incoming events. The advantage to events vs. sending utterances is that here is no Machine Learning processing of the utterance. Using an event allows you to immediately and definitively invoke an intent.

    Voximplant uses this event-based system when it sends an initial query, after connecting:

    // Sending WELCOME event to let the agent says a welcome message
    dialogflow.sendQuery({event : {name: "WELCOME", language_code: "en"}})
    

    Setting Contexts

    In addition, Dialogflow also lets you set contexts. Contexts let you store stateful information across intents during a user's bot interaction. Contexts can also be used as filters when you might want to have different responses for the same intent in different scenarios - like responding differently on a voice call vs. a text message. In our voicebot we wanted to have a phone context that kept the user’s caller ID and phone number. VoxEngine has a dialogflow.setQueryParameters method to do this that you should set before sending a query. To do this we just added the following instead of the above code:

    // Set a phone context with phone parameters
    // Note: Dialogflow seems to convert camelCase to camel_case, so I just changed it here
    // ToDo: error handling if returned parameters are null?
    phoneContext = {
      name: "phone",
      lifespanCount: 99,
      parameters: {
        caller_id: call.callerid(),
        called_number: call.number()
      }
    }
    
    dialogflow.setQueryParameters({contexts: [phoneContext]})
    
    // Sending WELCOME event to let the agent says a welcome message
    dialogflow.sendQuery({event : {name: "WELCOME", language_code:"en"}})
    
    

    Call Transfer

    The bottom part of Voximplant’s code gives examples for handling telephony responses from Dialogflow including call transfer, speech synthesis, and playback of an audio file - all the options available in Dialogflow’s Telephony menu. Basically all you need to do is set your Telephony options there, just like you would do if you were using Dialogflow’s Phone Gateway.

    Dialogflow Telephony Tab

    Dialogflow’s built-in Telephony options all work with Voximplant

    Just uncomment out the transfer code and enter one of your phone numbers to dial from:

    function processTelephonyMessage(msg) {
      // Transfer call to msg.telephonyTransferCall.phoneNumber
      if (msg.telephonyTransferCall !== undefined) {
        dialogflow.stop()
        let newcall = VoxEngine.callPSTN(msg.telephonyTransferCall.phoneNumber, VOICEBOT_PHONE_NUMBER)
        VoxEngine.easyProcess(call, newcall)
      }
    }
    

    Recording

    Call recording is trivial to do. Just add a call.record({stereo: true}). Stereo is not even mandatory, but it makes debugging easier since the caller and Dialogflow agent are on separate audio channels. I placed this in the onCallRecorded function.

    Voximplant puts the call recording in the call history tab. That is all I needed since I just wanted the recordings for debugging. In a real application you could share the recording URL and save it somewhere else.
    Voximplant-call-history

    Example from Voximplant’s call history showing charges charges and recording playback

    Playback Interruption

    We want our voicebot to be realistic as possible. One aspect of human-to-human conversations is that we interrupt each other. Ideally the voicebot could speak and listen at the same time and even interrupt its own responses if needed. The way Dialogflow’s query API works is you send it an utterance and it returns a response. Sending a bunch of queries back to back is fine, it does not make sense to playback a multiple responses over the top of each other. To avoid this lot of voicebots take a sequential approach:

    1. Stream audio to Dialogflow, which listens for an utterance
    2. When Dialogflow hears and utterance it sends a response to the Gateway
    3. The gateway then stops listening for new audio
    4. The gateway plays back the audio provided by Dialogflow (or synthesizes speech from the returned text)
    5. Then the gateway starts listening again, starting the cycle over

    Voximplant does not have a perfect solution for playback interruption, but they do have a playback marker concept that lets you tell the gateway when to start listening again after playback. VoxImplant actually lets you set this to a negative value, so it starts listening again before the playback of the previous intent response is done. If you get this timer right you can allow some ability to interrupt without worrying about overlapping playback.

    Voximplant has a single line of code for this:

       // Playback marker used for better user experience
       dialogflow.addMarker(-300)`
    

    It may be possible continuously listen for user utterances and use a dialogflow.stopMediaTo(call) to stop playback if a new intent comes in while the previous intent response is still playing. The logic to get this right seems complex and the application itself might want to choose when it is worth interrupting playback. I did not experiment with this yet.

    No Activity Detection

    if you were in the middle of a call with another person and they suddenly went silent, you would say “are you there”? The voicebot needs the mechanism to do something similar. Google Assistant provides an actions_intent_NO_INPUT event to Dialogflow to handle this. We can create something similar inside VoxEngine.

    First we will need to create a timer to keep track of silence time. I created a Timer class for this:

    const MAX_NO_INPUT_TIME = 15000
    let dialogflow, call, hangup
    
    // Fire the "NO_INPUT" event if there is no speech for MAX_NO_INPUT_TIME
    class Timer {
     constructor() {
       this.expired = false
       this.noInputTimer = null
       }
       start(){
           this.noInputTimer = setTimeout(()=>{
               this.expired = true
               Logger.write("No_Input timer exceeded")
               dialogflow.sendQuery({event : {name: "NO_INPUT", language_code: "en"}})
           }, MAX_NO_INPUT_TIME || 30 * 1000)
           Logger.write("No_Input timer started")
       }
       stop(){
           this.expired = false
           clearTimeout(this.noInputTimer)
           Logger.write("No_Input cleared")
       }
     }
    
    let timer = new Timer()
    

    You will notice my class has a start function that uses a simple setTimeout and will send a NO_INPUT event to the Dialogflow agent if it expires.

    Then we just add this start timer to whenever an Dialogflow audio response finishing playing. If the timer runs out then the NO_INPUT event is sent.

       dialogflow.addEventListener(AI.Events.DialogflowPlaybackFinished, (e) => {
         timer.start()
       })
       dialogflow.addEventListener(AI.Events.DialogflowPlaybackStarted, (e) => {
         // Dialogflow TTS playback started
         timer.stop()
       })
    

    I thought this would be all I needed, but when I first tried it didn’t work. It turns out calling sendMedia function right after the sendQuery overrides the stream. To prevent this race conditions, I had to add another check to the onDialogflowResponse function to not send media to Dialogflow if the timer is expired. We want the NO_INPUT response to come back and play first. It will not work without this.

    // Handle Dialogflow responses
    function onDialogflowResponse(e) {
     // If DialogflowResponse with queryResult received - the call stops sending media to Dialogflow
     // in case of response with queryResult but without responseId we can continue sending media to dialogflow
     if (e.response.queryResult !== undefined && e.response.responseId === undefined) {
       if (!timer.expired)
         call.sendMediaTo(dialogflow)
    

    On the Dialogflow side, make sure to make an intent and populate the NO_INPUT event with some default responses.
    Dialogflow-No-Input-intent

    DTMF detection

    DTMF entry is a nice to have feature in some circumstances, even if the overall goal is to eliminate the traditional “select 1 for…” approach. For example, if you asked for a phone number, a user may prefer to touchtone that rather than say it. In other cases one might need to maintain some existing IVR functionality for scripts that auto-navigate through some options.

    Voximplant’s call class has the ability to alert on DTMF tones:

    let waitForTone = false // global we’ll need later
    
    // ...
    // This is part of the VoxEngine.addEventListener block
    // ...
    
     call.handleTones(true)
     call.addEventListener(CallEvents.ToneReceived, onTone)
    

    Our onTone function then just needs to send an event to Dialogflow. We will add a parameter to pass the digit:

    function onTone(e){
     //ToDo: error handling - i.e. check to make sure Dialogflow is connected and running first, tone valuess
     waitForTone = true
     dialogflow.sendQuery({event : {name: "DTMF", language_code: "en", parameters: { dtmf_digits: e.tone} }})
    }
    

    In a real app you likely would want to capture multiple digits if they are entered in sequence and send them to Dialogflow as a single event instead of a series of events.

    Like with the No Activity Detection above, we need to pause sending media to Dialogflow while our while we handle our event response:

    function onDialogflowResponse(e) {
     // If DialogflowResponse with queryResult received - the call stops sending media to Dialogflow
     // in case of response with queryResult but without responseId we can continue sending media to dialogflow
     if (e.response.queryResult !== undefined && e.response.responseId === undefined) {
           if (!timer.expired && !waitForTone)
             call.sendMediaTo(dialogflow)
    

    Lastly, remember to reset our global waitForTone to resume sending media to Dialogflow:

       dialogflow.addEventListener(AI.Events.DialogflowPlaybackFinished, (e) => {
         timer.start()
         waitForTone = false
       })
    

    Like with the No Activity Detection, we need to setup our event in Dialogflow. I created an Entity to handle the parameters passed by the VoxEngine script. This one also needs some training phrases.
    Dialogflow-DTMF-intent

    No SMS Integration with Dialogflow

    Voximplant does not have any VoxEngine control over SMS. This means VoxEngine has no way to send incoming SMS messages to our bot. Like with other platforms, you could always setup your own server and use webhooks to handle this interaction. We were looking for something simpler.

    Scorecard

    Voximplant ended up being a good (but not perfect) fit against the criteria we identified in the first post of this series:

    Requirement Voximplant Dialogflow Connector
    Call Transfer Yes
    Recording Yes
    Playback Interruption Some - playback marker gives some of the effect
    No activity detection Yes
    DTMF detection Yes
    SMS support No

    What’s Next

    Now that we have established our requirements and evaluated a few implementation options, in the last few posts in this series we will start to get into architectural considerations and actual implementation challenges. You can leave comments below and remember to subscribe if you want to see more.


    About the Author

    Chad Hart is an analyst and consultant with cwh.consulting, a product management, marketing, and strategy advisory helping to advance the communications industry. In addition, recently he co-authored a study on AI in RTC and helped to organize an event / YouTube series covering that topic.


    Remember to subscribe for new post notifications and follow @cogintai.

    Comments