Bookmarks

You haven't yet saved any bookmarks. To bookmark a post, just click .

  • 3 Methods for Connecting a Phone Call to Dialogflow

  • We are starting up a new series on using Dialogflow as an Interactive Voice Response (IVR) replacement. We have covered this topic a few times here, including looking at Dialogflow’s own Phone Gateway and the Gateway interface implementations of VoxImplant and SignalWire. Beyond just building simple demo systems, Chad had been exploring improved ways of using Dialogflow to implement an IVR replacement for telephony environments.

    Joining me to help with this series is Emiliano Pelliccioni. Emiliano is a developer at webRTC.ventures. Emiliano has worked on similar projects with Chad in the past and has continued researching this area. We decided to get together to share our research and experiments in this domain.

    In this first post, we want to share some of the methods we explored to connecting Dialogflow to a phone call. Let’s first review what’s involved in making this connection and some nice-to-have features before reviewing the methods.

    What is a Dialogflow Telephony Gateway?

    We want to be able to dial a phone number and have Dialogflow handle the interaction as a voicebot IVR. To do this, there needs to be some kind of gateway that handles both signaling and media conversion. On the signaling side, the gateway needs to take the telephony signaling - which is almost always based on SIP - and use that to invoke the proper Dialogflow commands to launch and interact with the bot. This also includes handling hang-ups and the termination of the call.

    Slightly more complicated is the media conversation that needs to take place. Dialogflow’s interface for real time speech input is gRPC. The gateway needs to convert the SRTP or RTP media used by the telephony end to a gRPC bitstream using Dialogflow-friendly codecs. The gateway also needs to play back the response speech generated by Dialogflow (or use its own Text-to-Speech mechanism to vocalize Dialogflow’s response text).

    dialogflow-telephony-gateway

    What else should the Dialogflow Gateway handle?

    Beyond basic connectivity, there are a few other features that will help to improve development and user interaction.

    A short list of the top features we evaluated is:

    • Recording - not a hard requirement, but having a full recording of both parties is invaluable for debugging and improving the system
    • Call transfer - in most cases you need to give the user an option to talk to a human, or the natural result of a bot will be to transfer the call
    • Playback interruption - ideally your voicebot could handle an asynchronous conversation, so the user could interrupt whatever the bot is saying and that speech would be processed
    • No activity detection - if you were in the middle of a conversation and it suddenly went silent, you would say “are you there”? The bot needs the mechanism to do something similar. If you were using Dialogflow to make a Google Assistant bot, they give you an actions_intent_NO_INPUT event and mechanisms to setup reprompt intents - the gateway needs to provide something similar
    • DTMF detection - even if the goal is to eliminate DTMF menus, sometimes it is nice to have DTMF as a backup option or alternative input method - especially if you are trying to do something like capture a phone number and Dialogflow cannot understand the caller
    • SMS - nice to have; more on this below

    The scope of our project was fairly limited, but other options could include customized Speech-to-Text (STT) and Text-to-Speech (TTS) engines instead of using the ones built into Dialogflow. This could allow for better coverage of custom vocabularies or unique voice synthesis.

    SMS support

    There are many cases when it is easier to send the user a link. A restaurant wouldn’t want to read off an entire menu - it is easier to just send a link to this. In practice, most callers use their mobile phones to call, which means they should be able to receive text messages. The gateway should help determine if the caller is on an SMS-capable device and send them text messages if that would help in the interaction.

    If you are going to send SMS, you should also be prepared to receive SMS and interact via text without requiring voice. In fact, this could allow ongoing dialog after the phone call has finished. To implement this, the gateway needs to keep some state on the user and manage interactions between the voice telephony and SMS environment.

    Three Methods for Implementing a Dialogflow Telephony Gateway

    We found 3 major methods for connecting a phone call to Dialogflow:

    1. Use Dialogflow’s built-in Phone Gateway
    2. Forward a call from a telephony system into Dialogflow’s Phone Gateway
    3. Directly connect a telephony system to Dialogflow

    Method 1: Use the Dialogflow Phone Gateway

    dialogflow-phone-gateway
    Chad previously reviewed this in August 2018. Nothing has changed here in terms of the setup or functionality. The service is still in what Google defines as Beta, but that is usually a pretty high standard. See that previous post for a walkthrough or just go to Dialogflow’s own docs on using the Phone Gateway (which they call “Telephony Gateway” in that link).

    English is still the only supported language with telephony if you want to generate speech output from Dialogflow. Other than that, the main benefits of the Phone Gateway are:

    • It is very easy to setup
    • It is free, subject to quotas unless you want the Enterprise Edition
    • A TELEPHONY_WELCOME event is included for special handling of phone calls
    • The Dialoflow Intent GUI includes a tab for special handling of telephony responses, including audio playback, special speech synthesis, and call transfer (see below)
    • You can easily transfer calls to a single US number

    dialogflow-telephony-tab

    Dialogflow Phone Gateway Summary Scorecard

    So, Dialogflow has some convenient features in its Phone Gateway but in the end, it is pretty limited:

    Requirement Dialogflow Phone Gateway
    Recording No
    Call Transfer Yes, but only to US numbers
    Playback Interruption No
    No activity detection No
    DTMF detection No
    SMS support No

    Method 2: Forward Calls to Dialogflow Phone Gateway

    call-forwarding
    Dialogflow does not have a whole lot of telephony controls, but it is possible to use another telephony platform to actually forward calls to the Dialogflow Phone Gateway. This platform could be anything - a commercial PBX or ACD system, an open source telephony platform like Asterisk or Freeswitch, for a Communications Platform as a Service (CPaaS) that provides this functionality as a cloud-based service like Twilio, Nexmo, SignalWire, VoxImplant, and others. We largely evaluated CPaaS so we are only going to refer to that option from here on out. This approach enables some features if you are willing to leverage Dialogflow’s webhooks to interact with the telephony platform.

    Forwarding Benefits

    Some of the benefits of this approach vs. just using the Dialogflow Phone Gateway are:

    • Support for any phone number you want - the Dialogflow Phone Gateway only supports US numbers now
    • Support for multiple phone numbers - you just get one with the Dialogflow Phone Gateway
    • Easy Recording - this is almost always available as a feature
    • More control over call transfer if your platform can handle incoming webhooks and programmatic control
    • Handle SIP calls if your platform supports it
    • Advanced call flows - such as conferencing in an agent to listen or help with the voicebot interaction

    Features like recording are generally simple to implement. Call transfer control usually better if it is possible to tell your CPaaS to hang-up and send a call somewhere based on a webhook fulfillment.

    Nexmo’s Kranky Geek AI in RTC Show Example

    Nexmo provided a good example of this at the last Kranky Geek AI in RTC event:

    You can see that code here:
    https://github.com/alwell-kevin/simple-smart-ivr-framework

    Forwarding Downsides

    The main drawback of this approach is that there is still no direct way for your CPaaS app to communicate with Dialogflow. Dialogflow can only signal something back to your CPaaS App. This means you could have your CPaaS app listen for things like DTMF presses or the user interrupting an audio playback, but the CPaaS can’t talk directly to your app to signal an event.

    SMS proxy development may be required

    Dialogflow’s Phone Gateway also doesn’t support SMS, but nearly every CPaaS platform does. Unfortunately only Twilio has a Text Messaging option built into Dialogflow’s integrations. While most CPaaS platforms provide easy mechanisms for managing text message interaction programmatically, a developer would need to invoke Dialogflow’s API’s to manually manage these interactions.

    SMS-proxy
    As shown in the figure above, this app would need to receive the SMS text content from the CPaaS, and forward it to one of Dialogflow’s API’s. Once the intent response has been received back, your app would use the CPaaS API to send an SMS to the user containing the answer.

    Flexibility = more development time

    The same may be true for some of the other requirements. If you are willing to manage a lot more code, you could have your app translate between the CPaaS and Dialogflow for some of these features - but that is essentially building a good part of the gateway.

    Cost & quality implications

    This forwarding is also more expensive - both in telephony cost and call quality. You are paying for the inbound call leg to CPaaS, the outbound leg from the CPaaS to Dialogflow’s Phone Gateway, and then eventually again when you upgrade to Dialogflow’s Enterprise plan (which you will likely do in a production environment). Also, forwarding a call over the PSTN is less than ideal for voice quality - the extra gateway leg will definitely introduce some latency and hopefully does not cause other impairments.

    A direct connection from the CPaaS into Dialogflow is more ideal, and that is what we will discuss in Method 3.

    Forwarding Summary Scorecard

    Forwarding is better, but mileage will vary depending on the capabilities of your platform.

    Requirement CPaaS + Dialogflow Phone Gateway
    Call Transfer Yes - some work to code, but full control over transferring
    Recording Yes - generally easy with a given CPaaS
    Playback Interruption Not unless you run voice activity detection and are controlling the TTS output
    No activity detection No
    DTMF detection No
    SMS support Maybe, depending on the CPaaS

    Method 3: Direct Connectivity to Dialogflow

    Direct-CPaaS-Connection
    As is illustrated in the figure above, if your CPaaS can connect directly to Dialogflow you will save a conversion step. Assuming you want to leverage Dialogflow’s built-in Speech-to-Text and Text-to-Speech capabilities, this means your CPaaS needs to have a Gateway that can both signal Dialogflow’s API’s and convert RTP media to Dialogflow-friendly codecs over gRPC and back.

    Advantages

    • All the same benefits as in the forwarding methodology
    • Lower cost - no Dialogflow telephony charges (if paying for Enterprise)
    • Better quality - I did not quantitatively measure this, but it should have lower latency with better voice quality; use of HD audio end-to-end is also possible (though only used if connecting end-to-end with VoIP)

    After that, the advantages really depend on the specific platform.

    Disadvantages

    The main disadvantage here is the limited number of options available. Nearly every telephony platform provides call forwarding but very few offer a direct Dialogflow connector. After that, the ease of implementation will really depend on the platform selected and your development skills.

    Direct Connectivity Solutions

    The two main commercial CPaaS options are VoxImplant and SignalWire.

    On the open source side, there is the Dialogflow Interface to the Drachtio SIP Server which also requires Freeswitch. UniMRCP has a MRCP interface for Dialogflow.

    If you are looking for a licensed, commercial gateway, Audiocodes has what it calls a Voice.AI Gateway that connects to Dialogflow. USAN also has a Dialogflow gateway product. Some quick searches also show Voxibot and Tenios claim to have some direct Dialogflow Voice gateway capabilities.

    Direct Connectivity Summary Scorecard

    This scorecard will vary considerably by platform since implementation details vary considerably. The basic conversion of media to the Dialogflow gRPC interface seems to be common among all. Few have deep capabilities for interacting with Dialogflow.

    Requirement CPaaS + Dialogflow Phone Gateway
    Recording Yes - generally easy with a given platform
    Call Transfer Yes - some work to code, but full control over transferring
    Playback Interruption Depends on the platform
    No activity detection Depends on the platform
    DTMF detection Depends on the platform
    SMS support Maybe, depending on the platform

    What’s Next

    We plan to provide updated evaluations of SignalWire and VoxImplant’s Dialogflow connectors next. After that, we will review the major challenges we encountered in implementing this system and share some implementation examples from our research. Make sure to subscribe so you don't miss anything as we progress through this series.

    You can see the next parts of the series here.


    About the Authors

    Chad Hart is an analyst and consultant with cwh.consulting, a product management, marketing, and strategy advisory helping to advance the communications industry. In addition, he recently he co-authored a study on AI in RTC and helped to organize an event / YouTube series on that topic.

    Emiliano Pelliccioni is a computer engineer working at webRTC.ventures and specializes in developing real time communication applications for clients around the globe. One of his projects includes developing bot integrations for a major CPaaS provider.


    Remember to subscribe for new post notifications and follow @cogintai.

    Comments