Adding SMS to your Voicebot IVR

Earlier in this series on using Dialogflow as an Interactive Voice Response (IVR) replacement, we established SMS interaction is a nice to have feature. Users are used to texting and often times it is a better experience to just send the user a link. One might think that handling SMS with Dialogflow is simple, and it is, but many complexities come into play when you try to blend voice and SMS in the same bot. In this post we will show our findings on ways to implement this feature.

We will describe the use case some more, walk through the architecture we landed on for handling SMS and voice from the same phone number, and discuss using Dialogflow’s Contexts as a filter to select channels.

Use case: why SMS and Voice at the same time

We used a restaurant as a reference business for our prototype. Imagine a caller on the move, looking for something to eat. They find the restaurant and want to know what kind of food they have. Maybe they do a web search and can’t find the menu, or maybe they just call in the first place - either because they have other questions or because they are not in a position to stare at a several inch piece of glass to see this info at the moment they were interested.

They first thing they ask is, “what is your menu?” Here the voicebot IVR could read off the menu, but that would take forever assuming there are more than a handful of menu options. The bot could also structure the menu into categories - say appetizers, salads, main courses, etc. and let the caller select just the category, but that leads you back into a traditional IVR where you need to deal with navigating hierarchies. In a lot of cases it is just easier to ask the user if they would like a link to the menu they can read themselves at their leisure.

So the IVR could read off the menu URL. However, reading off a URL like https://myrandomlocalrestaurant.com/summermenu on a call is never a great experience. After listening to what is often a long, un-punctuated string, the user needs to either remember the URL or write it down - which they may not be in a good position to do if their phone is held up to their ear. At some point they still need to enter the URL into a browser, hopefully without making a mistake. There are a lot of ways this interaction could go wrong, leading to a lost opportunity for the restaurant.

Wouldn’t it just be easier to send the caller something they can click on in the first place? In most cases the caller is calling from a mobile device that is SMS capable. You should be able to get their phone number from the incoming caller ID. Why not just send them the link via SMS?

Then, once you send them an SMS link, you could continue to engage the customer on that channel. This has the advantage of providing a second, asynchronous channel for on-going communication with the customer.

Now that we have established why a business would want to have a multi-channel bot that can handle both voice and SMS, next let’s look at how to implement that.

Architecture: Telephony + SMS in Dialogflow

What we really want from a Dialogflow developer perspective is something that looks like this:
ideal-voice-SMS-bot-gateway

This is the gateway scenario we were looking for, couldn’t find, and had to create

The gateway would handle the incoming interactions, tell if it was from the phone or SMS, and just send that on to Dialogflow.

If that ideal gateway were available this blog series probably wouldn’t exist. As we discussed earlier in the series, you can use Twilio or Signalwire to make a good SMS bot but neither of those platforms fit our gateway requirements like VoxImplant. So we could do something more like this where we effectively have a different gateway for each channel:
dual-channel-voice-SMS-bot-gateway

We split the Gateway into 2 channels - one for SMS and one for Voice

A user could call in to the Telephony-CPaaS and when needed, you could have your bot controller app signal the SMS-CPaaS to send the message and continue as a SMS-bot as needed from there.

The problem with this approach is that each channel has its own phone number. This might be ok in some scenarios, but in most cases you want to minimize confusion for the customer and stick to a single number. If you want a single number then you need to have some kind of additional proxy - or just forward calls/messages from one CPaaS to the other.
single-number-voice-sms-voicebot-gateway

To reuse the same phone number, we ended up using SignalWire (CPaaS 1) and forwarding voice calls to VoxImplant (CPaaS 2).

We had some SignalWire logic from another unrelated project for phone number selection that we wanted to reuse so we ended up using SignalWire numbers and forwarding calls to VoxImplant via SIP. This setup involves a few clicks to enable inbound SIP inside VoxImplant and a 6-line script inside SignalWire:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Dial record="record-from-answer-dual">
        <Sip>sip:{{To}}@dialogflow-7sqrp2g.cogintai.voximplant.com</Sip>
    </Dial>
</Response>

SignalWire LAML script used to forward an incoming call to VoxImplant via SIP. This also records the call.

Using Dialogflow contexts to manage responses by channel

The next challenge we faced was dilenating between incoming telephony and SMS messages at the bot.

Different Intent responses for different channels

If the bot was very simple, you might be able to reuse the exact same intent responses across voice and text channels. We found for many intents, it gives a better experience to differentiate the responses by channel. For example, if the user asks:

how can I find you”

The voicebot can respond with a verbal description:

We are located at 4 Thompsons Point, building number 106 in Portland Maine just of the Fore River Parkway, Exit 5 or 5A on Route 295. Just a five-minute walk from the Portland Transportation Center, serviced by Amtrak and Concord Bus.

While it makes more sense to just send an SMS user a map link they can use to navigate themselves:

We are at 4 Thompsons Point #106, Portland, ME 04102. Here is a link to our address for driving or public transportation: https://goo.gl/maps/2Zc7hc6QdiSfHrjV7

Contexts as Intent filters

Contexts let you store stateful information across intents during a user's bot interaction. In Dialogflow, you can set a context at any time. Contexts can also store data, but in our case we many just needed the context as a filter.

As discussed in the VoxImplant 2019 review, we chose VoxImplant because they allow you to set contexts as part of their runtime scripting:

phoneContext = {
  name: "phone",
  lifespanCount: 99,
  parameters: {
    caller_id: call.callerid(),
    called_number: call.number()
  }
}

dialogflow.setQueryParameters({contexts: [phoneContext]})

We use the code above to tell our Dialogflow bot that the incoming call is a phone call. Then we simply duplicated the relevant intents and added a phone contexts to the duplicates:
intent-with-context

Permissions

How do you know if the user is on mobile and can receive an SMS? There are a number of different phone number lookup services where you can send the user’s caller ID and it will give you a good idea if it a mobile. However, due to number porting these are not 100% reliable (at least in the US where any number is technically SMS capable). We also decided it might make sense to ask the user for permission while verifying the can recieve texts. As a result, we ended up assigning 4 different context levels:

Phone - starting default for incoming telephone calls
SMS_capable - for telephony callers we think are on a mobile phone, but we haven’t confirmed
SMS_authorized - telephony callers that gave explicit confirmation they can receive text messages on their caller ID (or they gave another number)
No context - used for SMS interactions and as default

Follow-up intents can be used to move a user from SMS_capable to SMS_authorized. We used fulfillment to actually send the text message.

This can get messy

Duplicating intents with several different contexts like this does clutter the Dialogflow GUI. If we just had two levels perhaps we could have reused the Telephony Tab to manage a different set of responses within a single intent; we had 4 and there is no way to make custom tabs today. Unfortunately there is not really an alternative to this other than not using the GUI and programming everything via API - a topic we will cover later in this series.

What’s Next

We are getting close to the end of our series. Next we will explore some of the difficulties in reusing a Dialogflow bot across multiple unique businesses. Also, Chad will be discussing many of these findings during his Kill Your IVR with a Voicebot talk at ClueCon in Chicago on August 6th. Make sure to subscribe so you don’t miss our posts and leave your comments below.

About the Authors

Chad Hart is an analyst and consultant with cwh.consulting, a product management, marketing, and strategy advisory helping to advance the communications industry. In addition, recently he co-authored a study on AI in RTC and helped to organize an event / YouTube series covering that topic.

Emiliano Pelliccioni is a computer engineer working at webRTC.ventures and specializes in developing real time communication applications for clients around the globe. One of his projects includes developing bot integrations for a major CPaaS provider.

Bookmarks