Bookmarks

You haven't yet saved any bookmarks. To bookmark a post, just click .

  • Improving Dialogflow Phone Calls by Adding Noise

  • Generally technology has advanced to remove noise and make audio clearer with a higher signal to noise ratio. While this makes sense in most contexts, various forms of noise have become part of the user experience when making a phone call. Dialogflow is largely made for a smartphones and smart speakers environment, so subtleties of the phone medium can be lost. Have you ever notice how a phone call is never truely silent? There’s a reason for that. Fortunately, it isn’t too difficult to add noise to a Diaogflow voicebot for the phone.

    In this post I will cover how to add two kinds of noise:

    • Filler noise - using noises to mask silent periods, and
    • Ambient noise - background noise to simulate a specific environment

    Read-on for details and examples.

    Filler Noise

    Let’s look at filler noise first, which is generally easier.

    Why add filler noise?

    We usually try to avoid noise, so why would we want to add it to our bot?

    Absolute silence is bad

    User’s don’t like to hear silence on the phone. One byproduct of the original analog phone system is that users would always hear some electrostatic background noise. This noise eventually became a feature as users became used to interpreting that this noise meant they were connected, even if the other party wasn’t talking. When Voice over IP (VoIP) systems came about, comfort noise was actually engineered into the call. Most VoIP systems actually inject artificially generated comfort noise into a call that would otherwise be perfectly silent when someone isn’t speaking.

    Noise is a non-verbal form of communication

    Smart speakers generally have LED indicators that tell the user the device heard them and is processing. This visual indicator provides functions as a subtle feedback mechanism, letting the user know the device is doing something. However, if you hook that same virtual assistant up to a phone call, you lose that visual indicator and are limited to audio signals. Furthermore, phone assistants are trying to mimic human agents. Humans often need time to think or respond to prompts. They respond instantly, even if with a speech disfluency - i.e. “uh, ok - let me check that”.

    Oftentimes users also hear some background call center noise or the agent typing in between interactions which also functions as a type of comfort noise to let the user know the line is still connected.

    Using SSML to fake a delay with noise

    Inserting sounds inside of a response is easy using the <audio> method in SSML. Google supports several properties for when to start and stop the clip. I find the repeatDur property the easiest to use since it won’t exceed the value you enter and will repeat the audio clip if your clip happens to be shorter.

    Finding filler sounds

    You will also need to find some sound files or record your own. You could record some noises at an agent desk or you could look for some recordings online. Google actually has a filterable library of sounds you can find, listen to, and link to here: https://developers.google.com/assistant/tools/sound-library
    Amazon has a massive sound library for Alexa Skills, but their terms limit use to Alexa apps.
    There is also a free sound library on YouTube. Make sure to read their usage terms.

    Let's start with a typing noise, which isn't uncommon to hear when calling an agent:

    Note it doesn't sound very loud and shouldn't, so you may need to turn your speakers up.

    Example bot

    As a simple example, let’s make a bot that does simple multiplication. I called it math.multiply with some training phrases:
    1---bot-training

    With 2 parameters:
    2---parameters

    Fulfillment example

    Now let’s show how we can make our agent add a pause with some background noise. My complete fulfilment code looks like this:

    const functions = require('firebase-functions');
    const {WebhookClient} = require('dialogflow-fulfillment');
     
    process.env.DEBUG = 'dialogflow:debug'; // enables lib debugging statements
     
    exports.dialogflowFirebaseFulfillment = functions.https.onRequest((request, response) => {
      const agent = new WebhookClient({ request, response });
      console.log('Dialogflow Request headers: ' + JSON.stringify(request.headers));
      console.log('Dialogflow Request body: ' + JSON.stringify(request.body));
    
      function multiply(agent) {
        const number1 = agent.parameters.number1;
        const number2 = agent.parameters.number2;
        const answer = number1 * number2;
        
        const max = 5;
        const min = 2;
        const duration = Math.random() * (max - min) + min;
        
        agent.add(`
           <speak> ${number1} multiplied by ${number2}. 
             <audio repeatDur="${duration} s" src= "https://actions.google.com/sounds/v1/office/keyboard_typing_fast_close.ogg">
               <desc>keyboard typing</desc>
             </audio>
             That comes to ${answer}.
           </speak>`);
      }
      
      // Run the proper function handler based on the matched Dialogflow intent name
      let intentMap = new Map();
      intentMap.set('math.multiply', multiply);
      agent.handleRequest(intentMap);
    });
    

    This is mostly Dialogflow’s standard fulfillment example code except for the multiply function. You can see here I just set a random time between a min and max to pause for 2 to 5 seconds.

    Hear some filler noise

    You can try this below without dialing in.
    Just ask it something like:

    what is 431 times 31234?

    Make sure to hit the audio button to hear the response.

    Ambient background noise

    Even if you aren’t running an ASMR phone service, ambient background noise can give your phonebot a distinctive, life-like experience.

    Why ambient noise?

    If you were calling a busy coffee shop, if a live person picked up (assuming they don’t have a fancy noise cancelling microphone) you would hear some background noise. So long as the background noise isn’t too loud or distracting, this ambience can make the experience seem more authentic.

    Finding Background Noises

    Dialogflow actually includes a number of these ambiances in the Google Assistant sound library: https://developers.google.com/assistant/tools/sound-library/ambiences

    Of course if you can always record one yourself in a real environment.

    Adding Continuous Background Noise

    As we showed earlier, adding noise within a prompt is simple, but how do you add in ambient background noises?

    Dialogflow does not really give a way to do this out of the box. It plays a prompt and waits for a response. Dialogflow won’t generate noise while it is waiting. The only option is to do this from your telephony platform or RTC-Bot Gateway. Usually this will look something like this:
    3---mixing-background-noise

    Using conferencing to mix in ambient audio

    You need some kind of mixer that will combine the Dialogflow audio stream with the ambient noise. Often this can be implemented via a conference bridge. Dialogflow’s speech to text is pretty good at ignoring background noise, but it is always best to send it a clean signal if possible so send the media direct from the caller to Dialogflow without the ambient noise mixed in if you can.

    Dialogflow’s Phone Gateway will not do this by itself, but some of the third party options let you do mixing/conferencing in their platform. If you are using the call forwarding approach, you can always mix the audio in a conference bridge before forwarding the call to the Dialogflow Phone Gateway.

    I added the coffee shop noise to the Math-bot above as an example.
    Here is the background noise:

    And here is a recording of a call where I ask it some questions:

    I happened to use Voximplant to build this example. You can see the code for it here.

    Make Some Noise

    Noise isn’t always bad. Used effectively, it will actually make your calls seem less like a bot and more like a human on the other end. Used creatively, it can help enhance a businesses brand and make the phone experience unique.

    Let me know how you make use of various noises in your phonebots in the comments below.


    About the Author

    Chad Hart is an analyst and consultant with cwh.consulting, a product management, marketing, and strategy advisory helping to advance the communications industry. In addition, recently he co-authored a study on AI in RTC and organizes events / YouTube series covering that topic.


    Remember to subscribe for new post notifications and follow @cogintai.

    Comments