Businesses are more regularly implementing AI-based chatbots and conversational user interfaces (CUIs) to manage many aspects of an organization, often addressing customer service needs.
Chatbots in banking, for example, are transitioning into "always-available" agents that can intake information, organize and analyze it, and provide recommendations to customers who are contacting the chatbot for assistance or questions regarding their banking practices.
Ganesh Sankaralingam, analytics leader of data science, data engineering and business analytics practice at LatentView Analytics, said to successfully employ CUIs, companies should ensure they have data collection methods in place to efficiently and effectively collect the data needed to drive the technology.
"They are often rooted in AI, which is only successful based on its data quality," he explained.
From his perspective, the most successful CUIs are those designed to address specific challenges within an organization, as opposed to being an all-encompassing virtual agent that transcends entire organizations.
"When a chatbot can be created with specific results and tasks in mind, it can ensure the data quality it is analyzing is more targeted and relevant, bringing more relevant results back to users," he said.
Taking a Case-Driven Approach
Frank Schneider, AI Evangelist at Verint, said organizations should begin by defining the desired outcomes of their customer experience (CX) automation efforts, using this to guide a use case-driven approach that specifies when and where to start.
After mapping these use cases, he said it’s crucial to move beyond traditional testing phases and quickly respond to real-time data on customer journey friction, outcome performance, and customer satisfaction.
"GenAI can be particularly useful by pinpointing moments in conversations where customers are unsatisfied and identifying the reasons behind their dissatisfaction," he explained.
Thanks to advancements in AI technology, the CX industry now measures more than just accuracy--it’s essential to understand the reasons behind customer dissatisfaction.
"For example, while it might be 100 percent accurate to inform a customer about a late fee, GenAI can provide deeper insights by considering the customer’s history," Schneider explained. "It could reveal the customer has had only one late payment in five years."
An analyst could then generate a report on late payment dissatisfaction, identify this issue, and update the interactive virtual assistant (IVA) logic to allow a courtesy credit for a late fee once per year per customer, enhancing overall customer satisfaction.
Reetu Kainulainen, vice president of product management at Zendesk, said to ensure CUIs provide accurate and helpful responses it’s vital that the pipeline include a multitude of techniques to reduce hallucinations and improve the quality of the output.
Some common ones include a moderation layer to fact check the replies, grounding the system into the source knowledge like a help center, designing the pipeline to find relevant information most often using technologies like RAG, protecting against prompt injections and managing the personality and tone of voice of the system.
"The biggest challenge for adoption is going from quick demos into world class, production grade deployments," Kainulainen said. "Integration, management, scaling and trust all come to play when you are looking to build out a CUI for your customers."
Ensuring Accurate, In-Depth Responses
Anders Lillevik, CEO and founder of Focal Point, said a common challenge in leveraging CUIs is a lack of data for the chatbot to analyze to provide accurate and in-depth responses.
"We've all encountered a chatbot that can only respond to yes or no questions or shares incorrect information with us," he said. "These issues typically stem from a lack of organized data and information used to generate responses."
He said businesses can overcome these issues by implementing efficient data organization techniques.
By organizing data before implementing CUIs, businesses can ensure that once the chatbot is integrated into the system, it can seamlessly access the relevant data to provide users and customers with the correct responses.
When it comes to ensuring this data is protected, companies should align with all stakeholders and data teams on data privacy methods.
"When leveraging external customer data, companies must be extremely cognizant of potential risks associated with chatbots and CUIs as well as any data protection methods necessary," Lillevik added.
Kainulainen recommended that data going to third party providers like OpenAI always be pseudonymized and ensure data is not being used in training the underlying model.
"Protecting against prompt injections is also important to prevent users bypassing the instructions given to the LLM," he added. "As models are becoming more available you can also control the location of the servers hosting the LLMs to stay compliant with data protection laws."
Sankaralingam explained sufficient guardrails must be implemented to ensure the model training data such as text and image data does not have any foul language or disturbing images, especially in consumer-facing apps.
He explained data quality metrics for unstructured data is an emerging field that looks at parameters like ground truth, faithfulness and relevance with minimal toxicity as well as context recall and context precision with minimal verbosity.
"Advanced CUI solutions use an alternate 'golden GenAI model' that continuously checks if the CUI gives standard answers to standard questions and ensures that the CUI model is not corrupted to prevent poor customer experience," he said.
Measuring Efficiency and Success
Schneider said while CUIs play a key role in the customer experience, evaluating LLMs is crucial for businesses to ensure they meet customer expectations effectively.
To establish customer expectations for an LLM system, it’s essential to understand what the customers anticipate and define clear goals and objectives based on their needs.
"Consider the specific tasks your LLM performs, such as chatbot interactions, text generation, or question-answering," he said.
Choosing relevant evaluation metrics is crucial, with metrics like answer correctness, semantic similarity, and hallucination helping to quantify LLM performance.
Commonly used metrics include relevance, question-answering accuracy, toxicity, and retrieval-specific metrics, employing both online and offline evaluation methods.
Schneider explained evaluation is an iterative process, which requires continued improvements to the evaluation dataset over time.
He recommended organizations implement a robust evaluation infrastructure to assess LLM performance throughout its lifespan.
"Remember that LLM evaluation is dynamic and essential for refining and optimizing models to meet customer expectations," Schneider said.