In conjunction with our Number of the Month article on consumer willingness to interact with artificial intelligence (AI)-powered solutions, No Jitter spoke with Ellen Loeshelle, Qualtrics' Director of AI Product Development. She shared her insights on a variety of AI topics which are presented here in a question-and-answer format.
NJ: Can you talk about the impetus to move from the type of AI used in the “conversational” category – natural language processing (NLP) and natural language understanding (NLU) – to generative AI (Gen AI) and the large language models (LLMs) that power them?
Ellen Loeshelle: NLP is really the broad umbrella term for the utility of these algorithms. LLMs are just one set of tools within this toolbox that we would use for natural language processing or natural language understanding. I don't see them as totally dichotomous.
When you're doing NLP, there's a ton of different kinds of tools that you might use, to understand process, interpret, generate language – some as simple as writing rules or regular expressions or other things like that. There are also more basic machine learning capabilities that have been in the market for a long time. And then you have the deep learning models, the large language models, etc.
What we've found with the large language models is they just have a really quite impressive performance in generating human like language -- it's not perfect. First of all, it's not always accurate, but it looks the most native of anything I've seen.
So LLMs are providing kind of this “leap forward” in accuracy, relatability, etc., that other NLP strategies really haven't had the same impact or at least generated the same kind of hype. It's one of the movements since I've been in the space that feels tectonic.
But we are still in the middle of a hype cycle. I'm always trying to couch my own reactions and everybody else's reactions and say, okay, tech aside, what's the buzz and how do we keep ourselves level-headed with a lot of what's going on?
NJ: Okay, but why are LLMs performing better than the more traditional NLP approaches?
Loeshelle: Many data scientists don't really know why they’re so much better, but the biggest clue we have so far is in the word large. The size of the training sets for LLMs are just so much bigger than anything else that we've used historically. So, LLMs have more practice, more experience with actual language patterns, compared to other approaches that we've used in the past which were based on smaller language sets, or optimized for cost rather than for performance. We're still at that weird space where the LLMs are in many cases cost prohibitive for enterprises to use, or to build their own. That's going to have to change.
NJ: Are there any learnings from “older types” of AI that can be applied to kind of these newer use cases?
Loeshelle: Totally. I was even reading an interesting article over the weekend about the role that ontologies play – you can think about an ontology as basically like a taxonomy of terms or concepts and, often, [these taxonomies] are hierarchical. Those kinds of things can inform a generative model as an input.
So, I'm very passionate about the fact that there is not a one size fits all technology. We should pick the type of algorithm based on the problem we're trying to solve and also our business constraints. Cost being obviously a huge one. If I can't afford it, then maybe it's not worth the extra 2% in precision. Maybe I should just use something that has worked historically for me before.
But using them together is generally, in my opinion, going to produce a better outcome at a more cost-effective rate. My philosophy here at Qualtrics is that Gen AI is a tool that we can leverage when it makes sense and gives us unique advantages. But it's not the only tool. It's not something we're doing just to do it.
Our advantage as a company will come if we can leverage our own assets differently than everybody else does. If our competitors are saying, “yep, we're grabbing that same open-source dataset and that same open-source algorithm” and we're plugging them in in exactly the same way then we've achieved no spacing from our competition at all.
It's better for us to kind of blend, combine, use some of the older stuff to inform the newer stuff, or be super strategic about when we're using each kind so that we don't blow our budget for very low gain.
NJ: Can you provide some clarity on what you mean by “older stuff”?
Loeshelle: Sure, if you want to go “super old stuff,” then it's rules based. Within our product, we have capabilities where you can actually define sentiment based on words, parts of speech, linguistic structures, [and other things] that would occur in text. That's on one side of the spectrum. The other side of the spectrum is to say, can we use a large language model to predict what the sentiment would be without actually having coded each single word. But they can work together – they can feed off each other – and can learn from rules that we've created in the past.
And those rules were created by humans with business expertise, or domain expertise. We work with industries like financial services, insurance, health care where they're actually not allowed to use LLMs or generative models based on their own internal risk tolerance. So they have to lean back on some of the more rudimentary techniques to be able to use any kind of product like this. Being able to offer that has been a strategic asset for us.
NJ: One of the questions that keeps coming up is the concern over the dataset itself, what it was trained on, and introduction of toxicity or bias or you know, hallucinations, etc. How transparent are you about the data used to train the model – you just mentioned, an open-source data set and an open-source algorithm. Can an enterprise IT manager actually get their hands on the dataset?
Loeshelle: The datasets that we use are typically from our own data – it’s part of our ecosystem. We have language in our contracts that dictates how we can or can't use customer data. That corpus is one of our strategic advantages and so no, we're not going to share it. But it is totally within an IT manager’s rights to ask about it, and to hold us accountable – and they do – to how we're using their data or not, how we're reviewing for bias and ethical limitations in our own modeling practices.
NJ: There have been some examples of in the past few months of companies that have rolled out a Gen AI-powered bot that interacts with the customer and something goes wrong, and they have to pull it. Then there’s the agent assist model where it's suggesting things to the agent which the agent can use or not use as they see fit. Any thoughts on that?
Loeshelle: We have such low tolerance for imperfection in both human situations, and bot situations, that it's so easy to lose trust in these things.
Think about the early experiences with Google or Amazon’s Alexa, right? You ask these questions, and as soon as you get a crappy answer, you're like, "Forget it, I'm not going to ask that question again. I don't trust you to have that answer." Similarly, when I talk to a real human agent – you can't give me an answer? Escalate to your supervisor. The same behavior happens.
So it automatically constrains the search space that I'm willing to use for that chatbot or that interface of whatever kind, right? And then it's really hard to regain that trust.
I think keeping a human in the loop is hugely advantageous. And the advantages are kind of twofold because there's the technology side of "Can our Gen AI stuff handle it?" and then there's the other side – "Are society and culture ready for it?"
NJ: What are some best practices for AI deployments?
Loeshelle: If we think about Alexa again, one of the things that I really liked about what Amazon does is they send out emails describing to me the new skills that my Alexa has, for example. That actually constrains what I know to ask of her so that she can be successful. Creating a UX or UI paradigm so that people can inherently be successful with your technology is going to protect you and it's going to protect them.