Welcome to No Jitter’s Conversations in Collaboration, in which we speak with executives and thought leaders about the key trends across the full stack of enterprise communications technologies.
In this conversation, we spoke with Raj Gajwani, Managing Partner at AI strategy consultancy Day 0. Day 0 helps public and private enterprises with AI strategy, technology evaluation, prototyping, and capability development. Gajwani previously built multiple businesses and functions at Google, including DoubleClick's channel program. Later, Gajwani founded and ran Google's Orion project which is now part of Android and is used by multiple cellular carriers to improve cellular coverage.
Gajwani presented at Enterprise Connect AI in September 2024. In this follow-on conversation, Gajwani expands upon some of the points he made during that presentation while also touching on other topics including Agentic AI, the possible rise in use of small language models and the potential impact of AI systems in customer experience.
No Jitter (NJ): Let’s start with agentic AI – could you define the term since there’s some vagueness around what it means? What do you think of its potential impact on the contact center, CX and maybe enterprises more broadly?
Raj Gajwani (Gajwani): I can give you my opinion about whether [AI] agents will fully replace customer service, augment it or expand it, but in truth, I'm speculating, and everybody is speculating. One of the big problems with the term ‘AI agents’ is being used in at least two different ways by the same people as they speak about different things in computer science.
When programmers talk about agentic AI, what they mean is software that has the authority to go and do something on its own. Instead of asking it a question and it responds, I tell it to go do something, and it starts deciding what to do. Designs its own plan, etc. Separately, there are agents as people who speak to the customer. And then third, there's just the fact that ‘agentic’ is a buzzword, and people use it just to mean really good AI. I don't think the computer science version of agentic is the same thing as the customer service version of agentic. They're different. In the near term, this idea of agents that can talk to the customer is an enhancement of self service.
That said, I think there's a multiyear learning process ahead of almost every company. Gen AI doesn't work like we're used to computers working. It requires new design patterns and new workflows that take advantage of its strengths and mitigate its weaknesses, because there are meaningful weaknesses to these technologies.
One thing we're learning is that you probably will never just give one question to one LLM, take the answer [it generates] and move on. You will probably take the question and either break it up into multiple questions and ask it sequentially or ask the large language model to do that. That's the chain of thought reasoning that OpenAI is doing. And because the models are probabilistic, [we’ll probably have] a quality control LLM that looks at the result that the first LLM gave back. So increasingly, instead of one request to one LLM, we’ll end up with a preprocess request broken into four chunks, for example, and we either ask the four chunks separately to the same model, or we ask four different little models to come back and agree on an answer and then bring it back.
[We also] don't know where the technology is going. There are good arguments that we're going to see a plateau, and the technology, as you see it today, is basically what we're going to use for the next half decade, and it'll just get a little bit more reliable and we'll learn how to use it.
There's an alternate, credible argument which is the technology we see today is like the door cracked open a little bit and when the door is fully open, it'll be much different, much better or much more capable than it is today. I can make my guesses as to which it’ll be, but we don't know.
NJ: Could you flesh out those two arguments a little bit, the plateau versus ‘just the beginning’?
Gajwani: The plateau argument is that AI tools are kind of like an instant Wikipedia on demand for any topic you want. I can instantly get thoughtful, right answers presented in any way that I want, but I still need someone driving the Wikipedia [query]. And the tools are a little bit finicky, a little bit unreliable. They require some knowledge and context. Today, I hear a lot of call center and CX people using these tools to enhance their [human] agents – and there's a reasonable scenario that these are tools for human agents – to make them better and faster and more efficient.
The other argument is that this is a stepping stone on the way to automation with ‘AI agents’ being able to fully do everything a CX person does. And you may get to the point that the human who calls in prefers, and gets, a better experience from speaking directly by voice to an AI without a human in the loop. But that's definitely not true today for most purposes.
One way you can think about it is this the next generation of IVR versus C-3PO. Today, it’s a little bit like the IVR plus Wikipedia on demand for your agents. The simple stuff is handled by the IVR, it’s a little less annoying [for customers], while your agents get [first] call resolution and shorter handling times and higher satisfaction. The C-3PO argument is that we're headed to a world that's very unfamiliar, but the ‘voice on the phone’ answers all the questions for me without a person there. I don't think we know fully which of those two is the end state.
When I talk to CX people, some of them make an interesting point – that [customers] don’t know what they want. If they had a well formatted, easy question, many customers would have just used the FAQ or gone to self-service. We already have a large range of self-service options available. Improving the ‘UI’ of the self-service so I don't have to read through the full manual or [knowledge base articles] means I get the answer [more] quickly. In a sense, that is taking the job of a human agent. But it's also making self-service better because [those answers] were already available.
A substantial fraction of the things that people call in for are things that they [could not handle] with the information on the website or the self-service options. They're essentially exception handling outside of the standard processes or outside of the standard information that the company offers to the customer. So it makes a ton of sense to me for an AI agent to be a much better version of self-service. That kind of AI agent will be better in the same way that ChatGPT is a better way for me to find out what people on the Internet are saying versus reading 20 pages.
On the other hand, if I ask a question that has some subtlety or I don't know what my question is, and I call in saying I have a problem, the human agent has to decide: Am I asking about a delivery problem or a quality problem, or is it because I forgot to click submit on the order? That's almost like human psychology, and that’s not really a self-service problem. It's more like helping the customer diagnose what their question actually is. That is why I think tier two, tier three, tier four customer service will persist because it's fundamentally not the same thing as self-service.
NJ: In your presentation you made the point that with the introduction of automation in various areas of the economy such as food production and the introduction of ATMs it was generally predicted that jobs would decrease, or be lost, while what actually happened is that employment increased in ways that were unexpected.
Gajwani: Since that presentation in September [2024], I've gotten a few interesting elaborations of that idea. First of all, there's a long history of labor-saving technologies resulting in more labor being consumed. A personal example of this for me is that my father-in-law is a radiologist. A decade ago, it was clear that a machine could read the scan and identify a problem. We thought he’d be losing his job. Today, there's a shortage of radiologists. So while my father-in-law radiologist needs to review fewer of the radiology scans that come in and put his judgment and authority on them, at the same time, doctors are ordering many more scans because it’s a lot cheaper to get scans.
The computer scientist Jeffrey Hinton, who's one of the AI authorities, made a really interesting point: what happens to labor when you introduce these kinds of automation depends on how much demand elasticity there is. If the total amount of work to be done is fixed and the cost doesn't change the amount of work to be done, there's going to be less labor.
If there are 100 lawns in my neighborhood and there are 100 gardeners, but now there are better machines [such that] I only need 10 gardeners, 90 gardeners lose their jobs. On the other hand, in medicine, there's probably a lot of demand. If it was much easier and cheaper to see a doctor, I would see a doctor more often. So the demand elasticity means that when it becomes cheaper and easier to consume that service, people will consume a lot more of it.
In CX I think the equivalent is when call centers become more efficient – for example, with lower call handling times, better customer experience – and the cost of it is lower. On the one hand, [when this happens], I think people will hate call centers less, so they'll be more likely to call. On the other hand, I think [that because of this efficiency increase] companies will be more likely to offer human customer service for places that they didn't before because it was too expensive.
I think there are many sectors where the amount of human customer service provided will increase because the quality of it will go up and the cost will go down. Whether or not labor goes up or down really depends on how much utilization increases in return in response to costs dropping.
The really big picture is that in the history of our economy, there's always more and more work to be done, and almost every sector has found that as costs go down, you can offer more and better services, and demand [for those services] goes up. There are not a lot of places in our economy where demand is actually fixed. I think there's a lot of places in our economy where if the thing becomes a lot cheaper and easier, people use more of it in unexpected ways.
NJ: It seems like the use of small language models versus large models is increasing. Any thoughts on that possible trend?
Gajwani: A year or so ago, almost everybody was using OpenAI’s APIs because [the APIs] were easy and [OpenAI was] the out-front leader. Over time, folks have shifted to getting OpenAI through their cloud provider, or Azure, or experimenting with other [cloud-based] models or with local models and beginning to apply cost functions. And they’re asking questions like: Does it make sense to use a cheaper, faster, smaller model versus the cost of a larger, more capable and slower model.
Those are questions I haven’t heard consensus around; I still hear different things from different people. But we're moving to a place where there is room for small models to pick up share. As today's technology stands, there's a convergence between the quality of a small model and the quality of a large model.
The benefits of a small model are three-fold, I think. There's cost, there's speed, and there's more control or privacy [with respect to your data]. These three things are beginning to drive people to small models. I think you'll continue to see a lot of that as that we move into the implementation phase. It goes from ‘is this even possible’ to ‘how do we do it 20% better than the first draft.’ That's driving more experimentation with other models and smaller models.
One thing that's become clear is the argument that you need to use a cloud-hosted version of a model is that right now, it looks like that's not a rule of physics. It's a business decision of those providers. If I want to use OpenAI's latest models, I have to host it with Microsoft or OpenAI.
There are small model vendors who will offer [their models] to you on-prem now. Clouds will let you use a hosted container with a model which I can choose to run on-prem. There are also vendors who are offering essentially the same thing as a large model, but they'll customize it for you. I've had some conversations with two companies that do this – Cohere and Liquid AI. They will basically build a frontier model for you. So now there are options for hosted or private versions of the large models, which can also be called frontier models. Before, there weren't.
The second thing is the small models are increasingly capable of doing what large models do because of new techniques to build a small model [that involve] taking a big model and cutting out the ‘stuff’ you don't need. There are a few different techniques for doing that, but essentially, the small models turn out to be a big model with the extra weight dropped. That's creating a convergence in performance as well.
NJ: That sounds like it ties into the point you made during your presentation that the LLM platform providers may be a better bet than an enterprise doing it on their own?
Gajwani: There are two things about the platform provider as a better bet. There's a stock market perspective, which is that as these model builders flood the market and commoditize themselves the people who make the money are [companies like] Nvidia, Google Cloud and Amazon Web Services, because you need to host it somewhere, and you need chips for it. But in the meantime, all the AI model providers are eating each other's lunch in the search for market share.
The second way of looking at it is from the perspective of an enterprise user of these technologies. It makes sense to keep your options open to use a platform where I can shift between models easily – a model garden as they sometimes they call it. So using a platform that offers multiple models, like using Azure or AWS or Google Cloud, is better than hosting everything directly with OpenAI, where I can't switch from OpenAI to something else.
NJ: So does this approach of using a model garden require a completely different internal approach or architecture that the IT department, or whoever it is tasked with it inside the enterprise has to come to grips with?
Gajwani: It’s different, but not completely different. Our early fears and hopes were around that C-3PO concept, like we invented a brain and everything changed. Now it looks more like we invented new kinds of software and new kinds of processing, and a lot of things change only a little bit.
So like what I described earlier, about having multiple different LLMs chained together, is [basically] how a lot of software works in that software is a collection of many different modules [operating] in the background. When I open Photoshop, there's hundreds of different little pieces, and I'm activating different pieces at different times, and the UI unifies all that. Similarly, LLMs are a new kind of software object, and I'll use many different versions of it.
The fact that it's not a deterministic software object means I need to treat it differently and do different quality control around it. The fact that I have to feed it the right context and the right information is different from connecting an API to a database, but it's not wildly different. It's a new set of skills for programmers and product people to get used to, but they end up building software in ways that are different but recognizable. And those same frameworks that are being built could be applied to an AI agent that's acting on your behalf, depending on how that term is defined, but these are the kinds of guardrails and the quality controls that factor into that discussion.
Agentic AI is essentially software that I trust to do things without explicit instruction from me. So instead of me saying, ‘here's 10 things, follow these instructions to the letter,’ I might say, ‘Go off and do these things and come back and give me the answer,’ and I trust it to figure it out.
Making sure that it interprets my request properly becomes really important. One of the big problems theoretically with agents is that I give it a request that’s really 10 requests, and it goes off and does all 10 things and then comes back to me. The problem is that if the model only has an 80% chance of getting it right, for example, then the errors compound. So, on the first request it has an 80% chance, but the second request is 80% times 80% and so on until it reaches the last step – by which point it’s likely messed up a step and everything's wrong.
But if I can ask each question and then correct the LLM, and then it goes off to the next step, it’s less of a problem than if the LLM did all 10 steps by itself. So quality control becomes a huge issue because of that ‘compounding errors’ issue. Again, this is an example of us learning how to deal with probabilistic software so that to get to something like ‘agentic with less human oversight,’ we must build a lot more software oversight.
Another way to think of it is that these are elbow grease problems and not Nobel Prize problems. There's a way to fix these problems and through a lot of work and time, try a bunch of approaches.