This article provides an overview of OpenAI and its significance in the current landscape of the rapidly changing world of generative AI.
Overview of OpenAI and its significance
OpenAI began as the OpenAI Nonprofit (a 501c3) in late 2015 with the goal of building safe and beneficial artificial general intelligence for the benefit of humanity. The company launched with the goal of $1 billion in donation commitments, but in its first four years, it only landed 13% of that goal.
In March 2019, OpenAI Nonprofit formed a for-profit subsidiary: OpenAI LP, the legal entity to which general use of the name OpenAI refers. OpenAI LP can issue equity to raise capital and hire talent, but is still under the nonprofit’s control. The for-profit’s equity structure has caps that limit the maximum financial returns to investors (including Microsoft) and employees to incentivize them to research, develop, and “deploy AGI in a way that balances commerciality with safety and sustainability, rather than pure profit-maximization.” For more on OpenAI’s structure, see here.
OpenAI is significant because it released the ChatGPT chatbot in November 2022. That month is widely regarded as the inflection point in the world’s understanding of generative AI’s capabilities. From then on, many companies have embarked on Gen AI deployments. Many vendors in the enterprise communications, collaboration and contact center space have incorporated large language models (LLMs) and generative AI-powered capabilities into their products.
Microsoft Investment
In January 2023, Microsoft announced the “third phase of its long-term partnership with OpenAI through a multiyear, multibillion dollar investment.” As referenced in that blog post, Microsoft previously invested in OpenAI in 2019 and 2021. The 2023 announcement focused on how Microsoft would increase its investments to build out specialized supercomputing systems for OpenAI and become OpenAI’s exclusive cloud provider via the Azure platform. This TechCrunch article provides a history of Microsoft's investments into OpenAI.
OpenAI’s Products
OpenAI offers several products that are relevant to enterprise communications. We will survey three of them: foundation models, ChatGPT and ChatGPT Enterprise.
Foundation Models
OpenAI’s flagship models include GPT-3, GPT-3.5, GPT-3.5 turbo, GPT-4 and GPT-4o. These are the LLMs (also called foundation models) on which some of OpenAI’s other products are based.
ChatGPT
OpenAI’s flagship AI chatbot. It will generate humanlike text in response to prompts such as answers to questions, searches, summaries, and it can generate content such as essays, poems, lyrics, etc.
ChatGPT Enterprise
This enterprise version allows users to exercise direct control over how ChatGPT is used within their enterprises with regard to security, data privacy and administrative controls. Multiple use cases are detailed for engineering, marketing, sales, finance and accounting, etc.
The following graphic provides an overview of OpenAI’s offerings, capabilities and use cases.
Recent Acquisitions
In June 2024, OpenAI acquired Rockset to power its retrieval infrastructure across OpenAI products. OpenAI also acquired Multi, a video collaboration platform for enterprises.
Understanding OpenAI Features
The OpenAI platform provides organizations with application programming interfaces (APIs) they can then use to develop new generative AI-powered solutions that leverage all the advanced AI work and infrastructure OpenAI has already developed. This includes the foundation models referenced above that allow applications to generate text, etc. These capabilities can be built into existing products – which is what many in the enterprise communications and contact center space have done – or new products can be developed.
As an example of how an API can be used, see this page on Assistants API (in beta). As OpenAI states, an AI assistant can be built into an application and that assistant can be given instructions and thus leverage models, tools, and files to respond to user queries. The Assistants API currently supports three types of tools:
- Code Interpreter: Allows Assistants to write and run Python code in a sandboxed execution environment.
- File Search: File Search augments the Assistant with knowledge from outside its model, such as proprietary product information or documents provided by users.
- Function Calling: allows you to describe functions to the Assistants API and have it intelligently return the required functions along with their arguments.
Two examples of an Assistant include a personal finance bot or math tutor.
Advancements in OpenAI Technology
In May 2024, OpenAI unveiled its new flagship AI model called GPT-4o (the “o” is for omni). According to OpenAI, GPT-4o accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in an average of 320 milliseconds, which is similar to human response time in a conversation.
Prior to GPT-4o, users could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. To achieve this, Voice Mode is a pipeline of three separate models: one model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio.
This process means that the main source of intelligence, GPT-4 (the older model), loses a lot of information because it can’t directly observe tone, multiple speakers, or background noises, and it can’t output laughter, singing, or express emotion. With GPT-4o, OpenAI trained a single new model across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. GPT-4o is OpenAI’s first model that combines all these modalities.
More simply, users can speak commands GPT-4o and receive audio or text output. GPT-4o can also be granted access to a device’s camera (e.g., a smartphone) and can thus use what it “sees” as inputs. The linked OpenAI site provides many different examples of how GPT-4o’s features can be used.
OpenAI has said that these new voice and vision capabilities will be made available to all ChatGPT users, but premium subscribers will be given priority access.
How OpenAI Works
OpenAI’s large language models, including the models (GPT-4o, etc.) that power ChatGPT, were developed using three primary sources: (1) information that is publicly available on the internet, (2) information that licensed from third parties, and (3) information that users or human trainers provide.
OpenAI has licensed content from multiple third-parties, including News Corp, The Associated Press, theFinancial Times, People publisher Dotdash Meredith, and Politico owner Axel Springer, Vox Media and The Atlantic, Reddit, Le Monde and Prisa Media, to name a few. Other third-party content producers, such as the New York Times and, have filed lawsuits against OpenAI claiming copyright infringement.
GPT stands for general pre-trained transformer and involves a type of machine learning (ML) that involves artificial neural networks (ANNs). The history of GPTs goes back to 2018 when Google researchers introduced a model called Bidirectional Encoder Representations from Transformers (BERT). This was the first model, and overall approach, that could ‘understand’ words and the context of words. Other examples of GPT-based models include Meta’s Llama 2, Google Gemini, Nvidia’s Nemo and others.
How transformer models work is far beyond the limited scope of this article. Consider these resources: Datacamp’s tutorial, Toward Data Science’s explainer and this more visual guide to transformer architecture.
How OpenAI Works from the User’s Perspective
The following provides an overview of how OpenAI’s models work from the user’s perspective.
ChatGPT Browse with Bing
This is a feature (only available to paid users) that allows ChatGPT to search the Internet to help answer questions that benefit from recent information. When this feature is used, ChatGPT formulates a keyword search based on the prompt and submits search prompt to a headless version of the Bing search engine to retrieve relevant results. (The term “headless” refers to an automated way of interacting with Bing search without a graphical user interface (GUI).)
Image processing
When an image is uploaded as part of a prompt, ChatGPT uses the GPT Vision model to interpret the image. This enables ChatGPT to answer questions about the image or use information in the image as context for other prompts. Only the GPT-4 models can accept an image as an input; only users with Plus and ChatGPT Enterprise can use this feature.
Image generation
When users requests an image from ChatGPT, it uses a tool called DALL-E to create it. Users can use simple, everyday language to create the image. ChatGPT can also change specific parts of an image generated with DALL-E via the DALL-E editor.
Text documents
When users upload text documents (Word, PowerPoint, PDF, TXT, etc.), ChatGPT uses tools to extract text and find relevant information. This helps ChatGPT understand the contents of the document, and supports tasks like:
- Summarization: Creating a precis of the document(s).
- Synthesis: combining or analyzing information from files and documents to create something new.
- Transformation: Reshaping information from documents without changing its essence
- Extraction: Pulling out specific information out of a document
Advanced Data Analysis
Customers on a paid plan can use Advanced Data Analysis (ADA) to interact with data documents (Excel, CSV, JSON). Using ADA, ChatGPT can answer quantitative questions about the data, fix common data errors, and produce data visualizations.
Voice
Users can interact with ChatGPT by speaking using the ChatGPT mobile app. The web version of ChatGPT will also read any of its answers out loud. GPT-4o real-time voice and vision will first roll out to a limited Alpha for ChatGPT Plus users and then become widely available for ChatGPT Plus users later in 2024.
Extended capabilities with GPTs
GPTs are task-specific applications which run on top of ChatGPT. They can be configured to use the tools described above and can be augmented with access to additional data and services to extend their capabilities. Users can create GPTs and then build and publish those GPTs in the store.
OpenAI’s Data Usage Policies
OpenAI’s data usage policies can be found here. In short:
- OpenAI may use an individual’s content to train its models when those individuals uses a service such as ChatGPT or DALL-E. Individuals can opt-out of training through OpenAI’s privacy portal by clicking on “do not train on my content,” or to turn off training for your ChatGPT conversations, follow the instructions in OpenAI’s Data Controls FAQ. Once the individual opts out, new conversations will not be used to train OpenAI’s models. Note OpenAI describes “model training” as “improve the model for everyone.”
- OpenAI does not use content from its business offerings such as ChatGPT Team, ChatGPT Enterprise, and its API Platform to train its models. The OpenAI Enterprise Privacy page has information on how OpenAI says it handles business data.
OpenAI Use Cases
Using the OpenAI API, companies across almost every industry sector can use generative AI, and OpenAI in particular, to solve various challenges. This thread in the OpenAI Developer Forum, for example, lists several potential use cases in the industrial, manufacturing, mining, chemical, or similar verticals. Those use cases include spare parts inventory management, understanding and creating part drawings or manuals, image-to-text for maintenance and root cause analysis, machine diagnostics and customer/dealer support.
Obviously, several of those use cases are relevant across industries. For example, every company in every industry has customers to service and support. Generative AI can help automate some of the rote tasks associated with providing service and support.
Most of the contact center as a service (CCaaS) vendors have implemented generative AI-powered capabilities into their platforms to summarize issues for the human agent takes a call and then summarize the entire interaction after the call. This data not only improves the customer experience (CX) during the call, but the data about the interaction can be mined afterward to provide more personalized future experiences for that customer as well as those like them (which can be determined by AI tools, like those offered by Twilio, to define, segment and analyze customer behaviors).
Contact center and CX vendors are now introducing generative AI-powered virtual agents that can directly interact with customers and, in many cases, resolve the issue the customer was having. As discussed here, OpenAI itself demoed this capability using ChatGPT-4o.
Generative AI-based capabilities have also been integrated into unified communications as a service (UCaaS) platforms – RingEX, Webex, Zoom, Microsoft Teams, Slack, etc. – to automate some of the rote tasks associated with knowledge work. This includes summarization – of meetings, email, chat threads/channels, etc. – as well as document creation, data analysis, etc.
Other examples of use cases include these, as listed by Microsoft Azure OpenAI Service:
- Content creation and design: From text descriptions or other images, generative AI can generate images, videos, and graphics, which can be used for logos, product visuals and social media content. Microsoft cites the company Typeface, which ingests information about the brand, including style guidelines, images, and product details, which it then uses to generate images and text that a human marketer can choose, customize and use. The idea here is that this AI-enabled process saves time.
- Automation of IT tasks:Again, this use case is about saving time by automating routine IT tasks to not only get employees back to work quickly but is also about improving the efficiency and effectiveness of the company’s IT staff. This can improve employee experiences, enhance customer interactions, and reduce operational costs. Microsoft cites AT&T using Azure OpenAI Service to enable IT professionals to request resources like additional virtual machines; migrate legacy code into modern code; and empower employees to complete common human resources tasks, such as changing withholdings, adding a dependent to an insurance plan, or requisitioning a computer for a new hire.
- Personalization: Personalization of customer interactions can across the entire customer journey and include product research and design, marketing, sales and service/support. Microsoft cites the Take Blip platformand Azure OpenAI Service which helps brands personalize each customer interaction.
- Chatbots and virtual assistants: As discussed above, chatbots and virtual assistants powered by generative AI can provide instant and accurate responses to customer queries.
- Language translation and natural language processing: These two capabilities existed long before generative AI, but Gen AI-based tools have greatly improved their functionality and useability by producing more natural, human-like outputs.
- Fraud detection and cybersecurity: By analyzing patterns and anomalies in large datasets, businesses can use generative models to detect and prevent fraud, safeguard sensitive information, and protect their digital assets.
- Predictive analytics and forecasting: By analyzing historical data and identifying patterns, businesses can use generative algorithms to make accurate predictions and informed decisions, optimizing supply chain management, inventory forecasting, and demand planning.
- Creative writing and content generation: Generative AI-powered tools can automate the content creation process, allowing companies to generate articles, blog posts, and other written materials. Microsoft cites CarMax, which is creating rote contentfor its website so that its editorial staff can produce longer-form pieces.
- Analyze images and detect patterns: In the healthcare industry, researchers can use generative models to analyze medical images, detect abnormalities, and aid in the development of new treatments. Generative AI algorithms can assist in diagnosing diseases by analyzing patient symptoms and medical records. Microsoft cites the Cambridgeshire and Peterborough NHS Foundation Trust, which states that a single patient’s case notes could have up to 2,000 documents. Gen AI powered tools can keyword search those documents and/or summarize their contents.
Benefits of OpenAI
As these case studies illustrate, there are several benefits associated with using generative AI and OpenAI’s implementations and innovations of that technology, including:
- Enhanced efficiency: OpenAI’s models can automate knowledge-based tasks in ways that were previously impossible. It can also handle complex tasks like data analysis, report generation and content creation. The automation of rote tasks may help free up humans for more creative, strategic and empathetic roles.
- Interactive personalization at scale: All businesses have customers. Using Gen AI can, again, automate an increasing variety of rote service/support tasks while Gen AI’s human-like ability to communicate and interact, can help make those customer interactions more personal at scale.
- Data-driven decisions: Gen AI enables the analysis of large amounts of data and then Gen AI can produce reports and summaries about what it has found so that people can make more informed decisions about that data.
- Keeping up with the Joneses: Companies do not necessarily need to leap in and implement Gen AI immediately, but they should have individuals who are investigating how Gen AI tools can help them improve their operations.
Cons of Using OpenAI
There are also negatives associated with implementing generative AI technologies, including OpenAI’s. For example:
- Cost: There is cost associated with implementing these models and integrating them with existing business workflows and processes, as well as granting them access to corporate, proprietary data as well as customer data.
- Need people to remain in the loop: Biases in model training data can lead to biased or inaccurate results from the models. LLMs do not know or understand anything. At root, they are probabilistic that are usually, mostly, correct. Except when they are not. Humans must remain “in the loop” with most tasks that LLMs handle.
- Lack of transparency: No one truly understands how the models make decisions. There are real concerns about the complete lack of insight into the data on which the models were trained as well as who is accountable for when the AI makes a decision that is acted upon with or without direct human oversight. Organizations implementing Gen AI must assess the risks associated with doing so.
- Jobs will be lost to Gen AI: The current focus is on Gen AI tools working alongside humans and augmenting what they’re doing – hence the efficiency angle. But at some point, the Gen AI tools will get good enough that they can probably replace some job functions.
OpenAI Pricing
OpenAI offers several different pricing tiers, with different features gated behind the more expensive plans; check the website for the pricing, as it may change.
In addition to packages, OpenAI also has pricing per model (GPT-4o, GPT-3.5 Turbo, etc.). Prices can be viewed in units of either per 1M or 1K tokens. A token is approximately four characters (or 0.75 words for English text), so 1,000 tokens is about 750 total words. Note that subscribers on a paid ChatGPT plan can choose which model they use.
Summary
OpenAI has had, and will continue having, a profound impact on the field of generative AI. The company is innovating faster than its competitors (Google, Amazon, etc.) in the Gen AI space and, thanks to its deep partnership with Microsoft, has a great deal of funding, as well as a cloud compute channel through the Azure OpenAI Service.