Generative AI and Company Data

Leveraging a company’s propriety knowledge is critical to its ability to compete and innovate, especially in today’s volatile environment. 2023 marks the breakout year of generative AI and many organizations are leveraging this new generation of machine learning models.

Embedding a company’s knowledge into a generative AI model to provide more accurate and business-oriented responses may give a competitive edge to those willing to challenge it.

Generative AI models: Definition

Generative artificial intelligence (Gen AI) describes algorithms (such as ChatGPT) that can be used to create new content, including audio, code, images, text, simulations, and videos. This form of machine learning allows computers to generate all sorts of new and exciting content, creating new product designs and optimizing business processes.

Compared to previous generations of machine learning models known as supervised learning where a human is in charge of “teaching” the model what to do, this new generation of new machine learning models relies on what’s known as self-supervised learning.

This type of training involves feeding a model a massive amount of text so it becomes able to generate predictions.

2023: the breakout year of generative AI

The latest McKinsey Global Survey on the current state of AI confirms the explosive growth of generative AI (Gen AI) tools.

Gen AI has captured interest across the business population: individuals across regions, industries, and seniority levels are using Gen AI for work and outside of work.

One-third of all respondents are already using generative AI in at least one function.

What’s more, 40 percent of those reporting AI adoption at their organizations say their companies expect to invest more in AI overall thanks to generative AI, and 28 percent say generative AI use is already on their board’s agenda.

The most commonly reported business functions using these newer tools are marketing and sales, product and service development, and service operations, such as customer care and back-office support.

This suggests that organizations are pursuing these new tools where there is most value.

How companies can train Generative AI using their data

After the excitement and some experimenting, most users realize that these systems are primarily trained on internet-based information and can’t respond to prompts or questions regarding proprietary content or knowledge.

To effectively incorporate proprietary content into a generative model, there are at the moment three primary approaches:

Training a Gen AI model from scratch – One approach is to create and train one’s own domain-specific model from scratch.

That’s not a common approach, since it requires a massive amount of high-quality data to train a large language model, and most companies simply don’t have it. It also requires access to considerable computing power and well-trained data science talent.
Fine-tuning an existing Gen AI Model – A second approach is to “fine-tune” and train an existing Gen AI model to add specific domain content to a system that is already trained on general knowledge and language-based interaction.

This approach involves adjusting some parameters of a base model and typically requires substantially fewer data – usually only hundreds or thousands of documents, rather than millions or billions – and less computing time than creating a new model from scratch.
Prompt-tuning an existing Gen AI model – Perhaps the most common approach to customizing the content of a Gen AI model for non-cloud vendor companies is to tune it through prompts.

With this approach, the original model is kept frozen and is modified through prompts in the context window that contains domain-specific knowledge. After prompt tuning, the model can answer questions related to that knowledge.

This approach is the most computationally efficient of the three, and it does not require a vast amount of data to be trained on a new content domain.

All those approaches are not without technical, financial, and time-consuming challenges.

They all need to rely on human curation to ensure that knowledge content is accurate with a good governance approach. It also needs to ensure the quality of facts with the help of an evaluation strategy, as generative AI is widely known to “hallucinate” on occasion and confidently state facts that are incorrect or non-existent.

Finally, legal and governance issues associated with the generative AI model deployments are complex and evolving, leading to risk factors involving intellectual property, data privacy and security, bias and ethics, and false/inaccurate output.

The explosive growth of generative AI tools in the last year offers new opportunities for knowledge management, thereby enhancing a company’s performance learning and innovation capabilities.

However, it can’t be emphasized enough that it is still a new field.

The landscape of risks and opportunities is still likely to change in the coming weeks, months, and years. But, one thing is for sure: generative AI is here to stay and companies need to incorporate their data into the models to make the most out of it.

The article was written by: Tamara Habensus