Chatbot training used to be relatively straightforward. With generative AI models entering the market, training takes on a different dimension. And it is also getting companies concerned.
The biggest concern is that companies do not want just anyone to use their sensitive data. This makes it rare for a company to open its internal data to any large language model (LLM).
Next, they want to control responses. Imagine if someone asks a chatbot about the health risks of drinking soda; Coca-Cola would not want to be associated with the answer. Worse, if someone compares it with Pepsi by asking how great the latter tastes.
This creates a conundrum for data teams tasked with preparing generative AI models. It’s true that such a model gets better with more data ingestion and better prompt engineering. But what if companies are queasy about sharing their data? And what if the chatbot says the wrong thing or shares competitive information?
Data locations matter
“These two requirements require the preparation of the data to be positioned in two locations,” says Andrew Amann, chief executive officer at NineTwoThree Venture Studio, a two-time Inc. 5000 Fastest Growing Company.
Amann and his team have created over 50 products and 14 startups and is considered a leading mobile development agency in Boston, U.S. Based on his experience, he feels that first, you need to allow ingestion of company information to enhance prompting (or prompt engineering) for LLM.
“Typically referenced as a ‘fine tuned model,’ the data scientists will use internal company data to chunk information that can be readily used by the chatbot if necessary,” says Amann.
The second, which is most important when working with enterprises reluctant to open their data stores, is adding a company filter. All responses should go through this filter regardless of the complexity question.
“This filter will ensure that the responses match the company values and ethics and do not harm the sale of the products for that company,” says Amann.
“The preparation of these two datasets includes parsing data, training the prompts and supervising the learning of the chatbot used,” he adds.
The hidden tripwire
While these two steps sound logical, why are companies still getting blindsided by generative AI?
The problem, as Amann frames it, is thinking that the chatbot will respond directly to the customer. “This is a false pretext and a recipe for disaster. Every good enterprise application should ensure the company's values and messaging stay intact. This requires applying a filter to the output of the LLM and then cleansing it if necessary for clients' consumption.”
Amann feels strongly against LLMs responding to customers directly.
“We have all seen the racist responses it can generate — and if a company is building the ChatBot inside their domain, they must claim full responsibility for anything it generates. Therefore, applying a filtering system to the responses is critical to releasing the product,” he argues.
Training vs. supervising: get the distinction right
One issue many companies face is how long it takes to train a model. It’s one reason many companies focus on acquiring or subscribing to pre-trained models.
It is also another reason why ChatGPT entered the world via a startup using academic resources. It is difficult to justify the time spent on training. It also raises other awkward questions that need more time (like AI ethics, for example).
Just ask Google, who trained LaMDA or Meena for some time but did not release it (until they had no choice) and Microsoft, who leaped over their development efforts to integrate with ChatGPT directly.
But Amann believes that taking time to train is necessary. “Being in a rush to ‘build proper prompts’ is not suggested,” he said. Instead, he advises companies to look at supervisory training.
“Training and supervising are two different things. Being quick to supervise the results of an LLM is fine. The more user feedback you get on the supervised learning aspect, the better the chatbot will perform for the end customers,” he explained.
“Remember, that models like ChatGPT are trained without supervising at first. It is designed just to predict the next word. Then, a political layer is applied to the model through supervised training, allowing the bot to respond to politically charged questions properly. Some LLMs do not have this layer, allowing the business to create its own value mapping. Rushing this part of the model is fine to execute on,” Amann further explains.
Dealing with legacy data
Another major challenge for data teams is the use of legacy data.
Yes, such data offers a goldmine of insights and can give industry incumbents an edge over startups who need to buy or partner to get the depth of data. But (and it is a big but), this only works if the data is ready. And as any data team will attest, clean and AI-ingestible legacy data are hard to come by.
This is where Amann’s team comes in. “Legacy data needs to be labeled properly to be prepared for AI/ML. Our engineers can assist in this cleanup by parsing the data to proper formats — similar to what we have all experienced in Excel when trying to format a date. Once the rows and columns are properly labeled, then ML can be applied to start looking for anomalies in the dataset,” he explains.
More importantly, the machine learning implementation should be able to start self-learning with a bit of supervised training from the internal staff. You can then add the anomalies found in the initial stages into a feedback loop “to determine if the discovery assists in business decisions or is just noise,” Amann suggests.
One thing good about generative AI and LLMs is the reduced role of feature engineering.
“Feature engineering is not needed for generative AI and large language models because these models learn to generate features on their own. However, there are still techniques that can be used to fine-tune the model for specific use cases and to ensure the model fits your company’s specific messaging and branding needs,” says Amann.
Training the model with these constraints in mind is essential to ensure there aren’t any responses that are “wildly inaccurate or potentially damaging to your brand,” he adds.
The rise of AI whispering
For many users, generative AI is like a black box. You enter your prompts, and you get a result. Either you’re delighted or disappointed.
Regulated industries don’t have this luxury. They need to explain the outcomes; if explainability is not forthcoming, they can face fines, lawsuits and damage to their reputations.
This is why Amann believes such companies should invest in prompt engineering. “It is incredibly important to create prompts that predict the outcome of ChatGPT or other LLMs.”
He feels that every prompt should contain three components. The first should be a persona created by the company to match its values. For example, ‘respond like a pleasant eighth-grade teacher explaining homework’ might be a great persona for Kahn Academy,” says Amann.
Next, the question can be asked inside the prompt. And lastly, prompt engineers should use modifiers and clarifiers to ensure the response is predictable.
“A modifier can contain things like ‘Do not use swear words’ or ‘respond in less than 100 words,’ or if you were building a therapy type application, ‘respond with a clarifying question.’ The engineers must wrap the end customer's question into proper prompts so that the company can ensure the communication is not being derailed for any reason,” says Amann.
The approach adds to the original point that Amann argued for: the response of the LLM should never go directly to the end client and instead should be placed into a filter.
“If it fails, the system can modify the prompt and ask the question again to try and force a different response from the LLM. This loop can be placed at the end of any LLM for an enterprise client to monitor the responses,” says Amann.
Data teams can’t do it alone
Training generative AI is an evolving field. Amann notes that some clients are also interested in having humans monitor all the conversations happening with the chatbot. “Certain words and phrases will be highlighted to bring attention to the responses being generated, allowing a human to step in at any point.”
Of course, this means companies must blur the lines dividing their technical and nontechnical teams. It also means data teams must work with colleagues who may not be so data savvy but are schooled in social science and business knowledge.
The collaboration between these teams is crucial if companies want to jump on the generative AI bandwagon. Else, you might end up wishing your chatbot never opened its virtual mouth.
Winston Thomas is the editor-in-chief of CDOTrends and DigitalWorkforceTrends. He’s a singularity believer, a blockchain enthusiast, and believes we already live in a metaverse. You can reach him at [email protected].
Image credit: iStockphoto/Inside Creative House