Custom Intelligence: Building AI that matches your business DNA

In 2024, we launched the Custom Model Program within the AWS Generative AI Innovation Center to provide comprehensive support throughout every stage of model customization and optimization. Over the past two years, this program has delivered exceptional results by partnering with global enterprises and startups across diverse industries—including legal, financial services, healthcare and life sciences, software development, telecommunications, and manufacturing. These partnerships have produced tailored AI solutions that capture each organization’s unique data expertise, brand voice, and specialized business requirements. They operate more efficiently than off-the-shelf alternatives, delivering increased alignment and relevance with significant cost savings on inference operations.

As organizations mature past proof-of-concept projects and basic chatbots, we’re seeing increased adoption of advanced personalization and optimization strategies beyond prompt engineering and retrieval augmented generation (RAG). Our approach encompasses creating specialized models for specific tasks and brand alignment, distilling larger models into smaller, faster, more cost-effective versions, implementing deeper adaptations through mid-training modifications, and optimizing hardware and accelerators to increase throughput while reducing costs.

Strategic upfront investment pays dividends throughout a model’s production lifecycle, as demonstrated by Cosine AI’s results. Cosine AI is the developer of an AI developer platform and software engineering agent designed to integrate seamlessly into their users’ workflows. They worked with the Innovation Center to fine-tune Nova Pro, an Amazon Nova foundation model, using Amazon SageMaker AI for their AI engineering assistant, Genie, achieving remarkable results including a 5x increase in A/B testing capability, a 10x faster developer iterations, and a 4x overall project speed improvement. The return on investment becomes even more compelling as companies transition toward agentic systems and workflows, where latency task specificity, performance, and depth are critical and compound across complex processes.

In this post, we’ll share key learnings and actionable strategies for leaders looking to use customization for maximum ROI while avoiding common implementation pitfalls.

Five tips for maximizing value from training and tuning generative AI models

The Innovation Center recommends the following top tips to maximize value from training and tuning AI models:

1. Don’t start from a technical approach; work backwards from business goals

This may seem obvious, but after working with over a thousand customers, we’ve found that working backwards from business goals is a critical factor in why projects supported by the Innovation Center achieve a 65% production success rate, with some launching within 45 days. We apply this same strategy to every customization project by first identifying and prioritizing tangible business outcomes that a technical solution will drive. Success must be measurable and deliver real business value, helping avoid flashy experiments that end up sitting on a shelf instead of producing results. In the Custom Model Program, many customers initially approach us seeking specific technical solutions—such as jumping directly into model pre-training or continued pre-training—without having defined downstream use cases, data strategies, or evaluation plans. By starting with clear business objectives first, we make sure that technical decisions align with strategic goals and create meaningful impact for the organization.

2. Pick the right customization approach

Start with a baseline customization approach and exhaust simpler approaches before diving into deep model customization. The first question we ask customers seeking custom model development is “What have you already tried?” We recommend establishing this baseline with prompt engineering and RAG before exploring more complex techniques. While there’s a spectrum of model optimization approaches that can achieve higher performance, sometimes the simplest solution is the most effective. Once you establish this baseline, identify remaining gaps and opportunities to determine whether advancing to the next level makes strategic sense.

Customization options range from lightweight approaches like supervised fine-tuning to ground-up model development. We typically advise starting with lighter-weight solutions that require smaller amounts of data and compute, then progressing to more complex techniques only when specific use cases or remaining gaps justify the investment:

Supervised fine-tuning sharpens the model’s focus for specific use cases, for example delivering consistent customer service responses or adapting to your organization’s preferred phrasing, structure and reasoning patterns. Volkswagen, one of the world’s largest automobile manufacturers, achieved an “improvement in AI-powered brand consistency checks, increasing accuracy in identifying on-brand images from 55% to 70%,” notes Dr. Philip Trempler, Technical Lead AI & Cloud Engineering at Volkswagen Group Services.
Model efficiency and deployment tuning supports organizations like Robin AI, a leader in AI-powered legal contract technology, to create tailored models that speed up human verification. Organizations can also use techniques like quantization, pruning, and system optimizations to improve model performance and reduce infrastructure costs.
Reinforcement learning uses reward functions or preference data to align models to preferred behavior. This approach is often combined with supervised fine-tuning so organizations like Cosine AI can refine their models’ decision making to match organizational preferences.
Continued pre-training allow organizations like Athena RC, a leading research center in Greece, to build Greek-first foundation models that expand language capabilities beyond English. By continually pre-training large language models on extensive Greek data, Athena RC strengthens the models’ core understanding of the Greek language, culture, and usage – not just their domain knowledge. Their Meltemi-7B and Llama-Krikri-8B models demonstrate how continued pre-training and instruction tuning can create open, high-quality Greek models for applications across research, education, industry, and society.
Domain-specific foundation model development enables organizations like TGS, a leading energy data, insights, and technology provider, to build custom AI models from scratch, ideal for those with highly specialized requirements and substantial volume of proprietary data. TGS helps energy companies make smarter exploration and development decisions by solving some of the industry’s toughest challenges in understanding what lies beneath the Earth’s surface. TGS has enhanced its Seismic Foundation Models (SFMs) to more reliably detect underground geological structures—such as faults and reservoirs—that indicate potential oil and gas deposits. The benefit is clear: operators can reduce uncertainty, lower exploration costs, and make faster investment decisions.

Data quality and accessibility will be a major consideration in determining feasibility of each customization technique. Clean, high-quality data is essential both for model improvement and measuring progress. While some Innovation Center customers achieve performance gains with relatively smaller volumes of fine-tuning training pairs on instruction-tuned foundation models, approaches like continued pre-training typically require large volumes of training tokens. This reinforces the importance of starting simple—as you test lighter-weight model tuning, you can collect and process larger data volumes in parallel for future phases.

3. Define measures for what good looks like

Success needs to be measurable, regardless of which technical approach you choose. It’s critical to establish clear methods for measuring both overall business outcomes and the technical solution’s performance. At the model or application level, teams typically optimize across some combination of relevance, latency, and cost. However, the metrics for your production application won’t be general leaderboard metrics—they must be unique to what matters for your business.

Customers developing content generation systems prioritize metrics like relevance, clarity, style, and tone. Consider this example from Volkswagen Group: “We fine-tuned Nova Pro in SageMaker AI using our marketing experts’ knowledge. This improved the model’s ability to identify on-brand images, achieving stronger alignment with Volkswagen’s brand guidelines,” according to Volkswagen’s Dr. Trempler. “We are building on these results to enable Volkswagen Group’s vision to scale high-quality, brand-compliant content creation across our diverse automotive markets worldwide using generative AI.” Developing an automated evaluation process is critical for supporting iterative solution improvements.

For qualitative use cases, it’s essential to align automated evaluations with human experts, particularly in specialized domains. A common solution involves using LLM as judge to review another model or system responses. For instance, when fine-tuning a generation model for a RAG application, you might use an LLM judge to compare the fine-tuned model response to your existing baseline. However, LLM judges come with intrinsic biases and may not align with your internal team’s human preferences or domain expertise. Robin AI partnered with the Innovation Center to develop Legal LLM-as-Judge, an AI model for legal contract review. Emulating expert methodology and creating “a panel of trained judges” using fine-tuning techniques, they obtained smaller and faster models that maintain accuracy while reviewing documents ranging from NDAs to merger agreements. The solution achieved an 80% faster contract review process, enabling lawyers to focus on strategic work while AI handles detailed analysis.

4. Consider hardware-level optimizations for training and inference

If you’re using a managed service like Amazon Bedrock, you can take advantage of built-in optimizations out of the box. However, if you have a more bespoke solution or are operating at a lower level of the technology stack, there are several areas to consider for optimization and efficiency gains. For instance, TGS’s SFMs process massive 3D seismic images (essentially giant CAT scans of the Earth) that can cover tens of thousands of square kilometers. Each dataset is measured in petabytes, far beyond what traditional manual or even semi-automated interpretation methods can handle. By rebuilding their AI models on AWS’s high-performance GPU training infrastructure, TGS achieved near-linear scaling, meaning that adding more computing power results in almost proportional speed increases while maintaining >90% GPU efficiency. As a result, TGS can now deliver actionable subsurface insights, such as identifying drilling targets or de-risking exploration zones, to customers in days instead of weeks.

Over the life of a model, resource requirements are generally driven by inference requests, and any efficiency gains you can achieve will pay dividends during the production phase. One approach to reduce inference demands is model distillation to reduce the model size itself, but in some cases, there are additional gains to be had by digging deeper into the infrastructure. A recent example is Synthesia, the creator of a leading video generation platform where users can create professional videos without the need for mics, cameras, or actors. Synthesia is continually looking for ways to elevate their user experience, including by decreasing generation times for content. They worked with the Innovation Center to optimize the Variational Autoencoder decoder of their already efficient video generation pipeline. Strategic optimization of the model’s causal convolution layers unlocked powerful compiler performance gains, while asynchronous video chunk writing eliminated GPU idle time – together delivering a dramatic reduction in end-to-end latency and a 29% increase in decoding throughput.

5. One size doesn’t fit all

The one size doesn’t fit all principle applies to both model size and family. Some models excel out of the box for specific tasks like code generation, tool usage, document processing, or summarization. With the rapid pace of innovation, the best foundation model for a given use case today likely won’t be the best tomorrow. Model size corresponds to the number of parameters and often determines its ability to complete a broad set of general tasks and capabilities. However, larger models require more compute resources at inference time and can be expensive to run at production scale. Many applications don’t need a model that excels at everything but rather one that performs exceptionally well at a more limited set of tasks or domain-specific capabilities.

Even within a single application, optimization may require using multiple model providers depending on the specific task, complexity level, and latency requirements. In agentic applications, you might use a lightweight model for specialized agent tasks while requiring a more powerful generalist model to orchestrate and supervise those agents. Architecting your solution to be modular and resilient to changing model providers or versions helps you adapt quickly and capitalize on improvements. Services like Amazon Bedrock facilitate this approach by providing a unified API experience across a broad range of model families, including custom versions of many models.

How the Innovation Center can help

The Custom Model Program by the Innovation Center provides end-to-end expert support from model selection to customization, delivering performance improvements, and reducing time-to-market and value realization. Our process works backwards from customer business needs, strategy and goals, and starts with a use case and generative AI capability review by an experienced generative AI strategist. Specialist hands-on-keyboard applied scientists and engineers embed with customer teams to train and tune models for customers and integrate into applications without data ever needing to leave customer VPCs. This end-to-end support has helped organizations across industries successfully transform their AI vision into real business outcomes.

Want to learn more? Contact your account manager to learn more about the Innovation Center or come see us at re:Invent at the AWS Village in the Expo.

About the authors

Sri Elaprolu serves as Director of the AWS Generative AI Innovation Center, where he leverages nearly three decades of technology leadership experience to drive artificial intelligence and machine learning innovation. In this role, he leads a global team of machine learning scientists and engineers who develop and deploy advanced generative and agentic AI solutions for enterprise and government organizations facing complex business challenges. Throughout his nearly 13-year tenure at AWS, Sri has held progressively senior positions, including leadership of ML science teams that partnered with high-profile organizations such as the NFL, Cerner, and NASA. These collaborations enabled AWS customers to harness AI and ML technologies for transformative business and operational outcomes. Prior to joining AWS, he spent 14 years at Northrop Grumman, where he successfully managed product development and software engineering teams. Sri holds a Master’s degree in Engineering Science and an MBA with a concentration in general management, providing him with both the technical depth and business acumen essential for his current leadership role.

Hannah Marlowe leads the Model Customization and Optimization program for the AWS Generative AI Innovation Center. Her global team of strategists, specialized scientists, and engineers embeds directly with AWS customers, developing custom model solutions optimized for relevance, latency, and cost to drive business outcomes and capture ROI. Previous roles at Amazon include Senior Practice Manager for Advanced Computing and Principal Lead for Computer Vision and Remote Sensing. Dr. Marlowe completed her PhD in Physics at the University of Iowa in modeling and simulation of astronomical X-ray sources and instrumentation development for satellite-based payloads.

Rohit Thekkanal serves as ML Engineering Manager for Model Customization at the AWS Generative AI Innovation Center, where he leads the development of scalable generative AI applications focused on model optimization. With nearly a decade at Amazon, he has contributed to machine learning initiatives that significantly impact Amazon’s retail catalog. Rohit holds an MBA from The University of Chicago Booth School of Business and a Master’s degree from Carnegie Mellon University.

Alexandra Fedorova leads Growth for the Model Customization and Optimization program for the AWS Generative AI Innovation Center. Previous roles at Amazon include Global GenAI Startups Practice Leader with the AWS Generative AI Innovation Center, and Global Leader, Startups Strategic Initiatives and Growth. Alexandra holds an MBA degree from Southern Methodist University, and BS in Economics and Petroleum Engineering from Gubkin Russian State University of Oil and Gas.