Model Training Explained for Tech Leaders | Page Carbajal

Training models is one of the most powerful capabilities modern teams can develop.

Not because it is flashy, or because it signals technical sophistication, but because it represents ownership.

Ownership of behavior.
Constraints.
Costs.
and Long‑term outcomes.

For tech leaders, model training is not a technical milestone. It is a strategic choice. One that only makes sense when it is grounded in clarity, not pressure or hype.

This article is an introduction.

Its purpose is not to explain how to train models, but to help you decide when training is justified, what you need to understand before you commit, and what trade‑offs you are accepting when you do.

A practical decision checklist

Before touching terminology, start here. This checklist exists to surface reality early.

What breaks today with prompts or retrieval?
What happens when the model is wrong?
How expensive is iteration failure?
Is this capability core or supportive?
Do we have the data advantage to justify training?

If these questions are difficult to answer, that is already an answer.

The terms and concepts to master before you train your own models

Training vs inference

Training equates learning, as Inference equates execution. Rememeber LLMs are prediction, not intelligence.

Training is slow, expensive, and iterative. Inference must be fast, predictable, and ~~boring~~ precise.

As one industry explanation puts it, “training is the learning phase where the model develops its predictive capabilities, while inference applies the trained model to new data” (Mendix). Confusing the two is one of the fastest ways to design an expensive system that fails in production.

Training vs fine‑tuning vs reuse

Training from scratch means building model behavior entirely from your own data.

Fine‑tuning adjusts an existing model within limited boundaries.

Reuse accepts a model as‑is and solves problems around it.

Research consistently shows that fine‑tuning pre‑trained models costs a fraction of training from scratch, often reducing compute requirements by more than 90% (MiniML). For most teams, reuse and fine‑tuning unlock value faster and with far less risk.

Data is the real asset

Models do not encode understanding. They reflect data.

As one definition bluntly states, “model training constitutes the process of feeding curated datasets to machine learning algorithms to optimize their parameters through iterative learning” (Domino). If your data is noisy, biased, or poorly scoped, training will faithfully reproduce those qualities.

Splitting data into training, validation, and test sets is not ceremony. It is how teams prevent self‑deception and overfitting (Iguazio).

Optimization and loss, at a leadership level

Loss functions are proxies. Metrics are incentives.

Optimizing a number does not guarantee improving a system. Many failures happen because teams chase marginal metric gains while degrading real‑world behavior. Iteration always has a cost, and knowing when to stop is a leadership decision.

When training a custom model is actually justified

Two lessons from earlier in this track matter here.

From What are small language models in AI? : smaller, focused models often outperform larger ones when the task is well defined and bounded.

From The cost of running LLMs privately : owning models means owning responsibility, not just infrastructure.

Training is justified when:

You control unique, defensible data others cannot access
The task is narrow, stable, and repeatable
Failure modes are understood and acceptable
Latency, privacy, or regulation demand control
Long‑term leverage clearly outweighs short‑term cost

Training is rarely about being cutting‑edge. It is about being deliberate.

When training is usually not the right move

Training is often the wrong choice when:

The product is still discovering the real problem
Goals are framed as “make it smarter”
Data is sparse, noisy, or constantly changing
The team cannot sustain long‑term model ownership
System design or retrieval would solve the issue faster

In these cases, training increases risk without increasing clarity.

Training LLMs vs training SMLs

Training large language models

Training large language models is fundamentally a capital decision.

Public estimates place GPT‑4’s training compute alone at tens of millions of dollars, with total costs exceeding $100M (Forbes). These numbers exclude staffing, data acquisition, and ongoing maintenance.

For most organizations, this level of investment is neither necessary nor defensible.

Training small language models

SMLs change the equation.

They trade generality for focus, allowing faster iteration, clearer failure boundaries, and dramatically lower costs. This article leans slightly toward SMLs as a practical default today—not as a trend claim, but as a reflection of current constraints and agent‑oriented systems.

Financial reality: the constraint that shapes most decisions

Compute is only the visible cost.

Data preparation, labeling, experimentation time, and expertise quietly dominate budgets. As multiple analyses note, training costs have increased more than 4,000% since 2020 (Edge AI Vision).

Financial awareness does not limit ambition. It sharpens it.

Model training in practice

If you are going to embark on this enterprise, it helps to ground the decision in reality early. Below are resources teams commonly leverage to train or fine-tune models today, followed by ballpark pricing meant to inform judgment—not to optimize spreadsheets.

Managed cloud platforms

Thinking Machines (Tinker).

Tinker is designed to let teams focus on what actually differentiates them—their data and algorithms—while the platform handles the heavy lifting of distributed fine-tuning. As they state directly:

Tinker lets you focus on what matters in LLM fine-tuning – your data and algorithms – while we handle the heavy lifting of distributed training.

This positions Tinker not as a generic GPU provider, but as an abstraction layer over training complexity, aimed at teams that already know why they are training and want to reduce operational and cognitive overhead rather than reinvent infrastructure (Tinker docs).

Amazon Nova on AWS.

If you are already operating in AWS, SageMaker provides the end-to-end lifecycle, while the Nova family is positioned primarily around fine-tuning and API-based usage rather than immediate training from scratch (AWS SageMaker pricing).

IBM watsonx.ai / Watson Studio.

For regulated industries, IBM’s platforms emphasize governance, auditability, and enterprise workflows, with compliance and lifecycle controls built in by default (IBM).

Ballpark pricing that informs decisions

Cost only becomes real when you translate per-hour jibber-jabber into this is what the month looks like. So let’s do that, with simple, conservative assumptions.

Assume:

One engineer running training jobs
8 hours a day
20 working days per month

That’s ~160 GPU hours per month for a focused training effort. Many teams exceed this quickly once experiments restart, fail, and rerun.

Scenario 1: Light experimentation or SML fine-tuning

You are iterating on a small or mid-sized model. Think agent components, classifiers, or narrow domain language models.

RunPod (RTX 4090) at ~$0.34/hr
- 160 hrs × $0.34 ≈ $55/month per GPU

This is why teams start here. It’s cheap enough to learn, fail, and rerun without fear. The trade-off is that you own most of the operational complexity.

Scenario 2: Managed training in a familiar cloud

You want lifecycle tooling, logs, IAM, and fewer surprises.

AWS SageMaker (V100) at ~$3.83/hr
- 160 hrs × $3.83 ≈ $613/month per instance

This is still reasonable for exploratory work, but notice the shift: experiments now have a visible burn rate. Idle time starts to matter.

Scenario 3: Serious training runs

Now, you move into multi-GPU territory, larger models, or longer convergence times.

AWS SageMaker (8×A100 node) at ~$37.69/hr
- 160 hrs × $37.69 ≈ $6,030/month per node

And that is one node. A short multi-node experiment can turn into a five-figure line item surprisingly fast.

Scenario 4: Modern managed alternatives

You want newer hardware without full infrastructure ownership.

Google Vertex AI (L4 GPU) at ~$0.64/hr
- 160 hrs × $0.64 ≈ $102/month per GPU
Google Vertex AI (A100 80 GB) at ~$4.52/hr
- 160 hrs × $4.52 ≈ $723/month per GPU

This is often the middle ground: better hardware, manageable costs, and fewer sharp edges. One of the reasons I think Google might win the AI Race.

While none of these numbers are catastrophic on their own, what might quickly burn a whole in your wallet is underestimating:

Experiments multiply.
Runs restart
and How “just one more tweak” becomes a habit.

Treat these figures as the cost of learning, not the cost of winning. They’ll help you decide how much learning you can afford, and how disciplined your iteration needs to be.

Above all, you should assume a fail fast mentality.

If you are serious about learning How to train your dragon model

Training is phisical, time consuming, and expensive —as seen above. But if you need to do it, here are some resources you can use to learn about it.

Machine Learning Specialization by Andrew Ng
Machine Learning Crash Course by Google
Practical Deep Learning for Coders by fast.ai
CS229: Machine Learning by Stanford
Hugging Face LLM course

This is not about becoming an ML engineer overnight. It is about building enough shared vocabulary to lead responsibly.

Learn from Rocky

Most people who think Rocky is a fight-movie terribly misunderstand what this master piece of amovie is about. A man doing the best he can with the tools he’s got.

It is a story about discipline, constraint, and grit.

The moment everyone misses is when Rocky chooses his strategy. He knows he cannot beat Apollo. Winning is not the goal. Going the distance is.

Learn from Rocky. Choose your strategy before you step into the ring.

Catch you all in the next one.