Becoming an AI Developer Without the Math PhD: A Practical Journey into LLMs, Agents, and Real-World Tools
For the past year, the world has been obsessed with what artificial intelligence can do for us. From ChatGPT writing emails to MidJourney generating fantastical images, the dominant narrative has been "how to use AI." But what if you're not satisfied just prompting models? What if you want to build them, customize them, run them offline, and deploy them securely in the cloud?
This is the journey I'm starting now: learning to build with AI, not just use it. And in this post, I’ll lay out the core principles, motivations, and roadmap that will guide my exploration into becoming an AI developer—with a specific focus on LLMs (Large Language Models), agents, training workflows, and cloud/offline deployment.
Let me be clear: I’m not here to write a research paper, derive equations, or become a machine learning theorist. I don’t need to build a transformer from scratch in NumPy. My goal is pragmatic:
I want to learn how to train, run, integrate, and deploy powerful AI tools in real-world environments.
Let’s break this down into the pillars of this journey.
1. Becoming an AI Developer: The New Craft
Today, being an AI developer means being part software engineer, part ML engineer, part systems architect, and part product thinker. The good news is, you don’t need a PhD. You need curiosity, hands-on time, and a practical mindset.
My focus isn’t on AI theory. It’s on:
-
Training or fine-tuning models (LLMs, vision models, etc.)
-
Running models locally and in the cloud
-
Implementing agents (think: multi-step reasoning systems or API-integrated workflows)
-
Integrating LLMs with tools like search engines, file systems, knowledge bases
-
Keeping everything secure, fast, and understandable
This is not about using AI to write a blog post. It’s about building the system that understands your files, fetches your emails, or answers your customer questions—and knowing exactly how it works.
2. Training Models: Fine-Tuning and Beyond
One of my early goals is learning how to take an open-source model (like Meta’s LLaMA or Mistral) and customize it. I’m not aiming for full-scale training on terabytes of data—but rather:
-
Fine-tuning a model on domain-specific content
-
Learning how to do parameter-efficient tuning (like LoRA, QLoRA)
-
Using datasets I care about (technical documents, customer support logs, etc.)
I plan to start by running these fine-tuning jobs offline or on AWS EC2 with GPUs, using tools like HuggingFace’s transformers
, peft
, and trl
. I’ll try small models first (e.g., 7B or 3B parameter models), and work my way up to more complex tuning pipelines.
Why? Because having a model that understands your language, your products, and your workflows is the difference between a toy and a tool.
3. Implementing Agents: Orchestrating Reasoning + Tools
The next area I want to explore is agents. Not just chatbots, but smart, tool-using, context-aware systems.
For example:
-
A file assistant that can read and answer questions about local Markdown, PDF, or code files
-
A developer agent that can call APIs, search the web, and use Bash to automate tasks
-
A customer support AI that integrates with ticket systems and logs
These are powered by LLMs + memory + tool use (sometimes called RAG, or Retrieval-Augmented Generation).
I plan to explore:
I want to understand what agents can actually do, where the hallucinations and reliability issues are, and how to make them robust in production.
4. Running Models Offline
This one is personal: I want full local control. I don’t want every prompt or file to be uploaded to OpenAI or Anthropic.
So, I plan to run:
-
Quantized LLMs using
llama.cpp
,text-generation-webui
, orkoboldcpp
-
Vision models and Stable Diffusion locally using ComfyUI
-
Agents that talk to local tools and use local embeddings
This lets me:
-
Experiment without latency or cost limits
-
Keep things private and airgapped
-
Build a stack that could run on edge devices or in secure environments
I already have a compact AMD-based mini PC. But for heavy lifting, I’ll rent EC2 GPU instances (e.g., g5.xlarge
) to train and test models with full CUDA support.
5. Learning to Use the Cloud: AWS Bedrock and Beyond
While I love offline setups, I also want to master cloud-native AI development:
-
Using Amazon Bedrock to call hosted models like Claude or Titan
-
Deploying my own models with SageMaker JumpStart,
ml.m5
instances, or even ECS with NVIDIA GPU support -
Building LLM-powered APIs with Lambda + API Gateway + DynamoDB
-
Learning MLOps workflows: model tracking, deployment, versioning
The cloud is where scale, availability, and security live. I want to understand:
-
Which models to use when
-
Cost vs performance tradeoffs
-
How to handle burst loads and real users
This means exploring Bedrock, SageMaker, EC2 Spot training jobs, and maybe even multi-region deployment patterns.
6. Comparing AWS EC2 AI-Capable Instances
Before investing in expensive local GPUs, I explored what AWS EC2 has to offer. These instances provide access to powerful NVIDIA GPUs with full CUDA stack, allowing you to experiment with real-world training, inference, and deployment scenarios.
Here's a snapshot comparison of EC2 instances ideal for AI workloads:
Instance Type | GPU Model | VRAM | Price (On-Demand) | Price (Spot) | Notes |
---|---|---|---|---|---|
g5.xlarge |
NVIDIA A10G | 24 GB | ~$1.00/hr | ~$0.35/hr | Best balance of power, cost, and CUDA support for SDXL and LLMs |
g4dn.xlarge |
NVIDIA T4 | 16 GB | ~$0.60/hr | ~$0.20/hr | Lower-end option, fine for SD 1.5 or small LLMs |
g6.xlarge |
NVIDIA L4 | 24 GB | ~$1.10/hr | ~$0.40/hr | CUDA 12.2 support, fast and efficient |
p3.2xlarge |
NVIDIA V100 | 16 GB | ~$3.00/hr | ~$1.00/hr | Older, more expensive, still fast |
p4d.24xlarge |
NVIDIA A100 x8 | 320 GB | ~$32.00/hr | ~$9.00/hr | Extremely powerful, best for multi-GPU training |
All of these instances let you:
-
Run full SDXL pipelines, even with ControlNet or LoRA tuning
-
Fine-tune LLMs with
transformers
,peft
, ortrl
-
Benchmark inference latency vs cost
-
Build production-like setups with Docker, FastAPI, or LangChain
Pro Tips:
-
Use Spot Instances when possible to save up to 70%.
-
Store models in S3 and mount using EFS or copy to EBS for reuse.
-
Use Deep Learning AMIs or containers preinstalled with PyTorch + CUDA.
-
For ephemeral jobs, consider Auto Termination scripts to avoid cost surprises.
This approach gives you true NVIDIA GPU experience, and it’s perfect for trial runs before committing to building your own $2,000+ local rig.
7. Skipping the Math (Mostly)
I’ve said it before: I don’t need to know how to implement backpropagation from scratch. I care about model behavior, training workflows, and integration patterns, not the underlying calculus.
That said, I’ll still learn the basics of:
-
Tokenization (and what it means for prompts)
-
Attention mechanisms (at a conceptual level)
-
Embeddings and vector search
-
Fine-tuning configs and hyperparameters
But if the choice is between reading a 60-page paper on gradient descent vs. getting a working RAG agent running in AWS… I’m picking the agent.
8. What’s Next
In the coming weeks and months, I’ll be sharing hands-on walkthroughs of what I’m building and learning:
-
Building your first local RAG agent with
llama.cpp
-
Fine-tuning a 7B model on custom markdown docs
-
Deploying an LLM-powered customer support API in AWS Lambda
-
Using Amazon Bedrock with multiple foundation models in one app
I’ll post every success, failure, and bottleneck—because I know I’m not the only software engineer tired of prompt engineering and ready to get into real AI building.
If you're on the same journey—to build with AI, not just use it—follow along. This is going to be a practical, grounded, and developer-first look at how to make LLMs and agents actually work for you.
Comments
Post a Comment