Becoming an AI Developer Without the Math PhD: A Practical Journey into LLMs, Agents, and Real-World Tools

 For the past year, the world has been obsessed with what artificial intelligence can do for us. From ChatGPT writing emails to MidJourney generating fantastical images, the dominant narrative has been "how to use AI." But what if you're not satisfied just prompting models? What if you want to build them, customize them, run them offline, and deploy them securely in the cloud?

This is the journey I'm starting now: learning to build with AI, not just use it. And in this post, I’ll lay out the core principles, motivations, and roadmap that will guide my exploration into becoming an AI developer—with a specific focus on LLMs (Large Language Models), agents, training workflows, and cloud/offline deployment.

Let me be clear: I’m not here to write a research paper, derive equations, or become a machine learning theorist. I don’t need to build a transformer from scratch in NumPy. My goal is pragmatic:

I want to learn how to train, run, integrate, and deploy powerful AI tools in real-world environments.

Let’s break this down into the pillars of this journey.


1. Becoming an AI Developer: The New Craft

Today, being an AI developer means being part software engineer, part ML engineer, part systems architect, and part product thinker. The good news is, you don’t need a PhD. You need curiosity, hands-on time, and a practical mindset.

My focus isn’t on AI theory. It’s on:

  • Training or fine-tuning models (LLMs, vision models, etc.)

  • Running models locally and in the cloud

  • Implementing agents (think: multi-step reasoning systems or API-integrated workflows)

  • Integrating LLMs with tools like search engines, file systems, knowledge bases

  • Keeping everything secure, fast, and understandable

This is not about using AI to write a blog post. It’s about building the system that understands your files, fetches your emails, or answers your customer questions—and knowing exactly how it works.


2. Training Models: Fine-Tuning and Beyond

One of my early goals is learning how to take an open-source model (like Meta’s LLaMA or Mistral) and customize it. I’m not aiming for full-scale training on terabytes of data—but rather:

  • Fine-tuning a model on domain-specific content

  • Learning how to do parameter-efficient tuning (like LoRA, QLoRA)

  • Using datasets I care about (technical documents, customer support logs, etc.)

I plan to start by running these fine-tuning jobs offline or on AWS EC2 with GPUs, using tools like HuggingFace’s transformers, peft, and trl. I’ll try small models first (e.g., 7B or 3B parameter models), and work my way up to more complex tuning pipelines.

Why? Because having a model that understands your language, your products, and your workflows is the difference between a toy and a tool.


3. Implementing Agents: Orchestrating Reasoning + Tools

The next area I want to explore is agents. Not just chatbots, but smart, tool-using, context-aware systems.

For example:

  • A file assistant that can read and answer questions about local Markdown, PDF, or code files

  • A developer agent that can call APIs, search the web, and use Bash to automate tasks

  • A customer support AI that integrates with ticket systems and logs

These are powered by LLMs + memory + tool use (sometimes called RAG, or Retrieval-Augmented Generation).

I plan to explore:

I want to understand what agents can actually do, where the hallucinations and reliability issues are, and how to make them robust in production.


4. Running Models Offline

This one is personal: I want full local control. I don’t want every prompt or file to be uploaded to OpenAI or Anthropic.

So, I plan to run:

  • Quantized LLMs using llama.cpp, text-generation-webui, or koboldcpp

  • Vision models and Stable Diffusion locally using ComfyUI

  • Agents that talk to local tools and use local embeddings

This lets me:

  • Experiment without latency or cost limits

  • Keep things private and airgapped

  • Build a stack that could run on edge devices or in secure environments

I already have a compact AMD-based mini PC. But for heavy lifting, I’ll rent EC2 GPU instances (e.g., g5.xlarge) to train and test models with full CUDA support.


5. Learning to Use the Cloud: AWS Bedrock and Beyond

While I love offline setups, I also want to master cloud-native AI development:

  • Using Amazon Bedrock to call hosted models like Claude or Titan

  • Deploying my own models with SageMaker JumpStart, ml.m5 instances, or even ECS with NVIDIA GPU support

  • Building LLM-powered APIs with Lambda + API Gateway + DynamoDB

  • Learning MLOps workflows: model tracking, deployment, versioning

The cloud is where scale, availability, and security live. I want to understand:

  • Which models to use when

  • Cost vs performance tradeoffs

  • How to handle burst loads and real users

This means exploring Bedrock, SageMaker, EC2 Spot training jobs, and maybe even multi-region deployment patterns.


6. Comparing AWS EC2 AI-Capable Instances

Before investing in expensive local GPUs, I explored what AWS EC2 has to offer. These instances provide access to powerful NVIDIA GPUs with full CUDA stack, allowing you to experiment with real-world training, inference, and deployment scenarios.

Here's a snapshot comparison of EC2 instances ideal for AI workloads:

Instance Type GPU Model VRAM Price (On-Demand) Price (Spot) Notes
g5.xlarge NVIDIA A10G 24 GB ~$1.00/hr ~$0.35/hr Best balance of power, cost, and CUDA support for SDXL and LLMs
g4dn.xlarge NVIDIA T4 16 GB ~$0.60/hr ~$0.20/hr Lower-end option, fine for SD 1.5 or small LLMs
g6.xlarge NVIDIA L4 24 GB ~$1.10/hr ~$0.40/hr CUDA 12.2 support, fast and efficient
p3.2xlarge NVIDIA V100 16 GB ~$3.00/hr ~$1.00/hr Older, more expensive, still fast
p4d.24xlarge NVIDIA A100 x8 320 GB ~$32.00/hr ~$9.00/hr Extremely powerful, best for multi-GPU training

All of these instances let you:

  • Run full SDXL pipelines, even with ControlNet or LoRA tuning

  • Fine-tune LLMs with transformers, peft, or trl

  • Benchmark inference latency vs cost

  • Build production-like setups with Docker, FastAPI, or LangChain

Pro Tips:

  • Use Spot Instances when possible to save up to 70%.

  • Store models in S3 and mount using EFS or copy to EBS for reuse.

  • Use Deep Learning AMIs or containers preinstalled with PyTorch + CUDA.

  • For ephemeral jobs, consider Auto Termination scripts to avoid cost surprises.

This approach gives you true NVIDIA GPU experience, and it’s perfect for trial runs before committing to building your own $2,000+ local rig.


7. Skipping the Math (Mostly)

I’ve said it before: I don’t need to know how to implement backpropagation from scratch. I care about model behavior, training workflows, and integration patterns, not the underlying calculus.

That said, I’ll still learn the basics of:

  • Tokenization (and what it means for prompts)

  • Attention mechanisms (at a conceptual level)

  • Embeddings and vector search

  • Fine-tuning configs and hyperparameters

But if the choice is between reading a 60-page paper on gradient descent vs. getting a working RAG agent running in AWS… I’m picking the agent.


8. What’s Next

In the coming weeks and months, I’ll be sharing hands-on walkthroughs of what I’m building and learning:

  • Building your first local RAG agent with llama.cpp

  • Fine-tuning a 7B model on custom markdown docs

  • Deploying an LLM-powered customer support API in AWS Lambda

  • Using Amazon Bedrock with multiple foundation models in one app

I’ll post every success, failure, and bottleneck—because I know I’m not the only software engineer tired of prompt engineering and ready to get into real AI building.

If you're on the same journey—to build with AI, not just use it—follow along. This is going to be a practical, grounded, and developer-first look at how to make LLMs and agents actually work for you.

Comments

Popular posts from this blog

Going In With Rust: The Interview Prep Guide for the Brave (or the Mad)

Is Docker Still Relevant in 2025? A Practical Guide to Modern Containerization