Reference

Glossary

The terms every AI tutorial drops without warning. Skim once; come back when a word jumps out.

Models & generation

LLM: Large language model. A neural network trained on a huge corpus of text that predicts the next token given the previous ones. Claude, ChatGPT, Gemini, and Llama are all LLMs.
Token: The unit a model thinks in — roughly a word or part of one. English text is ~0.75 words per token. Models bill per million input + output tokens. read more →
Context window: How much input the model can hold at once, measured in tokens. 2026 frontier models handle 200K–2M tokens; useful for whole codebases or long PDFs.
Frontier model: The most capable model available — usually expensive, usually slow, usually the right tool for hard problems. Opus / GPT-large / Gemini Ultra at any given moment.
Temperature: A dial from 0 to ~2 that controls randomness. 0 = deterministic; higher = more creative / chaotic. 0.7 is a reasonable default for most chat tasks.
Streaming: The model emits tokens one at a time as it generates. That's why chat replies appear word-by-word rather than all at once.

Prompt: Everything you send to the model — your message plus any system prompt, history, attached files, and tool definitions. read more →
System prompt: Instructions that frame the whole conversation. Custom GPTs / Projects / Gems all save a system prompt for reuse. read more →
Few-shot: Including 2–3 worked examples inside the prompt so the model matches your pattern instead of guessing the format.
Chain-of-thought: Asking the model to 'reason out loud' before giving a final answer. Modern models do this by default; useful to make explicit for hard math / code review.
Prompt caching: When a tool sends the same prompt prefix (system + project files) repeatedly, the provider caches it and charges much less on subsequent calls. Saves real money. read more →
Hallucination: When the model confidently produces something that sounds plausible but is false. Wrong dates, invented APIs, made-up citations. Always verify facts. read more →

Agent: A loop where the model decides what to do next, picks a tool, observes the result, and decides again. Cursor / Claude Code / aider are agents over your codebase.
Tool use: When the model calls a function you defined — search the web, run code, query a database. The provider returns 'I want to call X with these args'; your code executes; you feed the result back.
MCP: Model Context Protocol — the open standard for connecting AI agents to external tools and data (filesystem, GitHub, Slack, Postgres, etc.) without a custom plugin per app. read more →
ReAct: Reason + Act loop pattern. The model alternates between thinking out loud and taking an action. Foundation pattern for most agents.
Autonomous agent: An agent that runs without human approval in the loop — submits the PR, posts the message, books the flight. Useful when the task is well-scoped; risky otherwise. read more →
Prompt injection: An attack where untrusted content the model reads (a webpage, a PDF, an email) contains instructions the model follows. The defence: scope tool access, require approval for destructive actions. read more →

Embedding: A high-dimensional vector that represents the meaning of a piece of text. Similar meanings = vectors close together. Used to find relevant snippets to feed back into a prompt.
Vector database: A database optimised for nearest-neighbour search over embeddings. Pinecone, Weaviate, Qdrant, pgvector, LanceDB.
RAG: Retrieval-Augmented Generation. Before answering, fetch relevant documents (often via embeddings) and stuff them into the prompt. The standard way to give a model access to your own data.
Chunking: Splitting documents into small overlapping pieces before embedding. Chunk too small and you lose context; too large and retrieval gets noisy. 500–1500 tokens is typical.
Knowledge file: A document attached to a Project / GPT / Gem that the model always sees in that conversation. The product-level wrapper around basic RAG. read more →

Multimodal: Models that take more than text — images, audio, video, screen captures — as input and/or output. read more →
Diffusion model: The architecture behind most image and video generators. Starts with noise and gradually denoises into a coherent image. Stable Diffusion, Flux, Midjourney, Sora.
Vision-language model: An LLM that also takes images as input. GPT, Claude, Gemini are all VLMs by default in their current versions.
TTS / STT: Text-to-speech / speech-to-text. ElevenLabs and OpenAI TTS are popular TTS; Whisper (also OpenAI) is the open STT baseline.

Pre-training: The expensive first step where the base model is trained on a huge text corpus to learn language. Costs millions; done by labs, not users.
Fine-tuning: Taking a pre-trained model and further training it on your specific examples. Useful for narrow consistent tasks; for most users, a good system prompt is enough.
RLHF: Reinforcement Learning from Human Feedback. The technique that makes raw language models behave as helpful assistants. Why the model says 'I can't help with that' for some requests.
Distillation: Training a smaller, cheaper model to mimic a larger one. How Haiku / GPT-mini / Gemini Flash get good while staying fast.

API key: A secret string (often starting with sk-) that authenticates your requests to a provider's API. Never commit one to git. read more →
Endpoint: The URL you POST to: /v1/messages (Anthropic), /v1/chat/completions (OpenAI), /v1/models/.../generateContent (Google).
Rate limit: The cap on how many requests / tokens you can send per minute. Hit it and the API returns 429. Higher tiers have higher limits.
Spend cap: A monthly budget you set in the provider dashboard. Hit it and further API calls fail until next month. Always set one. read more →