Reference
Glossary
The terms every AI tutorial drops without warning. Skim once; come back when a word jumps out.
Models & generation
- LLM
- Large language model. A neural network trained on a huge corpus of text that predicts the next token given the previous ones. Claude, ChatGPT, Gemini, and Llama are all LLMs.
- Token
- The unit a model thinks in — roughly a word or part of one. English text is ~0.75 words per token. Models bill per million input + output tokens. read more →
- Context window
- How much input the model can hold at once, measured in tokens. 2026 frontier models handle 200K–2M tokens; useful for whole codebases or long PDFs.
- Frontier model
- The most capable model available — usually expensive, usually slow, usually the right tool for hard problems. Opus / GPT-large / Gemini Ultra at any given moment.
- Temperature
- A dial from 0 to ~2 that controls randomness. 0 = deterministic; higher = more creative / chaotic. 0.7 is a reasonable default for most chat tasks.
- Streaming
- The model emits tokens one at a time as it generates. That's why chat replies appear word-by-word rather than all at once.
Prompts & context
- Prompt
- Everything you send to the model — your message plus any system prompt, history, attached files, and tool definitions. read more →
- System prompt
- Instructions that frame the whole conversation. Custom GPTs / Projects / Gems all save a system prompt for reuse. read more →
- Few-shot
- Including 2–3 worked examples inside the prompt so the model matches your pattern instead of guessing the format.
- Chain-of-thought
- Asking the model to 'reason out loud' before giving a final answer. Modern models do this by default; useful to make explicit for hard math / code review.
- Prompt caching
- When a tool sends the same prompt prefix (system + project files) repeatedly, the provider caches it and charges much less on subsequent calls. Saves real money. read more →
- Hallucination
- When the model confidently produces something that sounds plausible but is false. Wrong dates, invented APIs, made-up citations. Always verify facts. read more →
Agents & tools
- Agent
- A loop where the model decides what to do next, picks a tool, observes the result, and decides again. Cursor / Claude Code / aider are agents over your codebase.
- Tool use
- When the model calls a function you defined — search the web, run code, query a database. The provider returns 'I want to call X with these args'; your code executes; you feed the result back.
- MCP
- Model Context Protocol — the open standard for connecting AI agents to external tools and data (filesystem, GitHub, Slack, Postgres, etc.) without a custom plugin per app. read more →
- ReAct
- Reason + Act loop pattern. The model alternates between thinking out loud and taking an action. Foundation pattern for most agents.
- Autonomous agent
- An agent that runs without human approval in the loop — submits the PR, posts the message, books the flight. Useful when the task is well-scoped; risky otherwise. read more →
- Prompt injection
- An attack where untrusted content the model reads (a webpage, a PDF, an email) contains instructions the model follows. The defence: scope tool access, require approval for destructive actions. read more →
Data & retrieval
- Embedding
- A high-dimensional vector that represents the meaning of a piece of text. Similar meanings = vectors close together. Used to find relevant snippets to feed back into a prompt.
- Vector database
- A database optimised for nearest-neighbour search over embeddings. Pinecone, Weaviate, Qdrant, pgvector, LanceDB.
- RAG
- Retrieval-Augmented Generation. Before answering, fetch relevant documents (often via embeddings) and stuff them into the prompt. The standard way to give a model access to your own data.
- Chunking
- Splitting documents into small overlapping pieces before embedding. Chunk too small and you lose context; too large and retrieval gets noisy. 500–1500 tokens is typical.
- Knowledge file
- A document attached to a Project / GPT / Gem that the model always sees in that conversation. The product-level wrapper around basic RAG. read more →
Multimodal
- Multimodal
- Models that take more than text — images, audio, video, screen captures — as input and/or output. read more →
- Diffusion model
- The architecture behind most image and video generators. Starts with noise and gradually denoises into a coherent image. Stable Diffusion, Flux, Midjourney, Sora.
- Vision-language model
- An LLM that also takes images as input. GPT, Claude, Gemini are all VLMs by default in their current versions.
- TTS / STT
- Text-to-speech / speech-to-text. ElevenLabs and OpenAI TTS are popular TTS; Whisper (also OpenAI) is the open STT baseline.
Training & tuning
- Pre-training
- The expensive first step where the base model is trained on a huge text corpus to learn language. Costs millions; done by labs, not users.
- Fine-tuning
- Taking a pre-trained model and further training it on your specific examples. Useful for narrow consistent tasks; for most users, a good system prompt is enough.
- RLHF
- Reinforcement Learning from Human Feedback. The technique that makes raw language models behave as helpful assistants. Why the model says 'I can't help with that' for some requests.
- Distillation
- Training a smaller, cheaper model to mimic a larger one. How Haiku / GPT-mini / Gemini Flash get good while staying fast.
Plumbing
- API key
- A secret string (often starting with sk-) that authenticates your requests to a provider's API. Never commit one to git. read more →
- Endpoint
- The URL you POST to: /v1/messages (Anthropic), /v1/chat/completions (OpenAI), /v1/models/.../generateContent (Google).
- Rate limit
- The cap on how many requests / tokens you can send per minute. Hit it and the API returns 429. Higher tiers have higher limits.
- Spend cap
- A monthly budget you set in the provider dashboard. Hit it and further API calls fail until next month. Always set one. read more →