
You're debugging a model comparison and need to know exactly which OpenAI models exist. Not just the famous ones. All of them. From the research paper that started it all to the reasoning models dropping this week.
Here's the complete OpenAI LLM family lineage, mapped chronologically with release dates and key capabilities.
GPT-1 (June 2018)
117 million parameters. The proof of concept that transformer-based language modeling could work at scale. Trained on BookCorpus dataset. No chat interface, just text completion.
GPT-2 (February 2019)
1.5 billion parameters. OpenAI initially held back the full model, claiming it was too dangerous to release. Four variants: 124M, 355M, 774M, and the full 1.5B version released later in November 2019.
GPT-3 (June 2020)
175 billion parameters. The model that made everyone pay attention. Multiple sizes in the family:
GPT-3 Small (125M parameters)
GPT-3 Medium (350M parameters)
GPT-3 Large (760M parameters)
GPT-3 XL (1.3B parameters)
GPT-3 (6.7B parameters)
GPT-3 (175B parameters)
Initially available only through API. No fine-tuning, no chat optimization. Pure text completion.
InstructGPT (January 2022)
The first major step toward following human instructions. Three model sizes: 1.3B, 6B, and 175B parameters. Trained using reinforcement learning from human feedback (RLHF). This became the foundation for everything that followed.
text-davinci-002 (April 2022)
The first GPT-3.5 series model. Based on InstructGPT but with improved instruction following. Code-capable. This was the model that could actually help you write functions instead of just completing them.
text-davinci-003 (November 2022)
Refined version of text-davinci-002. Better at following complex instructions, improved factual accuracy, less likely to make up information. The last great completion model before chat took over.
gpt-3.5-turbo (March 2023)
The model that changed everything. Optimized for chat, 10x cheaper than text-davinci-003, and fast enough for real-time applications. 4,096 token context window initially.
gpt-3.5-turbo-16k (June 2023)
Extended context version. 16,384 token context window. Same underlying model, four times the memory.
gpt-3.5-turbo-instruct (September 2023)
Hybrid approach. Chat-optimized model that still supported completion-style prompts. Filled the gap for developers who needed both modes.
gpt-3.5-turbo-1106 (November 2023)
Updated training data with knowledge cutoff of April 2023. Improved instruction following and JSON mode support.
gpt-3.5-turbo-0125 (February 2024)
Latest GPT-3.5 turbo variant. Reduced cost, improved performance on specific tasks, better at following system instructions.
gpt-4 (March 2023)
The flagship. Significantly more capable than GPT-3.5, better reasoning, more reliable, safer responses. 8,192 token context initially. Vision capabilities existed but weren't publicly available yet.
gpt-4-32k (March 2023)
Extended context version with 32,768 token context window. Limited access initially, broader availability later.
gpt-4-vision-preview / gpt-4v (September 2023)
First public multimodal model. Could analyze images, read charts, describe photos. The model that made screenshots searchable.
gpt-4-1106-preview (November 2023)
128k context window. JSON mode, function calling improvements, updated knowledge through April 2023. The model developers had been waiting for.
gpt-4-turbo-preview (January 2024)
Cost reduction and performance optimization. Better at complex reasoning tasks, more efficient token usage.
gpt-4-turbo (April 2024)
Production-ready version of gpt-4-turbo-preview. Vision capabilities built in, not a separate model. 128k context standard.
gpt-4o (May 2024)
The omnimodal model. Text, vision, and audio in a single model. Faster than GPT-4 turbo, cheaper, better at non-English languages. The "o" stands for "omni."
gpt-4o-mini (July 2024)
Smaller, faster, cheaper version of GPT-4o. Designed to replace GPT-3.5 turbo as the default for most applications. 128k context, multimodal capabilities.
o1-preview (September 2024)
The first reasoning model. Trained to think through problems step by step. Excels at math, coding, and science. Slower responses but significantly better at complex reasoning tasks.
o1-mini (September 2024)
Faster, cheaper reasoning model. Optimized for coding and math. 80% of o1-preview's performance at 20% of the cost for most coding tasks.
o1 (December 2024)
Production version of o1-preview. Improved reasoning capabilities, faster responses, better at following instructions while maintaining the deep thinking approach.
o1 Pro (December 2024)
Enhanced reasoning model with more compute per query. Better at the hardest problems in math, science, and coding. Limited availability through ChatGPT Pro tier.
o3-mini (December 2024)
Latest reasoning model, successor to o1-mini. Improved efficiency and accuracy for coding and mathematical reasoning tasks.
Several models from the early OpenAI LLM family are no longer available:
text-ada-001, text-babbage-001, text-curie-001 - Early GPT-3 variants with different parameter counts
davinci, curie, babbage, ada - Original GPT-3 base models
code-davinci-002 - Specialized coding model, predecessor to broader code capabilities in later models
text-embedding-ada-002 - While technically not a chat model, part of the broader model family ecosystem
These models were phased out as newer, more capable versions replaced them. Most were deprecated between 2023 and 2024.
The pattern is clear. Each generation brings major capability jumps: better reasoning, longer context, multimodal features, or specialized skills like the o1 series reasoning capabilities.
GPT-5 development is ongoing. Based on the trajectory from GPT-3 to GPT-4, expect significant improvements in reasoning, knowledge integration, and possibly new modalities beyond text, vision, and audio.
The complete lineage of OpenAI LLMs shows rapid iteration and capability expansion. From 117 million parameter text completion in 2018 to multimodal reasoning models in 2024. Worth tracking these releases if you're building anything that depends on staying current with AI capabilities.

AI LLM context windows can hold millions of tokens, yet bigger isn't always better. Examine the trade-offs and surprises here.

Discover the crucial differences between tokens, characters, and words in large language models. Understand how they impact LLM outputs.
AI LLM context windows can hold millions of tokens, yet bigger isn't always better. Examine the trade-offs and surprises here.
Discover the crucial differences between tokens, characters, and words in large language models. Understand how they impact LLM outputs.
Explore the top AI coding assistants like Cursor and GitHub Copilot, designed to transform your coding workflow.

Explore the top AI coding assistants like Cursor and GitHub Copilot, designed to transform your coding workflow.