
You've got an AI agent running. It can chat, summarize, maybe even call a few APIs. But ask it to do something specific to your workflow, like pulling data from your internal inventory system or formatting output for your team's Slack channel, and it just stares back at you with a confident, completely wrong answer.
That's the gap AI agent skills are designed to fill. A skill is a modular, reusable capability you attach to an agent so it can actually do something useful beyond its base training. Think of it like a plugin, but for agents instead of browsers.
This guide covers what skills are, how to build them from scratch, which agents support them, and where to find pre-built ones. No filler. Just the stuff you need to start creating skills today.
In a nutshell, agent Skills are a lightweight, open format for extending AI agent capabilities with specific domain knowledge and workflows. At its very core, a skill is simply a folder or directory structure containing a SKILL.md file. This file includes some metadata (name and description, at the bare minimum) as well as instructions that instruct the agent how to perform a specific task. Skills can also include scripts, templates, and reference materials. The following is an example of a skill directly.
my-skill/
├── SKILL.md # Required: instructions and metadata
├── scripts/ # Optional: executable code
├── references/ # Optional: documentation
└── assets/ # Optional: templates, resourcesIt is a self-contained unit of functionality that an AI agent can load and execute. It defines a specific capability, like "search a database," "generate a PDF report," or "send a formatted email." The agent doesn't need to be retrained. It just needs to know the skill exists and how to invoke it.
Most skill implementations follow a common pattern. There's a manifest file that describes what the skill does, input/output schemas that define the data contract, and the actual execution logic. The agent reads the manifest, understands when to use the skill, and calls it with the right parameters.
If you've ever written a function and wished you could just hand it to an LLM and say "use this when appropriate," that's essentially what a skill enables. Except it's formalized, portable, and designed to work across multiple agent frameworks.
The most important part of that folder is the Skills.md file, and contains the actual instructions. Below is a bare minimum skeleton of a skill.
---
name: my-skill-name
description: A clear description of what this skill does and when to use it
---
# My Skill Name
[Add your instructions here that Claude will follow when this skill is active]
## Examples
- Example usage 1
- Example usage 2
## Guidelines
- Guideline 1
- Guideline 2The terminology gets messy fast. "Tools" in LangChain, "plugins" in ChatGPT, "actions" in GPTs, "skills" in the agentic ecosystem. They all describe roughly the same concept: extending an agent's capabilities beyond what it was trained to do. The key difference with agentic skills is standardization.
A tool in LangChain is tightly coupled to LangChain's framework. A ChatGPT plugin only works inside ChatGPT. Skills, as defined by the emerging standard at agentskills.io, aim to be framework-agnostic. Write once, attach to any compatible agent. That's the pitch, and it's actually working in practice for a growing number of platforms.
This helps to understand what goes into a skill at the start. In the SKILL.md file itself, you can find something that looks like this:
---
name: pdf-processing
description: Extract PDF text, fill forms, merge files. Use when handling PDFs.
---
# PDF Processing
## When to use this skill
Use this skill when the user needs to work with PDF files...
## How to extract text
1. Use pdfplumber for text extraction...
## How to fill forms
...The first part contains the name and description, which are obligatory to include. The agent first reads these to understand what the skill is about, and then if it needs to perform a task that is related to this skill and feels it can benefit from this skill, it will then load the entire thing into context when performing that task. Then the Markdown body contains the instructions and has no specific restrictions on its structure. This format has some advantages. First, it is self documenting and easy to read by yourself or others. Second, it is extensible, and can contain executable code. Third, it is portable, and easy to share and move around.
Every skill has three core components, and getting them right is the difference between a skill that agents use reliably and one they ignore or misfire on.
This is the skill's resume. It tells the agent what the skill does, when to use it, and what kind of inputs it expects. A well-written manifest is surprisingly important because agents decide whether to invoke a skill based largely on the natural language description in this file.
A vague description like "handles data processing" will get your skill ignored. Something like "retrieves the current inventory count for a specific SKU from the warehouse management system" gives the agent enough context to match it to the right user requests. Be obsessively specific here.
These define the data contract. What goes in, what comes out. Most skill frameworks use JSON Schema for this. The input schema tells the agent what parameters it needs to collect before calling the skill. The output schema tells it what to expect back.
Strong typing matters here more than you think. If your input schema accepts a loose "query" string when it really needs a structured object with a date range and a customer ID, the agent will hallucinate the missing fields. Define your schemas tightly and the agent will ask the user for the right information before executing.
A good tip is to also add negative examples, or instructions of what it should NOT do. This adds tighter rules and guardrails so it only executes the specific tasks you want, and nothing else.
This is the actual code that runs when the skill is invoked. It can be a simple function, an API call, a database query, or a multi-step workflow. The execution logic receives the validated input, does its thing, and returns data matching the output schema.
Keep it stateless when possible. Skills that depend on external state become fragile and hard to debug. If you need state, manage it explicitly through the input/output contract rather than relying on side effects.
This is what happens behind the scenes:
Discovery: When the chat session starts, the agent scans default skill directories and finds your skill. It reads only the name and description part, just enough to know when the skill might be relevant for a task.
Activation: When performing a specific action, the agent checks to see if the skill’s description is relevant to the task it is performing and if yes it loads the full SKILL.md body into context.
Execution: The agent follows the exact instructions in the body, and this essentially guides it in the current task it is performing, following all the instructions you set in the skill file.
This is the part most guides bury under theory. Here's the practical walkthrough for creating skills that actually work. The process applies regardless of which agent framework you're targeting, though the specific file formats may vary slightly.
But be careful, don't just randomly load a bunch of skills! They all count as part of the tokens you spend. So it is more efficient if you only add the relevant skills that you know for sure the agent needs. Other not so important skills would just be a waste of tokens.
Start with a single, clear sentence describing what the skill does. Not what it could do or might do. What it does. "This skill fetches the current weather for a given city using the OpenWeatherMap API." If your description needs an "and" in it, you probably need two skills.
Scope creep kills skills the same way it kills features. An agent can chain multiple focused skills together far more reliably than it can navigate one bloated skill with fifteen parameters and conditional logic branches.
Map out every piece of data the skill needs to function. For a weather skill, that might be a city name (required) and a unit preference (optional, defaulting to Celsius). Then define what comes back: temperature, conditions, humidity, wind speed.
Write the JSON Schema before writing any code. This forces you to think through edge cases early. What happens if the city name is misspelled? What if the API returns no data? Define error responses in your output schema so the agent can communicate failures gracefully instead of crashing silently.
The manifest ties everything together. Here's a simplified example of what one looks like in practice:
Name: A unique identifier like get-current-weather
Description: A natural language explanation the agent uses to decide when to invoke the skill. Make this detailed and specific. Include example phrases a user might say that should trigger it.
Input Schema: The JSON Schema for required and optional parameters
Output Schema: The JSON Schema for the response format
Auth: Any API keys or credentials the skill needs, referenced securely
The description field deserves extra attention. Spend more time on it than you think is reasonable. Agents are pattern-matching on this text to decide whether your skill is the right one for a given request. Ambiguity here leads to the skill being invoked at the wrong time or not at all.
Now write the code. Most skill frameworks support Python or JavaScript, though the specific runtime depends on your target platform. The implementation should be straightforward: receive validated input, call whatever external service or logic is needed, format the response to match the output schema, return it.
Error handling is non-negotiable. Wrap external API calls in try/catch blocks. Return structured error objects, not raw stack traces. The agent needs to understand what went wrong so it can either retry, ask the user for different input, or explain the failure clearly.
This step gets skipped constantly and it's the source of most debugging headaches. Run your skill as a standalone function before plugging it into any agent framework. Feed it valid inputs, invalid inputs, edge cases, empty strings, missing fields. Make sure the execution logic and schemas are solid before adding the complexity of an agent runtime.
Only after the skill works perfectly in isolation should you connect it to an agent and test the full loop: user request, agent interpretation, skill selection, execution, response formatting.
Deployment varies by platform. Some agent frameworks let you register skills by pointing to a local directory. Others require uploading to a skill registry or hosting the skill as an API endpoint. The agentskills.io ecosystem is pushing toward a standard registry where you can publish and discover skills, similar to npm for Node packages.
Once deployed, monitor how the agent uses your skill in real conversations. You'll almost certainly need to refine the manifest description after watching real usage patterns. The first version of your description is never the best one.
Skills can be trivially simple or deeply complex. Here are real-world examples across the spectrum to give you a sense of what's possible.
Takes a value, source unit, and target unit. Returns the converted value. No external APIs needed, just math. About 30 lines of code plus the manifest. Sounds boring, but agents without this skill will approximate conversions and get them wrong with alarming confidence.
Takes a customer name or ID, queries your CRM's API, returns account details, recent interactions, and open tickets. This is where agentic skills start showing real business value. An agent with this skill can answer "What's the status of Acme Corp's last support ticket?" without the user leaving the chat interface.
Takes a date range and report type, queries multiple internal databases, aggregates the data, generates formatted charts, and returns a downloadable PDF. This is actually multiple skills chained together, which is the right architecture. One skill for data retrieval, one for aggregation, one for chart generation, one for PDF rendering. The agent orchestrates the chain.
Takes a meeting transcript (or a link to a recording), extracts key decisions, action items, and owners, then formats everything into your team's standard meeting notes template. The skill handles the extraction logic while the agent handles the conversation around clarifications and follow-ups.
Not every agent framework supports external skills yet, and the ones that do implement them differently. Here's the current state of play as the ecosystem evolves.
Frameworks that have built-in skill or tool support include LangChain, LlamaIndex, AutoGPT, CrewAI, and Microsoft's Semantic Kernel. Each has its own way of defining and registering external capabilities, but the core concept is the same. You define what the skill does, how to call it, and the framework handles the orchestration.
The agentskills.io initiative is working to create a universal standard so that a skill written once can be loaded into any compatible agent. Think of it like how Docker containers run on any container runtime. The standard is still maturing, but early adoption is growing fast, especially among open-source agent projects.
Proprietary platforms like OpenAI's Assistants API and Anthropic's tool use feature have their own skill-like mechanisms. They're not directly compatible with the open standard yet, but the concepts map cleanly. If you write a well-structured skill for one platform, porting it to another is mostly a formatting exercise, not a rewrite.
If building from scratch sounds like overkill for your use case, check agentskills.io first. It contains the exact skills format used by Codex, Gemini, and other major AI IDEs. It's becoming the central directory for discovering, sharing, and installing pre-built AI agent skills. Think of it as a package registry specifically for agent capabilities.
The site explains what skills are, hosts a growing catalog of community-contributed skills, and provides documentation for the standard skill format. If you're evaluating whether to build or use an existing skill, this is the first place to look.
For skill authors, publishing to agentskills.io gets your work in front of the entire agent developer community. The submission process is straightforward, and having a centralized registry means less time explaining to users how to install your skill manually.
After watching dozens of developers go through the process of how to create a skill for the first time, certain patterns keep showing up. Avoid these and you'll save hours of debugging.
Already mentioned this but it's worth repeating because it's the number one issue. "Processes data" tells the agent nothing. "Calculates the 30-day rolling average of daily active users from the analytics database and returns a time series with date and value pairs" tells it everything. The more specific your description, the more accurately the agent will invoke your skill.
A skill that does seven things is seven skills crammed into one. Agents work better with focused, single-purpose skills they can compose together. If you find yourself adding conditional branches based on an input "mode" parameter, split it up.
Accepting a generic "data" object as input is asking for trouble. The agent will send whatever it thinks fits, and it will think wrong. Define every field explicitly. Use enums for constrained values. Set required fields. The tighter the schema, the more reliable the execution.
When a skill fails silently or throws an unstructured error, the agent has no idea what happened. It might retry infinitely, hallucinate a response, or just go quiet. Always return structured error responses with clear messages the agent can relay to the user.
The skill ecosystem is still early. Right now it feels a bit like npm in 2012, lots of potential, inconsistent quality, and competing standards that haven't fully shaken out yet. But the direction is clear.
As more agent frameworks converge on a shared skill standard, the network effects will accelerate. A skill written for one agent becomes usable by every agent. Developers stop rebuilding the same integrations for every new framework. Companies can maintain a library of internal skills that survive framework migrations.
The agents that win won't be the ones with the biggest models. They'll be the ones with the richest skill ecosystems. And that ecosystem gets built by developers who start creating skills now, while the standards are still being shaped and the registry is still small enough that good contributions get noticed.
Pick a workflow you're tired of doing manually. Turn it into a skill. Publish it. That's genuinely the best way to understand where this is all going.

AI LLM context windows can hold millions of tokens, yet bigger isn't always better. Examine the trade-offs and surprises here.

Discover the crucial differences between tokens, characters, and words in large language models. Understand how they impact LLM outputs.
AI LLM context windows can hold millions of tokens, yet bigger isn't always better. Examine the trade-offs and surprises here.
Discover the crucial differences between tokens, characters, and words in large language models. Understand how they impact LLM outputs.
Explore the top AI coding assistants like Cursor and GitHub Copilot, designed to transform your coding workflow.

Explore the top AI coding assistants like Cursor and GitHub Copilot, designed to transform your coding workflow.