Building a Testing Agent with OpenCode (Opinionated Walkthrough)
Okay so this is basically my messy notes from experimenting with OpenCode for test generation. I'm not saying this is the way to do it, it's just what I tried and what kinda worked for me.
Do I think AI should write all your tests? Nah. Do I trust it blindly? Hell nah. But I was curious: could OpenCode actually generate decent tests if I gave it better context?
The problem I was dealing with
So you're already using OpenCode for development purposes. Features built, bugs gone, refactoring happens. Cool. But the unit tests? Kinda all over the place.
Your codebase has okay coverage, but it's inconsistent af. Some components have solid tests, others have literally nothing. When AI generates tests, it's sometimes genius, sometimes testing stuff that doesn't even exist.
The real issue? Lack of specialized context for testing tasks.
Here's how I structured an OpenCode project specifically for testing workflows using the skills system. This isn't a copy-paste tutorial. It's more like a framework for thinking about how to organize your own setup based on what you or your team actually needs.
All the implementation stuff here is based on the official OpenCode docs. If something looks weird or outdated, check there first.
Prerequisites: You're already using OpenCode and get the basics (primary agents, subagents, @ mentions). If not, see the documentation. Please understand that this post is about structuring a testing-specific agent system with reusable skills.
The Obvious
generic agents = generic tests
When I prompted OpenCode's default Build agent to "generate tests for this component," it worked, but the results were inconsistent. Without testing-specific context:
- Sometimes wrong testing framework
- Over-mocking (mocking internal utility functions that didn't need it)
- Under-testing (missing edge cases)
- Inconsistent patterns (different test structures everywhere)
The default Build agent is a generalist. It does everything: features, refactoring, debugging, testing. When every task gets the same context, nothing gets specific context.
What if testing had its own specialized agent system with reusable, on-demand knowledge?
Approach
I decided to try building testing-focused agents with skills-based knowledge sharing:
- Primary testing agent - The Runner
- Specialized subagents - Handle specific tasks
- Reusable skills - On-demand testing knowledge via SKILL.md files
This way, the Build agent stays focused on dev. Testing agents stay focused on testing. Each loads exactly the context it needs, when it needs it.
The examples below are just examples. Your team's testing needs are gonna be different. The framework matters more than the specifics.
Step 1: Understand how skills work
Before I started writing agents, I needed to understand OpenCode's skills system. Turns out it's literally designed for this use case.
Skills are reusable instruction sets that:
- Live in
.opencode/skills/<name>/SKILL.md(project) or~/.config/opencode/skills/<name>/SKILL.md(global) - Show up in the agent's
<available_skills>list - Get loaded on-demand when agents call the
skilltool - Have structured metadata (name, description, permissions)
Instead of loading all testing knowledge into every agent all the time, skills let agents discover and load only what they need for their specific task.
This was way better than my initial approach of cramming everything into a single doc.
Step 2: Create testing skills (not just docs)
I created separate skills for different testing concerns. Each skill is a folder with a SKILL.md inside.
Skill 1: Core Vitest patterns
---
name: vitest-testing
description: Generate Vitest unit tests with BDD naming, proper mocking, and focused test suites
compatibility: opencode
metadata:
framework: vitest
testing-type: unit
audience: testing-agents
---
## What I do
- Generate unit tests following BDD naming convention (WHEN/SHOULD)
- Mock external dependencies using vi.fn() and vi.mock()
- Create focused test suites (max 10 tests per file)
- Test behavior, not implementation details
## When to use me
Use this skill when generating unit tests for TypeScript/JavaScript files.
Ask clarifying questions if the mocking strategy or test scope is unclear.
## Naming Convention
describe('ComponentName', () => {
describe('WHEN condition or context', () => {
it('SHOULD expected behavior', () => {
// test implementation
});
});
});
## Mocking Strategy
- **DO mock**: External APIs, database calls, third-party libraries
- **DON'T mock**: Internal utility functions, simple helpers, constants
Example:
\`\`\`typescript
// Mock external dependency
vi.mock('@/api/users', () => ({
fetchUser: vi.fn()
}));
// Don't mock internal utilities
import { formatDate } from '@/utils/date'; // Use directly
\`\`\`
## Anti-Patterns
- Don't test getter/setter methods
- Don't use \`new Date()\` in tests, use \`vi.setSystemTime()\` instead
- Don't create tests longer than 15 lines (break into multiple tests)
- Don't mock simple utility functions
## Example Test Structure
\`\`\`typescript
import { describe, it, expect, vi, beforeEach } from 'vitest';
import { UserService } from './user-service';
import { fetchUser } from '@/api/users';
vi.mock('@/api/users');
describe('UserService', () => {
beforeEach(() => {
vi.clearAllMocks();
});
describe('WHEN fetching a user by ID', () => {
it('SHOULD return user data on success', async () => {
const mockUser = { id: 1, name: 'John' };
vi.mocked(fetchUser).mockResolvedValue(mockUser);
const result = await UserService.getUser(1);
expect(result).toEqual(mockUser);
expect(fetchUser).toHaveBeenCalledWith(1);
});
it('SHOULD throw error when user not found', async () => {
vi.mocked(fetchUser).mockRejectedValue(new Error('Not found'));
await expect(UserService.getUser(999)).rejects.toThrow('Not found');
});
});
});
\`\`\`
Key points:
- The frontmatter is required (name, description)
- The name must be lowercase with hyphens only
- Description should be specific enough for agents to choose correctly
- Structure follows "What I do" → "When to use me" → actual patterns
Skill 2: React component testing
---
name: react-testing
description: Test React components using React Testing Library with accessibility-first queries and user interaction patterns
compatibility: opencode
metadata:
framework: react-testing-library
testing-type: component
audience: testing-agents
---
## What I do
- Generate React component tests using React Testing Library
- Query elements by role and accessible names, not classes or IDs
- Test user interactions and visible behavior, not internal state
- Ensure components are accessible
## When to use me
Use this skill when testing React components (.tsx/.jsx files).
Load alongside vitest-testing for complete React testing context.
## Query Priority
1. **ByRole** - Preferred (reflects accessibility tree)
2. **ByLabelText** - For form inputs
3. **ByPlaceholderText** - When label is absent
4. **ByText** - For non-interactive content
5. **ByTestId** - Last resort only
## Core Principles
- Test what users see and do, not implementation
- Avoid querying by className, id, or element type
- Use userEvent for interactions, not fireEvent
- Don't test internal component state
## Example Component Test
\`\`\`typescript
import { describe, it, expect } from 'vitest';
import { render, screen } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import { LoginForm } from './LoginForm';
describe('LoginForm', () => {
describe('WHEN rendering the form', () => {
it('SHOULD display email and password inputs', () => {
render(<LoginForm />);
expect(screen.getByRole('textbox', { name: /email/i })).toBeInTheDocument();
expect(screen.getByLabelText(/password/i)).toBeInTheDocument();
});
});
describe('WHEN submitting valid credentials', () => {
it('SHOULD call onSubmit with form data', async () => {
const user = userEvent.setup();
const handleSubmit = vi.fn();
render(<LoginForm onSubmit={handleSubmit} />);
await user.type(screen.getByRole('textbox', { name: /email/i }), '[email protected]');
await user.type(screen.getByLabelText(/password/i), 'password123');
await user.click(screen.getByRole('button', { name: /log in/i }));
expect(handleSubmit).toHaveBeenCalledWith({
email: '[email protected]',
password: 'password123'
});
});
});
});
\`\`\`
## Anti-Patterns
- Don't query by className: ❌ \`container.querySelector('.button')\`
- Don't access component state: ❌ \`wrapper.state()\`
- Don't use fireEvent: ❌ \`fireEvent.click(button)\` → ✅ \`await user.click(button)\`
- Don't test props directly: Test what the user sees instead
Skill 3: Test data fixtures
---
name: test-fixtures
description: Create reusable test data factories and fixtures following consistent patterns
compatibility: opencode
metadata:
testing-type: fixtures
audience: testing-agents
---
## What I do
- Generate factory functions for common entities
- Create flexible fixtures with override parameters
- Organize test data in \`__tests__/fixtures/\` directory
- Ensure fixtures are type-safe and reusable
## When to use me
Use this skill when you need to create or organize test data, mock objects, or fixture files.
## Factory Pattern
\`\`\`typescript
// __tests__/fixtures/user.factory.ts
import { User } from '@/types';
export const createMockUser = (overrides?: Partial<User>): User => ({
id: 1,
email: '[email protected]',
name: 'Test User',
role: 'user',
createdAt: new Date('2024-01-01'),
...overrides
});
export const createMockAdmin = (overrides?: Partial<User>): User =>
createMockUser({ role: 'admin', ...overrides });
\`\`\`
## Usage in Tests
\`\`\`typescript
import { createMockUser } from '../fixtures/user.factory';
it('SHOULD handle admin users differently', () => {
const admin = createMockUser({ role: 'admin' });
const result = processUser(admin);
expect(result.hasAdminAccess).toBe(true);
});
\`\`\`
## File Organization
\`\`\`
__tests__/
├── fixtures/
│ ├── user.factory.ts
│ ├── product.factory.ts
│ └── api-responses.ts
└── integration/
└── api.test.ts
\`\`\`
Why separate skills?
- Unit testing needs vitest-testing
- React component testing needs vitest-testing + react-testing
- Agents load only what's relevant to their task
- Easier to maintain and update individual patterns
Step 3: Customize your AGENTS.md
OpenCode uses AGENTS.md to understand your project. I ran /init to generate one:
/init
This analyzes the project and creates AGENTS.md with basic structure info. But it's generic, I needed testing-specific context.
What /init gave me (simplified):
# My TypeScript Project
## Project Structure
- `src/` - Source code
- `dist/` - Compiled output
## Tech Stack
- TypeScript with strict mode
- Node.js runtime
What I added:
# My TypeScript Project
[... existing structure ...]
## Testing Standards
**Framework**: Vitest (NOT Jest)
**File Convention**: Co-locate tests as `*.test.ts` or `*.test.tsx`
**Available Skills**:
- `vitest-testing` - Core unit testing patterns
- `react-testing` - React component testing with RTL
- `test-fixtures` - Test data factories and mocks
Testing agents should load the appropriate skills based on the file type:
- `.ts` files → load vitest-testing
- `.tsx` React components → load vitest-testing + react-testing
- Creating fixtures → load test-fixtures
See the skill descriptions in `<available_skills>` for details.
This tells agents that skills exist and gives guidance on when to use each one.
Step 4: Design the agent architecture
I created three agents as the baseline. Individual agents live in .opencode/agents/ as separate markdown files.
Primary Agent:
---
description: Testing coordinator that orchestrates test generation, execution, and fixes using specialized subagents
mode: primary
permission:
bash:
"npm test*": "allow"
"npm run test*": "allow"
"*": "ask"
skill:
"*": "allow"
---
You are the primary testing coordinator. Your workflow:
1. Read AGENTS.md to understand available testing skills
2. Analyze the file to determine which skills are needed
3. Delegate to appropriate subagents:
- @unit-tester for unit tests
- @test-data-generator for fixtures
4. Run tests via npm test
5. If tests fail, work with subagents to fix issues
You have bash access for running tests.
You can load any skill to understand testing patterns if needed.
Subagent for unit tests:
---
description: Generate unit tests by loading appropriate testing skills based on file type
mode: subagent
tools:
bash: false
permission:
skill:
"vitest-testing": "allow"
"react-testing": "allow"
"test-fixtures": "deny"
---
You generate unit tests by loading the appropriate skills:
**Decision tree**:
- TypeScript file (.ts) → Load vitest-testing skill
- React component (.tsx/.jsx) → Load vitest-testing + react-testing skills
- Need test data → Delegate to @test-data-generator (you can't load test-fixtures)
**Your process**:
1. Analyze the file type and dependencies
2. Load the appropriate skill(s) using the skill tool
3. Follow the loaded skill's patterns exactly
4. Generate tests with proper naming and structure
5. Return the test file only (you cannot run bash)
**Examples**:
- For \`user-service.ts\` → skill({ name: "vitest-testing" })
- For \`LoginForm.tsx\` → skill({ name: "vitest-testing" }) + skill({ name: "react-testing" })
Max 10 tests per file. Focus on behavior, not implementation.
Subagent for test data:
---
description: Create reusable test fixtures and mock data factories
mode: subagent
tools:
bash: false
permission:
skill:
"test-fixtures": "allow"
"vitest-testing": "deny"
"react-testing": "deny"
---
You create test fixtures and mock data.
**Your process**:
1. Load the test-fixtures skill
2. Follow its factory pattern exactly
3. Create files in \`__tests__/fixtures/\`
4. Ensure type-safe, reusable fixtures
You cannot load testing skills (that's unit-tester's job).
You cannot run bash commands.
Key points about permissions:
testercan load any skill (needs full context as coordinator)unit-testercan load testing skills but NOT fixtures (separation of concerns)test-data-generatorcan ONLY load fixtures skill- Subagents have
bash: falsefor security - Skills with
denyare completely hidden from those agents
Centralizing permissions via opencode.json (optional)
The frontmatter above defines permissions per-agent — each agent file owns its own rules. That works fine, but if you've got a lot of agents or want project-wide defaults without repeating yourself in every file, opencode.json is the better place for it. Think of it as the global layer: it sets the baseline, and individual agent frontmatter overrides it when needed. If both define a rule for the same skill, the agent frontmatter wins.
{
"permission": {
"skill": {
"*": "allow"
}
},
"agent": {
"tester": {
"permission": {
"bash": {
"npm test*": "allow",
"npm run test*": "allow",
"*": "ask"
},
"skill": {
"*": "allow"
}
}
},
"unit-tester": {
"permission": {
"skill": {
"vitest-testing": "allow",
"react-testing": "allow",
"test-fixtures": "deny"
}
}
},
"test-data-generator": {
"permission": {
"skill": {
"test-fixtures": "allow",
"vitest-testing": "deny",
"react-testing": "deny"
}
}
}
}
}
You can also use wildcards for broader rules, which is where opencode.json really shines over per-file frontmatter:
{
"permission": {
"skill": {
"experimental-*": "ask",
"legacy-*": "deny"
}
}
}
This would prompt before loading experimental skills and hide legacy ones entirely — without touching a single agent file.
Step 5: Understand skill loading
When an agent runs, here's what happens with skills:
1. Discovery phase:
<available_skills>
<skill>
<name>vitest-testing</name>
<description>Generate Vitest unit tests with BDD naming, proper mocking, and focused test suites</description>
</skill>
<skill>
<name>react-testing</name>
<description>Test React components using React Testing Library with accessibility-first queries</description>
</skill>
<skill>
<name>test-fixtures</name>
<description>Create reusable test data factories and fixtures following consistent patterns</description>
</skill>
</available_skills>
Agents see this list based on their permissions.
2. Loading phase:
// Agent decides it needs vitest patterns
skill({ name: "vitest-testing" })
// Agent reads the full SKILL.md content
// Now it has all the naming conventions, examples, anti-patterns
3. Application phase: Agent generates tests following the loaded skill's instructions.
This is way more efficient than loading all guidelines into every agent's context all the time.
How to actually use this
Switch to testing agent with Tab key or invoke directly:
@tester Generate tests for @src/user-service.ts
What happens behind the scenes:
@testerreads AGENTS.md and sees available skills- Delegates to
@unit-testerviatasktool @unit-testerloadsvitest-testingskill- Generates test file following skill patterns
@testerrunsnpm test- If failures,
@tester+@unit-testercollaborate to fix
For React components:
@tester Generate tests for @src/components/LoginForm.tsx
@unit-testeranalyzes file type (React component)- Loads both
vitest-testingandreact-testingskills - Generates component tests with RTL patterns
Navigate between parent/child sessions with <Leader>+Right/Left.
You can also make a custom command:
---
description: Generate tests for a file
agent: tester
---
Generate tests for $1 by loading the appropriate testing skills.
Usage: /test src/user-service.ts
Or invoke subagents directly for fixtures:
@test-data-generator Create fixture factory for User model
The agent will load the test-fixtures skill and generate the factory.
Refining skills over time
As I used the system, I kept updating the skill files based on what worked and what didn't:
Added to vitest-testing skill:
## Anti-Patterns (Updated)
- Don't mock simple utility functions
- Don't test getter/setter methods
- Don't use \`new Date()\` in tests, use \`vi.setSystemTime()\` instead
- Don't create assertions for function call order unless it's business critical
- Don't test private methods directly
Added to react-testing skill:
## Async Testing
Always use \`findBy\` for async elements:
\`\`\`typescript
// ❌ Wrong
await waitFor(() => {
expect(screen.getByText('Loaded')).toBeInTheDocument();
});
// ✅ Right
expect(await screen.findByText('Loaded')).toBeInTheDocument();
\`\`\`
Created new skill for integration tests:
---
name: integration-testing
description: Test API endpoints and database interactions with proper setup/teardown
compatibility: opencode
metadata:
testing-type: integration
---
[Integration-specific patterns...]
The more specific the skills, the better the output. This is iterative, your first version won't be perfect and that's expected.
What I learned (tbh)
-
Skills > manual file reading: Instead of loading
TESTING_GUIDELINES.mdinto every agent's context, skills load on-demand only when needed. -
Discovery: Agents can see what skills exist via
<available_skills>and choose the right one for the task. -
Permissions: Denying
test-fixturestounit-testerenforces separation of concerns, so if unit tests need fixtures, it delegates to the right agent. -
Modular > monolithic: Three focused skills beat one giant guidelines doc. Easier to update, easier to maintain.
-
Naming conventions: Skill names must be lowercase with hyphens.
vitest-testingworks,Vitest_Testingdoesn't.
Troubleshooting
If skills don't show up:
- Check the filename: Must be
SKILL.mdin all caps - Verify frontmatter: Needs
nameanddescriptionat minimum - Validate skill name: Must match
^[a-z0-9]+(-[a-z0-9]+)*$and match the directory name - Check permissions: Skills with
denyare hidden from agents - Ensure uniqueness: Skill names must be unique across all locations
Run in OpenCode TUI to debug:
@tester What skills can you see?
The agent will list its <available_skills>.
Extensions I thought about
Coverage analyzer skill:
---
name: test-coverage
description: Analyze test coverage reports and identify untested code paths
---
[Coverage analysis patterns...]
E2E testing skill:
---
name: e2e-testing
description: Generate Playwright end-to-end tests for user workflows
metadata:
framework: playwright
---
[E2E patterns...]
Update AGENTS.md:
## Testing Agents
- @tester (primary): Coordinates testing workflows
- @unit-tester (subagent): Generates unit tests
- @integration-tester (subagent): Generates integration tests
- @test-data-generator (subagent): Creates fixtures
## Available Testing Skills
- vitest-testing - Core unit testing
- react-testing - React component testing
- test-fixtures - Test data factories
- integration-testing - API/DB integration tests
- test-coverage - Coverage analysis
Your needs will be different. Add skills and agents based on actual needs, this post is merely a reference.
Final thoughts
Obviously, it's experimental and opinionated. It won't fit your needs exactly as-is. At least we figured out that AI agents need reusable, on-demand knowledge. That structure comes from:
- Skills system: Modular, discoverable, permission-controlled knowledge
- Agent specialization: Primary orchestrator + focused subagents
- Context hierarchy: Project info (
AGENTS.md) + skills (loaded on-demand) + role definitions (agent files) - Permission boundaries: Who can access which skills and tools
- Explicit config: DO NOT rely on agents to "figure it out"
The examples here won't work if you just copy-paste them into your project. You need to create skills for your actual testing framework, write your own patterns, define your naming conventions, and probably a bunch more stuff I haven't thought of.
Think of this as inspiration for organizing OpenCode around testing workflows with skills, not an actual ready-to-use solution. Your setup will look different from mine. That's literally the point. All details based on the OpenCode docs as of when I'm posting this. Check there for updates, especially the Agent Skills documentation.
Anyway, hope this helps someone avoid the trial-and-error I went through. If you've got a better approach, do lmk.