Build reliable browser tests that stay stable even as your application evolves. You can generate new checks from user stories, identify why tests fail automatically, and migrate existing suites from other tools with complete coverage analysis. Use this whenever you need to set up a new testing workflow, repair broken checks, or guarantee critical user flows are verified before deployment.
name: “playwright-pro”
description: “Production-grade Playwright testing toolkit. Use when the user mentions Playwright tests, end-to-end testing, browser automation, fixing flaky tests, test migration, CI/CD testing, or test suites. Generate tests, fix flaky failures, migrate from Cypress/Selenium, sync with TestRail, run on BrowserStack. 55 templates, 3 agents, smart reporting.”
Playwright Pro
Production-grade Playwright testing toolkit for AI coding agents.
Available Commands
When installed as a Claude Code plugin, these are available as /pw: commands:
Command
What it does
/pw:init
Set up Playwright — detects framework, generates config, CI, first test
/pw:generate <spec>
Generate tests from user story, URL, or component
/pw:review
Review tests for anti-patterns and coverage gaps
/pw:fix <test>
Diagnose and fix failing or flaky tests
/pw:migrate
Migrate from Cypress or Selenium to Playwright
/pw:coverage
Analyze what’s tested vs. what’s missing
/pw:testrail
Sync with TestRail — read cases, push results
/pw:browserstack
Run on BrowserStack, pull cross-browser reports
/pw:report
Generate test report in your preferred format
Quick Start Workflow
The recommended sequence for most projects:
1. /pw:init → scaffolds config, CI pipeline, and a first smoke test2. /pw:generate → generates tests from your spec or URL3. /pw:review → validates quality and flags anti-patterns ← always run after generate4. /pw:fix <test> → diagnoses and repairs any failing/flaky tests ← run when CI turns red
Validation checkpoints:
After /pw:generate — always run /pw:review before committing; it catches locator anti-patterns and missing assertions automatically.
After /pw:fix — re-run the full suite locally (npx playwright test) to confirm the fix doesn’t introduce regressions.
After /pw:migrate — run /pw:coverage to confirm parity with the old suite before decommissioning Cypress/Selenium tests.
Example: Generate → Review → Fix
# 1. Generate tests from a user story/pw:generate "As a user I can log in with email and password"# Generated: tests/auth/login.spec.ts# → Playwright Pro creates the file using the auth template.# 2. Review the generated tests/pw:review tests/auth/login.spec.ts# → Flags: one test used page.locator('input[type=password]') — suggests getByLabel('Password')# → Fix applied automatically.# 3. Run locally to confirmnpx playwright test tests/auth/login.spec.ts --headed# 4. If a test is flaky in CI, diagnose it/pw:fix tests/auth/login.spec.ts# → Identifies missing web-first assertion; replaces waitForTimeout(2000) with expect(locator).toBeVisible()
Golden Rules
getByRole() over CSS/XPath — resilient to markup changes
Never page.waitForTimeout() — use web-first assertions
expect(locator) auto-retries; expect(await locator.textContent()) does not
Isolate every test — no shared state between tests
baseURL in config — zero hardcoded URLs
Retries: 2 in CI, 0 locally
Traces: 'on-first-retry' — rich debugging without slowdown
Fixtures over globals — test.extend() for shared state
One behavior per test — multiple related assertions are fine
Mock external services only — never mock your own app
Locator Priority
1. getByRole() — buttons, links, headings, form elements2. getByLabel() — form fields with labels3. getByText() — non-interactive text4. getByPlaceholder() — inputs with placeholder5. getByTestId() — when no semantic option exists6. page.locator() — CSS/XPath as last resort
Categorize first: timing, isolation, environment, or infrastructure
Use npx playwright test <file> --repeat-each=10 to reproduce
Use --trace=on for every attempt
Apply the targeted fix from skills/fix/flaky-taxonomy.md
Using Built-in Commands
Leverage Claude Code's built-in capabilities:
Large migrations: Use /batch for parallel file-by-file conversion
Post-generation cleanup: Use /simplify after generating a test suite
Debugging sessions: Use /debug alongside /pw:fix for trace analysis
Code review: Use /review for general code quality, /pw:review for Playwright-specific
Integrations
TestRail: Configured via TESTRAIL_URL, TESTRAIL_USER, TESTRAIL_API_KEY env vars
BrowserStack: Configured via BROWSERSTACK_USERNAME, BROWSERSTACK_ACCESS_KEY env vars
Both are optional. The plugin works fully without them.
File Conventions
Test files: *.spec.ts or *.spec.js
Page objects: *.page.ts in a pages/ directory
Fixtures: fixtures.ts or fixtures/ directory
Test data: test-data/ directory with JSON/factory files
Playwright Pro
Production-grade Playwright testing toolkit for AI coding agents.
Generate tests, fix flaky failures, migrate from Cypress/Selenium, sync with TestRail, run on BrowserStack — all from your AI agent.
Install
# Claude Code pluginclaude plugin install pw@claude-skills# Or load directlyclaude --plugin-dir ./engineering-team/playwright-pro
Commands
Command
What it does
/pw:init
Set up Playwright in your project — detects framework, generates config, CI, first test
/pw:generate <spec>
Generate tests from a user story, URL, or component name
/pw:review
Review existing tests for anti-patterns and coverage gaps
/pw:fix <test>
Diagnose and fix a failing or flaky test
/pw:migrate
Migrate from Cypress or Selenium to Playwright
/pw:coverage
Analyze what's tested vs. what's missing
/pw:testrail
Sync with TestRail — read cases, push results, create runs
/pw:browserstack
Run tests on BrowserStack, pull cross-browser reports
/pw:report
Generate a test report in your preferred format
Quick Start
# In Claude Code:/pw:init # Set up Playwright/pw:generate "user can log in" # Generate your first test# Tests are auto-validated by hooks — no extra steps
What's Inside
9 Skills
Slash commands that turn natural language into production-ready Playwright tests. Each skill leverages Claude Code's built-in capabilities (/batch for parallel work, Explore for codebase analysis, /debug for trace inspection).
3 Specialized Agents
test-architect — Plans test strategy for complex applications
test-debugger — Diagnoses flaky tests using a systematic taxonomy
migration-planner — Creates file-by-file migration plans from Cypress/Selenium
55 Test Templates
Ready-to-use, parametrizable templates covering:
Category
Count
Examples
Authentication
8
Login, logout, SSO, MFA, password reset, RBAC
CRUD
6
Create, read, update, delete, bulk ops
Checkout
6
Cart, payment, coupon, order history
Search
5
Basic search, filters, sorting, pagination
Forms
6
Multi-step, validation, file upload
Dashboard
5
Data loading, charts, export
Settings
4
Profile, password, notifications
Onboarding
4
Registration, email verify, welcome tour
Notifications
3
In-app, toast, notification center
API
5
REST CRUD, GraphQL, error handling
Accessibility
3
Keyboard nav, screen reader, contrast
2 MCP Integrations
TestRail — Read test cases, create runs, push pass/fail results
BrowserStack — Trigger cross-browser runs, pull session reports with video/screenshots
Smart Hooks
Auto-validates test quality when you write *.spec.ts files
Then use /pw:browserstack to run tests across browsers.
Works With
Agent
How
Claude Code
Full plugin — slash commands, MCP tools, hooks, agents
Codex CLI
Copy CLAUDE.md to your project root as AGENTS.md
OpenClaw
Use as a skill with SKILL.md entry point
Built-in Command Integration
Playwright Pro doesn't reinvent what your AI agent already does. It orchestrates built-in capabilities:
/pw:generate uses Claude's Explore subagent to understand your codebase before generating tests
/pw:migrate uses /batch for parallel file-by-file conversion on large test suites
/pw:fix uses /debug for trace analysis alongside Playwright-specific diagnostics
/pw:review extends /review with Playwright anti-pattern detection
Reference
Based on battle-tested patterns from production test suites. Includes curated guidance on:
Locator strategies and priority hierarchy
Assertion patterns and auto-retry behavior
Fixture architecture and composition
Common pitfalls (top 20, ranked by frequency)
Flaky test diagnosis taxonomy
License
MIT
name: migration-planner
description: >-
Analyzes Cypress or Selenium test suites and creates a file-by-file
migration plan. Invoked by /pw:migrate before conversion starts.
allowed-tools:
Read
Grep
Glob
LS
Migration Planner Agent
You are a test migration specialist. Your job is to analyze an existing Cypress or Selenium test suite and create a detailed, ordered migration plan.
Planning Protocol
Step 1: Detect Source Framework
Scan the project:
Cypress indicators:
cypress/ directory
cypress.config.ts or cypress.config.js
@cypress packages in package.json
.cy.ts or .cy.js test files
Selenium indicators:
selenium-webdriver in dependencies
webdriver or wdio in dependencies
Test files importing selenium-webdriver
chromedriver or geckodriver in dependencies
Python files importing selenium
Step 2: Inventory All Test Files
List every test file with:
File path
Number of tests (count it(), test(), or test methods)
Tests using Cypress-only features (cy.origin(), cy.session())
Tests with complex cy.intercept() patterns
Tests relying on Cypress retry-ability semantics
Tests using Cypress plugins with no Playwright equivalent
Step 7: Return Plan
Return the complete migration plan to /pw:migrate for execution.
name: test-architect
description: >-
Plans test strategy for complex applications. Invoked by /pw:generate and
/pw:coverage when the app has multiple routes, complex state, or requires
a structured test plan before writing tests.
allowed-tools:
Read
Grep
Glob
LS
Test Architect Agent
You are a test architecture specialist. Your job is to analyze an application's structure and create a comprehensive test plan before any tests are written.
Your Responsibilities
Map the application surface: routes, components, API endpoints, user flows
Identify critical paths: the flows that, if broken, cause revenue loss or user churn
Design test structure: folder organization, fixture strategy, data management
Prioritize: which tests deliver the most confidence per effort
Select patterns: which template or approach fits each test scenario
How You Work
You are a read-only agent. You analyze and plan — you do not write test files.
Identify state management (Redux, Zustand, Pinia, etc.)
Check for API layer (REST, GraphQL, tRPC)
Step 2: Catalog Testable Surfaces
Create a structured inventory:
## Application Surface### Pages (by priority)1. /login — Auth entry point [CRITICAL]2. /dashboard — Main user view [CRITICAL]3. /settings — User preferences [HIGH]4. /admin — Admin panel [HIGH]5. /about — Static page [LOW]### Interactive Components1. SearchBar — complex state, debounced API calls2. DataTable — sorting, filtering, pagination3. FileUploader — drag-drop, progress, error handling### API Endpoints1. POST /api/auth/login — authentication2. GET /api/users — user list with pagination3. PUT /api/users/:id — user update### User Flows (multi-page)1. Registration → Email Verify → Onboarding → Dashboard2. Search → Filter → Select → Add to Cart → Checkout → Confirm
Step 3: Design Test Plan
## Test Plan### Folder Structuree2e/├── auth/ # Authentication tests├── dashboard/ # Dashboard tests├── checkout/ # Checkout flow tests├── fixtures/ # Shared fixtures├── pages/ # Page object models└── test-data/ # Test data files### Fixture Strategy- Auth fixture: shared `storageState` for logged-in tests- API fixture: request context for data seeding- Data fixture: factory functions for test entities### Test Distribution| Area | Tests | Template | Effort ||---|---|---|---|| Auth | 8 | auth/* | 1h || Dashboard | 6 | dashboard/* | 1h || Checkout | 10 | checkout/* | 2h || Search | 5 | search/* | 45m || Settings | 4 | settings/* | 30m || API | 5 | api/* | 45m |### Priority Order1. Auth (blocks everything else)2. Core user flow (the main thing users do)3. Payment/checkout (revenue-critical)4. Everything else
Step 4: Return Plan
Return the complete plan to the calling skill. Do not write files.
name: test-debugger
description: >-
Diagnoses flaky or failing Playwright tests using systematic taxonomy.
Invoked by /pw:fix when a test needs deep analysis including running
tests, reading traces, and identifying root causes.
allowed-tools:
Read
Grep
Glob
LS
Bash
Test Debugger Agent
You are a Playwright test debugging specialist. Your job is to systematically diagnose why a test fails or behaves flakily, identify the root cause category, and return a specific fix.
Debugging Protocol
Step 1: Read the Test
Read the test file and understand:
What behavior it's testing
Which pages/URLs it visits
Which locators it uses
Which assertions it makes
Any setup/teardown (fixtures, beforeEach)
Step 2: Run the Test
Run it multiple ways to classify the failure:
# Single run — get the errornpx playwright test <file> --grep "<test name>" --reporter=list 2>&1# Burn-in — expose timing issuesnpx playwright test <file> --grep "<test name>" --repeat-each=10 --reporter=list 2>&1# Isolation check — expose state leaksnpx playwright test <file> --grep "<test name>" --workers=1 --reporter=list 2>&1# Full suite — expose interactionnpx playwright test --reporter=list 2>&1
Step 3: Capture Trace
npx playwright test <file> --grep "<test name>" --trace=on --retries=0 2>&1
Read the trace output for:
Network requests that failed or were slow
Elements that weren't visible when expected
Navigation timing issues
Console errors
Step 4: Classify
Category
Evidence
Timing/Async
Fails on --repeat-each=10; error mentions timeout or element not found intermittently
Test Isolation
Passes alone (--workers=1 --grep), fails in full suite
Environment
Passes locally, fails in CI (check viewport, fonts, timezone)
Infrastructure
Random crash errors, OOM, browser process killed
Step 5: Identify Specific Cause
Common root causes per category:
Timing:
Missing await on a Playwright call
waitForTimeout() that's too short
Clicking before element is actionable
Asserting before data loads
Animation interference
Isolation:
Global variable shared between tests
Database not cleaned between tests
localStorage/cookies leaking
Test creates data with non-unique identifier
Environment:
Different viewport size in CI
Font rendering differences affect screenshots
Timezone affects date assertions
Network latency in CI is higher
Infrastructure:
Browser runs out of memory with too many workers
File system race condition
DNS resolution failure
Step 6: Return Diagnosis
Return to the calling skill:
## Diagnosis**Category:** Timing/Async**Root Cause:** Missing await on line 23 — `page.goto('/dashboard')` runs withoutwaiting, so the assertion on line 24 runs before navigation completes.**Evidence:** Fails 3/10 times on `--repeat-each=10`. Trace shows assertion firingbefore navigation response received.## FixLine 23: Add `await` before `page.goto('/dashboard')`## VerificationAfter fix: 10/10 passes on `--repeat-each=10`
#!/usr/bin/env bash# Session start hook: detects if the project uses Playwright.# Outputs context hint for Claude if playwright.config exists.set -euo pipefail# Check for Playwright config in current directory or common locationsPW_CONFIG=""for config in playwright.config.ts playwright.config.js playwright.config.mjs; do if [[ -f "$config" ]]; then PW_CONFIG="$config" break fidoneif [[ -z "$PW_CONFIG" ]]; then exit 0fi# Count existing test filesTEST_COUNT=$(find . -name "*.spec.ts" -o -name "*.spec.js" -o -name "*.test.ts" -o -name "*.test.js" 2>/dev/null | grep -v node_modules | wc -l | tr -d ' ')echo "🎭 Playwright detected ($PW_CONFIG) — $TEST_COUNT test files found. Use /pw: commands for testing workflows."
// BAD — no auto-retryconst text = await locator.textContent();expect(text).toBe('Hello');// BAD — snapshot in time, not reactiveconst isVisible = await locator.isVisible();expect(isVisible).toBe(true);// BAD — evaluating in page contextconst value = await page.evaluate(() => document.querySelector('input')?.value);expect(value).toBe('test');
Custom Timeout
// Override timeout for slow operationsawait expect(locator).toBeVisible({ timeout: 30_000 });
Soft Assertions
Continue test even if assertion fails (report all failures at end):
await expect.soft(locator).toHaveText('Expected');await expect.soft(page).toHaveURL('/next');// Test continues even if above fail
// BAD — checks once, no retryconst text = await page.textContent('.msg');expect(text).toBe('Done');// GOOD — retries until timeoutawait expect(page.getByText('Done')).toBeVisible();
3. Missing await
Symptom: Random passes/failures, tests seem to skip steps.
// BAD — test B depends on test Alet userId: string;test('create user', async () => { userId = '123'; });test('edit user', async () => { /* uses userId */ });// GOOD — each test is independenttest('edit user', async ({ request }) => { const res = await request.post('/api/users', { data: { name: 'Test' } }); const { id } = await res.json(); // ...});
7. Using networkidle
Symptom: Tests hang or timeout unpredictably.
// BAD — waits for all network activity to stopawait page.goto('/dashboard', { waitUntil: 'networkidle' });// GOOD — wait for specific contentawait page.goto('/dashboard');await expect(page.getByRole('heading', { name: 'Dashboard' })).toBeVisible();
8. Not Waiting for Navigation
Symptom: Assertions run on wrong page.
// BAD — click navigates but we don't waitawait page.getByRole('link', { name: 'Settings' }).click();await expect(page.getByRole('heading')).toHaveText('Settings');// GOOD — wait for URL changeawait page.getByRole('link', { name: 'Settings' }).click();await expect(page).toHaveURL('/settings');await expect(page.getByRole('heading')).toHaveText('Settings');
9. Testing Implementation, Not Behavior
Symptom: Tests break on every refactor.
// BAD — tests CSS class (implementation detail)await expect(page.locator('.btn')).toHaveClass('btn-primary active');// GOOD — tests what the user seesawait expect(page.getByRole('button', { name: 'Save' })).toBeEnabled();
10. No Error Case Tests
Symptom: App breaks on errors but all tests pass.
// Missing: what happens when the API fails?test('should handle API error', async ({ page }) => { await page.route('**/api/data', (route) => route.fulfill({ status: 500 }) ); await page.goto('/dashboard'); await expect(page.getByText(/error|try again/i)).toBeVisible();});
Fixtures Reference
What Are Fixtures
Fixtures provide setup/teardown for each test. They replace beforeEach/afterEach for shared state and are composable, type-safe, and lazy (only run when used).
Creating Custom Fixtures
// fixtures.tsimport { test as base, expect } from '@playwright/test';// Define fixture typestype MyFixtures = { authenticatedPage: Page; testUser: { email: string; password: string }; apiClient: APIRequestContext;};export const test = base.extend<MyFixtures>({ // Simple value fixture testUser: async ({}, use) => { await use({ email: `test-${Date.now()}@example.com`, password: 'Test123!', }); }, // Fixture with setup and teardown authenticatedPage: async ({ page, testUser }, use) => { // Setup: log in await page.goto('/login'); await page.getByLabel('Email').fill(testUser.email); await page.getByLabel('Password').fill(testUser.password); await page.getByRole('button', { name: 'Sign in' }).click(); await expect(page).toHaveURL('/dashboard'); // Provide the authenticated page to the test await use(page); // Teardown: clean up (optional) await page.goto('/logout'); }, // API client fixture apiClient: async ({ playwright }, use) => { const context = await playwright.request.newContext({ baseURL: 'http://localhost:3000', extraHTTPHeaders: { Authorization: `Bearer ${process.env.API_TOKEN}`, }, }); await use(context); await context.dispose(); },});export { expect };
Using Fixtures in Tests
import { test, expect } from './fixtures';test('should show dashboard for logged in user', async ({ authenticatedPage }) => { // authenticatedPage is already logged in await expect(authenticatedPage.getByRole('heading', { name: 'Dashboard' })).toBeVisible();});test('should create item via API', async ({ apiClient }) => { const response = await apiClient.post('/api/items', { data: { name: 'Test Item' }, }); expect(response.ok()).toBeTruthy();});
55 ready-to-use, parametrizable Playwright test templates. Each includes TypeScript and JavaScript examples with {{placeholder}} markers for customization.
Usage
Templates are loaded by /pw:generate when it detects a matching scenario. You can also reference them directly: