🗺️

Codebase Onboarding

Get up to speed on any codebase fast — architecture overview, key entry points, data flow diagrams, and a curated reading order for new contributors.

by @alirezarezvani · MIT · 9.2k

Built for: Developers

What this skill does

Generate clear onboarding guides for any project by automatically mapping its structure and key entry points. You get setup instructions, project overviews, and a curated reading order that helps new contributors start coding immediately. Use this when bringing contractors up to speed, handing off projects, or refreshing outdated documentation.

@alirezarezvani · Development

view on github ↗

name: “codebase-onboarding” description: “Codebase Onboarding”

Codebase Onboarding

Tier: POWERFUL
Category: Engineering
Domain: Documentation / Developer Experience

Overview

Analyze a codebase and generate onboarding documentation for engineers, tech leads, and contractors. This skill is optimized for fast fact-gathering and repeatable onboarding outputs.

Core Capabilities

Architecture and stack discovery from repository signals
Key file and config inventory for new contributors
Local setup and common-task guidance generation
Audience-aware documentation framing
Debugging and contribution checklist scaffolding

When to Use

Onboarding a new team member or contractor
Rebuilding stale project docs after large refactors
Preparing internal handoff documentation
Creating a standardized onboarding packet for services

Quick Start

# 1) Gather codebase facts
python3 scripts/codebase_analyzer.py /path/to/repo

# 2) Export machine-readable output
python3 scripts/codebase_analyzer.py /path/to/repo --json

# 3) Use the template to draft onboarding docs
# See references/onboarding-template.md

Recommended Workflow

Run scripts/codebase_analyzer.py against the target repository.
Capture key signals: file counts, detected languages, config files, top-level structure.
Fill the onboarding template in references/onboarding-template.md.
Tailor output depth by audience:
- Junior: setup + guardrails
- Senior: architecture + operational concerns
- Contractor: scoped ownership + integration boundaries

Onboarding Document Template

Detailed template and section examples live in:

references/onboarding-template.md
references/output-format-templates.md

Common Pitfalls

Writing docs without validating setup commands on a clean environment
Mixing architecture deep-dives into contractor-oriented docs
Omitting troubleshooting and verification steps
Letting onboarding docs drift from current repo state

Best Practices

Keep setup instructions executable and time-bounded.
Document the “why” for key architectural decisions.
Update docs in the same PR as behavior changes.
Treat onboarding docs as living operational assets, not one-time deliverables.

Onboarding Document Template

README.md - Full Template

# [Project Name]

> One-sentence description of what this does and who uses it.

[![CI](https://github.com/org/repo/actions/workflows/ci.yml/badge.svg)](https://github.com/org/repo/actions/workflows/ci.yml)
[![Coverage](https://codecov.io/gh/org/repo/branch/main/graph/badge.svg)](https://codecov.io/gh/org/repo)

## What is this?

[2-3 sentences: problem it solves, who uses it, current state]

**Live:** https://myapp.com  
**Staging:** https://staging.myapp.com  
**Docs:** https://docs.myapp.com

---

## Quick Start

### Prerequisites

| Tool | Version | Install |
|------|---------|---------|
| Node.js | 20+ | `nvm install 20` |
| pnpm | 8+ | `npm i -g pnpm` |
| Docker | 24+ | [docker.com](https://docker.com) |
| PostgreSQL | 16+ | via Docker (see below) |

### Setup (5 minutes)

```bash
git clone https://github.com/org/repo
cd repo
pnpm install
docker compose up -d
cp .env.example .env
pnpm db:migrate
pnpm db:seed
pnpm dev
pnpm test

Verify it works

App loads on localhost
Health endpoint returns ok
Tests pass

Architecture

System Overview

Browser / Mobile
    |
    v
[Next.js App] <- [Auth]
    |
    +-> [PostgreSQL]
    +-> [Redis]
    +-> [S3]

Tech Stack

Layer	Technology	Why
Frontend	Next.js	SSR + routing
Styling	Tailwind + shadcn/ui	Rapid UI
API	Route handlers	Co-location
Database	PostgreSQL	Relational
Queue	BullMQ + Redis	Background jobs

Key Files

Path	Purpose
`app/`	Pages and route handlers
`src/db/`	Schema and migrations
`src/lib/`	Shared utilities
`tests/`	Test suites and helpers
`.env.example`	Required variables

Common Developer Tasks

Add a new API endpoint

touch app/api/my-resource/route.ts
touch tests/api/my-resource.test.ts

Run a database migration

pnpm db:generate
pnpm db:migrate

Add a background job

# Create worker module and enqueue path

Debugging Guide

Common Errors

Missing environment variable
Database connectivity failure
Expired auth token
Generic 500 in local dev

Useful SQL Queries

Slow query checks
Connection status
Table bloat checks

Log Locations

Environment	Logs
Local dev	local terminal
Production	platform logs
Worker	worker process logs

Contribution Guidelines

Branch Strategy

main protected
feature/fix branches with ticket IDs

PR Requirements

CI green
Tests updated
Why documented
Self-review completed

Commit Convention

feat(scope): ...
fix(scope): ...
docs: ...

Audience-Specific Notes

Junior Developers

Start with core auth/data modules
Follow tests as executable examples

Senior Engineers

Read ADRs and scaling notes first
Validate performance/security assumptions early

Contractors

Stay within scoped feature boundaries
Use wrappers for external integrations


## Usage Notes

- Keep onboarding setup under 10 minutes where possible.
- Include executable verification checks after each setup phase.
- Prefer links to canonical docs instead of duplicating long content.
- Update this template when stack conventions or tooling change.

codebase-onboarding reference

Output Formats

Notion Export

// Use Notion API to create onboarding page
const { Client } = require('@notionhq/client')
const notion = new Client({ auth: process.env.NOTION_TOKEN })

const blocks = markdownToNotionBlocks(onboardingMarkdown) // use notion-to-md
await notion.pages.create({
  parent: { page_id: ONBOARDING_PARENT_PAGE_ID },
  properties: { title: { title: [{ text: { content: 'Engineer Onboarding — MyApp' } }] } },
  children: blocks,
})

Confluence Export

# Using confluence-cli or REST API
curl -X POST \
  -H "Content-Type: application/json" \
  -u "[email protected]:$CONFLUENCE_TOKEN" \
  "https://yourorg.atlassian.net/wiki/rest/api/content" \
  -d '{
    "type": "page",
    "title": "Codebase Onboarding",
    "space": {"key": "ENG"},
    "body": {
      "storage": {
        "value": "<p>Generated content...</p>",
        "representation": "storage"
      }
    }
  }'

#!/usr/bin/env python3
"""Generate a compact onboarding summary for a codebase (stdlib only)."""

from __future__ import annotations

import argparse
import json
import os
from collections import Counter
from pathlib import Path
from typing import Dict, Iterable, List

IGNORED_DIRS = {
    ".git",
    "node_modules",
    ".next",
    "dist",
    "build",
    "coverage",
    "venv",
    ".venv",
    "__pycache__",
}

EXT_TO_LANG = {
    ".py": "Python",
    ".ts": "TypeScript",
    ".tsx": "TypeScript",
    ".js": "JavaScript",
    ".jsx": "JavaScript",
    ".go": "Go",
    ".rs": "Rust",
    ".java": "Java",
    ".kt": "Kotlin",
    ".rb": "Ruby",
    ".php": "PHP",
    ".cs": "C#",
    ".c": "C",
    ".cpp": "C++",
    ".h": "C/C++",
    ".swift": "Swift",
    ".sql": "SQL",
    ".sh": "Shell",
}

KEY_CONFIG_FILES = [
    "package.json",
    "pnpm-workspace.yaml",
    "turbo.json",
    "nx.json",
    "lerna.json",
    "tsconfig.json",
    "next.config.js",
    "next.config.mjs",
    "pyproject.toml",
    "requirements.txt",
    "go.mod",
    "Cargo.toml",
    "docker-compose.yml",
    "Dockerfile",
    ".github/workflows",
]


def iter_files(root: Path) -> Iterable[Path]:
    for dirpath, dirnames, filenames in os.walk(root):
        dirnames[:] = [d for d in dirnames if d not in IGNORED_DIRS]
        for name in filenames:
            path = Path(dirpath) / name
            if path.is_file():
                yield path


def detect_languages(paths: Iterable[Path]) -> Dict[str, int]:
    counts: Counter[str] = Counter()
    for path in paths:
        lang = EXT_TO_LANG.get(path.suffix.lower())
        if lang:
            counts[lang] += 1
    return dict(sorted(counts.items(), key=lambda item: (-item[1], item[0])))


def find_key_configs(root: Path) -> List[str]:
    found: List[str] = []
    for rel in KEY_CONFIG_FILES:
        if (root / rel).exists():
            found.append(rel)
    return found


def top_level_structure(root: Path, max_depth: int) -> List[str]:
    lines: List[str] = []
    for dirpath, dirnames, filenames in os.walk(root):
        rel = Path(dirpath).relative_to(root)
        depth = 0 if str(rel) == "." else len(rel.parts)
        if depth > max_depth:
            dirnames[:] = []
            continue

        if any(part in IGNORED_DIRS for part in rel.parts):
            dirnames[:] = []
            continue

        indent = "  " * depth
        if str(rel) != ".":
            lines.append(f"{indent}{rel.name}/")

        visible_files = [f for f in sorted(filenames) if not f.startswith(".")]
        for filename in visible_files[:10]:
            lines.append(f"{indent}  {filename}")

        dirnames[:] = sorted([d for d in dirnames if d not in IGNORED_DIRS])
    return lines


def build_report(root: Path, max_depth: int) -> Dict[str, object]:
    files = list(iter_files(root))
    languages = detect_languages(files)
    total_files = len(files)
    file_count_by_ext: Counter[str] = Counter(p.suffix.lower() or "<no-ext>" for p in files)

    largest = sorted(
        ((str(p.relative_to(root)), p.stat().st_size) for p in files),
        key=lambda item: item[1],
        reverse=True,
    )[:20]

    return {
        "root": str(root),
        "file_count": total_files,
        "languages": languages,
        "key_config_files": find_key_configs(root),
        "top_extensions": dict(file_count_by_ext.most_common(12)),
        "largest_files": largest,
        "directory_structure": top_level_structure(root, max_depth),
    }


def format_size(num_bytes: int) -> str:
    units = ["B", "KB", "MB", "GB"]
    value = float(num_bytes)
    for unit in units:
        if value < 1024 or unit == units[-1]:
            return f"{value:.1f}{unit}"
        value /= 1024
    return f"{num_bytes}B"


def print_text(report: Dict[str, object]) -> None:
    print("Codebase Onboarding Summary")
    print(f"Root: {report['root']}")
    print(f"Total files: {report['file_count']}")
    print("")

    print("Languages detected")
    if report["languages"]:
        for lang, count in report["languages"].items():
            print(f"- {lang}: {count}")
    else:
        print("- No recognized source file extensions")
    print("")

    print("Key config files")
    configs = report["key_config_files"]
    if configs:
        for cfg in configs:
            print(f"- {cfg}")
    else:
        print("- None found from default checklist")
    print("")

    print("Largest files")
    for rel, size in report["largest_files"][:10]:
        print(f"- {rel}: {format_size(size)}")
    print("")

    print("Directory structure")
    for line in report["directory_structure"][:200]:
        print(line)


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(description="Scan a repository and generate onboarding summary facts.")
    parser.add_argument("path", help="Path to project directory")
    parser.add_argument("--max-depth", type=int, default=2, help="Max depth for structure output (default: 2)")
    parser.add_argument("--json", action="store_true", help="Print JSON output")
    return parser.parse_args()


def main() -> int:
    args = parse_args()
    root = Path(args.path).expanduser().resolve()
    if not root.exists() or not root.is_dir():
        raise SystemExit(f"Path is not a directory: {root}")

    report = build_report(root, max_depth=max(1, args.max_depth))
    if args.json:
        print(json.dumps(report, indent=2))
    else:
        print_text(report)
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

Install this Skill

Skills give your AI agent a consistent, structured approach to this task — better output than a one-off prompt.

npx skills add alirezarezvani/claude-skills --skill engineering/codebase-onboarding

Download ZIP

Community skill by @alirezarezvani. Need a walkthrough? See the install guide →

Works with

Claude Code OpenAI Codex CLI Gemini CLI

Prefer no terminal? Download the ZIP and place it manually.

Details

Category: Development
License: MIT
Author: @alirezarezvani
Source: GitHub →
Source file: show path
engineering/codebase-onboarding/SKILL.md

onboarding documentation codebase architecture developer-experience

People who install this also use

🏛️

Senior Software Architect

Design system architecture with C4 and sequence diagrams, write Architecture Decision Records, evaluate tech stacks, and guide architectural trade-offs.

@alirezarezvani

📋

Runbook Generator

Generate clear operational runbooks — step-by-step procedures for deployments, incident response, disaster recovery, and routine maintenance tasks.

@alirezarezvani