name: “codebase-onboarding”
description: “Codebase Onboarding”
Codebase Onboarding
Tier: POWERFUL
Category: Engineering
Domain: Documentation / Developer Experience
Overview
Analyze a codebase and generate onboarding documentation for engineers, tech leads, and contractors. This skill is optimized for fast fact-gathering and repeatable onboarding outputs.
Core Capabilities
- Architecture and stack discovery from repository signals
- Key file and config inventory for new contributors
- Local setup and common-task guidance generation
- Audience-aware documentation framing
- Debugging and contribution checklist scaffolding
When to Use
- Onboarding a new team member or contractor
- Rebuilding stale project docs after large refactors
- Preparing internal handoff documentation
- Creating a standardized onboarding packet for services
Quick Start
# 1) Gather codebase facts
python3 scripts/codebase_analyzer.py /path/to/repo
# 2) Export machine-readable output
python3 scripts/codebase_analyzer.py /path/to/repo --json
# 3) Use the template to draft onboarding docs
# See references/onboarding-template.md
Recommended Workflow
- Run
scripts/codebase_analyzer.py against the target repository.
- Capture key signals: file counts, detected languages, config files, top-level structure.
- Fill the onboarding template in
references/onboarding-template.md.
- Tailor output depth by audience:
- Junior: setup + guardrails
- Senior: architecture + operational concerns
- Contractor: scoped ownership + integration boundaries
Onboarding Document Template
Detailed template and section examples live in:
references/onboarding-template.md
references/output-format-templates.md
Common Pitfalls
- Writing docs without validating setup commands on a clean environment
- Mixing architecture deep-dives into contractor-oriented docs
- Omitting troubleshooting and verification steps
- Letting onboarding docs drift from current repo state
Best Practices
- Keep setup instructions executable and time-bounded.
- Document the “why” for key architectural decisions.
- Update docs in the same PR as behavior changes.
- Treat onboarding docs as living operational assets, not one-time deliverables.
Onboarding Document Template
README.md - Full Template
# [Project Name]
> One-sentence description of what this does and who uses it.
[](https://github.com/org/repo/actions/workflows/ci.yml)
[](https://codecov.io/gh/org/repo)
## What is this?
[2-3 sentences: problem it solves, who uses it, current state]
**Live:** https://myapp.com
**Staging:** https://staging.myapp.com
**Docs:** https://docs.myapp.com
---
## Quick Start
### Prerequisites
| Tool | Version | Install |
|------|---------|---------|
| Node.js | 20+ | `nvm install 20` |
| pnpm | 8+ | `npm i -g pnpm` |
| Docker | 24+ | [docker.com](https://docker.com) |
| PostgreSQL | 16+ | via Docker (see below) |
### Setup (5 minutes)
```bash
git clone https://github.com/org/repo
cd repo
pnpm install
docker compose up -d
cp .env.example .env
pnpm db:migrate
pnpm db:seed
pnpm dev
pnpm test
Verify it works
Architecture
System Overview
Browser / Mobile
|
v
[Next.js App] <- [Auth]
|
+-> [PostgreSQL]
+-> [Redis]
+-> [S3]
Tech Stack
| Layer |
Technology |
Why |
| Frontend |
Next.js |
SSR + routing |
| Styling |
Tailwind + shadcn/ui |
Rapid UI |
| API |
Route handlers |
Co-location |
| Database |
PostgreSQL |
Relational |
| Queue |
BullMQ + Redis |
Background jobs |
Key Files
| Path |
Purpose |
app/ |
Pages and route handlers |
src/db/ |
Schema and migrations |
src/lib/ |
Shared utilities |
tests/ |
Test suites and helpers |
.env.example |
Required variables |
Common Developer Tasks
Add a new API endpoint
touch app/api/my-resource/route.ts
touch tests/api/my-resource.test.ts
Run a database migration
pnpm db:generate
pnpm db:migrate
Add a background job
# Create worker module and enqueue path
Debugging Guide
Common Errors
- Missing environment variable
- Database connectivity failure
- Expired auth token
- Generic 500 in local dev
Useful SQL Queries
- Slow query checks
- Connection status
- Table bloat checks
Log Locations
| Environment |
Logs |
| Local dev |
local terminal |
| Production |
platform logs |
| Worker |
worker process logs |
Contribution Guidelines
Branch Strategy
main protected
- feature/fix branches with ticket IDs
PR Requirements
- CI green
- Tests updated
- Why documented
- Self-review completed
Commit Convention
feat(scope): ...
fix(scope): ...
docs: ...
Audience-Specific Notes
Junior Developers
- Start with core auth/data modules
- Follow tests as executable examples
Senior Engineers
- Read ADRs and scaling notes first
- Validate performance/security assumptions early
Contractors
- Stay within scoped feature boundaries
- Use wrappers for external integrations
## Usage Notes
- Keep onboarding setup under 10 minutes where possible.
- Include executable verification checks after each setup phase.
- Prefer links to canonical docs instead of duplicating long content.
- Update this template when stack conventions or tooling change.
codebase-onboarding reference
Output Formats
Notion Export
// Use Notion API to create onboarding page
const { Client } = require('@notionhq/client')
const notion = new Client({ auth: process.env.NOTION_TOKEN })
const blocks = markdownToNotionBlocks(onboardingMarkdown) // use notion-to-md
await notion.pages.create({
parent: { page_id: ONBOARDING_PARENT_PAGE_ID },
properties: { title: { title: [{ text: { content: 'Engineer Onboarding — MyApp' } }] } },
children: blocks,
})
Confluence Export
# Using confluence-cli or REST API
curl -X POST \
-H "Content-Type: application/json" \
-u "[email protected]:$CONFLUENCE_TOKEN" \
"https://yourorg.atlassian.net/wiki/rest/api/content" \
-d '{
"type": "page",
"title": "Codebase Onboarding",
"space": {"key": "ENG"},
"body": {
"storage": {
"value": "<p>Generated content...</p>",
"representation": "storage"
}
}
}'
#!/usr/bin/env python3
"""Generate a compact onboarding summary for a codebase (stdlib only)."""
from __future__ import annotations
import argparse
import json
import os
from collections import Counter
from pathlib import Path
from typing import Dict, Iterable, List
IGNORED_DIRS = {
".git",
"node_modules",
".next",
"dist",
"build",
"coverage",
"venv",
".venv",
"__pycache__",
}
EXT_TO_LANG = {
".py": "Python",
".ts": "TypeScript",
".tsx": "TypeScript",
".js": "JavaScript",
".jsx": "JavaScript",
".go": "Go",
".rs": "Rust",
".java": "Java",
".kt": "Kotlin",
".rb": "Ruby",
".php": "PHP",
".cs": "C#",
".c": "C",
".cpp": "C++",
".h": "C/C++",
".swift": "Swift",
".sql": "SQL",
".sh": "Shell",
}
KEY_CONFIG_FILES = [
"package.json",
"pnpm-workspace.yaml",
"turbo.json",
"nx.json",
"lerna.json",
"tsconfig.json",
"next.config.js",
"next.config.mjs",
"pyproject.toml",
"requirements.txt",
"go.mod",
"Cargo.toml",
"docker-compose.yml",
"Dockerfile",
".github/workflows",
]
def iter_files(root: Path) -> Iterable[Path]:
for dirpath, dirnames, filenames in os.walk(root):
dirnames[:] = [d for d in dirnames if d not in IGNORED_DIRS]
for name in filenames:
path = Path(dirpath) / name
if path.is_file():
yield path
def detect_languages(paths: Iterable[Path]) -> Dict[str, int]:
counts: Counter[str] = Counter()
for path in paths:
lang = EXT_TO_LANG.get(path.suffix.lower())
if lang:
counts[lang] += 1
return dict(sorted(counts.items(), key=lambda item: (-item[1], item[0])))
def find_key_configs(root: Path) -> List[str]:
found: List[str] = []
for rel in KEY_CONFIG_FILES:
if (root / rel).exists():
found.append(rel)
return found
def top_level_structure(root: Path, max_depth: int) -> List[str]:
lines: List[str] = []
for dirpath, dirnames, filenames in os.walk(root):
rel = Path(dirpath).relative_to(root)
depth = 0 if str(rel) == "." else len(rel.parts)
if depth > max_depth:
dirnames[:] = []
continue
if any(part in IGNORED_DIRS for part in rel.parts):
dirnames[:] = []
continue
indent = " " * depth
if str(rel) != ".":
lines.append(f"{indent}{rel.name}/")
visible_files = [f for f in sorted(filenames) if not f.startswith(".")]
for filename in visible_files[:10]:
lines.append(f"{indent} {filename}")
dirnames[:] = sorted([d for d in dirnames if d not in IGNORED_DIRS])
return lines
def build_report(root: Path, max_depth: int) -> Dict[str, object]:
files = list(iter_files(root))
languages = detect_languages(files)
total_files = len(files)
file_count_by_ext: Counter[str] = Counter(p.suffix.lower() or "<no-ext>" for p in files)
largest = sorted(
((str(p.relative_to(root)), p.stat().st_size) for p in files),
key=lambda item: item[1],
reverse=True,
)[:20]
return {
"root": str(root),
"file_count": total_files,
"languages": languages,
"key_config_files": find_key_configs(root),
"top_extensions": dict(file_count_by_ext.most_common(12)),
"largest_files": largest,
"directory_structure": top_level_structure(root, max_depth),
}
def format_size(num_bytes: int) -> str:
units = ["B", "KB", "MB", "GB"]
value = float(num_bytes)
for unit in units:
if value < 1024 or unit == units[-1]:
return f"{value:.1f}{unit}"
value /= 1024
return f"{num_bytes}B"
def print_text(report: Dict[str, object]) -> None:
print("Codebase Onboarding Summary")
print(f"Root: {report['root']}")
print(f"Total files: {report['file_count']}")
print("")
print("Languages detected")
if report["languages"]:
for lang, count in report["languages"].items():
print(f"- {lang}: {count}")
else:
print("- No recognized source file extensions")
print("")
print("Key config files")
configs = report["key_config_files"]
if configs:
for cfg in configs:
print(f"- {cfg}")
else:
print("- None found from default checklist")
print("")
print("Largest files")
for rel, size in report["largest_files"][:10]:
print(f"- {rel}: {format_size(size)}")
print("")
print("Directory structure")
for line in report["directory_structure"][:200]:
print(line)
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Scan a repository and generate onboarding summary facts.")
parser.add_argument("path", help="Path to project directory")
parser.add_argument("--max-depth", type=int, default=2, help="Max depth for structure output (default: 2)")
parser.add_argument("--json", action="store_true", help="Print JSON output")
return parser.parse_args()
def main() -> int:
args = parse_args()
root = Path(args.path).expanduser().resolve()
if not root.exists() or not root.is_dir():
raise SystemExit(f"Path is not a directory: {root}")
report = build_report(root, max_depth=max(1, args.max_depth))
if args.json:
print(json.dumps(report, indent=2))
else:
print_text(report)
return 0
if __name__ == "__main__":
raise SystemExit(main())