🎤

Interview System Designer

Design structured technical and behavioral interview processes — question banks, evaluation rubrics, calibration guides, and fair assessment frameworks.

by @alirezarezvani · MIT · 9.2k

Built for: Developers

What this skill does

Design fair and consistent hiring processes with structured interview plans for any role. Create ready-to-use question lists, scoring guides, and bias-reduction checklists that keep your team aligned during candidate assessments. Reach for this whenever you need to standardize how you hire or improve your existing hiring system.

@alirezarezvani · Productivity

view on github ↗

name: “interview-system-designer” description: This skill should be used when the user asks to “design interview processes”, “create hiring pipelines”, “calibrate interview loops”, “generate interview questions”, “design competency matrices”, “analyze interviewer bias”, “create scoring rubrics”, “build question banks”, or “optimize hiring systems”. Use for designing role-specific interview loops, competency assessments, and hiring calibration systems.

Interview System Designer

Comprehensive interview loop planning and calibration support for role-based hiring systems.

Overview

Use this skill to create structured interview loops, standardize question quality, and keep hiring signal consistent across interviewers.

Core Capabilities

Interview loop planning by role and level
Round-by-round focus and timing recommendations
Suggested question sets by round type
Framework support for scoring and calibration
Bias-reduction and process consistency guidance

Quick Start

# Generate a loop plan for a role and level
python3 scripts/interview_planner.py --role "Senior Software Engineer" --level senior

# JSON output for integration with internal tooling
python3 scripts/interview_planner.py --role "Product Manager" --level mid --json

Recommended Workflow

Run scripts/interview_planner.py to generate a baseline loop.
Align rounds to role-specific competencies.
Validate scoring rubric consistency with interview panel leads.
Review for bias controls before rollout.
Recalibrate quarterly using hiring outcome data.

References

references/interview-frameworks.md
references/bias_mitigation_checklist.md
references/competency_matrix_templates.md
references/debrief_facilitation_guide.md

Common Pitfalls

Overweighting one round while ignoring other competency signals
Using unstructured interviews without standardized scoring
Skipping calibration sessions for interviewers
Changing hiring bar without documenting rationale

Best Practices

Keep round objectives explicit and non-overlapping.
Require evidence for each score recommendation.
Use the same baseline rubric across comparable roles.
Revisit loop design based on quality-of-hire outcomes.

Interview System Designer

A comprehensive toolkit for designing, optimizing, and calibrating interview processes. This skill provides tools to create role-specific interview loops, generate competency-based question banks, and analyze hiring data for bias and calibration issues.

Overview

The Interview System Designer skill includes three powerful Python tools and comprehensive reference materials to help you build fair, effective, and scalable hiring processes:

Interview Loop Designer - Generate calibrated interview loops for any role and level
Question Bank Generator - Create competency-based interview questions with scoring rubrics
Hiring Calibrator - Analyze interview data to detect bias and calibration issues

Tools

1. Interview Loop Designer (`loop_designer.py`)

Generates complete interview loops tailored to specific roles, levels, and teams.

Features:

Role-specific competency mapping (SWE, PM, Designer, Data, DevOps, Leadership)
Level-appropriate interview rounds (junior through principal)
Optimized scheduling and time allocation
Interviewer skill requirements
Standardized scorecard templates

Usage:

# Basic usage
python3 loop_designer.py --role "Senior Software Engineer" --level senior

# With team and custom competencies  
python3 loop_designer.py --role "Product Manager" --level mid --team growth --competencies leadership,strategy,analytics

# Using JSON input file
python3 loop_designer.py --input assets/sample_role_definitions.json --output loops/

# Specify output format
python3 loop_designer.py --role "Staff Data Scientist" --level staff --format json --output data_scientist_loop.json

Input Options:

--role: Job role title (e.g., "Senior Software Engineer", "Product Manager")
--level: Experience level (junior, mid, senior, staff, principal)
--team: Team or department (optional)
--competencies: Comma-separated list of specific competencies to focus on
--input: JSON file with role definition
--output: Output directory or file path
--format: Output format (json, text, both) - default: both

Example Output:

Interview Loop Design for Senior Software Engineer (Senior Level)
============================================================
Total Duration: 300 minutes (5h 0m)
Total Rounds: 5

INTERVIEW ROUNDS
----------------------------------------
Round 1: Technical Phone Screen
Duration: 45 minutes
Format: Virtual
Focus Areas: Coding Fundamentals, Problem Solving

Round 2: System Design  
Duration: 75 minutes
Format: Collaborative Whitboard
Focus Areas: System Thinking, Architectural Reasoning
...

2. Question Bank Generator (`question_bank_generator.py`)

Creates comprehensive interview question banks organized by competency area.

Features:

Competency-based question organization
Level-appropriate difficulty progression
Multiple question types (technical, behavioral, situational)
Detailed scoring rubrics with calibration examples
Follow-up probes and conversation guides

Usage:

# Generate questions for specific competencies
python3 question_bank_generator.py --role "Frontend Engineer" --competencies react,typescript,system-design

# Create behavioral question bank
python3 question_bank_generator.py --role "Product Manager" --question-types behavioral,leadership --num-questions 15

# Generate questions for multiple levels
python3 question_bank_generator.py --role "DevOps Engineer" --levels junior,mid,senior --output questions/

Input Options:

--role: Job role title
--level: Experience level (default: senior)
--competencies: Comma-separated list of competencies to focus on
--question-types: Types to include (technical, behavioral, situational)
--num-questions: Number of questions to generate (default: 20)
--input: JSON file with role requirements
--output: Output directory or file path
--format: Output format (json, text, both) - default: both

Question Types:

Technical: Coding problems, system design, domain-specific challenges
Behavioral: STAR method questions focusing on past experiences
Situational: Hypothetical scenarios testing decision-making

3. Hiring Calibrator (`hiring_calibrator.py`)

Analyzes interview scores to detect bias, calibration issues, and provides recommendations.

Features:

Statistical bias detection across demographics
Interviewer calibration analysis
Score distribution and trending analysis
Specific coaching recommendations
Comprehensive reporting with actionable insights

Usage:

# Comprehensive analysis
python3 hiring_calibrator.py --input assets/sample_interview_results.json --analysis-type comprehensive

# Focus on specific areas
python3 hiring_calibrator.py --input interview_data.json --analysis-type bias --competencies technical,leadership

# Trend analysis over time
python3 hiring_calibrator.py --input historical_data.json --trend-analysis --period quarterly

Input Options:

--input: JSON file with interview results data (required)
--analysis-type: Type of analysis (comprehensive, bias, calibration, interviewer, scoring)
--competencies: Comma-separated list of competencies to focus on
--trend-analysis: Enable trend analysis over time
--period: Time period for trends (daily, weekly, monthly, quarterly)
--output: Output file path
--format: Output format (json, text, both) - default: both

Analysis Types:

Comprehensive: Full analysis including bias, calibration, and recommendations
Bias: Focus on demographic and interviewer bias patterns
Calibration: Interviewer consistency and agreement analysis
Interviewer: Individual interviewer performance and coaching needs
Scoring: Score distribution and pattern analysis

Data Formats

Role Definition Input (JSON)

{
  "role": "Senior Software Engineer",
  "level": "senior", 
  "team": "platform",
  "competencies": ["system_design", "technical_leadership", "mentoring"],
  "requirements": {
    "years_experience": "5-8",
    "technical_skills": ["Python", "AWS", "Kubernetes"],
    "leadership_experience": true
  }
}

Interview Results Input (JSON)

[
  {
    "candidate_id": "candidate_001",
    "role": "Senior Software Engineer",
    "interviewer_id": "interviewer_alice", 
    "date": "2024-01-15T09:00:00Z",
    "scores": {
      "coding_fundamentals": 3.5,
      "system_design": 4.0,
      "technical_leadership": 3.0,
      "communication": 3.5
    },
    "overall_recommendation": "Hire",
    "gender": "male",
    "ethnicity": "asian",
    "years_experience": 6
  }
]

Reference Materials

Competency Matrix Templates (`references/competency_matrix_templates.md`)

Comprehensive competency matrices for all engineering roles
Level-specific expectations (junior through principal)
Assessment criteria and growth paths
Customization guidelines for different company stages and industries

Bias Mitigation Checklist (`references/bias_mitigation_checklist.md`)

Pre-interview preparation checklist
Interview process bias prevention strategies
Real-time bias interruption techniques
Legal compliance reminders
Emergency response protocols

Debrief Facilitation Guide (`references/debrief_facilitation_guide.md`)

Structured debrief meeting frameworks
Evidence-based discussion techniques
Bias interruption strategies
Decision documentation standards
Common challenges and solutions

Sample Data

The assets/ directory contains sample data for testing:

sample_role_definitions.json: Example role definitions for various positions
sample_interview_results.json: Sample interview data with multiple candidates and interviewers

Expected Outputs

The expected_outputs/ directory contains examples of tool outputs:

Interview loop designs in both JSON and human-readable formats
Question banks with scoring rubrics and calibration examples
Calibration analysis reports with bias detection and recommendations

Best Practices

Interview Loop Design

Competency Focus: Align interview rounds with role-critical competencies
Level Calibration: Adjust expectations and question difficulty based on experience level
Time Optimization: Balance thoroughness with candidate experience
Interviewer Training: Ensure interviewers are qualified and calibrated

Question Bank Development

Evidence-Based: Focus on observable behaviors and concrete examples
Bias Mitigation: Use structured questions that minimize subjective interpretation
Calibration: Include examples of different quality responses for consistency
Continuous Improvement: Regularly update questions based on predictive validity

Calibration Analysis

Regular Monitoring: Analyze hiring data quarterly for bias patterns
Prompt Action: Address calibration issues immediately with targeted coaching
Data Quality: Ensure complete and consistent data collection
Legal Compliance: Monitor for discriminatory patterns and document corrections

Installation & Setup

No external dependencies required - uses Python 3 standard library only.

# Clone or download the skill directory
cd interview-system-designer/

# Make scripts executable (optional)
chmod +x *.py

# Test with sample data
python3 loop_designer.py --role "Senior Software Engineer" --level senior
python3 question_bank_generator.py --role "Product Manager" --level mid  
python3 hiring_calibrator.py --input assets/sample_interview_results.json

Integration

With Existing Systems

ATS Integration: Export interview loops as structured data for applicant tracking systems
Calendar Systems: Use scheduling outputs to auto-create interview blocks
HR Analytics: Import calibration reports into broader diversity and inclusion dashboards

Custom Workflows

Batch Processing: Process multiple roles or historical data sets
Automated Reporting: Schedule regular calibration analysis
Custom Competencies: Extend frameworks with company-specific competencies

Troubleshooting

Common Issues

"Role not found" errors:

The tool will map common variations (engineer → software_engineer)
For custom roles, use the closest standard role and specify custom competencies

"Insufficient data" errors:

Minimum 5 interviews required for statistical analysis
Ensure interview data includes required fields (candidate_id, interviewer_id, scores, date)

Missing output files:

Check file permissions in output directory
Ensure adequate disk space
Verify JSON input file format is valid

Performance Considerations

Interview loop generation: < 1 second
Question bank generation: 1-3 seconds for 20 questions
Calibration analysis: 1-5 seconds for 50 interviews, scales linearly

Contributing

To extend this skill:

New Roles: Add competency frameworks in _init_competency_frameworks()
New Question Types: Extend question templates in respective generators
New Analysis Types: Add analysis methods to hiring calibrator
Custom Outputs: Modify formatting functions for different output needs

License & Usage

This skill is designed for internal company use in hiring process optimization. All bias detection and mitigation features should be reviewed with legal counsel to ensure compliance with local employment laws.

For questions or support, refer to the comprehensive documentation in each script's docstring and the reference materials provided.

[
  {
    "candidate_id": "candidate_001",
    "role": "Senior Software Engineer",
    "interviewer_id": "interviewer_alice",
    "date": "2024-01-15T09:00:00Z",
    "scores": {
      "coding_fundamentals": 3.5,
      "system_design": 4.0,
      "technical_leadership": 3.0,
      "communication": 3.5,
      "problem_solving": 4.0
    },
    "overall_recommendation": "Hire",
    "gender": "male",
    "ethnicity": "asian",
    "years_experience": 6,
    "university_tier": "tier_1",
    "previous_company_size": "large"
  },
  {
    "candidate_id": "candidate_001", 
    "role": "Senior Software Engineer",
    "interviewer_id": "interviewer_bob",
    "date": "2024-01-15T11:00:00Z",
    "scores": {
      "system_design": 3.5,
      "technical_leadership": 3.5,
      "mentoring": 3.0,
      "cross_team_collaboration": 4.0,
      "strategic_thinking": 3.5
    },
    "overall_recommendation": "Hire",
    "gender": "male",
    "ethnicity": "asian", 
    "years_experience": 6,
    "university_tier": "tier_1",
    "previous_company_size": "large"
  },
  {
    "candidate_id": "candidate_002",
    "role": "Senior Software Engineer", 
    "interviewer_id": "interviewer_alice",
    "date": "2024-01-16T09:00:00Z",
    "scores": {
      "coding_fundamentals": 2.5,
      "system_design": 3.0,
      "technical_leadership": 2.0,
      "communication": 3.0,
      "problem_solving": 3.0
    },
    "overall_recommendation": "No Hire",
    "gender": "female",
    "ethnicity": "hispanic",
    "years_experience": 5,
    "university_tier": "tier_2",
    "previous_company_size": "startup"
  },
  {
    "candidate_id": "candidate_002",
    "role": "Senior Software Engineer",
    "interviewer_id": "interviewer_charlie", 
    "date": "2024-01-16T11:00:00Z",
    "scores": {
      "system_design": 2.0,
      "technical_leadership": 2.5,
      "mentoring": 2.0,
      "cross_team_collaboration": 3.0,
      "strategic_thinking": 2.5
    },
    "overall_recommendation": "No Hire",
    "gender": "female",
    "ethnicity": "hispanic",
    "years_experience": 5,
    "university_tier": "tier_2",
    "previous_company_size": "startup"
  },
  {
    "candidate_id": "candidate_003",
    "role": "Senior Software Engineer",
    "interviewer_id": "interviewer_david",
    "date": "2024-01-17T14:00:00Z", 
    "scores": {
      "coding_fundamentals": 4.0,
      "system_design": 3.5,
      "technical_leadership": 4.0,
      "communication": 4.0,
      "problem_solving": 3.5
    },
    "overall_recommendation": "Strong Hire",
    "gender": "male",
    "ethnicity": "white",
    "years_experience": 8,
    "university_tier": "tier_1",
    "previous_company_size": "large"
  },
  {
    "candidate_id": "candidate_003",
    "role": "Senior Software Engineer", 
    "interviewer_id": "interviewer_alice",
    "date": "2024-01-17T16:00:00Z",
    "scores": {
      "system_design": 4.0,
      "technical_leadership": 4.0,
      "mentoring": 3.5,
      "cross_team_collaboration": 4.0,
      "strategic_thinking": 3.5
    },
    "overall_recommendation": "Hire",
    "gender": "male",
    "ethnicity": "white",
    "years_experience": 8,
    "university_tier": "tier_1", 
    "previous_company_size": "large"
  },
  {
    "candidate_id": "candidate_004",
    "role": "Product Manager",
    "interviewer_id": "interviewer_emma",
    "date": "2024-01-18T10:00:00Z",
    "scores": {
      "product_strategy": 3.0,
      "user_research": 3.5,
      "data_analysis": 4.0,
      "stakeholder_management": 3.0,
      "communication": 3.5
    },
    "overall_recommendation": "Hire",
    "gender": "female",
    "ethnicity": "black",
    "years_experience": 4,
    "university_tier": "tier_2",
    "previous_company_size": "medium"
  },
  {
    "candidate_id": "candidate_005", 
    "role": "Product Manager",
    "interviewer_id": "interviewer_frank",
    "date": "2024-01-19T13:00:00Z",
    "scores": {
      "product_strategy": 2.5,
      "user_research": 2.0,
      "data_analysis": 3.0,
      "stakeholder_management": 2.5,
      "communication": 3.0
    },
    "overall_recommendation": "No Hire",
    "gender": "male",
    "ethnicity": "white",
    "years_experience": 3,
    "university_tier": "tier_3",
    "previous_company_size": "startup"
  },
  {
    "candidate_id": "candidate_006",
    "role": "Junior Software Engineer",
    "interviewer_id": "interviewer_alice", 
    "date": "2024-01-20T09:00:00Z",
    "scores": {
      "coding_fundamentals": 3.0,
      "debugging": 3.5,
      "testing_basics": 3.0,
      "collaboration": 4.0,
      "learning_agility": 3.5
    },
    "overall_recommendation": "Hire",
    "gender": "female",
    "ethnicity": "asian",
    "years_experience": 1,
    "university_tier": "bootcamp",
    "previous_company_size": "none"
  },
  {
    "candidate_id": "candidate_007",
    "role": "Junior Software Engineer",
    "interviewer_id": "interviewer_bob",
    "date": "2024-01-21T10:30:00Z",
    "scores": {
      "coding_fundamentals": 2.0,
      "debugging": 2.5, 
      "testing_basics": 2.0,
      "collaboration": 3.0,
      "learning_agility": 3.0
    },
    "overall_recommendation": "No Hire",
    "gender": "male",
    "ethnicity": "hispanic",
    "years_experience": 0,
    "university_tier": "tier_2",
    "previous_company_size": "none"
  },
  {
    "candidate_id": "candidate_008",
    "role": "Staff Frontend Engineer",
    "interviewer_id": "interviewer_grace",
    "date": "2024-01-22T14:00:00Z", 
    "scores": {
      "frontend_architecture": 4.0,
      "system_design": 4.0,
      "technical_leadership": 4.0,
      "team_building": 3.5,
      "strategic_thinking": 3.5
    },
    "overall_recommendation": "Strong Hire",
    "gender": "female",
    "ethnicity": "white",
    "years_experience": 9,
    "university_tier": "tier_1",
    "previous_company_size": "large"
  },
  {
    "candidate_id": "candidate_008",
    "role": "Staff Frontend Engineer",
    "interviewer_id": "interviewer_henry",
    "date": "2024-01-22T16:00:00Z",
    "scores": {
      "frontend_architecture": 3.5,
      "technical_leadership": 4.0,
      "team_building": 4.0,
      "cross_functional_collaboration": 4.0,
      "organizational_impact": 3.5
    },
    "overall_recommendation": "Hire", 
    "gender": "female",
    "ethnicity": "white",
    "years_experience": 9,
    "university_tier": "tier_1",
    "previous_company_size": "large"
  },
  {
    "candidate_id": "candidate_009",
    "role": "Data Scientist",
    "interviewer_id": "interviewer_ivan",
    "date": "2024-01-23T11:00:00Z",
    "scores": {
      "statistical_analysis": 3.5,
      "machine_learning": 4.0,
      "data_engineering": 3.0,
      "business_acumen": 3.5,
      "communication": 3.0
    },
    "overall_recommendation": "Hire",
    "gender": "male",
    "ethnicity": "indian",
    "years_experience": 5,
    "university_tier": "tier_1",
    "previous_company_size": "medium"
  },
  {
    "candidate_id": "candidate_010",
    "role": "DevOps Engineer", 
    "interviewer_id": "interviewer_jane",
    "date": "2024-01-24T15:00:00Z",
    "scores": {
      "infrastructure_automation": 3.5,
      "ci_cd_design": 4.0,
      "monitoring_observability": 3.0,
      "security_implementation": 3.5,
      "incident_management": 4.0
    },
    "overall_recommendation": "Hire",
    "gender": "female",
    "ethnicity": "black",
    "years_experience": 6,
    "university_tier": "tier_2",
    "previous_company_size": "startup"
  },
  {
    "candidate_id": "candidate_011",
    "role": "UX Designer",
    "interviewer_id": "interviewer_karl",
    "date": "2024-01-25T10:00:00Z",
    "scores": {
      "design_process": 4.0,
      "user_research": 3.5, 
      "design_systems": 4.0,
      "cross_functional_collaboration": 3.5,
      "design_leadership": 3.0
    },
    "overall_recommendation": "Hire",
    "gender": "non_binary",
    "ethnicity": "white",
    "years_experience": 7,
    "university_tier": "tier_1",
    "previous_company_size": "medium"
  },
  {
    "candidate_id": "candidate_012",
    "role": "Engineering Manager",
    "interviewer_id": "interviewer_lisa",
    "date": "2024-01-26T13:30:00Z",
    "scores": {
      "people_leadership": 4.0,
      "technical_background": 3.5,
      "strategic_thinking": 3.5,
      "performance_management": 4.0, 
      "cross_functional_leadership": 3.5
    },
    "overall_recommendation": "Hire",
    "gender": "male",
    "ethnicity": "white",
    "years_experience": 8,
    "university_tier": "tier_1",
    "previous_company_size": "large"
  },
  {
    "candidate_id": "candidate_013",
    "role": "Senior Software Engineer", 
    "interviewer_id": "interviewer_alice",
    "date": "2024-01-27T09:00:00Z",
    "scores": {
      "coding_fundamentals": 4.0,
      "system_design": 4.0,
      "technical_leadership": 4.0,
      "communication": 4.0,
      "problem_solving": 4.0
    },
    "overall_recommendation": "Strong Hire",
    "gender": "female",
    "ethnicity": "asian",
    "years_experience": 7,
    "university_tier": "tier_1",
    "previous_company_size": "large"
  },
  {
    "candidate_id": "candidate_013",
    "role": "Senior Software Engineer",
    "interviewer_id": "interviewer_charlie", 
    "date": "2024-01-27T11:00:00Z",
    "scores": {
      "system_design": 3.5,
      "technical_leadership": 3.5,
      "mentoring": 4.0,
      "cross_team_collaboration": 4.0,
      "strategic_thinking": 3.5
    },
    "overall_recommendation": "Hire",
    "gender": "female",
    "ethnicity": "asian",
    "years_experience": 7,
    "university_tier": "tier_1",
    "previous_company_size": "large"
  },
  {
    "candidate_id": "candidate_014",
    "role": "Senior Software Engineer",
    "interviewer_id": "interviewer_david", 
    "date": "2024-01-28T14:00:00Z",
    "scores": {
      "coding_fundamentals": 1.5,
      "system_design": 2.0,
      "technical_leadership": 1.0,
      "communication": 2.0,
      "problem_solving": 2.0
    },
    "overall_recommendation": "Strong No Hire",
    "gender": "male",
    "ethnicity": "white",
    "years_experience": 4,
    "university_tier": "tier_3",
    "previous_company_size": "startup"
  },
  {
    "candidate_id": "candidate_015",
    "role": "Product Manager",
    "interviewer_id": "interviewer_emma",
    "date": "2024-01-29T11:00:00Z",
    "scores": {
      "product_strategy": 4.0,
      "user_research": 3.5,
      "data_analysis": 4.0,
      "stakeholder_management": 4.0,
      "communication": 3.5
    }, 
    "overall_recommendation": "Strong Hire",
    "gender": "male",
    "ethnicity": "black",
    "years_experience": 5,
    "university_tier": "tier_2",
    "previous_company_size": "medium"
  }
]

[
  {
    "role": "Senior Software Engineer",
    "level": "senior",
    "team": "platform",
    "department": "engineering",
    "competencies": [
      "system_design",
      "coding_fundamentals", 
      "technical_leadership",
      "mentoring",
      "cross_team_collaboration"
    ],
    "requirements": {
      "years_experience": "5-8",
      "technical_skills": ["Python", "Java", "Docker", "Kubernetes", "AWS"],
      "leadership_experience": true,
      "mentoring_required": true
    },
    "hiring_bar": "high",
    "interview_focus": ["technical_depth", "system_architecture", "leadership_potential"]
  },
  {
    "role": "Product Manager", 
    "level": "mid",
    "team": "growth",
    "department": "product",
    "competencies": [
      "product_strategy",
      "user_research",
      "data_analysis", 
      "stakeholder_management",
      "cross_functional_leadership"
    ],
    "requirements": {
      "years_experience": "3-5",
      "domain_knowledge": ["user_analytics", "experimentation", "product_metrics"],
      "leadership_experience": false,
      "technical_background": "preferred"
    },
    "hiring_bar": "medium-high",
    "interview_focus": ["product_sense", "analytical_thinking", "execution_ability"]
  },
  {
    "role": "Staff Frontend Engineer",
    "level": "staff", 
    "team": "consumer",
    "department": "engineering",
    "competencies": [
      "frontend_architecture",
      "system_design",
      "technical_leadership",
      "team_building",
      "cross_functional_collaboration"
    ],
    "requirements": {
      "years_experience": "8+",
      "technical_skills": ["React", "TypeScript", "GraphQL", "Webpack", "Performance Optimization"],
      "leadership_experience": true,
      "architecture_experience": true
    },
    "hiring_bar": "very-high",
    "interview_focus": ["architectural_vision", "technical_strategy", "organizational_impact"]
  },
  {
    "role": "Data Scientist",
    "level": "mid",
    "team": "ml_platform", 
    "department": "data",
    "competencies": [
      "statistical_analysis",
      "machine_learning",
      "data_engineering",
      "business_acumen",
      "communication"
    ],
    "requirements": {
      "years_experience": "3-6",
      "technical_skills": ["Python", "SQL", "TensorFlow", "Spark", "Statistics"],
      "domain_knowledge": ["ML algorithms", "experimentation", "data_pipelines"],
      "leadership_experience": false
    },
    "hiring_bar": "high",
    "interview_focus": ["technical_depth", "problem_solving", "business_impact"]
  },
  {
    "role": "DevOps Engineer",
    "level": "senior",
    "team": "infrastructure",
    "department": "engineering", 
    "competencies": [
      "infrastructure_automation",
      "ci_cd_design",
      "monitoring_observability",
      "security_implementation",
      "incident_management"
    ],
    "requirements": {
      "years_experience": "5-7",
      "technical_skills": ["Kubernetes", "Terraform", "AWS", "Docker", "Monitoring"],
      "security_background": "required",
      "leadership_experience": "preferred"
    },
    "hiring_bar": "high",
    "interview_focus": ["system_reliability", "automation_expertise", "operational_excellence"]
  },
  {
    "role": "UX Designer", 
    "level": "senior",
    "team": "design_systems",
    "department": "design",
    "competencies": [
      "design_process",
      "user_research",
      "design_systems",
      "cross_functional_collaboration",
      "design_leadership"
    ],
    "requirements": {
      "years_experience": "5-8",
      "portfolio_quality": "high",
      "research_experience": true,
      "systems_thinking": true
    },
    "hiring_bar": "high", 
    "interview_focus": ["design_process", "systems_thinking", "user_advocacy"]
  },
  {
    "role": "Engineering Manager",
    "level": "senior",
    "team": "backend",
    "department": "engineering",
    "competencies": [
      "people_leadership",
      "technical_background", 
      "strategic_thinking",
      "performance_management",
      "cross_functional_leadership"
    ],
    "requirements": {
      "years_experience": "6-10",
      "management_experience": "2+ years",
      "technical_background": "required",
      "hiring_experience": true
    },
    "hiring_bar": "very-high",
    "interview_focus": ["people_leadership", "technical_judgment", "organizational_impact"]
  },
  {
    "role": "Junior Software Engineer",
    "level": "junior",
    "team": "web",
    "department": "engineering",
    "competencies": [
      "coding_fundamentals",
      "debugging",
      "testing_basics",
      "collaboration",
      "learning_agility"
    ],
    "requirements": {
      "years_experience": "0-2", 
      "technical_skills": ["JavaScript", "HTML/CSS", "Git", "Basic Algorithms"],
      "education": "CS degree or bootcamp",
      "growth_mindset": true
    },
    "hiring_bar": "medium",
    "interview_focus": ["coding_ability", "problem_solving", "potential_assessment"]
  }
]

{
  "role": "Product Manager",
  "level": "senior",
  "competencies": [
    "strategy",
    "analytics",
    "business_strategy",
    "product_strategy",
    "stakeholder_management",
    "p&l_responsibility",
    "leadership",
    "team_leadership",
    "user_research",
    "data_analysis"
  ],
  "question_types": [
    "technical",
    "behavioral",
    "situational"
  ],
  "generated_at": "2026-02-16T13:27:41.303329",
  "total_questions": 20,
  "questions": [
    {
      "question": "What challenges have you faced related to p&l responsibility and how did you overcome them?",
      "competency": "p&l_responsibility",
      "type": "challenge_based",
      "focus_areas": [
        "problem_solving",
        "learning_from_experience"
      ]
    },
    {
      "question": "Analyze conversion funnel data to identify the biggest drop-off point and propose solutions.",
      "competency": "data_analysis",
      "type": "analytical",
      "difficulty": "medium",
      "time_limit": 45,
      "key_concepts": [
        "funnel_analysis",
        "conversion_optimization",
        "statistical_significance"
      ]
    },
    {
      "question": "What challenges have you faced related to team leadership and how did you overcome them?",
      "competency": "team_leadership",
      "type": "challenge_based",
      "focus_areas": [
        "problem_solving",
        "learning_from_experience"
      ]
    },
    {
      "question": "Design a go-to-market strategy for a new B2B SaaS product entering a competitive market.",
      "competency": "product_strategy",
      "type": "strategic",
      "difficulty": "hard",
      "time_limit": 60,
      "key_concepts": [
        "market_analysis",
        "competitive_positioning",
        "pricing_strategy",
        "channel_strategy"
      ]
    },
    {
      "question": "What challenges have you faced related to business strategy and how did you overcome them?",
      "competency": "business_strategy",
      "type": "challenge_based",
      "focus_areas": [
        "problem_solving",
        "learning_from_experience"
      ]
    },
    {
      "question": "Describe your experience with business strategy in your current or previous role.",
      "competency": "business_strategy",
      "type": "experience",
      "focus_areas": [
        "experience_depth",
        "practical_application"
      ]
    },
    {
      "question": "Describe your experience with team leadership in your current or previous role.",
      "competency": "team_leadership",
      "type": "experience",
      "focus_areas": [
        "experience_depth",
        "practical_application"
      ]
    },
    {
      "question": "Describe a situation where you had to influence someone without having direct authority over them.",
      "competency": "leadership",
      "type": "behavioral",
      "method": "STAR",
      "focus_areas": [
        "influence",
        "persuasion",
        "stakeholder_management"
      ]
    },
    {
      "question": "Given a dataset of user activities, calculate the daily active users for the past month.",
      "competency": "data_analysis",
      "type": "analytical",
      "difficulty": "easy",
      "time_limit": 30,
      "key_concepts": [
        "sql_basics",
        "date_functions",
        "aggregation"
      ]
    },
    {
      "question": "Describe your experience with analytics in your current or previous role.",
      "competency": "analytics",
      "type": "experience",
      "focus_areas": [
        "experience_depth",
        "practical_application"
      ]
    },
    {
      "question": "How would you prioritize features for a mobile app with limited engineering resources?",
      "competency": "product_strategy",
      "type": "case_study",
      "difficulty": "medium",
      "time_limit": 45,
      "key_concepts": [
        "prioritization_frameworks",
        "resource_allocation",
        "impact_estimation"
      ]
    },
    {
      "question": "Describe your experience with stakeholder management in your current or previous role.",
      "competency": "stakeholder_management",
      "type": "experience",
      "focus_areas": [
        "experience_depth",
        "practical_application"
      ]
    },
    {
      "question": "What challenges have you faced related to stakeholder management and how did you overcome them?",
      "competency": "stakeholder_management",
      "type": "challenge_based",
      "focus_areas": [
        "problem_solving",
        "learning_from_experience"
      ]
    },
    {
      "question": "What challenges have you faced related to user research and how did you overcome them?",
      "competency": "user_research",
      "type": "challenge_based",
      "focus_areas": [
        "problem_solving",
        "learning_from_experience"
      ]
    },
    {
      "question": "What challenges have you faced related to strategy and how did you overcome them?",
      "competency": "strategy",
      "type": "challenge_based",
      "focus_areas": [
        "problem_solving",
        "learning_from_experience"
      ]
    },
    {
      "question": "Describe your experience with user research in your current or previous role.",
      "competency": "user_research",
      "type": "experience",
      "focus_areas": [
        "experience_depth",
        "practical_application"
      ]
    },
    {
      "question": "Describe your experience with p&l responsibility in your current or previous role.",
      "competency": "p&l_responsibility",
      "type": "experience",
      "focus_areas": [
        "experience_depth",
        "practical_application"
      ]
    },
    {
      "question": "Describe your experience with strategy in your current or previous role.",
      "competency": "strategy",
      "type": "experience",
      "focus_areas": [
        "experience_depth",
        "practical_application"
      ]
    },
    {
      "question": "Tell me about a time when you had to lead a team through a significant change or challenge.",
      "competency": "leadership",
      "type": "behavioral",
      "method": "STAR",
      "focus_areas": [
        "change_management",
        "team_motivation",
        "communication"
      ]
    },
    {
      "question": "What challenges have you faced related to analytics and how did you overcome them?",
      "competency": "analytics",
      "type": "challenge_based",
      "focus_areas": [
        "problem_solving",
        "learning_from_experience"
      ]
    }
  ],
  "scoring_rubrics": {
    "question_8": {
      "question": "Describe a situation where you had to influence someone without having direct authority over them.",
      "competency": "leadership",
      "type": "behavioral",
      "scoring_criteria": {
        "situation_clarity": {
          "4": "Clear, specific situation with relevant context and stakes",
          "3": "Good situation description with adequate context",
          "2": "Situation described but lacks some specifics",
          "1": "Vague or unclear situation description"
        },
        "action_quality": {
          "4": "Specific, thoughtful actions showing strong competency",
          "3": "Good actions demonstrating competency",
          "2": "Adequate actions but could be stronger",
          "1": "Weak or inappropriate actions"
        },
        "result_impact": {
          "4": "Significant positive impact with measurable results",
          "3": "Good positive impact with clear outcomes",
          "2": "Some positive impact demonstrated",
          "1": "Little or no positive impact shown"
        },
        "self_awareness": {
          "4": "Excellent self-reflection, learns from experience, acknowledges growth areas",
          "3": "Good self-awareness and learning orientation",
          "2": "Some self-reflection demonstrated",
          "1": "Limited self-awareness or reflection"
        }
      },
      "weight": "high",
      "time_limit": 30
    },
    "question_19": {
      "question": "Tell me about a time when you had to lead a team through a significant change or challenge.",
      "competency": "leadership",
      "type": "behavioral",
      "scoring_criteria": {
        "situation_clarity": {
          "4": "Clear, specific situation with relevant context and stakes",
          "3": "Good situation description with adequate context",
          "2": "Situation described but lacks some specifics",
          "1": "Vague or unclear situation description"
        },
        "action_quality": {
          "4": "Specific, thoughtful actions showing strong competency",
          "3": "Good actions demonstrating competency",
          "2": "Adequate actions but could be stronger",
          "1": "Weak or inappropriate actions"
        },
        "result_impact": {
          "4": "Significant positive impact with measurable results",
          "3": "Good positive impact with clear outcomes",
          "2": "Some positive impact demonstrated",
          "1": "Little or no positive impact shown"
        },
        "self_awareness": {
          "4": "Excellent self-reflection, learns from experience, acknowledges growth areas",
          "3": "Good self-awareness and learning orientation",
          "2": "Some self-reflection demonstrated",
          "1": "Limited self-awareness or reflection"
        }
      },
      "weight": "high",
      "time_limit": 30
    }
  },
  "follow_up_probes": {
    "question_1": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_2": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_3": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_4": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_5": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_6": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_7": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_8": [
      "What would you do differently if you faced this situation again?",
      "How did you handle team members who were resistant to the change?",
      "What metrics did you use to measure success?",
      "How did you communicate progress to stakeholders?",
      "What did you learn from this experience?"
    ],
    "question_9": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_10": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_11": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_12": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_13": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_14": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_15": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_16": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_17": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_18": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ],
    "question_19": [
      "What would you do differently if you faced this situation again?",
      "How did you handle team members who were resistant to the change?",
      "What metrics did you use to measure success?",
      "How did you communicate progress to stakeholders?",
      "What did you learn from this experience?"
    ],
    "question_20": [
      "Can you provide more specific details about your approach?",
      "What would you do differently if you had to do this again?",
      "What challenges did you face and how did you overcome them?"
    ]
  },
  "calibration_examples": {
    "question_1": {
      "question": "What challenges have you faced related to p&l responsibility and how did you overcome them?",
      "competency": "p&l_responsibility",
      "sample_answers": {
        "poor_answer": {
          "answer": "Sample poor answer for p&l_responsibility question - lacks detail, specificity, or demonstrates weak competency",
          "score": "1-2",
          "issues": [
            "Vague response",
            "Limited evidence of competency",
            "Poor structure"
          ]
        },
        "good_answer": {
          "answer": "Sample good answer for p&l_responsibility question - adequate detail, demonstrates competency clearly",
          "score": "3",
          "strengths": [
            "Clear structure",
            "Demonstrates competency",
            "Adequate detail"
          ]
        },
        "great_answer": {
          "answer": "Sample excellent answer for p&l_responsibility question - exceptional detail, strong evidence, goes above and beyond",
          "score": "4",
          "strengths": [
            "Exceptional detail",
            "Strong evidence",
            "Strategic thinking",
            "Goes beyond requirements"
          ]
        }
      },
      "scoring_rationale": {
        "key_indicators": "Look for evidence of p&l responsibility competency",
        "red_flags": "Vague answers, lack of specifics, negative outcomes without learning",
        "green_flags": "Specific examples, clear impact, demonstrates growth and learning"
      }
    },
    "question_2": {
      "question": "Analyze conversion funnel data to identify the biggest drop-off point and propose solutions.",
      "competency": "data_analysis",
      "sample_answers": {
        "poor_answer": {
          "answer": "Sample poor answer for data_analysis question - lacks detail, specificity, or demonstrates weak competency",
          "score": "1-2",
          "issues": [
            "Vague response",
            "Limited evidence of competency",
            "Poor structure"
          ]
        },
        "good_answer": {
          "answer": "Sample good answer for data_analysis question - adequate detail, demonstrates competency clearly",
          "score": "3",
          "strengths": [
            "Clear structure",
            "Demonstrates competency",
            "Adequate detail"
          ]
        },
        "great_answer": {
          "answer": "Sample excellent answer for data_analysis question - exceptional detail, strong evidence, goes above and beyond",
          "score": "4",
          "strengths": [
            "Exceptional detail",
            "Strong evidence",
            "Strategic thinking",
            "Goes beyond requirements"
          ]
        }
      },
      "scoring_rationale": {
        "key_indicators": "Look for evidence of data analysis competency",
        "red_flags": "Vague answers, lack of specifics, negative outcomes without learning",
        "green_flags": "Specific examples, clear impact, demonstrates growth and learning"
      }
    },
    "question_3": {
      "question": "What challenges have you faced related to team leadership and how did you overcome them?",
      "competency": "team_leadership",
      "sample_answers": {
        "poor_answer": {
          "answer": "Sample poor answer for team_leadership question - lacks detail, specificity, or demonstrates weak competency",
          "score": "1-2",
          "issues": [
            "Vague response",
            "Limited evidence of competency",
            "Poor structure"
          ]
        },
        "good_answer": {
          "answer": "Sample good answer for team_leadership question - adequate detail, demonstrates competency clearly",
          "score": "3",
          "strengths": [
            "Clear structure",
            "Demonstrates competency",
            "Adequate detail"
          ]
        },
        "great_answer": {
          "answer": "Sample excellent answer for team_leadership question - exceptional detail, strong evidence, goes above and beyond",
          "score": "4",
          "strengths": [
            "Exceptional detail",
            "Strong evidence",
            "Strategic thinking",
            "Goes beyond requirements"
          ]
        }
      },
      "scoring_rationale": {
        "key_indicators": "Look for evidence of team leadership competency",
        "red_flags": "Vague answers, lack of specifics, negative outcomes without learning",
        "green_flags": "Specific examples, clear impact, demonstrates growth and learning"
      }
    },
    "question_4": {
      "question": "Design a go-to-market strategy for a new B2B SaaS product entering a competitive market.",
      "competency": "product_strategy",
      "sample_answers": {
        "poor_answer": {
          "answer": "Sample poor answer for product_strategy question - lacks detail, specificity, or demonstrates weak competency",
          "score": "1-2",
          "issues": [
            "Vague response",
            "Limited evidence of competency",
            "Poor structure"
          ]
        },
        "good_answer": {
          "answer": "Sample good answer for product_strategy question - adequate detail, demonstrates competency clearly",
          "score": "3",
          "strengths": [
            "Clear structure",
            "Demonstrates competency",
            "Adequate detail"
          ]
        },
        "great_answer": {
          "answer": "Sample excellent answer for product_strategy question - exceptional detail, strong evidence, goes above and beyond",
          "score": "4",
          "strengths": [
            "Exceptional detail",
            "Strong evidence",
            "Strategic thinking",
            "Goes beyond requirements"
          ]
        }
      },
      "scoring_rationale": {
        "key_indicators": "Look for evidence of product strategy competency",
        "red_flags": "Vague answers, lack of specifics, negative outcomes without learning",
        "green_flags": "Specific examples, clear impact, demonstrates growth and learning"
      }
    },
    "question_5": {
      "question": "What challenges have you faced related to business strategy and how did you overcome them?",
      "competency": "business_strategy",
      "sample_answers": {
        "poor_answer": {
          "answer": "Sample poor answer for business_strategy question - lacks detail, specificity, or demonstrates weak competency",
          "score": "1-2",
          "issues": [
            "Vague response",
            "Limited evidence of competency",
            "Poor structure"
          ]
        },
        "good_answer": {
          "answer": "Sample good answer for business_strategy question - adequate detail, demonstrates competency clearly",
          "score": "3",
          "strengths": [
            "Clear structure",
            "Demonstrates competency",
            "Adequate detail"
          ]
        },
        "great_answer": {
          "answer": "Sample excellent answer for business_strategy question - exceptional detail, strong evidence, goes above and beyond",
          "score": "4",
          "strengths": [
            "Exceptional detail",
            "Strong evidence",
            "Strategic thinking",
            "Goes beyond requirements"
          ]
        }
      },
      "scoring_rationale": {
        "key_indicators": "Look for evidence of business strategy competency",
        "red_flags": "Vague answers, lack of specifics, negative outcomes without learning",
        "green_flags": "Specific examples, clear impact, demonstrates growth and learning"
      }
    }
  },
  "usage_guidelines": {
    "interview_flow": {
      "warm_up": "Start with 1-2 easier questions to build rapport",
      "core_assessment": "Focus majority of time on core competency questions",
      "closing": "End with questions about candidate's questions/interests"
    },
    "time_management": {
      "technical_questions": "Allow extra time for coding/design questions",
      "behavioral_questions": "Keep to time limits but allow for follow-ups",
      "total_recommendation": "45-75 minutes per interview round"
    },
    "question_selection": {
      "variety": "Mix question types within each competency area",
      "difficulty": "Adjust based on candidate responses and energy",
      "customization": "Adapt questions based on candidate's background"
    },
    "common_mistakes": [
      "Don't ask all questions mechanically",
      "Don't skip follow-up questions",
      "Don't forget to assess cultural fit alongside competencies",
      "Don't let one strong/weak area bias overall assessment"
    ],
    "calibration_reminders": [
      "Compare against role standard, not other candidates",
      "Focus on evidence demonstrated, not potential",
      "Consider level-appropriate expectations",
      "Document specific examples in feedback"
    ]
  }
}

Interview Question Bank: Product Manager (Senior Level)
======================================================================
Generated: 2026-02-16T13:27:41.303329
Total Questions: 20
Question Types: technical, behavioral, situational
Target Competencies: strategy, analytics, business_strategy, product_strategy, stakeholder_management, p&l_responsibility, leadership, team_leadership, user_research, data_analysis

INTERVIEW QUESTIONS
--------------------------------------------------

1. What challenges have you faced related to p&l responsibility and how did you overcome them?
   Competency: P&L Responsibility
   Type: Challenge_Based
   Focus Areas: problem_solving, learning_from_experience

2. Analyze conversion funnel data to identify the biggest drop-off point and propose solutions.
   Competency: Data Analysis
   Type: Analytical
   Time Limit: 45 minutes

3. What challenges have you faced related to team leadership and how did you overcome them?
   Competency: Team Leadership
   Type: Challenge_Based
   Focus Areas: problem_solving, learning_from_experience

4. Design a go-to-market strategy for a new B2B SaaS product entering a competitive market.
   Competency: Product Strategy
   Type: Strategic
   Time Limit: 60 minutes

5. What challenges have you faced related to business strategy and how did you overcome them?
   Competency: Business Strategy
   Type: Challenge_Based
   Focus Areas: problem_solving, learning_from_experience

6. Describe your experience with business strategy in your current or previous role.
   Competency: Business Strategy
   Type: Experience
   Focus Areas: experience_depth, practical_application

7. Describe your experience with team leadership in your current or previous role.
   Competency: Team Leadership
   Type: Experience
   Focus Areas: experience_depth, practical_application

8. Describe a situation where you had to influence someone without having direct authority over them.
   Competency: Leadership
   Type: Behavioral
   Focus Areas: influence, persuasion, stakeholder_management

9. Given a dataset of user activities, calculate the daily active users for the past month.
   Competency: Data Analysis
   Type: Analytical
   Time Limit: 30 minutes

10. Describe your experience with analytics in your current or previous role.
   Competency: Analytics
   Type: Experience
   Focus Areas: experience_depth, practical_application

11. How would you prioritize features for a mobile app with limited engineering resources?
   Competency: Product Strategy
   Type: Case_Study
   Time Limit: 45 minutes

12. Describe your experience with stakeholder management in your current or previous role.
   Competency: Stakeholder Management
   Type: Experience
   Focus Areas: experience_depth, practical_application

13. What challenges have you faced related to stakeholder management and how did you overcome them?
   Competency: Stakeholder Management
   Type: Challenge_Based
   Focus Areas: problem_solving, learning_from_experience

14. What challenges have you faced related to user research and how did you overcome them?
   Competency: User Research
   Type: Challenge_Based
   Focus Areas: problem_solving, learning_from_experience

15. What challenges have you faced related to strategy and how did you overcome them?
   Competency: Strategy
   Type: Challenge_Based
   Focus Areas: problem_solving, learning_from_experience

16. Describe your experience with user research in your current or previous role.
   Competency: User Research
   Type: Experience
   Focus Areas: experience_depth, practical_application

17. Describe your experience with p&l responsibility in your current or previous role.
   Competency: P&L Responsibility
   Type: Experience
   Focus Areas: experience_depth, practical_application

18. Describe your experience with strategy in your current or previous role.
   Competency: Strategy
   Type: Experience
   Focus Areas: experience_depth, practical_application

19. Tell me about a time when you had to lead a team through a significant change or challenge.
   Competency: Leadership
   Type: Behavioral
   Focus Areas: change_management, team_motivation, communication

20. What challenges have you faced related to analytics and how did you overcome them?
   Competency: Analytics
   Type: Challenge_Based
   Focus Areas: problem_solving, learning_from_experience


SCORING RUBRICS
--------------------------------------------------
Sample Scoring Criteria (behavioral questions):

Situation Clarity:
  4: Clear, specific situation with relevant context and stakes
  3: Good situation description with adequate context
  2: Situation described but lacks some specifics
  1: Vague or unclear situation description

Action Quality:
  4: Specific, thoughtful actions showing strong competency
  3: Good actions demonstrating competency
  2: Adequate actions but could be stronger
  1: Weak or inappropriate actions

Result Impact:
  4: Significant positive impact with measurable results
  3: Good positive impact with clear outcomes
  2: Some positive impact demonstrated
  1: Little or no positive impact shown

Self Awareness:
  4: Excellent self-reflection, learns from experience, acknowledges growth areas
  3: Good self-awareness and learning orientation
  2: Some self-reflection demonstrated
  1: Limited self-awareness or reflection


FOLLOW-UP PROBE EXAMPLES
--------------------------------------------------
Sample follow-up questions:
  • Can you provide more specific details about your approach?
  • What would you do differently if you had to do this again?
  • What challenges did you face and how did you overcome them?


USAGE GUIDELINES
--------------------------------------------------
Interview Flow:
  • Warm Up: Start with 1-2 easier questions to build rapport
  • Core Assessment: Focus majority of time on core competency questions
  • Closing: End with questions about candidate's questions/interests

Time Management:
  • Technical Questions: Allow extra time for coding/design questions
  • Behavioral Questions: Keep to time limits but allow for follow-ups
  • Total Recommendation: 45-75 minutes per interview round

Common Mistakes to Avoid:
  • Don't ask all questions mechanically
  • Don't skip follow-up questions
  • Don't forget to assess cultural fit alongside competencies


CALIBRATION EXAMPLES
--------------------------------------------------
Question: What challenges have you faced related to p&l responsibility and how did you overcome them?

Sample Answer Quality Levels:
  Poor Answer (Score 1-2):
    Issues: Vague response, Limited evidence of competency, Poor structure
  Good Answer (Score 3):
    Strengths: Clear structure, Demonstrates competency, Adequate detail
  Great Answer (Score 4):
    Strengths: Exceptional detail, Strong evidence, Strategic thinking, Goes beyond requirements

{
  "role": "Senior Software Engineer",
  "level": "senior",
  "team": "platform",
  "generated_at": "2026-02-16T13:27:37.925680",
  "total_duration_minutes": 300,
  "total_rounds": 5,
  "rounds": {
    "round_1_technical_phone_screen": {
      "name": "Technical Phone Screen",
      "duration_minutes": 45,
      "format": "virtual",
      "objectives": [
        "Assess coding fundamentals",
        "Evaluate problem-solving approach",
        "Screen for basic technical competency"
      ],
      "question_types": [
        "coding_problems",
        "technical_concepts",
        "experience_questions"
      ],
      "evaluation_criteria": [
        "technical_accuracy",
        "problem_solving_process",
        "communication_clarity"
      ],
      "order": 1,
      "focus_areas": [
        "coding_fundamentals",
        "problem_solving",
        "technical_leadership",
        "system_architecture",
        "people_development"
      ]
    },
    "round_2_coding_deep_dive": {
      "name": "Coding Deep Dive",
      "duration_minutes": 75,
      "format": "in_person_or_virtual",
      "objectives": [
        "Evaluate coding skills in depth",
        "Assess code quality and testing",
        "Review debugging approach"
      ],
      "question_types": [
        "complex_coding_problems",
        "code_review",
        "testing_strategy"
      ],
      "evaluation_criteria": [
        "code_quality",
        "testing_approach",
        "debugging_skills",
        "optimization_thinking"
      ],
      "order": 2,
      "focus_areas": [
        "technical_execution",
        "code_quality",
        "technical_leadership",
        "system_architecture",
        "people_development"
      ]
    },
    "round_3_system_design": {
      "name": "System Design",
      "duration_minutes": 75,
      "format": "collaborative_whiteboard",
      "objectives": [
        "Assess architectural thinking",
        "Evaluate scalability considerations",
        "Review trade-off analysis"
      ],
      "question_types": [
        "system_architecture",
        "scalability_design",
        "trade_off_analysis"
      ],
      "evaluation_criteria": [
        "architectural_thinking",
        "scalability_awareness",
        "trade_off_reasoning"
      ],
      "order": 3,
      "focus_areas": [
        "system_thinking",
        "architectural_reasoning",
        "technical_leadership",
        "system_architecture",
        "people_development"
      ]
    },
    "round_4_behavioral": {
      "name": "Behavioral Interview",
      "duration_minutes": 45,
      "format": "conversational",
      "objectives": [
        "Assess cultural fit",
        "Evaluate past experiences",
        "Review leadership examples"
      ],
      "question_types": [
        "star_method_questions",
        "situational_scenarios",
        "values_alignment"
      ],
      "evaluation_criteria": [
        "communication_skills",
        "leadership_examples",
        "cultural_alignment"
      ],
      "order": 4,
      "focus_areas": [
        "cultural_fit",
        "communication",
        "teamwork",
        "technical_leadership",
        "system_architecture"
      ]
    },
    "round_5_technical_leadership": {
      "name": "Technical Leadership",
      "duration_minutes": 60,
      "format": "discussion_based",
      "objectives": [
        "Evaluate mentoring capability",
        "Assess technical decision making",
        "Review cross-team collaboration"
      ],
      "question_types": [
        "leadership_scenarios",
        "technical_decisions",
        "mentoring_examples"
      ],
      "evaluation_criteria": [
        "leadership_potential",
        "technical_judgment",
        "influence_skills"
      ],
      "order": 5,
      "focus_areas": [
        "leadership",
        "mentoring",
        "influence",
        "technical_leadership",
        "system_architecture"
      ]
    }
  },
  "suggested_schedule": {
    "type": "multi_day",
    "total_duration_minutes": 300,
    "recommended_breaks": [
      {
        "type": "short_break",
        "duration": 15,
        "after_minutes": 90
      },
      {
        "type": "lunch_break",
        "duration": 60,
        "after_minutes": 180
      }
    ],
    "day_structure": {
      "day_1": {
        "date": "TBD",
        "start_time": "09:00",
        "end_time": "12:45",
        "rounds": [
          {
            "type": "interview",
            "round_name": "round_1_technical_phone_screen",
            "title": "Technical Phone Screen",
            "start_time": "09:00",
            "end_time": "09:45",
            "duration_minutes": 45,
            "format": "virtual"
          },
          {
            "type": "interview",
            "round_name": "round_2_coding_deep_dive",
            "title": "Coding Deep Dive",
            "start_time": "10:00",
            "end_time": "11:15",
            "duration_minutes": 75,
            "format": "in_person_or_virtual"
          },
          {
            "type": "interview",
            "round_name": "round_3_system_design",
            "title": "System Design",
            "start_time": "11:30",
            "end_time": "12:45",
            "duration_minutes": 75,
            "format": "collaborative_whiteboard"
          }
        ]
      },
      "day_2": {
        "date": "TBD",
        "start_time": "09:00",
        "end_time": "11:00",
        "rounds": [
          {
            "type": "interview",
            "round_name": "round_4_behavioral",
            "title": "Behavioral Interview",
            "start_time": "09:00",
            "end_time": "09:45",
            "duration_minutes": 45,
            "format": "conversational"
          },
          {
            "type": "interview",
            "round_name": "round_5_technical_leadership",
            "title": "Technical Leadership",
            "start_time": "10:00",
            "end_time": "11:00",
            "duration_minutes": 60,
            "format": "discussion_based"
          }
        ]
      }
    },
    "logistics_notes": [
      "Coordinate interviewer availability before scheduling",
      "Ensure all interviewers have access to job description and competency requirements",
      "Prepare interview rooms/virtual links for all rounds",
      "Share candidate resume and application with all interviewers",
      "Test video conferencing setup before virtual interviews",
      "Share virtual meeting links with candidate 24 hours in advance",
      "Prepare whiteboard or collaborative online tool for design sessions"
    ]
  },
  "scorecard_template": {
    "scoring_scale": {
      "4": "Exceeds Expectations - Demonstrates mastery beyond required level",
      "3": "Meets Expectations - Solid performance meeting all requirements",
      "2": "Partially Meets - Shows potential but has development areas",
      "1": "Does Not Meet - Significant gaps in required competencies"
    },
    "dimensions": [
      {
        "dimension": "system_architecture",
        "weight": "high",
        "scale": "1-4",
        "description": "Assessment of system architecture competency"
      },
      {
        "dimension": "technical_leadership",
        "weight": "high",
        "scale": "1-4",
        "description": "Assessment of technical leadership competency"
      },
      {
        "dimension": "mentoring",
        "weight": "high",
        "scale": "1-4",
        "description": "Assessment of mentoring competency"
      },
      {
        "dimension": "cross_team_collab",
        "weight": "high",
        "scale": "1-4",
        "description": "Assessment of cross team collab competency"
      },
      {
        "dimension": "technology_evaluation",
        "weight": "medium",
        "scale": "1-4",
        "description": "Assessment of technology evaluation competency"
      },
      {
        "dimension": "process_improvement",
        "weight": "medium",
        "scale": "1-4",
        "description": "Assessment of process improvement competency"
      },
      {
        "dimension": "hiring_contribution",
        "weight": "medium",
        "scale": "1-4",
        "description": "Assessment of hiring contribution competency"
      },
      {
        "dimension": "communication",
        "weight": "high",
        "scale": "1-4"
      },
      {
        "dimension": "cultural_fit",
        "weight": "medium",
        "scale": "1-4"
      },
      {
        "dimension": "learning_agility",
        "weight": "medium",
        "scale": "1-4"
      }
    ],
    "overall_recommendation": {
      "options": [
        "Strong Hire",
        "Hire",
        "No Hire",
        "Strong No Hire"
      ],
      "criteria": "Based on weighted average and minimum thresholds"
    },
    "calibration_notes": {
      "required": true,
      "min_length": 100,
      "sections": [
        "strengths",
        "areas_for_development",
        "specific_examples"
      ]
    }
  },
  "interviewer_requirements": {
    "round_1_technical_phone_screen": {
      "required_skills": [
        "technical_assessment",
        "coding_evaluation"
      ],
      "preferred_experience": [
        "same_domain",
        "senior_level"
      ],
      "calibration_level": "standard",
      "suggested_interviewers": [
        "senior_engineer",
        "tech_lead"
      ]
    },
    "round_2_coding_deep_dive": {
      "required_skills": [
        "advanced_technical",
        "code_quality_assessment"
      ],
      "preferred_experience": [
        "senior_engineer",
        "system_design"
      ],
      "calibration_level": "high",
      "suggested_interviewers": [
        "senior_engineer",
        "staff_engineer"
      ]
    },
    "round_3_system_design": {
      "required_skills": [
        "architecture_design",
        "scalability_assessment"
      ],
      "preferred_experience": [
        "senior_architect",
        "large_scale_systems"
      ],
      "calibration_level": "high",
      "suggested_interviewers": [
        "senior_architect",
        "staff_engineer"
      ]
    },
    "round_4_behavioral": {
      "required_skills": [
        "behavioral_interviewing",
        "competency_assessment"
      ],
      "preferred_experience": [
        "hiring_manager",
        "people_leadership"
      ],
      "calibration_level": "standard",
      "suggested_interviewers": [
        "hiring_manager",
        "people_manager"
      ]
    },
    "round_5_technical_leadership": {
      "required_skills": [
        "leadership_assessment",
        "technical_mentoring"
      ],
      "preferred_experience": [
        "engineering_manager",
        "tech_lead"
      ],
      "calibration_level": "high",
      "suggested_interviewers": [
        "engineering_manager",
        "senior_staff"
      ]
    }
  },
  "competency_framework": {
    "required": [
      "system_architecture",
      "technical_leadership",
      "mentoring",
      "cross_team_collab"
    ],
    "preferred": [
      "technology_evaluation",
      "process_improvement",
      "hiring_contribution"
    ],
    "focus_areas": [
      "technical_leadership",
      "system_architecture",
      "people_development"
    ]
  },
  "calibration_notes": {
    "hiring_bar_notes": "Calibrated for senior level software engineer role",
    "common_pitfalls": [
      "Avoid comparing candidates to each other rather than to the role standard",
      "Don't let one strong/weak area overshadow overall assessment",
      "Ensure consistent application of evaluation criteria"
    ],
    "calibration_checkpoints": [
      "Review score distribution after every 5 candidates",
      "Conduct monthly interviewer calibration sessions",
      "Track correlation with 6-month performance reviews"
    ],
    "escalation_criteria": [
      "Any candidate receiving all 4s or all 1s",
      "Significant disagreement between interviewers (>1.5 point spread)",
      "Unusual circumstances or accommodations needed"
    ]
  }
}

Interview Loop Design for Senior Software Engineer (Senior Level)
============================================================
Team: platform
Generated: 2026-02-16T13:27:37.925680
Total Duration: 300 minutes (5h 0m)
Total Rounds: 5

INTERVIEW ROUNDS
----------------------------------------

Round 1: Technical Phone Screen
Duration: 45 minutes
Format: Virtual
Objectives:
  • Assess coding fundamentals
  • Evaluate problem-solving approach
  • Screen for basic technical competency
Focus Areas:
  • Coding Fundamentals
  • Problem Solving
  • Technical Leadership
  • System Architecture
  • People Development

Round 2: Coding Deep Dive
Duration: 75 minutes
Format: In Person Or Virtual
Objectives:
  • Evaluate coding skills in depth
  • Assess code quality and testing
  • Review debugging approach
Focus Areas:
  • Technical Execution
  • Code Quality
  • Technical Leadership
  • System Architecture
  • People Development

Round 3: System Design
Duration: 75 minutes
Format: Collaborative Whiteboard
Objectives:
  • Assess architectural thinking
  • Evaluate scalability considerations
  • Review trade-off analysis
Focus Areas:
  • System Thinking
  • Architectural Reasoning
  • Technical Leadership
  • System Architecture
  • People Development

Round 4: Behavioral Interview
Duration: 45 minutes
Format: Conversational
Objectives:
  • Assess cultural fit
  • Evaluate past experiences
  • Review leadership examples
Focus Areas:
  • Cultural Fit
  • Communication
  • Teamwork
  • Technical Leadership
  • System Architecture

Round 5: Technical Leadership
Duration: 60 minutes
Format: Discussion Based
Objectives:
  • Evaluate mentoring capability
  • Assess technical decision making
  • Review cross-team collaboration
Focus Areas:
  • Leadership
  • Mentoring
  • Influence
  • Technical Leadership
  • System Architecture

SUGGESTED SCHEDULE
----------------------------------------
Schedule Type: Multi Day

Day 1:
Time: 09:00 - 12:45
  09:00-09:45: Technical Phone Screen (45min)
  10:00-11:15: Coding Deep Dive (75min)
  11:30-12:45: System Design (75min)

Day 2:
Time: 09:00 - 11:00
  09:00-09:45: Behavioral Interview (45min)
  10:00-11:00: Technical Leadership (60min)

INTERVIEWER REQUIREMENTS
----------------------------------------

Technical Phone Screen:
Required Skills: technical_assessment, coding_evaluation
Suggested Interviewers: senior_engineer, tech_lead
Calibration Level: Standard

Coding Deep Dive:
Required Skills: advanced_technical, code_quality_assessment
Suggested Interviewers: senior_engineer, staff_engineer
Calibration Level: High

System Design:
Required Skills: architecture_design, scalability_assessment
Suggested Interviewers: senior_architect, staff_engineer
Calibration Level: High

Behavioral:
Required Skills: behavioral_interviewing, competency_assessment
Suggested Interviewers: hiring_manager, people_manager
Calibration Level: Standard

Technical Leadership:
Required Skills: leadership_assessment, technical_mentoring
Suggested Interviewers: engineering_manager, senior_staff
Calibration Level: High

SCORECARD TEMPLATE
----------------------------------------
Scoring Scale:
  4: Exceeds Expectations - Demonstrates mastery beyond required level
  3: Meets Expectations - Solid performance meeting all requirements
  2: Partially Meets - Shows potential but has development areas
  1: Does Not Meet - Significant gaps in required competencies

Evaluation Dimensions:
  • System Architecture (Weight: high)
  • Technical Leadership (Weight: high)
  • Mentoring (Weight: high)
  • Cross Team Collab (Weight: high)
  • Technology Evaluation (Weight: medium)
  • Process Improvement (Weight: medium)
  • Hiring Contribution (Weight: medium)
  • Communication (Weight: high)
  • Cultural Fit (Weight: medium)
  • Learning Agility (Weight: medium)

CALIBRATION NOTES
----------------------------------------
Hiring Bar: Calibrated for senior level software engineer role

Common Pitfalls:
  • Avoid comparing candidates to each other rather than to the role standard
  • Don't let one strong/weak area overshadow overall assessment
  • Ensure consistent application of evaluation criteria

#!/usr/bin/env python3
"""
Hiring Calibrator

Analyzes interview scores from multiple candidates and interviewers to detect bias, 
calibration issues, and inconsistent rubric application. Generates calibration reports
with specific recommendations for interviewer coaching and process improvements.

Usage:
    python hiring_calibrator.py --input interview_results.json --analysis-type comprehensive
    python hiring_calibrator.py --input data.json --competencies technical,leadership --output report.json
    python hiring_calibrator.py --input historical_data.json --trend-analysis --period quarterly
"""

import os
import sys
import json
import argparse
import statistics
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Any, Tuple
from collections import defaultdict, Counter
import math


class HiringCalibrator:
    """Analyzes interview data for bias detection and calibration issues."""
    
    def __init__(self):
        self.bias_thresholds = self._init_bias_thresholds()
        self.calibration_standards = self._init_calibration_standards()
        self.demographic_categories = self._init_demographic_categories()
        
    def _init_bias_thresholds(self) -> Dict[str, float]:
        """Initialize statistical thresholds for bias detection."""
        return {
            "score_variance_threshold": 1.5,  # Standard deviations
            "pass_rate_difference_threshold": 0.15,  # 15% difference
            "interviewer_consistency_threshold": 0.8,  # Correlation coefficient
            "demographic_parity_threshold": 0.10,  # 10% difference
            "score_inflation_threshold": 0.3,  # 30% above historical average
            "score_deflation_threshold": 0.3,  # 30% below historical average
            "minimum_sample_size": 5  # Minimum candidates per analysis
        }
    
    def _init_calibration_standards(self) -> Dict[str, Dict]:
        """Initialize expected calibration standards."""
        return {
            "score_distribution": {
                "target_mean": 2.8,  # Expected average score (1-4 scale)
                "target_std": 0.9,   # Expected standard deviation
                "expected_distribution": {
                    "1": 0.10,  # 10% score 1 (does not meet)
                    "2": 0.25,  # 25% score 2 (partially meets)
                    "3": 0.45,  # 45% score 3 (meets expectations) 
                    "4": 0.20   # 20% score 4 (exceeds expectations)
                }
            },
            "interviewer_agreement": {
                "minimum_correlation": 0.70,  # Minimum correlation between interviewers
                "maximum_std_deviation": 0.8,  # Maximum std dev in scores for same candidate
                "agreement_threshold": 0.75   # % of time interviewers should agree within 1 point
            },
            "pass_rates": {
                "junior_level": 0.25,   # 25% pass rate for junior roles
                "mid_level": 0.20,      # 20% pass rate for mid roles
                "senior_level": 0.15,   # 15% pass rate for senior roles
                "staff_level": 0.10,    # 10% pass rate for staff+ roles
                "leadership": 0.12      # 12% pass rate for leadership roles
            }
        }
    
    def _init_demographic_categories(self) -> List[str]:
        """Initialize demographic categories to analyze for bias."""
        return [
            "gender", "ethnicity", "education_level", "previous_company_size",
            "years_experience", "university_tier", "geographic_location"
        ]
    
    def analyze_hiring_calibration(self, interview_data: List[Dict[str, Any]], 
                                  analysis_type: str = "comprehensive",
                                  competencies: Optional[List[str]] = None,
                                  trend_analysis: bool = False,
                                  period: str = "monthly") -> Dict[str, Any]:
        """Perform comprehensive hiring calibration analysis."""
        
        # Validate and preprocess data
        processed_data = self._preprocess_interview_data(interview_data)
        
        if len(processed_data) < self.bias_thresholds["minimum_sample_size"]:
            return {
                "error": "Insufficient data for analysis",
                "minimum_required": self.bias_thresholds["minimum_sample_size"],
                "actual_samples": len(processed_data)
            }
        
        # Perform different types of analysis based on request
        analysis_results = {
            "analysis_type": analysis_type,
            "data_summary": self._generate_data_summary(processed_data),
            "generated_at": datetime.now().isoformat()
        }
        
        if analysis_type in ["comprehensive", "bias"]:
            analysis_results["bias_analysis"] = self._analyze_bias_patterns(processed_data, competencies)
        
        if analysis_type in ["comprehensive", "calibration"]:
            analysis_results["calibration_analysis"] = self._analyze_calibration_consistency(processed_data, competencies)
        
        if analysis_type in ["comprehensive", "interviewer"]:
            analysis_results["interviewer_analysis"] = self._analyze_interviewer_bias(processed_data)
        
        if analysis_type in ["comprehensive", "scoring"]:
            analysis_results["scoring_analysis"] = self._analyze_scoring_patterns(processed_data, competencies)
        
        if trend_analysis:
            analysis_results["trend_analysis"] = self._analyze_trends_over_time(processed_data, period)
        
        # Generate recommendations
        analysis_results["recommendations"] = self._generate_recommendations(analysis_results)
        
        # Calculate overall calibration health score
        analysis_results["calibration_health_score"] = self._calculate_health_score(analysis_results)
        
        return analysis_results
    
    def _preprocess_interview_data(self, raw_data: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """Clean and validate interview data."""
        processed_data = []
        
        for record in raw_data:
            if self._validate_interview_record(record):
                processed_record = self._standardize_record(record)
                processed_data.append(processed_record)
        
        return processed_data
    
    def _validate_interview_record(self, record: Dict[str, Any]) -> bool:
        """Validate that an interview record has required fields."""
        required_fields = ["candidate_id", "interviewer_id", "scores", "overall_recommendation", "date"]
        
        for field in required_fields:
            if field not in record or record[field] is None:
                return False
        
        # Validate scores format
        if not isinstance(record["scores"], dict):
            return False
        
        # Validate score values are numeric and in valid range (1-4)
        for competency, score in record["scores"].items():
            if not isinstance(score, (int, float)) or not (1 <= score <= 4):
                return False
        
        return True
    
    def _standardize_record(self, record: Dict[str, Any]) -> Dict[str, Any]:
        """Standardize record format and add computed fields."""
        standardized = record.copy()
        
        # Calculate average score
        scores = list(record["scores"].values())
        standardized["average_score"] = statistics.mean(scores)
        
        # Standardize recommendation to binary
        recommendation = record["overall_recommendation"].lower()
        standardized["hire_decision"] = recommendation in ["hire", "strong hire", "yes"]
        
        # Parse date if string
        if isinstance(record["date"], str):
            try:
                standardized["date"] = datetime.fromisoformat(record["date"].replace("Z", "+00:00"))
            except ValueError:
                standardized["date"] = datetime.now()
        
        # Add demographic info if available
        for category in self.demographic_categories:
            if category not in standardized:
                standardized[category] = "unknown"
        
        # Add level normalization
        role = record.get("role", "").lower()
        if any(level in role for level in ["junior", "associate", "entry"]):
            standardized["normalized_level"] = "junior"
        elif any(level in role for level in ["senior", "sr"]):
            standardized["normalized_level"] = "senior"  
        elif any(level in role for level in ["staff", "principal", "lead"]):
            standardized["normalized_level"] = "staff"
        else:
            standardized["normalized_level"] = "mid"
        
        return standardized
    
    def _generate_data_summary(self, data: List[Dict[str, Any]]) -> Dict[str, Any]:
        """Generate summary statistics for the dataset."""
        if not data:
            return {}
        
        total_candidates = len(data)
        unique_interviewers = len(set(record["interviewer_id"] for record in data))
        
        # Score statistics
        all_scores = []
        all_average_scores = []
        hire_decisions = []
        
        for record in data:
            all_scores.extend(record["scores"].values())
            all_average_scores.append(record["average_score"])
            hire_decisions.append(record["hire_decision"])
        
        # Date range
        dates = [record["date"] for record in data if record["date"]]
        date_range = {
            "start_date": min(dates).isoformat() if dates else None,
            "end_date": max(dates).isoformat() if dates else None,
            "total_days": (max(dates) - min(dates)).days if len(dates) > 1 else 0
        }
        
        # Role distribution
        roles = [record.get("role", "unknown") for record in data]
        role_distribution = dict(Counter(roles))
        
        return {
            "total_candidates": total_candidates,
            "unique_interviewers": unique_interviewers,
            "candidates_per_interviewer": round(total_candidates / unique_interviewers, 2),
            "date_range": date_range,
            "score_statistics": {
                "mean_individual_scores": round(statistics.mean(all_scores), 2),
                "std_individual_scores": round(statistics.stdev(all_scores) if len(all_scores) > 1 else 0, 2),
                "mean_average_scores": round(statistics.mean(all_average_scores), 2),
                "std_average_scores": round(statistics.stdev(all_average_scores) if len(all_average_scores) > 1 else 0, 2)
            },
            "hire_rate": round(sum(hire_decisions) / len(hire_decisions), 3),
            "role_distribution": role_distribution
        }
    
    def _analyze_bias_patterns(self, data: List[Dict[str, Any]], 
                              target_competencies: Optional[List[str]]) -> Dict[str, Any]:
        """Analyze potential bias patterns in interview decisions."""
        bias_analysis = {
            "demographic_bias": {},
            "interviewer_bias": {},
            "competency_bias": {},
            "overall_bias_score": 0
        }
        
        # Analyze demographic bias
        for demographic in self.demographic_categories:
            if all(record.get(demographic) == "unknown" for record in data):
                continue
                
            demographic_analysis = self._analyze_demographic_bias(data, demographic)
            if demographic_analysis["bias_detected"]:
                bias_analysis["demographic_bias"][demographic] = demographic_analysis
        
        # Analyze interviewer bias
        bias_analysis["interviewer_bias"] = self._analyze_interviewer_bias(data)
        
        # Analyze competency bias if specified
        if target_competencies:
            bias_analysis["competency_bias"] = self._analyze_competency_bias(data, target_competencies)
        
        # Calculate overall bias score
        bias_analysis["overall_bias_score"] = self._calculate_bias_score(bias_analysis)
        
        return bias_analysis
    
    def _analyze_demographic_bias(self, data: List[Dict[str, Any]], 
                                 demographic: str) -> Dict[str, Any]:
        """Analyze bias for a specific demographic category."""
        # Group data by demographic values
        demographic_groups = defaultdict(list)
        for record in data:
            demo_value = record.get(demographic, "unknown")
            if demo_value != "unknown":
                demographic_groups[demo_value].append(record)
        
        if len(demographic_groups) < 2:
            return {"bias_detected": False, "reason": "insufficient_groups"}
        
        # Calculate statistics for each group
        group_stats = {}
        for group, records in demographic_groups.items():
            if len(records) >= self.bias_thresholds["minimum_sample_size"]:
                scores = [r["average_score"] for r in records]
                hire_rate = sum(r["hire_decision"] for r in records) / len(records)
                
                group_stats[group] = {
                    "count": len(records),
                    "mean_score": statistics.mean(scores),
                    "hire_rate": hire_rate,
                    "std_score": statistics.stdev(scores) if len(scores) > 1 else 0
                }
        
        if len(group_stats) < 2:
            return {"bias_detected": False, "reason": "insufficient_sample_sizes"}
        
        # Detect statistical differences
        bias_detected = False
        bias_details = {}
        
        # Check for significant differences in hire rates
        hire_rates = [stats["hire_rate"] for stats in group_stats.values()]
        max_hire_rate_diff = max(hire_rates) - min(hire_rates)
        
        if max_hire_rate_diff > self.bias_thresholds["demographic_parity_threshold"]:
            bias_detected = True
            bias_details["hire_rate_disparity"] = {
                "max_difference": round(max_hire_rate_diff, 3),
                "threshold": self.bias_thresholds["demographic_parity_threshold"],
                "group_stats": group_stats
            }
        
        # Check for significant differences in scoring
        mean_scores = [stats["mean_score"] for stats in group_stats.values()]
        max_score_diff = max(mean_scores) - min(mean_scores)
        
        if max_score_diff > 0.5:  # Half point difference threshold
            bias_detected = True
            bias_details["scoring_disparity"] = {
                "max_difference": round(max_score_diff, 3),
                "group_stats": group_stats
            }
        
        return {
            "bias_detected": bias_detected,
            "demographic": demographic,
            "group_statistics": group_stats,
            "bias_details": bias_details,
            "recommendation": self._generate_demographic_bias_recommendation(demographic, bias_details) if bias_detected else None
        }
    
    def _analyze_interviewer_bias(self, data: List[Dict[str, Any]]) -> Dict[str, Any]:
        """Analyze bias patterns across different interviewers."""
        interviewer_stats = defaultdict(list)
        
        # Group by interviewer
        for record in data:
            interviewer_id = record["interviewer_id"]
            interviewer_stats[interviewer_id].append(record)
        
        # Calculate statistics per interviewer
        interviewer_analysis = {}
        for interviewer_id, records in interviewer_stats.items():
            if len(records) >= self.bias_thresholds["minimum_sample_size"]:
                scores = [r["average_score"] for r in records]
                hire_rate = sum(r["hire_decision"] for r in records) / len(records)
                
                interviewer_analysis[interviewer_id] = {
                    "total_interviews": len(records),
                    "mean_score": statistics.mean(scores),
                    "std_score": statistics.stdev(scores) if len(scores) > 1 else 0,
                    "hire_rate": hire_rate,
                    "score_inflation": self._detect_score_inflation(scores),
                    "consistency_score": self._calculate_interviewer_consistency(records)
                }
        
        # Identify outlier interviewers
        if len(interviewer_analysis) > 1:
            overall_mean_score = statistics.mean([stats["mean_score"] for stats in interviewer_analysis.values()])
            overall_hire_rate = statistics.mean([stats["hire_rate"] for stats in interviewer_analysis.values()])
            
            outlier_interviewers = {}
            for interviewer_id, stats in interviewer_analysis.items():
                issues = []
                
                # Check for score inflation/deflation
                if stats["mean_score"] > overall_mean_score * (1 + self.bias_thresholds["score_inflation_threshold"]):
                    issues.append("score_inflation")
                elif stats["mean_score"] < overall_mean_score * (1 - self.bias_thresholds["score_deflation_threshold"]):
                    issues.append("score_deflation")
                
                # Check for hire rate deviation
                hire_rate_diff = abs(stats["hire_rate"] - overall_hire_rate)
                if hire_rate_diff > self.bias_thresholds["pass_rate_difference_threshold"]:
                    issues.append("hire_rate_deviation")
                
                # Check for low consistency
                if stats["consistency_score"] < self.bias_thresholds["interviewer_consistency_threshold"]:
                    issues.append("low_consistency")
                
                if issues:
                    outlier_interviewers[interviewer_id] = {
                        "issues": issues,
                        "statistics": stats,
                        "severity": len(issues)  # More issues = higher severity
                    }
        
        return {
            "interviewer_statistics": interviewer_analysis,
            "outlier_interviewers": outlier_interviewers if len(interviewer_analysis) > 1 else {},
            "overall_consistency": self._calculate_overall_interviewer_consistency(data),
            "recommendations": self._generate_interviewer_recommendations(outlier_interviewers if len(interviewer_analysis) > 1 else {})
        }
    
    def _analyze_competency_bias(self, data: List[Dict[str, Any]], 
                               competencies: List[str]) -> Dict[str, Any]:
        """Analyze bias patterns within specific competencies."""
        competency_analysis = {}
        
        for competency in competencies:
            # Extract scores for this competency
            competency_scores = []
            for record in data:
                if competency in record["scores"]:
                    competency_scores.append({
                        "score": record["scores"][competency],
                        "interviewer": record["interviewer_id"],
                        "candidate": record["candidate_id"],
                        "overall_decision": record["hire_decision"]
                    })
            
            if len(competency_scores) < self.bias_thresholds["minimum_sample_size"]:
                continue
            
            # Analyze scoring patterns
            scores = [item["score"] for item in competency_scores]
            score_variance = statistics.variance(scores) if len(scores) > 1 else 0
            
            # Analyze by interviewer
            interviewer_competency_scores = defaultdict(list)
            for item in competency_scores:
                interviewer_competency_scores[item["interviewer"]].append(item["score"])
            
            interviewer_variations = {}
            if len(interviewer_competency_scores) > 1:
                interviewer_means = {interviewer: statistics.mean(scores) 
                                   for interviewer, scores in interviewer_competency_scores.items()
                                   if len(scores) >= 3}
                
                if len(interviewer_means) > 1:
                    mean_of_means = statistics.mean(interviewer_means.values())
                    for interviewer, mean_score in interviewer_means.items():
                        deviation = abs(mean_score - mean_of_means)
                        if deviation > 0.5:  # More than half point deviation
                            interviewer_variations[interviewer] = {
                                "mean_score": round(mean_score, 2),
                                "deviation_from_average": round(deviation, 2),
                                "sample_size": len(interviewer_competency_scores[interviewer])
                            }
            
            competency_analysis[competency] = {
                "total_scores": len(competency_scores),
                "mean_score": round(statistics.mean(scores), 2),
                "score_variance": round(score_variance, 2),
                "interviewer_variations": interviewer_variations,
                "bias_detected": len(interviewer_variations) > 0
            }
        
        return competency_analysis
    
    def _analyze_calibration_consistency(self, data: List[Dict[str, Any]], 
                                       target_competencies: Optional[List[str]]) -> Dict[str, Any]:
        """Analyze calibration consistency across interviews."""
        
        # Group candidates by those interviewed by multiple people
        candidate_interviewers = defaultdict(list)
        for record in data:
            candidate_interviewers[record["candidate_id"]].append(record)
        
        multi_interviewer_candidates = {
            candidate: records for candidate, records in candidate_interviewers.items()
            if len(records) > 1
        }
        
        if not multi_interviewer_candidates:
            return {
                "error": "No candidates with multiple interviewers found",
                "single_interviewer_analysis": self._analyze_single_interviewer_consistency(data)
            }
        
        # Calculate agreement statistics
        agreement_stats = []
        score_correlations = []
        
        for candidate, records in multi_interviewer_candidates.items():
            candidate_scores = []
            interviewer_pairs = []
            
            for record in records:
                avg_score = record["average_score"]
                candidate_scores.append(avg_score)
                interviewer_pairs.append(record["interviewer_id"])
            
            if len(candidate_scores) > 1:
                # Calculate standard deviation of scores for this candidate
                score_std = statistics.stdev(candidate_scores)
                agreement_stats.append(score_std)
                
                # Check if all interviewers agree within 1 point
                score_range = max(candidate_scores) - min(candidate_scores)
                agreement_within_one = score_range <= 1.0
                
                score_correlations.append({
                    "candidate": candidate,
                    "scores": candidate_scores,
                    "interviewers": interviewer_pairs,
                    "score_std": score_std,
                    "score_range": score_range,
                    "agreement_within_one": agreement_within_one
                })
        
        # Calculate overall calibration metrics
        mean_score_std = statistics.mean(agreement_stats) if agreement_stats else 0
        agreement_rate = sum(1 for corr in score_correlations if corr["agreement_within_one"]) / len(score_correlations) if score_correlations else 0
        
        calibration_quality = "good"
        if mean_score_std > self.calibration_standards["interviewer_agreement"]["maximum_std_deviation"]:
            calibration_quality = "poor"
        elif agreement_rate < self.calibration_standards["interviewer_agreement"]["agreement_threshold"]:
            calibration_quality = "fair"
        
        return {
            "multi_interviewer_candidates": len(multi_interviewer_candidates),
            "mean_score_standard_deviation": round(mean_score_std, 3),
            "agreement_within_one_point_rate": round(agreement_rate, 3),
            "calibration_quality": calibration_quality,
            "candidate_agreement_details": score_correlations,
            "target_standards": self.calibration_standards["interviewer_agreement"],
            "recommendations": self._generate_calibration_recommendations(mean_score_std, agreement_rate)
        }
    
    def _analyze_scoring_patterns(self, data: List[Dict[str, Any]], 
                                target_competencies: Optional[List[str]]) -> Dict[str, Any]:
        """Analyze overall scoring patterns and distributions."""
        
        # Overall score distribution
        all_individual_scores = []
        all_average_scores = []
        score_distribution = defaultdict(int)
        
        for record in data:
            avg_score = record["average_score"]
            all_average_scores.append(avg_score)
            
            for competency, score in record["scores"].items():
                if not target_competencies or competency in target_competencies:
                    all_individual_scores.append(score)
                    score_distribution[str(int(score))] += 1
        
        # Calculate distribution percentages
        total_scores = sum(score_distribution.values())
        score_percentages = {score: count/total_scores for score, count in score_distribution.items()}
        
        # Compare against expected distribution
        expected_dist = self.calibration_standards["score_distribution"]["expected_distribution"]
        distribution_analysis = {}
        
        for score in ["1", "2", "3", "4"]:
            expected_pct = expected_dist.get(score, 0)
            actual_pct = score_percentages.get(score, 0)
            difference = actual_pct - expected_pct
            
            distribution_analysis[score] = {
                "expected_percentage": expected_pct,
                "actual_percentage": round(actual_pct, 3),
                "difference": round(difference, 3),
                "significant_deviation": abs(difference) > 0.05  # 5% threshold
            }
        
        # Calculate scoring statistics
        mean_score = statistics.mean(all_individual_scores) if all_individual_scores else 0
        std_score = statistics.stdev(all_individual_scores) if len(all_individual_scores) > 1 else 0
        
        target_mean = self.calibration_standards["score_distribution"]["target_mean"]
        target_std = self.calibration_standards["score_distribution"]["target_std"]
        
        # Analyze pass rates by level
        level_pass_rates = {}
        level_groups = defaultdict(list)
        
        for record in data:
            level = record.get("normalized_level", "unknown")
            level_groups[level].append(record["hire_decision"])
        
        for level, decisions in level_groups.items():
            if len(decisions) >= self.bias_thresholds["minimum_sample_size"]:
                pass_rate = sum(decisions) / len(decisions)
                expected_rate = self.calibration_standards["pass_rates"].get(f"{level}_level", 0.15)
                
                level_pass_rates[level] = {
                    "actual_pass_rate": round(pass_rate, 3),
                    "expected_pass_rate": expected_rate,
                    "difference": round(pass_rate - expected_rate, 3),
                    "sample_size": len(decisions)
                }
        
        return {
            "score_statistics": {
                "mean_score": round(mean_score, 2),
                "std_score": round(std_score, 2),
                "target_mean": target_mean,
                "target_std": target_std,
                "mean_deviation": round(abs(mean_score - target_mean), 2),
                "std_deviation": round(abs(std_score - target_std), 2)
            },
            "score_distribution": distribution_analysis,
            "level_pass_rates": level_pass_rates,
            "overall_assessment": self._assess_scoring_health(distribution_analysis, mean_score, target_mean)
        }
    
    def _analyze_trends_over_time(self, data: List[Dict[str, Any]], period: str) -> Dict[str, Any]:
        """Analyze trends in hiring patterns over time."""
        
        # Sort data by date
        dated_data = [record for record in data if record.get("date")]
        dated_data.sort(key=lambda x: x["date"])
        
        if len(dated_data) < 10:  # Need minimum data for trend analysis
            return {"error": "Insufficient data for trend analysis", "minimum_required": 10}
        
        # Group by time period
        period_groups = defaultdict(list)
        
        for record in dated_data:
            date = record["date"]
            
            if period == "weekly":
                period_key = date.strftime("%Y-W%U")
            elif period == "monthly":
                period_key = date.strftime("%Y-%m")
            elif period == "quarterly":
                quarter = (date.month - 1) // 3 + 1
                period_key = f"{date.year}-Q{quarter}"
            else:  # daily
                period_key = date.strftime("%Y-%m-%d")
            
            period_groups[period_key].append(record)
        
        # Calculate metrics for each period
        period_metrics = {}
        for period_key, records in period_groups.items():
            if len(records) >= 3:  # Minimum for meaningful metrics
                scores = [r["average_score"] for r in records]
                hire_rate = sum(r["hire_decision"] for r in records) / len(records)
                
                period_metrics[period_key] = {
                    "count": len(records),
                    "mean_score": statistics.mean(scores),
                    "hire_rate": hire_rate,
                    "std_score": statistics.stdev(scores) if len(scores) > 1 else 0
                }
        
        if len(period_metrics) < 3:
            return {"error": "Insufficient periods for trend analysis"}
        
        # Analyze trends
        sorted_periods = sorted(period_metrics.keys())
        mean_scores = [period_metrics[p]["mean_score"] for p in sorted_periods]
        hire_rates = [period_metrics[p]["hire_rate"] for p in sorted_periods]
        
        # Simple linear trend calculation
        score_trend = self._calculate_linear_trend(mean_scores)
        hire_rate_trend = self._calculate_linear_trend(hire_rates)
        
        return {
            "period": period,
            "total_periods": len(period_metrics),
            "period_metrics": period_metrics,
            "trends": {
                "score_trend": {
                    "direction": "increasing" if score_trend > 0.01 else "decreasing" if score_trend < -0.01 else "stable",
                    "slope": round(score_trend, 4),
                    "significance": "significant" if abs(score_trend) > 0.05 else "minor"
                },
                "hire_rate_trend": {
                    "direction": "increasing" if hire_rate_trend > 0.005 else "decreasing" if hire_rate_trend < -0.005 else "stable",
                    "slope": round(hire_rate_trend, 4),
                    "significance": "significant" if abs(hire_rate_trend) > 0.02 else "minor"
                }
            },
            "insights": self._generate_trend_insights(score_trend, hire_rate_trend, period_metrics)
        }
    
    def _calculate_linear_trend(self, values: List[float]) -> float:
        """Calculate simple linear trend slope."""
        if len(values) < 2:
            return 0
        
        n = len(values)
        x = list(range(n))
        
        # Calculate slope using least squares
        x_mean = statistics.mean(x)
        y_mean = statistics.mean(values)
        
        numerator = sum((x[i] - x_mean) * (values[i] - y_mean) for i in range(n))
        denominator = sum((x[i] - x_mean) ** 2 for i in range(n))
        
        return numerator / denominator if denominator != 0 else 0
    
    def _detect_score_inflation(self, scores: List[float]) -> Dict[str, Any]:
        """Detect if an interviewer shows score inflation patterns."""
        if len(scores) < 5:
            return {"insufficient_data": True}
        
        mean_score = statistics.mean(scores)
        std_score = statistics.stdev(scores)
        
        # Check against expected mean (2.8)
        expected_mean = self.calibration_standards["score_distribution"]["target_mean"]
        deviation = mean_score - expected_mean
        
        # High scores with low variance might indicate inflation
        high_scores_low_variance = mean_score > 3.2 and std_score < 0.5
        
        # Check distribution - too many 4s might indicate inflation
        score_counts = Counter([int(score) for score in scores])
        four_count_ratio = score_counts.get(4, 0) / len(scores)
        
        return {
            "mean_score": round(mean_score, 2),
            "expected_mean": expected_mean,
            "deviation": round(deviation, 2),
            "high_scores_low_variance": high_scores_low_variance,
            "four_count_ratio": round(four_count_ratio, 2),
            "inflation_detected": deviation > 0.3 or high_scores_low_variance or four_count_ratio > 0.4
        }
    
    def _calculate_interviewer_consistency(self, records: List[Dict[str, Any]]) -> float:
        """Calculate consistency score for an interviewer."""
        if len(records) < 3:
            return 0.5  # Neutral score for insufficient data
        
        # Look at variance in scoring
        avg_scores = [r["average_score"] for r in records]
        score_variance = statistics.variance(avg_scores)
        
        # Look at decision consistency relative to scores
        decisions = [r["hire_decision"] for r in records]
        scores_of_hires = [r["average_score"] for r in records if r["hire_decision"]]
        scores_of_no_hires = [r["average_score"] for r in records if not r["hire_decision"]]
        
        # Good consistency means hires have higher average scores
        decision_consistency = 0.5
        if scores_of_hires and scores_of_no_hires:
            hire_mean = statistics.mean(scores_of_hires)
            no_hire_mean = statistics.mean(scores_of_no_hires)
            score_gap = hire_mean - no_hire_mean
            decision_consistency = min(1.0, max(0.0, score_gap / 2.0))  # Normalize to 0-1
        
        # Combine metrics (lower variance = higher consistency)
        variance_consistency = max(0.0, 1.0 - (score_variance / 2.0))
        
        return (decision_consistency + variance_consistency) / 2
    
    def _calculate_overall_interviewer_consistency(self, data: List[Dict[str, Any]]) -> Dict[str, Any]:
        """Calculate overall consistency across all interviewers."""
        interviewer_consistency_scores = []
        
        interviewer_records = defaultdict(list)
        for record in data:
            interviewer_records[record["interviewer_id"]].append(record)
        
        for interviewer_id, records in interviewer_records.items():
            if len(records) >= 3:
                consistency = self._calculate_interviewer_consistency(records)
                interviewer_consistency_scores.append(consistency)
        
        if not interviewer_consistency_scores:
            return {"error": "Insufficient data per interviewer for consistency analysis"}
        
        return {
            "mean_consistency": round(statistics.mean(interviewer_consistency_scores), 3),
            "std_consistency": round(statistics.stdev(interviewer_consistency_scores) if len(interviewer_consistency_scores) > 1 else 0, 3),
            "min_consistency": round(min(interviewer_consistency_scores), 3),
            "max_consistency": round(max(interviewer_consistency_scores), 3),
            "interviewers_analyzed": len(interviewer_consistency_scores),
            "target_threshold": self.bias_thresholds["interviewer_consistency_threshold"]
        }
    
    def _calculate_bias_score(self, bias_analysis: Dict[str, Any]) -> float:
        """Calculate overall bias score (0-1, where 1 is most biased)."""
        bias_factors = []
        
        # Demographic bias factors
        demographic_bias = bias_analysis.get("demographic_bias", {})
        for demo, analysis in demographic_bias.items():
            if analysis.get("bias_detected"):
                bias_factors.append(0.3)  # Each demographic bias adds 0.3
        
        # Interviewer bias factors
        interviewer_bias = bias_analysis.get("interviewer_bias", {})
        outlier_interviewers = interviewer_bias.get("outlier_interviewers", {})
        if outlier_interviewers:
            # Scale by severity and number of outliers
            total_severity = sum(info["severity"] for info in outlier_interviewers.values())
            bias_factors.append(min(0.5, total_severity * 0.1))
        
        # Competency bias factors  
        competency_bias = bias_analysis.get("competency_bias", {})
        for comp, analysis in competency_bias.items():
            if analysis.get("bias_detected"):
                bias_factors.append(0.2)  # Each competency bias adds 0.2
        
        return min(1.0, sum(bias_factors))
    
    def _calculate_health_score(self, analysis: Dict[str, Any]) -> Dict[str, Any]:
        """Calculate overall calibration health score."""
        health_factors = []
        
        # Bias score (lower is better)
        bias_analysis = analysis.get("bias_analysis", {})
        bias_score = bias_analysis.get("overall_bias_score", 0)
        bias_health = max(0, 1 - bias_score)
        health_factors.append(("bias", bias_health, 0.3))
        
        # Calibration consistency
        calibration_analysis = analysis.get("calibration_analysis", {})
        if "calibration_quality" in calibration_analysis:
            quality_map = {"good": 1.0, "fair": 0.7, "poor": 0.3}
            calibration_health = quality_map.get(calibration_analysis["calibration_quality"], 0.5)
            health_factors.append(("calibration", calibration_health, 0.25))
        
        # Interviewer consistency
        interviewer_analysis = analysis.get("interviewer_analysis", {})
        overall_consistency = interviewer_analysis.get("overall_consistency", {})
        if "mean_consistency" in overall_consistency:
            consistency_health = overall_consistency["mean_consistency"]
            health_factors.append(("interviewer_consistency", consistency_health, 0.25))
        
        # Scoring patterns health
        scoring_analysis = analysis.get("scoring_analysis", {})
        if "overall_assessment" in scoring_analysis:
            assessment_map = {"healthy": 1.0, "concerning": 0.6, "poor": 0.2}
            scoring_health = assessment_map.get(scoring_analysis["overall_assessment"], 0.5)
            health_factors.append(("scoring_patterns", scoring_health, 0.2))
        
        # Calculate weighted average
        if health_factors:
            weighted_sum = sum(score * weight for _, score, weight in health_factors)
            total_weight = sum(weight for _, _, weight in health_factors)
            overall_score = weighted_sum / total_weight
        else:
            overall_score = 0.5  # Neutral if no data
        
        # Categorize health
        if overall_score >= 0.8:
            health_category = "excellent"
        elif overall_score >= 0.7:
            health_category = "good"
        elif overall_score >= 0.5:
            health_category = "fair"
        else:
            health_category = "poor"
        
        return {
            "overall_score": round(overall_score, 3),
            "health_category": health_category,
            "component_scores": {name: round(score, 3) for name, score, _ in health_factors},
            "improvement_priority": self._identify_improvement_priorities(health_factors)
        }
    
    def _identify_improvement_priorities(self, health_factors: List[Tuple[str, float, float]]) -> List[str]:
        """Identify areas that need the most improvement."""
        priorities = []
        
        for name, score, weight in health_factors:
            impact = (1 - score) * weight  # Low scores with high weights = high priority
            if impact > 0.15:  # Significant impact threshold
                priorities.append(name)
        
        # Sort by impact (highest first)
        priorities.sort(key=lambda name: next((1 - score) * weight for n, score, weight in health_factors if n == name), reverse=True)
        
        return priorities
    
    def _generate_recommendations(self, analysis: Dict[str, Any]) -> List[Dict[str, Any]]:
        """Generate actionable recommendations based on analysis results."""
        recommendations = []
        
        # Bias-related recommendations
        bias_analysis = analysis.get("bias_analysis", {})
        
        # Demographic bias recommendations
        for demo, demo_analysis in bias_analysis.get("demographic_bias", {}).items():
            if demo_analysis.get("bias_detected"):
                recommendations.append({
                    "priority": "high",
                    "category": "bias_mitigation",
                    "title": f"Address {demo.replace('_', ' ').title()} Bias",
                    "description": demo_analysis.get("recommendation", f"Implement bias mitigation strategies for {demo}"),
                    "actions": [
                        "Conduct unconscious bias training focused on this demographic",
                        "Review and standardize interview questions",
                        "Implement diverse interview panels",
                        "Monitor hiring metrics by demographic group"
                    ]
                })
        
        # Interviewer-specific recommendations
        interviewer_analysis = bias_analysis.get("interviewer_bias", {})
        outlier_interviewers = interviewer_analysis.get("outlier_interviewers", {})
        
        for interviewer_id, outlier_info in outlier_interviewers.items():
            issues = outlier_info["issues"]
            priority = "high" if outlier_info["severity"] >= 3 else "medium"
            
            actions = []
            if "score_inflation" in issues:
                actions.extend([
                    "Provide calibration training on scoring standards",
                    "Shadow experienced interviewers for recalibration",
                    "Review examples of each score level"
                ])
            if "score_deflation" in issues:
                actions.extend([
                    "Review expectations for role level",
                    "Calibrate against recent successful hires",
                    "Discuss evaluation criteria with hiring manager"
                ])
            if "hire_rate_deviation" in issues:
                actions.extend([
                    "Review hiring bar standards",
                    "Participate in calibration sessions",
                    "Compare decision criteria with team"
                ])
            if "low_consistency" in issues:
                actions.extend([
                    "Practice structured interviewing techniques",
                    "Use standardized scorecards",
                    "Document specific examples for each score"
                ])
            
            recommendations.append({
                "priority": priority,
                "category": "interviewer_coaching",
                "title": f"Coach Interviewer {interviewer_id}",
                "description": f"Address issues: {', '.join(issues)}",
                "actions": list(set(actions))  # Remove duplicates
            })
        
        # Calibration recommendations
        calibration_analysis = analysis.get("calibration_analysis", {})
        if calibration_analysis.get("calibration_quality") in ["fair", "poor"]:
            recommendations.append({
                "priority": "high",
                "category": "calibration_improvement",
                "title": "Improve Interview Calibration",
                "description": f"Current calibration quality: {calibration_analysis.get('calibration_quality')}",
                "actions": [
                    "Conduct monthly calibration sessions",
                    "Create shared examples of good/poor answers",
                    "Implement mandatory interviewer shadowing",
                    "Standardize scoring rubrics across all interviewers",
                    "Review and align on role expectations"
                ]
            })
        
        # Scoring pattern recommendations
        scoring_analysis = analysis.get("scoring_analysis", {})
        if scoring_analysis.get("overall_assessment") in ["concerning", "poor"]:
            recommendations.append({
                "priority": "medium",
                "category": "scoring_standards",
                "title": "Adjust Scoring Standards",
                "description": "Scoring patterns deviate significantly from expected distribution",
                "actions": [
                    "Review and communicate target score distributions",
                    "Provide examples for each score level",
                    "Monitor pass rates by role level",
                    "Adjust hiring bar if consistently too high/low"
                ]
            })
        
        # Health score recommendations
        health_score = analysis.get("calibration_health_score", {})
        priorities = health_score.get("improvement_priority", [])
        
        if "bias" in priorities:
            recommendations.append({
                "priority": "critical",
                "category": "bias_mitigation", 
                "title": "Implement Comprehensive Bias Mitigation",
                "description": "Multiple bias indicators detected across the hiring process",
                "actions": [
                    "Mandatory unconscious bias training for all interviewers",
                    "Implement structured interview protocols",
                    "Diversify interview panels",
                    "Regular bias audits and monitoring",
                    "Create accountability metrics for fair hiring"
                ]
            })
        
        # Sort by priority
        priority_order = {"critical": 0, "high": 1, "medium": 2, "low": 3}
        recommendations.sort(key=lambda x: priority_order.get(x["priority"], 3))
        
        return recommendations
    
    def _generate_demographic_bias_recommendation(self, demographic: str, bias_details: Dict[str, Any]) -> str:
        """Generate specific recommendation for demographic bias."""
        if "hire_rate_disparity" in bias_details:
            return f"Significant hire rate disparity detected for {demographic}. Implement structured interviews and diverse panels."
        elif "scoring_disparity" in bias_details:
            return f"Scoring disparity detected for {demographic}. Provide unconscious bias training and standardize evaluation criteria."
        else:
            return f"Potential bias detected for {demographic}. Monitor closely and implement bias mitigation strategies."
    
    def _generate_interviewer_recommendations(self, outlier_interviewers: Dict[str, Any]) -> List[str]:
        """Generate recommendations for interviewer issues."""
        if not outlier_interviewers:
            return ["All interviewers performing within expected ranges"]
        
        recommendations = []
        for interviewer, info in outlier_interviewers.items():
            issues = info["issues"]
            if len(issues) >= 2:
                recommendations.append(f"Interviewer {interviewer}: Requires comprehensive recalibration - multiple issues detected")
            elif "score_inflation" in issues:
                recommendations.append(f"Interviewer {interviewer}: Provide calibration training on scoring standards")
            elif "hire_rate_deviation" in issues:
                recommendations.append(f"Interviewer {interviewer}: Review hiring bar standards and decision criteria")
        
        return recommendations
    
    def _generate_calibration_recommendations(self, mean_std: float, agreement_rate: float) -> List[str]:
        """Generate calibration improvement recommendations."""
        recommendations = []
        
        if mean_std > self.calibration_standards["interviewer_agreement"]["maximum_std_deviation"]:
            recommendations.append("High score variance detected - implement regular calibration sessions")
            recommendations.append("Create shared examples of scoring standards for each competency")
        
        if agreement_rate < self.calibration_standards["interviewer_agreement"]["agreement_threshold"]:
            recommendations.append("Low interviewer agreement rate - standardize interview questions and evaluation criteria")
            recommendations.append("Implement mandatory interviewer training on consistent evaluation")
        
        if not recommendations:
            recommendations.append("Calibration appears healthy - maintain current practices")
        
        return recommendations
    
    def _assess_scoring_health(self, distribution: Dict[str, Any], mean_score: float, target_mean: float) -> str:
        """Assess overall health of scoring patterns."""
        issues = 0
        
        # Check distribution deviations
        for score_level, analysis in distribution.items():
            if analysis["significant_deviation"]:
                issues += 1
        
        # Check mean deviation
        if abs(mean_score - target_mean) > 0.3:
            issues += 1
        
        if issues == 0:
            return "healthy"
        elif issues <= 2:
            return "concerning"
        else:
            return "poor"
    
    def _generate_trend_insights(self, score_trend: float, hire_rate_trend: float, period_metrics: Dict[str, Any]) -> List[str]:
        """Generate insights from trend analysis."""
        insights = []
        
        if abs(score_trend) > 0.05:
            direction = "increasing" if score_trend > 0 else "decreasing"
            insights.append(f"Significant {direction} trend in average scores over time")
            
            if score_trend > 0:
                insights.append("May indicate score inflation or improving candidate quality")
            else:
                insights.append("May indicate stricter evaluation or declining candidate quality")
        
        if abs(hire_rate_trend) > 0.02:
            direction = "increasing" if hire_rate_trend > 0 else "decreasing"
            insights.append(f"Significant {direction} trend in hire rates over time")
            
            if hire_rate_trend > 0:
                insights.append("Consider if hiring bar has lowered or candidate pool improved")
            else:
                insights.append("Consider if hiring bar has raised or candidate pool declined")
        
        # Check for consistency
        period_values = list(period_metrics.values())
        hire_rates = [p["hire_rate"] for p in period_values]
        hire_rate_variance = statistics.variance(hire_rates) if len(hire_rates) > 1 else 0
        
        if hire_rate_variance > 0.01:  # High variance in hire rates
            insights.append("High variance in hire rates across periods - consider process standardization")
        
        if not insights:
            insights.append("Hiring patterns appear stable over time")
        
        return insights
    
    def _analyze_single_interviewer_consistency(self, data: List[Dict[str, Any]]) -> Dict[str, Any]:
        """Analyze consistency for single-interviewer candidates."""
        # Look at consistency within individual interviewers
        interviewer_scores = defaultdict(list)
        
        for record in data:
            interviewer_scores[record["interviewer_id"]].extend(record["scores"].values())
        
        consistency_analysis = {}
        for interviewer, scores in interviewer_scores.items():
            if len(scores) >= 10:  # Need sufficient data
                consistency_analysis[interviewer] = {
                    "mean_score": round(statistics.mean(scores), 2),
                    "std_score": round(statistics.stdev(scores), 2),
                    "coefficient_of_variation": round(statistics.stdev(scores) / statistics.mean(scores), 2),
                    "total_scores": len(scores)
                }
        
        return consistency_analysis


def format_human_readable(calibration_report: Dict[str, Any]) -> str:
    """Format calibration report in human-readable format."""
    output = []
    
    # Header
    output.append("HIRING CALIBRATION ANALYSIS REPORT")
    output.append("=" * 60)
    output.append(f"Analysis Type: {calibration_report.get('analysis_type', 'N/A').title()}")
    output.append(f"Generated: {calibration_report.get('generated_at', 'N/A')}")
    
    if "error" in calibration_report:
        output.append(f"\nError: {calibration_report['error']}")
        return "\n".join(output)
    
    # Data Summary
    data_summary = calibration_report.get("data_summary", {})
    if data_summary:
        output.append(f"\nDATA SUMMARY")
        output.append("-" * 30)
        output.append(f"Total Candidates: {data_summary.get('total_candidates', 0)}")
        output.append(f"Unique Interviewers: {data_summary.get('unique_interviewers', 0)}")
        output.append(f"Overall Hire Rate: {data_summary.get('hire_rate', 0):.1%}")
        
        score_stats = data_summary.get("score_statistics", {})
        output.append(f"Average Score: {score_stats.get('mean_average_scores', 0):.2f}")
        output.append(f"Score Std Dev: {score_stats.get('std_average_scores', 0):.2f}")
    
    # Health Score
    health_score = calibration_report.get("calibration_health_score", {})
    if health_score:
        output.append(f"\nCALIBRATION HEALTH SCORE")
        output.append("-" * 30)
        output.append(f"Overall Score: {health_score.get('overall_score', 0):.3f}")
        output.append(f"Health Category: {health_score.get('health_category', 'Unknown').title()}")
        
        if health_score.get("improvement_priority"):
            output.append(f"Priority Areas: {', '.join(health_score['improvement_priority'])}")
    
    # Bias Analysis
    bias_analysis = calibration_report.get("bias_analysis", {})
    if bias_analysis:
        output.append(f"\nBIAS ANALYSIS")
        output.append("-" * 30)
        output.append(f"Overall Bias Score: {bias_analysis.get('overall_bias_score', 0):.3f}")
        
        # Demographic bias
        demographic_bias = bias_analysis.get("demographic_bias", {})
        if demographic_bias:
            output.append(f"\nDemographic Bias Issues:")
            for demo, analysis in demographic_bias.items():
                output.append(f"  • {demo.replace('_', ' ').title()}: {analysis.get('bias_details', {}).keys()}")
        
        # Interviewer bias
        interviewer_bias = bias_analysis.get("interviewer_bias", {})
        outlier_interviewers = interviewer_bias.get("outlier_interviewers", {})
        if outlier_interviewers:
            output.append(f"\nOutlier Interviewers:")
            for interviewer, info in outlier_interviewers.items():
                issues = ", ".join(info["issues"])
                output.append(f"  • {interviewer}: {issues}")
    
    # Calibration Analysis
    calibration_analysis = calibration_report.get("calibration_analysis", {})
    if calibration_analysis and "error" not in calibration_analysis:
        output.append(f"\nCALIBRATION CONSISTENCY")
        output.append("-" * 30)
        output.append(f"Quality: {calibration_analysis.get('calibration_quality', 'Unknown').title()}")
        output.append(f"Agreement Rate: {calibration_analysis.get('agreement_within_one_point_rate', 0):.1%}")
        output.append(f"Score Std Dev: {calibration_analysis.get('mean_score_standard_deviation', 0):.3f}")
    
    # Scoring Analysis
    scoring_analysis = calibration_report.get("scoring_analysis", {})
    if scoring_analysis:
        output.append(f"\nSCORING PATTERNS")
        output.append("-" * 30)
        output.append(f"Overall Assessment: {scoring_analysis.get('overall_assessment', 'Unknown').title()}")
        
        score_stats = scoring_analysis.get("score_statistics", {})
        output.append(f"Mean Score: {score_stats.get('mean_score', 0):.2f} (Target: {score_stats.get('target_mean', 0):.2f})")
        
        # Distribution analysis
        distribution = scoring_analysis.get("score_distribution", {})
        if distribution:
            output.append(f"\nScore Distribution vs Expected:")
            for score in ["1", "2", "3", "4"]:
                if score in distribution:
                    actual = distribution[score]["actual_percentage"]
                    expected = distribution[score]["expected_percentage"]
                    output.append(f"  Score {score}: {actual:.1%} (Expected: {expected:.1%})")
    
    # Top Recommendations
    recommendations = calibration_report.get("recommendations", [])
    if recommendations:
        output.append(f"\nTOP RECOMMENDATIONS")
        output.append("-" * 30)
        for i, rec in enumerate(recommendations[:5], 1):  # Show top 5
            output.append(f"{i}. {rec['title']} ({rec['priority'].title()} Priority)")
            output.append(f"   {rec['description']}")
            if rec.get('actions'):
                output.append(f"   Actions: {len(rec['actions'])} specific action items")
    
    return "\n".join(output)


def main():
    parser = argparse.ArgumentParser(description="Analyze interview data for bias and calibration issues")
    parser.add_argument("--input", type=str, required=True, help="Input JSON file with interview results data")
    parser.add_argument("--analysis-type", type=str, choices=["comprehensive", "bias", "calibration", "interviewer", "scoring"], 
                       default="comprehensive", help="Type of analysis to perform")
    parser.add_argument("--competencies", type=str, help="Comma-separated list of competencies to focus on")
    parser.add_argument("--trend-analysis", action="store_true", help="Perform trend analysis over time")
    parser.add_argument("--period", type=str, choices=["daily", "weekly", "monthly", "quarterly"], 
                       default="monthly", help="Time period for trend analysis")
    parser.add_argument("--output", type=str, help="Output file path")
    parser.add_argument("--format", choices=["json", "text", "both"], default="both", help="Output format")
    
    args = parser.parse_args()
    
    # Load input data
    try:
        with open(args.input, 'r') as f:
            interview_data = json.load(f)
        
        if not isinstance(interview_data, list):
            print("Error: Input data must be a JSON array of interview records")
            sys.exit(1)
    except FileNotFoundError:
        print(f"Error: Input file '{args.input}' not found")
        sys.exit(1)
    except json.JSONDecodeError as e:
        print(f"Error: Invalid JSON in input file: {e}")
        sys.exit(1)
    except Exception as e:
        print(f"Error reading input file: {e}")
        sys.exit(1)
    
    # Initialize calibrator and run analysis
    calibrator = HiringCalibrator()
    
    competencies = args.competencies.split(',') if args.competencies else None
    
    try:
        results = calibrator.analyze_hiring_calibration(
            interview_data=interview_data,
            analysis_type=args.analysis_type,
            competencies=competencies,
            trend_analysis=args.trend_analysis,
            period=args.period
        )
        
        # Handle output
        if args.output:
            output_path = args.output
            json_path = output_path if output_path.endswith('.json') else f"{output_path}.json"
            text_path = output_path.replace('.json', '.txt') if output_path.endswith('.json') else f"{output_path}.txt"
        else:
            base_filename = f"calibration_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
            json_path = f"{base_filename}.json"
            text_path = f"{base_filename}.txt"
        
        # Write outputs
        if args.format in ["json", "both"]:
            with open(json_path, 'w') as f:
                json.dump(results, f, indent=2, default=str)
            print(f"JSON report written to: {json_path}")
        
        if args.format in ["text", "both"]:
            with open(text_path, 'w') as f:
                f.write(format_human_readable(results))
            print(f"Text report written to: {text_path}")
        
        # Print summary
        print(f"\nCalibration Analysis Summary:")
        if "error" in results:
            print(f"Error: {results['error']}")
        else:
            health_score = results.get("calibration_health_score", {})
            print(f"Health Score: {health_score.get('overall_score', 0):.3f} ({health_score.get('health_category', 'Unknown').title()})")
            
            bias_score = results.get("bias_analysis", {}).get("overall_bias_score", 0)
            print(f"Bias Score: {bias_score:.3f} (Lower is better)")
            
            recommendations = results.get("recommendations", [])
            print(f"Recommendations Generated: {len(recommendations)}")
            
            if recommendations:
                print(f"Top Priority: {recommendations[0]['title']} ({recommendations[0]['priority'].title()})")
        
    except Exception as e:
        print(f"Error during analysis: {e}")
        sys.exit(1)


if __name__ == "__main__":
    main()

#!/usr/bin/env python3
"""
Interview Loop Designer

Generates calibrated interview loops tailored to specific roles, levels, and teams.
Creates complete interview loops with rounds, focus areas, time allocation, 
interviewer skill requirements, and scorecard templates.

Usage:
    python loop_designer.py --role "Senior Software Engineer" --level senior --team platform
    python loop_designer.py --role "Product Manager" --level mid --competencies leadership,strategy
    python loop_designer.py --input role_definition.json --output loops/
"""

import os
import sys
import json
import argparse
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Any, Tuple
from collections import defaultdict


class InterviewLoopDesigner:
    """Designs comprehensive interview loops based on role requirements."""
    
    def __init__(self):
        self.competency_frameworks = self._init_competency_frameworks()
        self.role_templates = self._init_role_templates()
        self.interviewer_skills = self._init_interviewer_skills()
        
    def _init_competency_frameworks(self) -> Dict[str, Dict]:
        """Initialize competency frameworks for different roles."""
        return {
            "software_engineer": {
                "junior": {
                    "required": ["coding_fundamentals", "debugging", "testing_basics", "version_control"],
                    "preferred": ["system_understanding", "code_review", "collaboration"],
                    "focus_areas": ["technical_execution", "learning_agility", "team_collaboration"]
                },
                "mid": {
                    "required": ["advanced_coding", "system_design_basics", "testing_strategy", "debugging_complex"],
                    "preferred": ["mentoring_basics", "technical_communication", "project_ownership"],
                    "focus_areas": ["technical_depth", "system_thinking", "ownership"]
                },
                "senior": {
                    "required": ["system_architecture", "technical_leadership", "mentoring", "cross_team_collab"],
                    "preferred": ["technology_evaluation", "process_improvement", "hiring_contribution"],
                    "focus_areas": ["technical_leadership", "system_architecture", "people_development"]
                },
                "staff": {
                    "required": ["architectural_vision", "organizational_impact", "technical_strategy", "team_building"],
                    "preferred": ["industry_influence", "innovation_leadership", "executive_communication"],
                    "focus_areas": ["organizational_impact", "technical_vision", "strategic_influence"]
                },
                "principal": {
                    "required": ["company_wide_impact", "technical_vision", "talent_development", "strategic_planning"],
                    "preferred": ["industry_leadership", "board_communication", "market_influence"],
                    "focus_areas": ["strategic_leadership", "organizational_transformation", "external_influence"]
                }
            },
            "product_manager": {
                "junior": {
                    "required": ["product_execution", "user_research", "data_analysis", "stakeholder_comm"],
                    "preferred": ["market_awareness", "technical_understanding", "project_management"],
                    "focus_areas": ["execution_excellence", "user_focus", "analytical_thinking"]
                },
                "mid": {
                    "required": ["product_strategy", "cross_functional_leadership", "metrics_design", "market_analysis"],
                    "preferred": ["team_building", "technical_collaboration", "competitive_analysis"],
                    "focus_areas": ["strategic_thinking", "leadership", "business_impact"]
                },
                "senior": {
                    "required": ["business_strategy", "team_leadership", "p&l_ownership", "market_positioning"],
                    "preferred": ["hiring_leadership", "board_communication", "partnership_development"],
                    "focus_areas": ["business_leadership", "market_strategy", "organizational_impact"]
                },
                "staff": {
                    "required": ["portfolio_management", "organizational_leadership", "strategic_planning", "market_creation"],
                    "preferred": ["executive_presence", "investor_relations", "acquisition_strategy"],
                    "focus_areas": ["strategic_leadership", "market_innovation", "organizational_transformation"]
                }
            },
            "designer": {
                "junior": {
                    "required": ["design_fundamentals", "user_research", "prototyping", "design_tools"],
                    "preferred": ["user_empathy", "visual_design", "collaboration"],
                    "focus_areas": ["design_execution", "user_research", "creative_problem_solving"]
                },
                "mid": {
                    "required": ["design_systems", "user_testing", "cross_functional_collab", "design_strategy"],
                    "preferred": ["mentoring", "process_improvement", "business_understanding"],
                    "focus_areas": ["design_leadership", "system_thinking", "business_impact"]
                },
                "senior": {
                    "required": ["design_leadership", "team_building", "strategic_design", "stakeholder_management"],
                    "preferred": ["design_culture", "hiring_leadership", "executive_communication"],
                    "focus_areas": ["design_strategy", "team_leadership", "organizational_impact"]
                }
            },
            "data_scientist": {
                "junior": {
                    "required": ["statistical_analysis", "python_r", "data_visualization", "sql"],
                    "preferred": ["machine_learning", "business_understanding", "communication"],
                    "focus_areas": ["analytical_skills", "technical_execution", "business_impact"]
                },
                "mid": {
                    "required": ["advanced_ml", "experiment_design", "data_engineering", "stakeholder_comm"],
                    "preferred": ["mentoring", "project_leadership", "product_collaboration"],
                    "focus_areas": ["advanced_analytics", "project_leadership", "cross_functional_impact"]
                },
                "senior": {
                    "required": ["data_strategy", "team_leadership", "ml_systems", "business_strategy"],
                    "preferred": ["hiring_leadership", "executive_communication", "technology_evaluation"],
                    "focus_areas": ["strategic_leadership", "technical_vision", "organizational_impact"]
                }
            },
            "devops_engineer": {
                "junior": {
                    "required": ["infrastructure_basics", "scripting", "monitoring", "troubleshooting"],
                    "preferred": ["automation", "cloud_platforms", "security_awareness"],
                    "focus_areas": ["operational_excellence", "automation_mindset", "problem_solving"]
                },
                "mid": {
                    "required": ["ci_cd_design", "infrastructure_as_code", "security_implementation", "performance_optimization"],
                    "preferred": ["team_collaboration", "incident_management", "capacity_planning"],
                    "focus_areas": ["system_reliability", "automation_leadership", "cross_team_collaboration"]
                },
                "senior": {
                    "required": ["platform_architecture", "team_leadership", "security_strategy", "organizational_impact"],
                    "preferred": ["hiring_contribution", "technology_evaluation", "executive_communication"],
                    "focus_areas": ["platform_leadership", "strategic_thinking", "organizational_transformation"]
                }
            },
            "engineering_manager": {
                "junior": {
                    "required": ["team_leadership", "technical_background", "people_management", "project_coordination"],
                    "preferred": ["hiring_experience", "performance_management", "technical_mentoring"],
                    "focus_areas": ["people_leadership", "team_building", "execution_excellence"]
                },
                "senior": {
                    "required": ["organizational_leadership", "strategic_planning", "talent_development", "cross_functional_leadership"],
                    "preferred": ["technical_vision", "culture_building", "executive_communication"],
                    "focus_areas": ["organizational_impact", "strategic_leadership", "talent_development"]
                },
                "staff": {
                    "required": ["multi_team_leadership", "organizational_strategy", "executive_presence", "cultural_transformation"],
                    "preferred": ["board_communication", "market_understanding", "acquisition_integration"],
                    "focus_areas": ["organizational_transformation", "strategic_leadership", "cultural_evolution"]
                }
            }
        }
    
    def _init_role_templates(self) -> Dict[str, Dict]:
        """Initialize role-specific interview templates."""
        return {
            "software_engineer": {
                "core_rounds": ["technical_phone_screen", "coding_deep_dive", "system_design", "behavioral"],
                "optional_rounds": ["technical_leadership", "domain_expertise", "culture_fit"],
                "total_duration_range": (180, 360),  # 3-6 hours
                "required_competencies": ["coding", "problem_solving", "communication"]
            },
            "product_manager": {
                "core_rounds": ["product_sense", "analytical_thinking", "execution_process", "behavioral"],
                "optional_rounds": ["strategic_thinking", "technical_collaboration", "leadership"],
                "total_duration_range": (180, 300),  # 3-5 hours
                "required_competencies": ["product_strategy", "analytical_thinking", "stakeholder_management"]
            },
            "designer": {
                "core_rounds": ["portfolio_review", "design_challenge", "collaboration_process", "behavioral"],
                "optional_rounds": ["design_system_thinking", "research_methodology", "leadership"],
                "total_duration_range": (180, 300),  # 3-5 hours
                "required_competencies": ["design_process", "user_empathy", "visual_communication"]
            },
            "data_scientist": {
                "core_rounds": ["technical_assessment", "case_study", "statistical_thinking", "behavioral"],
                "optional_rounds": ["ml_systems", "business_strategy", "technical_leadership"],
                "total_duration_range": (210, 330),  # 3.5-5.5 hours
                "required_competencies": ["statistical_analysis", "programming", "business_acumen"]
            },
            "devops_engineer": {
                "core_rounds": ["technical_assessment", "system_design", "troubleshooting", "behavioral"],
                "optional_rounds": ["security_assessment", "automation_design", "leadership"],
                "total_duration_range": (180, 300),  # 3-5 hours
                "required_competencies": ["infrastructure", "automation", "problem_solving"]
            },
            "engineering_manager": {
                "core_rounds": ["leadership_assessment", "technical_background", "people_management", "behavioral"],
                "optional_rounds": ["strategic_thinking", "hiring_assessment", "culture_building"],
                "total_duration_range": (240, 360),  # 4-6 hours
                "required_competencies": ["people_leadership", "technical_understanding", "strategic_thinking"]
            }
        }
    
    def _init_interviewer_skills(self) -> Dict[str, Dict]:
        """Initialize interviewer skill requirements for different round types."""
        return {
            "technical_phone_screen": {
                "required_skills": ["technical_assessment", "coding_evaluation"],
                "preferred_experience": ["same_domain", "senior_level"],
                "calibration_level": "standard"
            },
            "coding_deep_dive": {
                "required_skills": ["advanced_technical", "code_quality_assessment"],
                "preferred_experience": ["senior_engineer", "system_design"],
                "calibration_level": "high"
            },
            "system_design": {
                "required_skills": ["architecture_design", "scalability_assessment"],
                "preferred_experience": ["senior_architect", "large_scale_systems"],
                "calibration_level": "high"
            },
            "behavioral": {
                "required_skills": ["behavioral_interviewing", "competency_assessment"],
                "preferred_experience": ["hiring_manager", "people_leadership"],
                "calibration_level": "standard"
            },
            "technical_leadership": {
                "required_skills": ["leadership_assessment", "technical_mentoring"],
                "preferred_experience": ["engineering_manager", "tech_lead"],
                "calibration_level": "high"
            },
            "product_sense": {
                "required_skills": ["product_evaluation", "market_analysis"],
                "preferred_experience": ["product_manager", "product_leadership"],
                "calibration_level": "high"
            },
            "analytical_thinking": {
                "required_skills": ["data_analysis", "metrics_evaluation"],
                "preferred_experience": ["data_analyst", "product_manager"],
                "calibration_level": "standard"
            },
            "design_challenge": {
                "required_skills": ["design_evaluation", "user_experience"],
                "preferred_experience": ["senior_designer", "design_manager"],
                "calibration_level": "high"
            }
        }
    
    def generate_interview_loop(self, role: str, level: str, team: Optional[str] = None, 
                              competencies: Optional[List[str]] = None) -> Dict[str, Any]:
        """Generate a complete interview loop for the specified role and level."""
        
        # Normalize inputs
        role_key = role.lower().replace(" ", "_").replace("-", "_")
        level_key = level.lower()
        
        # Get role template and competency requirements
        if role_key not in self.competency_frameworks:
            role_key = self._find_closest_role(role_key)
        
        if level_key not in self.competency_frameworks[role_key]:
            level_key = self._find_closest_level(role_key, level_key)
        
        competency_req = self.competency_frameworks[role_key][level_key]
        role_template = self.role_templates.get(role_key, self.role_templates["software_engineer"])
        
        # Design the interview loop
        rounds = self._design_rounds(role_key, level_key, competency_req, role_template, competencies)
        schedule = self._create_schedule(rounds)
        scorecard = self._generate_scorecard(role_key, level_key, competency_req)
        interviewer_requirements = self._define_interviewer_requirements(rounds)
        
        return {
            "role": role,
            "level": level,
            "team": team,
            "generated_at": datetime.now().isoformat(),
            "total_duration_minutes": sum(round_info["duration_minutes"] for round_info in rounds.values()),
            "total_rounds": len(rounds),
            "rounds": rounds,
            "suggested_schedule": schedule,
            "scorecard_template": scorecard,
            "interviewer_requirements": interviewer_requirements,
            "competency_framework": competency_req,
            "calibration_notes": self._generate_calibration_notes(role_key, level_key)
        }
    
    def _find_closest_role(self, role_key: str) -> str:
        """Find the closest matching role template."""
        role_mappings = {
            "engineer": "software_engineer",
            "developer": "software_engineer",
            "swe": "software_engineer",
            "backend": "software_engineer",
            "frontend": "software_engineer",
            "fullstack": "software_engineer",
            "pm": "product_manager",
            "product": "product_manager",
            "ux": "designer",
            "ui": "designer",
            "graphic": "designer",
            "data": "data_scientist",
            "analyst": "data_scientist",
            "ml": "data_scientist",
            "ops": "devops_engineer",
            "sre": "devops_engineer",
            "infrastructure": "devops_engineer",
            "manager": "engineering_manager",
            "lead": "engineering_manager"
        }
        
        for key_part in role_key.split("_"):
            if key_part in role_mappings:
                return role_mappings[key_part]
        
        return "software_engineer"  # Default fallback
    
    def _find_closest_level(self, role_key: str, level_key: str) -> str:
        """Find the closest matching level for the role."""
        available_levels = list(self.competency_frameworks[role_key].keys())
        
        level_mappings = {
            "entry": "junior",
            "associate": "junior", 
            "jr": "junior",
            "mid": "mid",
            "middle": "mid",
            "sr": "senior",
            "senior": "senior",
            "staff": "staff",
            "principal": "principal",
            "lead": "senior",
            "manager": "senior"
        }
        
        mapped_level = level_mappings.get(level_key, level_key)
        
        if mapped_level in available_levels:
            return mapped_level
        elif "senior" in available_levels:
            return "senior"
        else:
            return available_levels[0]
    
    def _design_rounds(self, role_key: str, level_key: str, competency_req: Dict, 
                      role_template: Dict, custom_competencies: Optional[List[str]]) -> Dict[str, Dict]:
        """Design the specific interview rounds based on role and level."""
        rounds = {}
        
        # Determine which rounds to include
        core_rounds = role_template["core_rounds"].copy()
        optional_rounds = role_template["optional_rounds"].copy()
        
        # Add optional rounds based on level
        if level_key in ["senior", "staff", "principal"]:
            if "technical_leadership" in optional_rounds and role_key in ["software_engineer", "engineering_manager"]:
                core_rounds.append("technical_leadership")
            if "strategic_thinking" in optional_rounds and role_key in ["product_manager", "engineering_manager"]:
                core_rounds.append("strategic_thinking")
            if "design_system_thinking" in optional_rounds and role_key == "designer":
                core_rounds.append("design_system_thinking")
        
        if level_key in ["staff", "principal"]:
            if "domain_expertise" in optional_rounds:
                core_rounds.append("domain_expertise")
        
        # Define round details
        round_definitions = self._get_round_definitions()
        
        for i, round_type in enumerate(core_rounds, 1):
            if round_type in round_definitions:
                round_def = round_definitions[round_type].copy()
                round_def["order"] = i
                round_def["focus_areas"] = self._customize_focus_areas(round_type, competency_req, custom_competencies)
                rounds[f"round_{i}_{round_type}"] = round_def
        
        return rounds
    
    def _get_round_definitions(self) -> Dict[str, Dict]:
        """Get predefined round definitions with standard durations and formats."""
        return {
            "technical_phone_screen": {
                "name": "Technical Phone Screen",
                "duration_minutes": 45,
                "format": "virtual",
                "objectives": ["Assess coding fundamentals", "Evaluate problem-solving approach", "Screen for basic technical competency"],
                "question_types": ["coding_problems", "technical_concepts", "experience_questions"],
                "evaluation_criteria": ["technical_accuracy", "problem_solving_process", "communication_clarity"]
            },
            "coding_deep_dive": {
                "name": "Coding Deep Dive",
                "duration_minutes": 75,
                "format": "in_person_or_virtual",
                "objectives": ["Evaluate coding skills in depth", "Assess code quality and testing", "Review debugging approach"],
                "question_types": ["complex_coding_problems", "code_review", "testing_strategy"],
                "evaluation_criteria": ["code_quality", "testing_approach", "debugging_skills", "optimization_thinking"]
            },
            "system_design": {
                "name": "System Design",
                "duration_minutes": 75,
                "format": "collaborative_whiteboard",
                "objectives": ["Assess architectural thinking", "Evaluate scalability considerations", "Review trade-off analysis"],
                "question_types": ["system_architecture", "scalability_design", "trade_off_analysis"],
                "evaluation_criteria": ["architectural_thinking", "scalability_awareness", "trade_off_reasoning"]
            },
            "behavioral": {
                "name": "Behavioral Interview",
                "duration_minutes": 45,
                "format": "conversational",
                "objectives": ["Assess cultural fit", "Evaluate past experiences", "Review leadership examples"],
                "question_types": ["star_method_questions", "situational_scenarios", "values_alignment"],
                "evaluation_criteria": ["communication_skills", "leadership_examples", "cultural_alignment"]
            },
            "technical_leadership": {
                "name": "Technical Leadership",
                "duration_minutes": 60,
                "format": "discussion_based",
                "objectives": ["Evaluate mentoring capability", "Assess technical decision making", "Review cross-team collaboration"],
                "question_types": ["leadership_scenarios", "technical_decisions", "mentoring_examples"],
                "evaluation_criteria": ["leadership_potential", "technical_judgment", "influence_skills"]
            },
            "product_sense": {
                "name": "Product Sense",
                "duration_minutes": 75,
                "format": "case_study",
                "objectives": ["Assess product intuition", "Evaluate user empathy", "Review market understanding"],
                "question_types": ["product_scenarios", "feature_prioritization", "user_journey_analysis"],
                "evaluation_criteria": ["product_intuition", "user_empathy", "analytical_thinking"]
            },
            "analytical_thinking": {
                "name": "Analytical Thinking",
                "duration_minutes": 60,
                "format": "data_analysis",
                "objectives": ["Evaluate data interpretation", "Assess metric design", "Review experiment planning"],
                "question_types": ["data_interpretation", "metric_design", "experiment_analysis"],
                "evaluation_criteria": ["analytical_rigor", "metric_intuition", "experimental_thinking"]
            },
            "design_challenge": {
                "name": "Design Challenge",
                "duration_minutes": 90,
                "format": "hands_on_design",
                "objectives": ["Assess design process", "Evaluate user-centered thinking", "Review iteration approach"],
                "question_types": ["design_problems", "user_research", "design_critique"],
                "evaluation_criteria": ["design_process", "user_focus", "visual_communication"]
            },
            "portfolio_review": {
                "name": "Portfolio Review",
                "duration_minutes": 75,
                "format": "presentation_discussion",
                "objectives": ["Review past work", "Assess design thinking", "Evaluate impact measurement"],
                "question_types": ["portfolio_walkthrough", "design_decisions", "impact_stories"],
                "evaluation_criteria": ["design_quality", "process_thinking", "business_impact"]
            }
        }
    
    def _customize_focus_areas(self, round_type: str, competency_req: Dict, 
                              custom_competencies: Optional[List[str]]) -> List[str]:
        """Customize focus areas based on role competency requirements."""
        base_focus_areas = competency_req.get("focus_areas", [])
        
        round_focus_mapping = {
            "technical_phone_screen": ["coding_fundamentals", "problem_solving"],
            "coding_deep_dive": ["technical_execution", "code_quality"],
            "system_design": ["system_thinking", "architectural_reasoning"],
            "behavioral": ["cultural_fit", "communication", "teamwork"],
            "technical_leadership": ["leadership", "mentoring", "influence"],
            "product_sense": ["product_intuition", "user_empathy"],
            "analytical_thinking": ["data_analysis", "metric_design"],
            "design_challenge": ["design_process", "user_focus"]
        }
        
        focus_areas = round_focus_mapping.get(round_type, [])
        
        # Add custom competencies if specified
        if custom_competencies:
            focus_areas.extend([comp for comp in custom_competencies if comp not in focus_areas])
        
        # Add role-specific focus areas
        focus_areas.extend([area for area in base_focus_areas if area not in focus_areas])
        
        return focus_areas[:5]  # Limit to top 5 focus areas
    
    def _create_schedule(self, rounds: Dict[str, Dict]) -> Dict[str, Any]:
        """Create a suggested interview schedule."""
        sorted_rounds = sorted(rounds.items(), key=lambda x: x[1]["order"])
        
        # Calculate optimal scheduling
        total_duration = sum(round_info["duration_minutes"] for _, round_info in sorted_rounds)
        
        if total_duration <= 240:  # 4 hours or less - single day
            schedule_type = "single_day"
            day_structure = self._create_single_day_schedule(sorted_rounds)
        else:  # Multi-day schedule
            schedule_type = "multi_day"
            day_structure = self._create_multi_day_schedule(sorted_rounds)
        
        return {
            "type": schedule_type,
            "total_duration_minutes": total_duration,
            "recommended_breaks": self._calculate_breaks(total_duration),
            "day_structure": day_structure,
            "logistics_notes": self._generate_logistics_notes(sorted_rounds)
        }
    
    def _create_single_day_schedule(self, rounds: List[Tuple[str, Dict]]) -> Dict[str, Any]:
        """Create a single-day interview schedule."""
        start_time = datetime.strptime("09:00", "%H:%M")
        current_time = start_time
        
        schedule = []
        
        for round_name, round_info in rounds:
            # Add break if needed (after 90 minutes of interviews)
            if schedule and sum(item.get("duration_minutes", 0) for item in schedule if "break" not in item.get("type", "")) >= 90:
                schedule.append({
                    "type": "break",
                    "start_time": current_time.strftime("%H:%M"),
                    "duration_minutes": 15,
                    "end_time": (current_time + timedelta(minutes=15)).strftime("%H:%M")
                })
                current_time += timedelta(minutes=15)
            
            # Add the interview round
            end_time = current_time + timedelta(minutes=round_info["duration_minutes"])
            schedule.append({
                "type": "interview",
                "round_name": round_name,
                "title": round_info["name"],
                "start_time": current_time.strftime("%H:%M"),
                "end_time": end_time.strftime("%H:%M"),
                "duration_minutes": round_info["duration_minutes"],
                "format": round_info["format"]
            })
            current_time = end_time
        
        return {
            "day_1": {
                "date": "TBD",
                "start_time": start_time.strftime("%H:%M"),
                "end_time": current_time.strftime("%H:%M"),
                "rounds": schedule
            }
        }
    
    def _create_multi_day_schedule(self, rounds: List[Tuple[str, Dict]]) -> Dict[str, Any]:
        """Create a multi-day interview schedule."""
        # Split rounds across days (max 4 hours per day)
        max_daily_minutes = 240
        days = {}
        current_day = 1
        current_day_duration = 0
        current_day_rounds = []
        
        for round_name, round_info in rounds:
            duration = round_info["duration_minutes"] + 15  # Add buffer time
            
            if current_day_duration + duration > max_daily_minutes and current_day_rounds:
                # Finalize current day
                days[f"day_{current_day}"] = self._finalize_day_schedule(current_day_rounds)
                current_day += 1
                current_day_duration = 0
                current_day_rounds = []
            
            current_day_rounds.append((round_name, round_info))
            current_day_duration += duration
        
        # Finalize last day
        if current_day_rounds:
            days[f"day_{current_day}"] = self._finalize_day_schedule(current_day_rounds)
        
        return days
    
    def _finalize_day_schedule(self, day_rounds: List[Tuple[str, Dict]]) -> Dict[str, Any]:
        """Finalize the schedule for a specific day."""
        start_time = datetime.strptime("09:00", "%H:%M")
        current_time = start_time
        schedule = []
        
        for round_name, round_info in day_rounds:
            end_time = current_time + timedelta(minutes=round_info["duration_minutes"])
            schedule.append({
                "type": "interview",
                "round_name": round_name,
                "title": round_info["name"],
                "start_time": current_time.strftime("%H:%M"),
                "end_time": end_time.strftime("%H:%M"),
                "duration_minutes": round_info["duration_minutes"],
                "format": round_info["format"]
            })
            current_time = end_time + timedelta(minutes=15)  # 15-min buffer
        
        return {
            "date": "TBD",
            "start_time": start_time.strftime("%H:%M"),
            "end_time": (current_time - timedelta(minutes=15)).strftime("%H:%M"),
            "rounds": schedule
        }
    
    def _calculate_breaks(self, total_duration: int) -> List[Dict[str, Any]]:
        """Calculate recommended breaks based on total duration."""
        breaks = []
        
        if total_duration >= 120:  # 2+ hours
            breaks.append({"type": "short_break", "duration": 15, "after_minutes": 90})
        
        if total_duration >= 240:  # 4+ hours
            breaks.append({"type": "lunch_break", "duration": 60, "after_minutes": 180})
        
        if total_duration >= 360:  # 6+ hours
            breaks.append({"type": "short_break", "duration": 15, "after_minutes": 300})
        
        return breaks
    
    def _generate_scorecard(self, role_key: str, level_key: str, competency_req: Dict) -> Dict[str, Any]:
        """Generate a scorecard template for the interview loop."""
        scoring_dimensions = []
        
        # Add competency-based scoring dimensions
        for competency in competency_req["required"]:
            scoring_dimensions.append({
                "dimension": competency,
                "weight": "high",
                "scale": "1-4",
                "description": f"Assessment of {competency.replace('_', ' ')} competency"
            })
        
        for competency in competency_req.get("preferred", []):
            scoring_dimensions.append({
                "dimension": competency,
                "weight": "medium",
                "scale": "1-4", 
                "description": f"Assessment of {competency.replace('_', ' ')} competency"
            })
        
        # Add standard dimensions
        standard_dimensions = [
            {"dimension": "communication", "weight": "high", "scale": "1-4"},
            {"dimension": "cultural_fit", "weight": "medium", "scale": "1-4"},
            {"dimension": "learning_agility", "weight": "medium", "scale": "1-4"}
        ]
        
        scoring_dimensions.extend(standard_dimensions)
        
        return {
            "scoring_scale": {
                "4": "Exceeds Expectations - Demonstrates mastery beyond required level",
                "3": "Meets Expectations - Solid performance meeting all requirements", 
                "2": "Partially Meets - Shows potential but has development areas",
                "1": "Does Not Meet - Significant gaps in required competencies"
            },
            "dimensions": scoring_dimensions,
            "overall_recommendation": {
                "options": ["Strong Hire", "Hire", "No Hire", "Strong No Hire"],
                "criteria": "Based on weighted average and minimum thresholds"
            },
            "calibration_notes": {
                "required": True,
                "min_length": 100,
                "sections": ["strengths", "areas_for_development", "specific_examples"]
            }
        }
    
    def _define_interviewer_requirements(self, rounds: Dict[str, Dict]) -> Dict[str, Dict]:
        """Define interviewer skill requirements for each round."""
        requirements = {}
        
        for round_name, round_info in rounds.items():
            round_type = round_name.split("_", 2)[-1]  # Extract round type
            
            if round_type in self.interviewer_skills:
                skill_req = self.interviewer_skills[round_type].copy()
                skill_req["suggested_interviewers"] = self._suggest_interviewer_profiles(round_type)
                requirements[round_name] = skill_req
            else:
                # Default requirements
                requirements[round_name] = {
                    "required_skills": ["interviewing_basics", "evaluation_skills"],
                    "preferred_experience": ["relevant_domain"],
                    "calibration_level": "standard",
                    "suggested_interviewers": ["experienced_interviewer"]
                }
        
        return requirements
    
    def _suggest_interviewer_profiles(self, round_type: str) -> List[str]:
        """Suggest specific interviewer profiles for different round types."""
        profile_mapping = {
            "technical_phone_screen": ["senior_engineer", "tech_lead"],
            "coding_deep_dive": ["senior_engineer", "staff_engineer"],
            "system_design": ["senior_architect", "staff_engineer"],
            "behavioral": ["hiring_manager", "people_manager"],
            "technical_leadership": ["engineering_manager", "senior_staff"],
            "product_sense": ["senior_pm", "product_leader"],
            "analytical_thinking": ["senior_analyst", "data_scientist"],
            "design_challenge": ["senior_designer", "design_manager"]
        }
        
        return profile_mapping.get(round_type, ["experienced_interviewer"])
    
    def _generate_calibration_notes(self, role_key: str, level_key: str) -> Dict[str, Any]:
        """Generate calibration notes and best practices."""
        return {
            "hiring_bar_notes": f"Calibrated for {level_key} level {role_key.replace('_', ' ')} role",
            "common_pitfalls": [
                "Avoid comparing candidates to each other rather than to the role standard",
                "Don't let one strong/weak area overshadow overall assessment",
                "Ensure consistent application of evaluation criteria"
            ],
            "calibration_checkpoints": [
                "Review score distribution after every 5 candidates",
                "Conduct monthly interviewer calibration sessions",
                "Track correlation with 6-month performance reviews"
            ],
            "escalation_criteria": [
                "Any candidate receiving all 4s or all 1s",
                "Significant disagreement between interviewers (>1.5 point spread)",
                "Unusual circumstances or accommodations needed"
            ]
        }
    
    def _generate_logistics_notes(self, rounds: List[Tuple[str, Dict]]) -> List[str]:
        """Generate logistics and coordination notes."""
        notes = [
            "Coordinate interviewer availability before scheduling",
            "Ensure all interviewers have access to job description and competency requirements",
            "Prepare interview rooms/virtual links for all rounds",
            "Share candidate resume and application with all interviewers"
        ]
        
        # Add format-specific notes
        formats_used = {round_info["format"] for _, round_info in rounds}
        
        if "virtual" in formats_used:
            notes.append("Test video conferencing setup before virtual interviews")
            notes.append("Share virtual meeting links with candidate 24 hours in advance")
        
        if "collaborative_whiteboard" in formats_used:
            notes.append("Prepare whiteboard or collaborative online tool for design sessions")
        
        if "hands_on_design" in formats_used:
            notes.append("Provide design tools access or ensure candidate can screen share their preferred tools")
        
        return notes


def format_human_readable(loop_data: Dict[str, Any]) -> str:
    """Format the interview loop data in a human-readable format."""
    output = []
    
    # Header
    output.append(f"Interview Loop Design for {loop_data['role']} ({loop_data['level'].title()} Level)")
    output.append("=" * 60)
    
    if loop_data.get('team'):
        output.append(f"Team: {loop_data['team']}")
    
    output.append(f"Generated: {loop_data['generated_at']}")
    output.append(f"Total Duration: {loop_data['total_duration_minutes']} minutes ({loop_data['total_duration_minutes']//60}h {loop_data['total_duration_minutes']%60}m)")
    output.append(f"Total Rounds: {loop_data['total_rounds']}")
    output.append("")
    
    # Interview Rounds
    output.append("INTERVIEW ROUNDS")
    output.append("-" * 40)
    
    sorted_rounds = sorted(loop_data['rounds'].items(), key=lambda x: x[1]['order'])
    for round_name, round_info in sorted_rounds:
        output.append(f"\nRound {round_info['order']}: {round_info['name']}")
        output.append(f"Duration: {round_info['duration_minutes']} minutes")
        output.append(f"Format: {round_info['format'].replace('_', ' ').title()}")
        
        output.append("Objectives:")
        for obj in round_info['objectives']:
            output.append(f"  • {obj}")
        
        output.append("Focus Areas:")
        for area in round_info['focus_areas']:
            output.append(f"  • {area.replace('_', ' ').title()}")
    
    # Suggested Schedule
    output.append("\nSUGGESTED SCHEDULE")
    output.append("-" * 40)
    
    schedule = loop_data['suggested_schedule']
    output.append(f"Schedule Type: {schedule['type'].replace('_', ' ').title()}")
    
    for day_name, day_info in schedule['day_structure'].items():
        output.append(f"\n{day_name.replace('_', ' ').title()}:")
        output.append(f"Time: {day_info['start_time']} - {day_info['end_time']}")
        
        for item in day_info['rounds']:
            if item['type'] == 'interview':
                output.append(f"  {item['start_time']}-{item['end_time']}: {item['title']} ({item['duration_minutes']}min)")
            else:
                output.append(f"  {item['start_time']}-{item['end_time']}: {item['type'].title()} ({item['duration_minutes']}min)")
    
    # Interviewer Requirements
    output.append("\nINTERVIEWER REQUIREMENTS")
    output.append("-" * 40)
    
    for round_name, requirements in loop_data['interviewer_requirements'].items():
        round_display = round_name.split("_", 2)[-1].replace("_", " ").title()
        output.append(f"\n{round_display}:")
        output.append(f"Required Skills: {', '.join(requirements['required_skills'])}")
        output.append(f"Suggested Interviewers: {', '.join(requirements['suggested_interviewers'])}")
        output.append(f"Calibration Level: {requirements['calibration_level'].title()}")
    
    # Scorecard Overview
    output.append("\nSCORECARD TEMPLATE")
    output.append("-" * 40)
    
    scorecard = loop_data['scorecard_template']
    output.append("Scoring Scale:")
    for score, description in scorecard['scoring_scale'].items():
        output.append(f"  {score}: {description}")
    
    output.append("\nEvaluation Dimensions:")
    for dim in scorecard['dimensions']:
        output.append(f"  • {dim['dimension'].replace('_', ' ').title()} (Weight: {dim['weight']})")
    
    # Calibration Notes
    output.append("\nCALIBRATION NOTES")
    output.append("-" * 40)
    
    calibration = loop_data['calibration_notes']
    output.append(f"Hiring Bar: {calibration['hiring_bar_notes']}")
    
    output.append("\nCommon Pitfalls:")
    for pitfall in calibration['common_pitfalls']:
        output.append(f"  • {pitfall}")
    
    return "\n".join(output)


def main():
    parser = argparse.ArgumentParser(description="Generate calibrated interview loops for specific roles and levels")
    parser.add_argument("--role", type=str, help="Job role title (e.g., 'Senior Software Engineer')")
    parser.add_argument("--level", type=str, help="Experience level (junior, mid, senior, staff, principal)")
    parser.add_argument("--team", type=str, help="Team or department (optional)")
    parser.add_argument("--competencies", type=str, help="Comma-separated list of specific competencies to focus on")
    parser.add_argument("--input", type=str, help="Input JSON file with role definition")
    parser.add_argument("--output", type=str, help="Output directory or file path")
    parser.add_argument("--format", choices=["json", "text", "both"], default="both", help="Output format")
    
    args = parser.parse_args()
    
    designer = InterviewLoopDesigner()
    
    # Handle input
    if args.input:
        try:
            with open(args.input, 'r') as f:
                role_data = json.load(f)
            role = role_data.get('role') or role_data.get('title', '')
            level = role_data.get('level', 'senior')
            team = role_data.get('team')
            competencies = role_data.get('competencies')
        except Exception as e:
            print(f"Error reading input file: {e}")
            sys.exit(1)
    else:
        if not args.role or not args.level:
            print("Error: --role and --level are required when not using --input")
            sys.exit(1)
        
        role = args.role
        level = args.level
        team = args.team
        competencies = args.competencies.split(',') if args.competencies else None
    
    # Generate interview loop
    try:
        loop_data = designer.generate_interview_loop(role, level, team, competencies)
        
        # Handle output
        if args.output:
            output_path = args.output
            if os.path.isdir(output_path):
                safe_role = "".join(c for c in role.lower() if c.isalnum() or c in (' ', '-', '_')).replace(' ', '_')
                base_filename = f"{safe_role}_{level}_interview_loop"
                json_path = os.path.join(output_path, f"{base_filename}.json")
                text_path = os.path.join(output_path, f"{base_filename}.txt")
            else:
                # Use provided path as base
                json_path = output_path if output_path.endswith('.json') else f"{output_path}.json"
                text_path = output_path.replace('.json', '.txt') if output_path.endswith('.json') else f"{output_path}.txt"
        else:
            safe_role = "".join(c for c in role.lower() if c.isalnum() or c in (' ', '-', '_')).replace(' ', '_')
            base_filename = f"{safe_role}_{level}_interview_loop"
            json_path = f"{base_filename}.json"
            text_path = f"{base_filename}.txt"
        
        # Write outputs
        if args.format in ["json", "both"]:
            with open(json_path, 'w') as f:
                json.dump(loop_data, f, indent=2, default=str)
            print(f"JSON output written to: {json_path}")
        
        if args.format in ["text", "both"]:
            with open(text_path, 'w') as f:
                f.write(format_human_readable(loop_data))
            print(f"Text output written to: {text_path}")
        
        # Always print summary to stdout
        print("\nInterview Loop Summary:")
        print(f"Role: {loop_data['role']} ({loop_data['level'].title()})")
        print(f"Total Duration: {loop_data['total_duration_minutes']} minutes")
        print(f"Number of Rounds: {loop_data['total_rounds']}")
        print(f"Schedule Type: {loop_data['suggested_schedule']['type'].replace('_', ' ').title()}")
        
    except Exception as e:
        print(f"Error generating interview loop: {e}")
        sys.exit(1)


if __name__ == "__main__":
    main()

#!/usr/bin/env python3
"""
Question Bank Generator

Generates comprehensive, competency-based interview questions with detailed scoring criteria.
Creates structured question banks organized by competency area with scoring rubrics, 
follow-up probes, and calibration examples.

Usage:
    python question_bank_generator.py --role "Frontend Engineer" --competencies react,typescript,system-design
    python question_bank_generator.py --role "Product Manager" --question-types behavioral,leadership
    python question_bank_generator.py --input role_requirements.json --output questions/
"""

import os
import sys
import json
import argparse
import random
from datetime import datetime
from typing import Dict, List, Optional, Any, Tuple
from collections import defaultdict


class QuestionBankGenerator:
    """Generates comprehensive interview question banks with scoring criteria."""
    
    def __init__(self):
        self.technical_questions = self._init_technical_questions()
        self.behavioral_questions = self._init_behavioral_questions()
        self.competency_mapping = self._init_competency_mapping()
        self.scoring_rubrics = self._init_scoring_rubrics()
        self.follow_up_strategies = self._init_follow_up_strategies()
        
    def _init_technical_questions(self) -> Dict[str, Dict]:
        """Initialize technical questions by competency area and level."""
        return {
            "coding_fundamentals": {
                "junior": [
                    {
                        "question": "Write a function to reverse a string without using built-in reverse methods.",
                        "competency": "coding_fundamentals",
                        "type": "coding",
                        "difficulty": "easy",
                        "time_limit": 15,
                        "key_concepts": ["loops", "string_manipulation", "basic_algorithms"]
                    },
                    {
                        "question": "Implement a function to check if a string is a palindrome.",
                        "competency": "coding_fundamentals", 
                        "type": "coding",
                        "difficulty": "easy",
                        "time_limit": 15,
                        "key_concepts": ["string_processing", "comparison", "edge_cases"]
                    },
                    {
                        "question": "Find the largest element in an array without using built-in max functions.",
                        "competency": "coding_fundamentals",
                        "type": "coding", 
                        "difficulty": "easy",
                        "time_limit": 10,
                        "key_concepts": ["arrays", "iteration", "comparison"]
                    }
                ],
                "mid": [
                    {
                        "question": "Implement a function to find the first non-repeating character in a string.",
                        "competency": "coding_fundamentals",
                        "type": "coding",
                        "difficulty": "medium",
                        "time_limit": 20,
                        "key_concepts": ["hash_maps", "string_processing", "efficiency"]
                    },
                    {
                        "question": "Write a function to merge two sorted arrays into one sorted array.",
                        "competency": "coding_fundamentals",
                        "type": "coding",
                        "difficulty": "medium", 
                        "time_limit": 25,
                        "key_concepts": ["merge_algorithms", "two_pointers", "optimization"]
                    }
                ],
                "senior": [
                    {
                        "question": "Implement a LRU (Least Recently Used) cache with O(1) operations.",
                        "competency": "coding_fundamentals",
                        "type": "coding",
                        "difficulty": "hard",
                        "time_limit": 35,
                        "key_concepts": ["data_structures", "hash_maps", "doubly_linked_lists"]
                    }
                ]
            },
            "system_design": {
                "mid": [
                    {
                        "question": "Design a URL shortener service like bit.ly for 10K users.",
                        "competency": "system_design",
                        "type": "design",
                        "difficulty": "medium",
                        "time_limit": 45,
                        "key_concepts": ["database_design", "hashing", "basic_scalability"]
                    }
                ],
                "senior": [
                    {
                        "question": "Design a real-time chat system supporting 1M concurrent users.",
                        "competency": "system_design",
                        "type": "design",
                        "difficulty": "hard",
                        "time_limit": 60,
                        "key_concepts": ["websockets", "load_balancing", "database_sharding", "caching"]
                    },
                    {
                        "question": "Design a distributed cache system like Redis with high availability.",
                        "competency": "system_design",
                        "type": "design",
                        "difficulty": "hard",
                        "time_limit": 60,
                        "key_concepts": ["distributed_systems", "replication", "consistency", "partitioning"]
                    }
                ],
                "staff": [
                    {
                        "question": "Design the architecture for a global content delivery network (CDN).",
                        "competency": "system_design",
                        "type": "design",
                        "difficulty": "expert",
                        "time_limit": 75,
                        "key_concepts": ["global_architecture", "edge_computing", "content_optimization", "network_protocols"]
                    }
                ]
            },
            "frontend_development": {
                "junior": [
                    {
                        "question": "Create a responsive navigation menu using HTML, CSS, and vanilla JavaScript.",
                        "competency": "frontend_development",
                        "type": "coding",
                        "difficulty": "easy",
                        "time_limit": 30,
                        "key_concepts": ["html_css", "responsive_design", "dom_manipulation"]
                    }
                ],
                "mid": [
                    {
                        "question": "Build a React component that fetches and displays paginated data from an API.",
                        "competency": "frontend_development",
                        "type": "coding",
                        "difficulty": "medium",
                        "time_limit": 45,
                        "key_concepts": ["react_hooks", "api_integration", "state_management", "pagination"]
                    }
                ],
                "senior": [
                    {
                        "question": "Design and implement a custom React hook for managing complex form state with validation.",
                        "competency": "frontend_development",
                        "type": "coding",
                        "difficulty": "hard",
                        "time_limit": 60,
                        "key_concepts": ["custom_hooks", "form_validation", "state_management", "performance"]
                    }
                ]
            },
            "data_analysis": {
                "junior": [
                    {
                        "question": "Given a dataset of user activities, calculate the daily active users for the past month.",
                        "competency": "data_analysis",
                        "type": "analytical",
                        "difficulty": "easy",
                        "time_limit": 30,
                        "key_concepts": ["sql_basics", "date_functions", "aggregation"]
                    }
                ],
                "mid": [
                    {
                        "question": "Analyze conversion funnel data to identify the biggest drop-off point and propose solutions.",
                        "competency": "data_analysis", 
                        "type": "analytical",
                        "difficulty": "medium",
                        "time_limit": 45,
                        "key_concepts": ["funnel_analysis", "conversion_optimization", "statistical_significance"]
                    }
                ],
                "senior": [
                    {
                        "question": "Design an A/B testing framework to measure the impact of a new recommendation algorithm.",
                        "competency": "data_analysis",
                        "type": "analytical",
                        "difficulty": "hard", 
                        "time_limit": 60,
                        "key_concepts": ["experiment_design", "statistical_power", "bias_mitigation", "causal_inference"]
                    }
                ]
            },
            "machine_learning": {
                "mid": [
                    {
                        "question": "Explain how you would build a recommendation system for an e-commerce platform.",
                        "competency": "machine_learning",
                        "type": "conceptual",
                        "difficulty": "medium",
                        "time_limit": 45,
                        "key_concepts": ["collaborative_filtering", "content_based", "cold_start", "evaluation_metrics"]
                    }
                ],
                "senior": [
                    {
                        "question": "Design a real-time fraud detection system for financial transactions.",
                        "competency": "machine_learning",
                        "type": "design",
                        "difficulty": "hard",
                        "time_limit": 60,
                        "key_concepts": ["anomaly_detection", "real_time_ml", "feature_engineering", "model_monitoring"]
                    }
                ]
            },
            "product_strategy": {
                "mid": [
                    {
                        "question": "How would you prioritize features for a mobile app with limited engineering resources?",
                        "competency": "product_strategy",
                        "type": "case_study",
                        "difficulty": "medium",
                        "time_limit": 45,
                        "key_concepts": ["prioritization_frameworks", "resource_allocation", "impact_estimation"]
                    }
                ],
                "senior": [
                    {
                        "question": "Design a go-to-market strategy for a new B2B SaaS product entering a competitive market.",
                        "competency": "product_strategy",
                        "type": "strategic",
                        "difficulty": "hard",
                        "time_limit": 60,
                        "key_concepts": ["market_analysis", "competitive_positioning", "pricing_strategy", "channel_strategy"]
                    }
                ]
            }
        }
    
    def _init_behavioral_questions(self) -> Dict[str, List[Dict]]:
        """Initialize behavioral questions by competency area."""
        return {
            "leadership": [
                {
                    "question": "Tell me about a time when you had to lead a team through a significant change or challenge.",
                    "competency": "leadership",
                    "type": "behavioral",
                    "method": "STAR",
                    "focus_areas": ["change_management", "team_motivation", "communication"]
                },
                {
                    "question": "Describe a situation where you had to influence someone without having direct authority over them.",
                    "competency": "leadership", 
                    "type": "behavioral",
                    "method": "STAR",
                    "focus_areas": ["influence", "persuasion", "stakeholder_management"]
                },
                {
                    "question": "Give me an example of when you had to make a difficult decision that affected your team.",
                    "competency": "leadership",
                    "type": "behavioral", 
                    "method": "STAR",
                    "focus_areas": ["decision_making", "team_impact", "communication"]
                }
            ],
            "collaboration": [
                {
                    "question": "Describe a time when you had to work with a difficult colleague or stakeholder.",
                    "competency": "collaboration",
                    "type": "behavioral",
                    "method": "STAR", 
                    "focus_areas": ["conflict_resolution", "relationship_building", "professionalism"]
                },
                {
                    "question": "Tell me about a project where you had to coordinate across multiple teams or departments.",
                    "competency": "collaboration",
                    "type": "behavioral",
                    "method": "STAR",
                    "focus_areas": ["cross_functional_work", "communication", "project_coordination"]
                }
            ],
            "problem_solving": [
                {
                    "question": "Walk me through a complex problem you solved recently. What was your approach?",
                    "competency": "problem_solving",
                    "type": "behavioral",
                    "method": "STAR",
                    "focus_areas": ["analytical_thinking", "methodology", "creativity"]
                },
                {
                    "question": "Describe a time when you had to solve a problem with limited information or resources.",
                    "competency": "problem_solving",
                    "type": "behavioral",
                    "method": "STAR", 
                    "focus_areas": ["resourcefulness", "ambiguity_tolerance", "decision_making"]
                }
            ],
            "communication": [
                {
                    "question": "Tell me about a time when you had to present complex technical information to a non-technical audience.",
                    "competency": "communication",
                    "type": "behavioral",
                    "method": "STAR",
                    "focus_areas": ["technical_communication", "audience_adaptation", "clarity"]
                },
                {
                    "question": "Describe a situation where you had to deliver difficult feedback to a colleague.",
                    "competency": "communication",
                    "type": "behavioral", 
                    "method": "STAR",
                    "focus_areas": ["feedback_delivery", "empathy", "constructive_criticism"]
                }
            ],
            "adaptability": [
                {
                    "question": "Tell me about a time when you had to quickly learn a new technology or skill for work.",
                    "competency": "adaptability",
                    "type": "behavioral",
                    "method": "STAR",
                    "focus_areas": ["learning_agility", "growth_mindset", "knowledge_acquisition"]
                },
                {
                    "question": "Describe how you handled a situation when project requirements changed significantly mid-way.",
                    "competency": "adaptability",
                    "type": "behavioral",
                    "method": "STAR",
                    "focus_areas": ["flexibility", "change_management", "resilience"]
                }
            ],
            "innovation": [
                {
                    "question": "Tell me about a time when you came up with a creative solution to improve a process or solve a problem.",
                    "competency": "innovation", 
                    "type": "behavioral",
                    "method": "STAR",
                    "focus_areas": ["creative_thinking", "process_improvement", "initiative"]
                }
            ]
        }
    
    def _init_competency_mapping(self) -> Dict[str, Dict]:
        """Initialize role to competency mapping."""
        return {
            "software_engineer": {
                "core_competencies": ["coding_fundamentals", "system_design", "problem_solving", "collaboration"],
                "level_specific": {
                    "junior": ["coding_fundamentals", "debugging", "learning_agility"],
                    "mid": ["advanced_coding", "system_design", "mentoring_basics"], 
                    "senior": ["system_architecture", "technical_leadership", "innovation"],
                    "staff": ["architectural_vision", "organizational_impact", "strategic_thinking"]
                }
            },
            "frontend_engineer": {
                "core_competencies": ["frontend_development", "ui_ux_understanding", "problem_solving", "collaboration"],
                "level_specific": {
                    "junior": ["html_css_js", "responsive_design", "basic_frameworks"],
                    "mid": ["react_vue_angular", "state_management", "performance_optimization"],
                    "senior": ["frontend_architecture", "team_leadership", "cross_functional_collaboration"],
                    "staff": ["frontend_strategy", "technology_evaluation", "organizational_impact"]
                }
            },
            "backend_engineer": {
                "core_competencies": ["backend_development", "database_design", "api_design", "system_design"],
                "level_specific": {
                    "junior": ["server_side_programming", "database_basics", "api_consumption"],
                    "mid": ["microservices", "caching", "security_basics"],
                    "senior": ["distributed_systems", "performance_optimization", "technical_leadership"],
                    "staff": ["system_architecture", "technology_strategy", "cross_team_influence"]
                }
            },
            "product_manager": {
                "core_competencies": ["product_strategy", "user_research", "data_analysis", "stakeholder_management"],
                "level_specific": {
                    "junior": ["feature_specification", "user_stories", "basic_analytics"],
                    "mid": ["product_roadmap", "cross_functional_leadership", "market_research"],
                    "senior": ["business_strategy", "team_leadership", "p&l_responsibility"],
                    "staff": ["portfolio_management", "organizational_strategy", "market_creation"]
                }
            },
            "data_scientist": {
                "core_competencies": ["statistical_analysis", "machine_learning", "data_analysis", "business_acumen"],
                "level_specific": {
                    "junior": ["python_r", "sql", "basic_ml", "data_visualization"],
                    "mid": ["advanced_ml", "experiment_design", "model_evaluation"],
                    "senior": ["ml_systems", "data_strategy", "stakeholder_communication"],
                    "staff": ["data_platform", "ai_strategy", "organizational_impact"]
                }
            },
            "designer": {
                "core_competencies": ["design_process", "user_research", "visual_design", "collaboration"],
                "level_specific": {
                    "junior": ["design_tools", "user_empathy", "visual_communication"],
                    "mid": ["design_systems", "user_testing", "cross_functional_work"],
                    "senior": ["design_strategy", "team_leadership", "business_impact"],
                    "staff": ["design_vision", "organizational_design", "strategic_influence"]
                }
            },
            "devops_engineer": {
                "core_competencies": ["infrastructure", "automation", "monitoring", "troubleshooting"],
                "level_specific": {
                    "junior": ["scripting", "basic_cloud", "ci_cd_basics"],
                    "mid": ["infrastructure_as_code", "container_orchestration", "security"],
                    "senior": ["platform_design", "reliability_engineering", "team_leadership"],
                    "staff": ["platform_strategy", "organizational_infrastructure", "technology_vision"]
                }
            }
        }
    
    def _init_scoring_rubrics(self) -> Dict[str, Dict]:
        """Initialize scoring rubrics for different question types."""
        return {
            "coding": {
                "correctness": {
                    "4": "Solution is completely correct, handles all edge cases, optimal complexity",
                    "3": "Solution is correct for main cases, good complexity, minor edge case issues",
                    "2": "Solution works but has some bugs or suboptimal approach",
                    "1": "Solution has significant issues or doesn't work"
                },
                "code_quality": {
                    "4": "Clean, readable, well-structured code with excellent naming and comments",
                    "3": "Good code structure, readable with appropriate naming",
                    "2": "Code works but has style/structure issues",
                    "1": "Poor code quality, hard to understand"
                },
                "problem_solving_approach": {
                    "4": "Excellent problem breakdown, clear thinking process, considers alternatives",
                    "3": "Good approach, logical thinking, systematic problem solving",
                    "2": "Decent approach but some confusion or inefficiency",
                    "1": "Poor approach, unclear thinking process"
                },
                "communication": {
                    "4": "Excellent explanation of approach, asks clarifying questions, clear reasoning",
                    "3": "Good communication, explains thinking well",
                    "2": "Adequate communication, some explanation",
                    "1": "Poor communication, little explanation"
                }
            },
            "behavioral": {
                "situation_clarity": {
                    "4": "Clear, specific situation with relevant context and stakes",
                    "3": "Good situation description with adequate context",
                    "2": "Situation described but lacks some specifics",
                    "1": "Vague or unclear situation description"
                },
                "action_quality": {
                    "4": "Specific, thoughtful actions showing strong competency",
                    "3": "Good actions demonstrating competency",
                    "2": "Adequate actions but could be stronger",
                    "1": "Weak or inappropriate actions"
                },
                "result_impact": {
                    "4": "Significant positive impact with measurable results",
                    "3": "Good positive impact with clear outcomes",
                    "2": "Some positive impact demonstrated",
                    "1": "Little or no positive impact shown"
                },
                "self_awareness": {
                    "4": "Excellent self-reflection, learns from experience, acknowledges growth areas",
                    "3": "Good self-awareness and learning orientation",
                    "2": "Some self-reflection demonstrated",
                    "1": "Limited self-awareness or reflection"
                }
            },
            "design": {
                "system_thinking": {
                    "4": "Comprehensive system view, considers all components and interactions",
                    "3": "Good system understanding with most components identified",
                    "2": "Basic system thinking with some gaps",
                    "1": "Limited system thinking, misses key components"
                },
                "scalability": {
                    "4": "Excellent scalability considerations, multiple strategies discussed",
                    "3": "Good scalability awareness with practical solutions",
                    "2": "Basic scalability understanding",
                    "1": "Little to no scalability consideration"
                },
                "trade_offs": {
                    "4": "Excellent trade-off analysis, considers multiple dimensions",
                    "3": "Good trade-off awareness with clear reasoning",
                    "2": "Some trade-off consideration",
                    "1": "Limited trade-off analysis"
                },
                "technical_depth": {
                    "4": "Deep technical knowledge with implementation details",
                    "3": "Good technical knowledge with solid understanding",
                    "2": "Adequate technical knowledge",
                    "1": "Limited technical depth"
                }
            }
        }
    
    def _init_follow_up_strategies(self) -> Dict[str, List[str]]:
        """Initialize follow-up question strategies by competency."""
        return {
            "coding_fundamentals": [
                "How would you optimize this solution for better time complexity?",
                "What edge cases should we consider for this problem?",
                "How would you test this function?",
                "What would happen if the input size was very large?"
            ],
            "system_design": [
                "How would you handle if the system needed to scale 10x?",
                "What would you do if one of your services went down?",
                "How would you monitor this system in production?",
                "What security considerations would you implement?"
            ],
            "leadership": [
                "What would you do differently if you faced this situation again?",
                "How did you handle team members who were resistant to the change?",
                "What metrics did you use to measure success?",
                "How did you communicate progress to stakeholders?"
            ],
            "problem_solving": [
                "Walk me through your thought process step by step",
                "What alternative approaches did you consider?",
                "How did you validate your solution worked?",
                "What did you learn from this experience?"
            ],
            "collaboration": [
                "How did you build consensus among the different stakeholders?",
                "What communication channels did you use to keep everyone aligned?",
                "How did you handle disagreements or conflicts?",
                "What would you do to improve collaboration in the future?"
            ]
        }
    
    def generate_question_bank(self, role: str, level: str = "senior", 
                              competencies: Optional[List[str]] = None,
                              question_types: Optional[List[str]] = None,
                              num_questions: int = 20) -> Dict[str, Any]:
        """Generate a comprehensive question bank for the specified role and competencies."""
        
        # Normalize inputs
        role_key = self._normalize_role(role)
        level_key = level.lower()
        
        # Get competency requirements
        role_competencies = self._get_role_competencies(role_key, level_key, competencies)
        
        # Determine question types to include
        if question_types is None:
            question_types = ["technical", "behavioral", "situational"]
        
        # Generate questions
        questions = self._generate_questions(role_competencies, question_types, level_key, num_questions)
        
        # Create scoring rubrics
        scoring_rubrics = self._create_scoring_rubrics(questions)
        
        # Generate follow-up probes
        follow_up_probes = self._generate_follow_up_probes(questions)
        
        # Create calibration examples
        calibration_examples = self._create_calibration_examples(questions[:5])  # Sample for first 5 questions
        
        return {
            "role": role,
            "level": level,
            "competencies": role_competencies,
            "question_types": question_types,
            "generated_at": datetime.now().isoformat(),
            "total_questions": len(questions),
            "questions": questions,
            "scoring_rubrics": scoring_rubrics,
            "follow_up_probes": follow_up_probes,
            "calibration_examples": calibration_examples,
            "usage_guidelines": self._generate_usage_guidelines(role_key, level_key)
        }
    
    def _normalize_role(self, role: str) -> str:
        """Normalize role name to match competency mapping keys."""
        role_lower = role.lower().replace(" ", "_").replace("-", "_")
        
        # Map variations to standard roles
        role_mappings = {
            "software_engineer": ["engineer", "developer", "swe", "software_developer"],
            "frontend_engineer": ["frontend", "front_end", "ui_engineer", "web_developer"],
            "backend_engineer": ["backend", "back_end", "server_engineer", "api_developer"],
            "product_manager": ["pm", "product", "product_owner", "po"],
            "data_scientist": ["ds", "data", "analyst", "ml_engineer"],
            "designer": ["ux", "ui", "ux_ui", "product_designer", "visual_designer"],
            "devops_engineer": ["devops", "sre", "platform_engineer", "infrastructure"]
        }
        
        for standard_role, variations in role_mappings.items():
            if any(var in role_lower for var in variations):
                return standard_role
        
        # Default fallback
        return "software_engineer"
    
    def _get_role_competencies(self, role_key: str, level_key: str, 
                              custom_competencies: Optional[List[str]]) -> List[str]:
        """Get competencies for the role and level."""
        if role_key not in self.competency_mapping:
            role_key = "software_engineer"
        
        role_mapping = self.competency_mapping[role_key]
        competencies = role_mapping["core_competencies"].copy()
        
        # Add level-specific competencies
        if level_key in role_mapping["level_specific"]:
            competencies.extend(role_mapping["level_specific"][level_key])
        elif "senior" in role_mapping["level_specific"]:
            competencies.extend(role_mapping["level_specific"]["senior"])
        
        # Add custom competencies if specified
        if custom_competencies:
            competencies.extend([comp.strip() for comp in custom_competencies if comp.strip() not in competencies])
        
        return list(set(competencies))  # Remove duplicates
    
    def _generate_questions(self, competencies: List[str], question_types: List[str], 
                           level: str, num_questions: int) -> List[Dict[str, Any]]:
        """Generate questions based on competencies and types."""
        questions = []
        questions_per_competency = max(1, num_questions // len(competencies))
        
        for competency in competencies:
            competency_questions = []
            
            # Add technical questions if requested and available
            if "technical" in question_types and competency in self.technical_questions:
                tech_questions = []
                
                # Get questions for current level and below
                level_order = ["junior", "mid", "senior", "staff", "principal"]
                current_level_idx = level_order.index(level) if level in level_order else 2
                
                for lvl_idx in range(current_level_idx + 1):
                    lvl = level_order[lvl_idx]
                    if lvl in self.technical_questions[competency]:
                        tech_questions.extend(self.technical_questions[competency][lvl])
                
                competency_questions.extend(tech_questions[:questions_per_competency])
            
            # Add behavioral questions if requested
            if "behavioral" in question_types and competency in self.behavioral_questions:
                behavioral_q = self.behavioral_questions[competency][:questions_per_competency]
                competency_questions.extend(behavioral_q)
            
            # Add situational questions (variations of behavioral)
            if "situational" in question_types:
                situational_q = self._generate_situational_questions(competency, questions_per_competency)
                competency_questions.extend(situational_q)
            
            # Ensure we have enough questions for this competency
            while len(competency_questions) < questions_per_competency:
                competency_questions.extend(self._generate_fallback_questions(competency, level))
                if len(competency_questions) >= questions_per_competency:
                    break
            
            questions.extend(competency_questions[:questions_per_competency])
        
        # Shuffle and limit to requested number
        random.shuffle(questions)
        return questions[:num_questions]
    
    def _generate_situational_questions(self, competency: str, count: int) -> List[Dict[str, Any]]:
        """Generate situational questions for a competency."""
        situational_templates = {
            "leadership": [
                {
                    "question": "You're leading a project that's behind schedule and the client is unhappy. How do you handle this situation?",
                    "competency": competency,
                    "type": "situational",
                    "focus_areas": ["crisis_management", "client_communication", "team_leadership"]
                }
            ],
            "collaboration": [
                {
                    "question": "You're working on a cross-functional project and two team members have opposing views on the technical approach. How do you resolve this?",
                    "competency": competency, 
                    "type": "situational",
                    "focus_areas": ["conflict_resolution", "technical_decision_making", "facilitation"]
                }
            ],
            "problem_solving": [
                {
                    "question": "You've been assigned to improve the performance of a critical system, but you have limited time and budget. Walk me through your approach.",
                    "competency": competency,
                    "type": "situational", 
                    "focus_areas": ["prioritization", "resource_constraints", "systematic_approach"]
                }
            ]
        }
        
        if competency in situational_templates:
            return situational_templates[competency][:count]
        return []
    
    def _generate_fallback_questions(self, competency: str, level: str) -> List[Dict[str, Any]]:
        """Generate fallback questions when specific ones aren't available."""
        fallback_questions = [
            {
                "question": f"Describe your experience with {competency.replace('_', ' ')} in your current or previous role.",
                "competency": competency,
                "type": "experience",
                "focus_areas": ["experience_depth", "practical_application"]
            },
            {
                "question": f"What challenges have you faced related to {competency.replace('_', ' ')} and how did you overcome them?",
                "competency": competency,
                "type": "challenge_based",
                "focus_areas": ["problem_solving", "learning_from_experience"]
            }
        ]
        return fallback_questions
    
    def _create_scoring_rubrics(self, questions: List[Dict[str, Any]]) -> Dict[str, Dict]:
        """Create scoring rubrics for the generated questions."""
        rubrics = {}
        
        for i, question in enumerate(questions, 1):
            question_key = f"question_{i}"
            question_type = question.get("type", "behavioral")
            
            if question_type in self.scoring_rubrics:
                rubrics[question_key] = {
                    "question": question["question"],
                    "competency": question["competency"],
                    "type": question_type,
                    "scoring_criteria": self.scoring_rubrics[question_type],
                    "weight": self._determine_question_weight(question),
                    "time_limit": question.get("time_limit", 30)
                }
        
        return rubrics
    
    def _determine_question_weight(self, question: Dict[str, Any]) -> str:
        """Determine the weight/importance of a question."""
        competency = question.get("competency", "")
        question_type = question.get("type", "")
        difficulty = question.get("difficulty", "medium")
        
        # Core competencies get higher weight
        core_competencies = ["coding_fundamentals", "system_design", "leadership", "problem_solving"]
        
        if competency in core_competencies:
            return "high"
        elif question_type in ["coding", "design"] or difficulty == "hard":
            return "high" 
        elif difficulty == "easy":
            return "medium"
        else:
            return "medium"
    
    def _generate_follow_up_probes(self, questions: List[Dict[str, Any]]) -> Dict[str, List[str]]:
        """Generate follow-up probes for each question."""
        probes = {}
        
        for i, question in enumerate(questions, 1):
            question_key = f"question_{i}"
            competency = question.get("competency", "")
            
            # Get competency-specific follow-ups
            if competency in self.follow_up_strategies:
                competency_probes = self.follow_up_strategies[competency].copy()
            else:
                competency_probes = [
                    "Can you provide more specific details about your approach?",
                    "What would you do differently if you had to do this again?",
                    "What challenges did you face and how did you overcome them?"
                ]
            
            # Add question-type specific probes
            question_type = question.get("type", "")
            if question_type == "coding":
                competency_probes.extend([
                    "How would you test this solution?",
                    "What's the time and space complexity of your approach?",
                    "Can you think of any optimizations?"
                ])
            elif question_type == "behavioral":
                competency_probes.extend([
                    "What did you learn from this experience?",
                    "How did others react to your approach?",
                    "What metrics did you use to measure success?"
                ])
            elif question_type == "design":
                competency_probes.extend([
                    "How would you handle failure scenarios?",
                    "What monitoring would you implement?",
                    "How would this scale to 10x the load?"
                ])
            
            probes[question_key] = competency_probes[:5]  # Limit to 5 follow-ups
        
        return probes
    
    def _create_calibration_examples(self, sample_questions: List[Dict[str, Any]]) -> Dict[str, Dict]:
        """Create calibration examples with poor/good/great answers."""
        examples = {}
        
        for i, question in enumerate(sample_questions, 1):
            question_key = f"question_{i}"
            examples[question_key] = {
                "question": question["question"],
                "competency": question["competency"],
                "sample_answers": {
                    "poor_answer": self._generate_sample_answer(question, "poor"),
                    "good_answer": self._generate_sample_answer(question, "good"), 
                    "great_answer": self._generate_sample_answer(question, "great")
                },
                "scoring_rationale": self._generate_scoring_rationale(question)
            }
        
        return examples
    
    def _generate_sample_answer(self, question: Dict[str, Any], quality: str) -> Dict[str, str]:
        """Generate sample answers of different quality levels."""
        competency = question.get("competency", "")
        question_type = question.get("type", "")
        
        if quality == "poor":
            return {
                "answer": f"Sample poor answer for {competency} question - lacks detail, specificity, or demonstrates weak competency",
                "score": "1-2",
                "issues": ["Vague response", "Limited evidence of competency", "Poor structure"]
            }
        elif quality == "good":
            return {
                "answer": f"Sample good answer for {competency} question - adequate detail, demonstrates competency clearly",
                "score": "3", 
                "strengths": ["Clear structure", "Demonstrates competency", "Adequate detail"]
            }
        else:  # great
            return {
                "answer": f"Sample excellent answer for {competency} question - exceptional detail, strong evidence, goes above and beyond",
                "score": "4",
                "strengths": ["Exceptional detail", "Strong evidence", "Strategic thinking", "Goes beyond requirements"]
            }
    
    def _generate_scoring_rationale(self, question: Dict[str, Any]) -> Dict[str, str]:
        """Generate rationale for scoring this question."""
        competency = question.get("competency", "")
        return {
            "key_indicators": f"Look for evidence of {competency.replace('_', ' ')} competency",
            "red_flags": "Vague answers, lack of specifics, negative outcomes without learning",
            "green_flags": "Specific examples, clear impact, demonstrates growth and learning"
        }
    
    def _generate_usage_guidelines(self, role_key: str, level_key: str) -> Dict[str, Any]:
        """Generate usage guidelines for the question bank."""
        return {
            "interview_flow": {
                "warm_up": "Start with 1-2 easier questions to build rapport",
                "core_assessment": "Focus majority of time on core competency questions",
                "closing": "End with questions about candidate's questions/interests"
            },
            "time_management": {
                "technical_questions": "Allow extra time for coding/design questions",
                "behavioral_questions": "Keep to time limits but allow for follow-ups",
                "total_recommendation": "45-75 minutes per interview round"
            },
            "question_selection": {
                "variety": "Mix question types within each competency area",
                "difficulty": "Adjust based on candidate responses and energy",
                "customization": "Adapt questions based on candidate's background"
            },
            "common_mistakes": [
                "Don't ask all questions mechanically",
                "Don't skip follow-up questions",
                "Don't forget to assess cultural fit alongside competencies",
                "Don't let one strong/weak area bias overall assessment"
            ],
            "calibration_reminders": [
                "Compare against role standard, not other candidates",
                "Focus on evidence demonstrated, not potential",
                "Consider level-appropriate expectations",
                "Document specific examples in feedback"
            ]
        }


def format_human_readable(question_bank: Dict[str, Any]) -> str:
    """Format question bank data in human-readable format."""
    output = []
    
    # Header
    output.append(f"Interview Question Bank: {question_bank['role']} ({question_bank['level'].title()} Level)")
    output.append("=" * 70)
    output.append(f"Generated: {question_bank['generated_at']}")
    output.append(f"Total Questions: {question_bank['total_questions']}")
    output.append(f"Question Types: {', '.join(question_bank['question_types'])}")
    output.append(f"Target Competencies: {', '.join(question_bank['competencies'])}")
    output.append("")
    
    # Questions
    output.append("INTERVIEW QUESTIONS")
    output.append("-" * 50)
    
    for i, question in enumerate(question_bank['questions'], 1):
        output.append(f"\n{i}. {question['question']}")
        output.append(f"   Competency: {question['competency'].replace('_', ' ').title()}")
        output.append(f"   Type: {question.get('type', 'N/A').title()}")
        if 'time_limit' in question:
            output.append(f"   Time Limit: {question['time_limit']} minutes")
        if 'focus_areas' in question:
            output.append(f"   Focus Areas: {', '.join(question['focus_areas'])}")
    
    # Scoring Guidelines
    output.append("\n\nSCORING RUBRICS")
    output.append("-" * 50)
    
    # Show sample scoring criteria
    if question_bank['scoring_rubrics']:
        first_question = list(question_bank['scoring_rubrics'].keys())[0]
        sample_rubric = question_bank['scoring_rubrics'][first_question]
        
        output.append(f"Sample Scoring Criteria ({sample_rubric['type']} questions):")
        for criterion, scores in sample_rubric['scoring_criteria'].items():
            output.append(f"\n{criterion.replace('_', ' ').title()}:")
            for score, description in scores.items():
                output.append(f"  {score}: {description}")
    
    # Follow-up Probes
    output.append("\n\nFOLLOW-UP PROBE EXAMPLES")
    output.append("-" * 50)
    
    if question_bank['follow_up_probes']:
        first_question = list(question_bank['follow_up_probes'].keys())[0]
        sample_probes = question_bank['follow_up_probes'][first_question]
        
        output.append("Sample follow-up questions:")
        for probe in sample_probes[:3]:  # Show first 3
            output.append(f"  • {probe}")
    
    # Usage Guidelines
    output.append("\n\nUSAGE GUIDELINES")
    output.append("-" * 50)
    
    guidelines = question_bank['usage_guidelines']
    
    output.append("Interview Flow:")
    for phase, description in guidelines['interview_flow'].items():
        output.append(f"  • {phase.replace('_', ' ').title()}: {description}")
    
    output.append("\nTime Management:")
    for aspect, recommendation in guidelines['time_management'].items():
        output.append(f"  • {aspect.replace('_', ' ').title()}: {recommendation}")
    
    output.append("\nCommon Mistakes to Avoid:")
    for mistake in guidelines['common_mistakes'][:3]:  # Show first 3
        output.append(f"  • {mistake}")
    
    # Calibration Examples (if available)
    if question_bank['calibration_examples']:
        output.append("\n\nCALIBRATION EXAMPLES")
        output.append("-" * 50)
        
        first_example = list(question_bank['calibration_examples'].values())[0]
        output.append(f"Question: {first_example['question']}")
        
        output.append("\nSample Answer Quality Levels:")
        for quality, details in first_example['sample_answers'].items():
            output.append(f"  {quality.replace('_', ' ').title()} (Score {details['score']}):")
            if 'issues' in details:
                output.append(f"    Issues: {', '.join(details['issues'])}")
            if 'strengths' in details:
                output.append(f"    Strengths: {', '.join(details['strengths'])}")
    
    return "\n".join(output)


def main():
    parser = argparse.ArgumentParser(description="Generate comprehensive interview question banks with scoring criteria")
    parser.add_argument("--role", type=str, help="Job role title (e.g., 'Frontend Engineer')")
    parser.add_argument("--level", type=str, default="senior", help="Experience level (junior, mid, senior, staff, principal)")
    parser.add_argument("--competencies", type=str, help="Comma-separated list of competencies to focus on")
    parser.add_argument("--question-types", type=str, help="Comma-separated list of question types (technical, behavioral, situational)")
    parser.add_argument("--num-questions", type=int, default=20, help="Number of questions to generate")
    parser.add_argument("--input", type=str, help="Input JSON file with role requirements")
    parser.add_argument("--output", type=str, help="Output directory or file path")
    parser.add_argument("--format", choices=["json", "text", "both"], default="both", help="Output format")
    
    args = parser.parse_args()
    
    generator = QuestionBankGenerator()
    
    # Handle input
    if args.input:
        try:
            with open(args.input, 'r') as f:
                role_data = json.load(f)
            role = role_data.get('role') or role_data.get('title', '')
            level = role_data.get('level', 'senior')
            competencies = role_data.get('competencies')
            question_types = role_data.get('question_types')
            num_questions = role_data.get('num_questions', 20)
        except Exception as e:
            print(f"Error reading input file: {e}")
            sys.exit(1)
    else:
        if not args.role:
            print("Error: --role is required when not using --input")
            sys.exit(1)
        
        role = args.role
        level = args.level
        competencies = args.competencies.split(',') if args.competencies else None
        question_types = args.question_types.split(',') if args.question_types else None
        num_questions = args.num_questions
    
    # Generate question bank
    try:
        question_bank = generator.generate_question_bank(
            role=role,
            level=level,
            competencies=competencies,
            question_types=question_types,
            num_questions=num_questions
        )
        
        # Handle output
        if args.output:
            output_path = args.output
            if os.path.isdir(output_path):
                safe_role = "".join(c for c in role.lower() if c.isalnum() or c in (' ', '-', '_')).replace(' ', '_')
                base_filename = f"{safe_role}_{level}_questions"
                json_path = os.path.join(output_path, f"{base_filename}.json")
                text_path = os.path.join(output_path, f"{base_filename}.txt")
            else:
                json_path = output_path if output_path.endswith('.json') else f"{output_path}.json"
                text_path = output_path.replace('.json', '.txt') if output_path.endswith('.json') else f"{output_path}.txt"
        else:
            safe_role = "".join(c for c in role.lower() if c.isalnum() or c in (' ', '-', '_')).replace(' ', '_')
            base_filename = f"{safe_role}_{level}_questions"
            json_path = f"{base_filename}.json"
            text_path = f"{base_filename}.txt"
        
        # Write outputs
        if args.format in ["json", "both"]:
            with open(json_path, 'w') as f:
                json.dump(question_bank, f, indent=2, default=str)
            print(f"JSON output written to: {json_path}")
        
        if args.format in ["text", "both"]:
            with open(text_path, 'w') as f:
                f.write(format_human_readable(question_bank))
            print(f"Text output written to: {text_path}")
        
        # Print summary
        print(f"\nQuestion Bank Summary:")
        print(f"Role: {question_bank['role']} ({question_bank['level'].title()})")
        print(f"Total Questions: {question_bank['total_questions']}")
        print(f"Competencies Covered: {len(question_bank['competencies'])}")
        print(f"Question Types: {', '.join(question_bank['question_types'])}")
        
    except Exception as e:
        print(f"Error generating question bank: {e}")
        sys.exit(1)


if __name__ == "__main__":
    main()

Interview Bias Mitigation Checklist

This comprehensive checklist helps identify, prevent, and mitigate various forms of bias in the interview process. Use this as a systematic guide to ensure fair and equitable hiring practices.

Pre-Interview Phase

Job Description & Requirements

Remove unnecessary requirements that don't directly relate to job performance
Avoid gendered language (competitive, aggressive vs. collaborative, detail-oriented)
Remove university prestige requirements unless absolutely necessary for role
Focus on skills and outcomes rather than years of experience in specific technologies
Use inclusive language and avoid cultural assumptions
Specify only essential requirements vs. nice-to-have qualifications
Remove location/commute assumptions for remote-eligible positions
Review requirements for unconscious bias (e.g., assuming continuous work history)

Sourcing & Pipeline

Diversify sourcing channels beyond traditional networks
Partner with diverse professional organizations and communities
Use bias-minimizing sourcing tools and platforms
Track sourcing effectiveness by demographic groups
Train recruiters on bias awareness and inclusive outreach
Review referral patterns for potential network bias
Expand university partnerships beyond elite institutions
Use structured outreach messages to reduce individual bias

Resume Screening

Implement blind resume review (remove names, photos, university names initially)
Use standardized screening criteria applied consistently
Multiple screeners for each resume with independent scoring
Focus on relevant skills and achievements over pedigree indicators
Avoid assumptions about career gaps or non-traditional backgrounds
Consider alternative paths to skills (bootcamps, self-taught, career changes)
Track screening pass rates by demographic groups
Regular screener calibration sessions on bias awareness

Interview Panel Composition

Diversity Requirements

Ensure diverse interview panels (gender, ethnicity, seniority levels)
Include at least one underrepresented interviewer when possible
Rotate panel assignments to prevent bias patterns
Balance seniority levels on panels (not all senior or all junior)
Include cross-functional perspectives when relevant
Avoid panels of only one demographic group when possible
Consider panel member unconscious bias training status
Document panel composition rationale for future review

Interviewer Selection

Choose interviewers based on relevant competency assessment ability
Ensure interviewers have completed bias training within last 12 months
Select interviewers with consistent calibration history
Avoid interviewers with known bias patterns (flagged in previous analyses)
Include at least one interviewer familiar with candidate's background type
Balance perspectives (technical depth, cultural fit, growth potential)
Consider interviewer availability for proper preparation time
Ensure interviewers understand role requirements and standards

Interview Process Design

Question Standardization

Use standardized question sets for each competency area
Develop questions that assess skills, not culture fit stereotypes
Avoid questions about personal background unless directly job-relevant
Remove questions that could reveal protected characteristics
Focus on behavioral examples using STAR method
Include scenario-based questions with clear evaluation criteria
Test questions for potential bias with diverse interviewers
Regularly update question bank based on effectiveness data

Structured Interview Protocol

Define clear time allocations for each question/section
Establish consistent interview flow across all candidates
Create standardized intro/outro processes
Use identical technical setup and tools for all candidates
Provide same background information to all interviewers
Standardize note-taking format and requirements
Define clear handoff procedures between interviewers
Document any deviations from standard protocol

Accommodation Preparation

Proactively offer accommodations without requiring disclosure
Provide multiple interview format options (phone, video, in-person)
Ensure accessibility of interview locations and tools
Allow extended time when requested or needed
Provide materials in advance when helpful
Train interviewers on accommodation protocols
Test all technology for accessibility compliance
Have backup plans for technical issues

During the Interview

Interviewer Behavior

Use welcoming, professional tone with all candidates
Avoid assumptions based on appearance or background
Give equal encouragement and support to all candidates
Allow equal time for candidate questions
Avoid leading questions that suggest desired answers
Listen actively without interrupting unnecessarily
Take detailed notes focusing on responses, not impressions
Avoid small talk that could reveal irrelevant personal information

Question Delivery

Ask questions as written without improvisation that could introduce bias
Provide equal clarification when candidates ask for it
Use consistent follow-up probing across candidates
Allow reasonable thinking time before expecting responses
Avoid rephrasing questions in ways that give hints
Stay focused on defined competencies being assessed
Give equal encouragement for elaboration when needed
Maintain professional demeanor regardless of candidate background

Real-time Bias Checking

Notice first impressions but don't let them drive assessment
Question gut reactions - are they based on competency evidence?
Focus on specific examples and evidence provided
Avoid pattern matching to existing successful employees
Notice cultural assumptions in interpretation of responses
Check for confirmation bias - seeking evidence to support initial impressions
Consider alternative explanations for candidate responses
Stay aware of fatigue effects on judgment throughout the day

Evaluation & Scoring

Scoring Consistency

Use defined rubrics consistently across all candidates
Score immediately after interview while details are fresh
Focus scoring on demonstrated competencies not potential or personality
Provide specific evidence for each score given
Avoid comparative scoring (comparing candidates to each other)
Use calibrated examples of each score level
Score independently before discussing with other interviewers
Document reasoning for all scores, especially extreme ones (1s and 4s)

Bias Check Questions

"Would I score this differently if the candidate looked different?"
"Am I basing this on evidence or assumptions?"
"Would this response get the same score from a different demographic?"
"Am I penalizing non-traditional backgrounds or approaches?"
"Is my scoring consistent with the defined rubric?"
"Am I letting one strong/weak area bias overall assessment?"
"Are my cultural assumptions affecting interpretation?"
"Would I want to work with this person?" (Check if this is biasing assessment)

Documentation Requirements

Record specific examples supporting each competency score
Avoid subjective language like "seems like," "appears to be"
Focus on observable behaviors and concrete responses
Note exact quotes when relevant to assessment
Distinguish between facts and interpretations
Provide improvement suggestions that are skill-based, not person-based
Avoid comparative language to other candidates or employees
Use neutral language free from cultural assumptions

Debrief Process

Structured Discussion

Start with independent score sharing before discussion
Focus discussion on evidence not impressions or feelings
Address significant score discrepancies with evidence review
Challenge biased language or assumptions in discussion
Ensure all voices are heard in group decision making
Document reasons for final decision with specific evidence
Avoid personality-based discussions ("culture fit" should be evidence-based)
Consider multiple perspectives on candidate responses

Decision-Making Process

Use weighted scoring system based on role requirements
Require minimum scores in critical competency areas
Avoid veto power unless based on clear, documented evidence
Consider growth potential fairly across all candidates
Document dissenting opinions and reasoning
Use tie-breaking criteria that are predetermined and fair
Consider additional data collection if team is split
Make final decision based on role requirements, not team preferences

Final Recommendations

Provide specific, actionable feedback for development areas
Focus recommendations on skills and competencies
Avoid language that could reflect bias in written feedback
Consider onboarding needs based on actual skill gaps, not assumptions
Provide coaching recommendations that are evidence-based
Avoid personal judgments about candidate character or personality
Make hiring recommendation based solely on job-relevant criteria
Document any concerns with specific, observable evidence

Post-Interview Monitoring

Data Collection

Track interviewer scoring patterns for consistency analysis
Monitor pass rates by demographic groups
Collect candidate experience feedback on interview fairness
Analyze score distributions for potential bias indicators
Track time-to-decision across different candidate types
Monitor offer acceptance rates by demographics
Collect new hire performance data for process validation
Document any bias incidents or concerns raised

Regular Analysis

Conduct quarterly bias audits of interview data
Review interviewer calibration and identify outliers
Analyze demographic trends in hiring outcomes
Compare candidate experience surveys across groups
Track correlation between interview scores and job performance
Review and update bias mitigation strategies based on data
Share findings with interview teams for continuous improvement
Update training programs based on identified bias patterns

Bias Types to Watch For

Affinity Bias

Definition: Favoring candidates similar to yourself
Watch for: Over-positive response to shared backgrounds, interests, or experiences
Mitigation: Focus on job-relevant competencies, diversify interview panels

Halo/Horn Effect

Definition: One positive/negative trait influencing overall assessment
Watch for: Strong performance in one area affecting scores in unrelated areas
Mitigation: Score each competency independently, use structured evaluation

Confirmation Bias

Definition: Seeking information that confirms initial impressions
Watch for: Asking follow-ups that lead candidate toward expected responses
Mitigation: Use standardized questions, consider alternative interpretations

Attribution Bias

Definition: Attributing success/failure to different causes based on candidate demographics
Watch for: Assuming women are "lucky" vs. men are "skilled" for same achievements
Mitigation: Focus on candidate's role in achievements, avoid assumptions

Cultural Bias

Definition: Judging candidates based on cultural differences rather than job performance
Watch for: Penalizing communication styles, work approaches, or values that differ from team norm
Mitigation: Define job-relevant criteria clearly, consider diverse perspectives valuable

Educational Bias

Definition: Over-weighting prestigious educational credentials
Watch for: Assuming higher capability based on school rank rather than demonstrated skills
Mitigation: Focus on skills demonstration, consider alternative learning paths

Experience Bias

Definition: Requiring specific company or industry experience unnecessarily
Watch for: Discounting transferable skills from different industries or company sizes
Mitigation: Define core skills needed, assess adaptability and learning ability

Emergency Bias Response Protocol

During Interview

Pause the interview if significant bias is observed
Privately address bias with interviewer if possible
Document the incident for review
Continue with fair assessment of candidate
Flag for debrief discussion if interview continues

Post-Interview

Report bias incidents to hiring manager/HR immediately
Document specific behaviors observed
Consider additional interviewer for second opinion
Review candidate assessment for bias impact
Implement corrective actions for future interviews

Interviewer Coaching

Provide immediate feedback on bias observed
Schedule bias training refresher if needed
Monitor future interviews for improvement
Consider removing from interview rotation if bias persists
Document coaching provided for performance management

Legal Compliance Reminders

Protected Characteristics

Age, race, color, religion, sex, national origin, disability status, veteran status
Pregnancy, genetic information, sexual orientation, gender identity
Any other characteristics protected by local/state/federal law

Prohibited Questions

Questions about family planning, marital status, pregnancy
Age-related questions (unless BFOQ)
Religious or political affiliations
Disability status (unless voluntary disclosure for accommodation)
Arrest records (without conviction relevance)
Financial status or credit (unless job-relevant)

Documentation Requirements

Keep all interview materials for required retention period
Ensure consistent documentation across all candidates
Avoid documenting protected characteristic observations
Focus documentation on job-relevant observations only

Training & Certification

Required Training Topics

Unconscious bias awareness and mitigation
Structured interviewing techniques
Legal compliance in hiring
Company-specific bias mitigation protocols
Role-specific competency assessment
Accommodation and accessibility requirements

Ongoing Development

Annual bias training refresher
Quarterly calibration sessions
Regular updates on legal requirements
Peer feedback and coaching
Industry best practice updates
Data-driven process improvements

This checklist should be reviewed and updated regularly based on legal requirements, industry best practices, and internal bias analysis results.

Competency Matrix Templates

This document provides comprehensive competency matrix templates for different engineering roles and levels. Use these matrices to design role-specific interview loops and evaluation criteria.

Software Engineering Competency Matrix

Technical Competencies

Competency	Junior (L1-L2)	Mid (L3-L4)	Senior (L5-L6)	Staff+ (L7+)
Coding & Algorithms	Basic data structures, simple algorithms, language syntax	Advanced algorithms, complexity analysis, optimization	Complex problem solving, algorithm design, performance tuning	Architecture-level algorithmic decisions, novel approach design
System Design	Component interactions, basic scalability concepts	Service design, database modeling, API design	Distributed systems, scalability patterns, trade-off analysis	Large-scale architecture, cross-system design, technology strategy
Code Quality	Readable code, basic testing, follows conventions	Maintainable code, comprehensive testing, design patterns	Code reviews, quality standards, refactoring leadership	Engineering standards, quality culture, technical debt management
Debugging & Problem Solving	Basic debugging, structured problem approach	Complex debugging, root cause analysis, performance issues	System-wide debugging, production issues, incident response	Cross-system troubleshooting, preventive measures, tooling design
Domain Knowledge	Learning role-specific technologies	Proficiency in domain tools/frameworks	Deep domain expertise, technology evaluation	Domain leadership, technology roadmap, innovation

Behavioral Competencies

Competency	Junior (L1-L2)	Mid (L3-L4)	Senior (L5-L6)	Staff+ (L7+)
Communication	Clear status updates, asks good questions	Technical explanations, stakeholder updates	Cross-functional communication, technical writing	Executive communication, external representation, thought leadership
Collaboration	Team participation, code reviews	Cross-team projects, knowledge sharing	Team leadership, conflict resolution	Cross-org collaboration, culture building, strategic partnerships
Leadership & Influence	Peer mentoring, positive attitude	Junior mentoring, project ownership	Team guidance, technical decisions, hiring	Org-wide influence, vision setting, culture change
Growth & Learning	Skill development, feedback receptivity	Proactive learning, teaching others	Continuous improvement, trend awareness	Learning culture, industry leadership, innovation adoption
Ownership & Initiative	Task completion, quality focus	Project ownership, process improvement	Feature/service ownership, strategic thinking	Product/platform ownership, business impact, market influence

Product Management Competency Matrix

Product Competencies

Competency	Associate PM (L1-L2)	PM (L3-L4)	Senior PM (L5-L6)	Principal PM (L7+)
Product Strategy	Feature requirements, user stories	Product roadmaps, market analysis	Business strategy, competitive positioning	Portfolio strategy, market creation, platform vision
User Research & Analytics	Basic user interviews, metrics tracking	Research design, data interpretation	Research strategy, advanced analytics	Research culture, measurement frameworks, insight generation
Technical Understanding	Basic tech concepts, API awareness	System architecture, technical trade-offs	Technical strategy, platform decisions	Technology vision, architectural influence, innovation leadership
Execution & Process	Feature delivery, stakeholder coordination	Project management, cross-functional leadership	Process optimization, team scaling	Operational excellence, org design, strategic execution
Business Acumen	Revenue awareness, customer understanding	P&L understanding, business case development	Business strategy, market dynamics	Corporate strategy, board communication, investor relations

Leadership Competencies

Competency	Associate PM (L1-L2)	PM (L3-L4)	Senior PM (L5-L6)	Principal PM (L7+)
Stakeholder Management	Team collaboration, clear communication	Cross-functional alignment, expectation management	Executive communication, influence without authority	Board interaction, external partnerships, industry influence
Team Development	Peer learning, feedback sharing	Junior mentoring, knowledge transfer	Team building, hiring, performance management	Talent development, culture building, org leadership
Decision Making	Data-driven decisions, priority setting	Complex trade-offs, strategic choices	Ambiguous situations, high-stakes decisions	Strategic vision, transformational decisions, risk management
Innovation & Vision	Creative problem solving, user empathy	Market opportunity identification, feature innovation	Product vision, market strategy	Industry vision, disruptive thinking, platform creation

Design Competency Matrix

Design Competencies

Competency	Junior Designer (L1-L2)	Mid Designer (L3-L4)	Senior Designer (L5-L6)	Principal Designer (L7+)
Visual Design	UI components, typography, color theory	Design systems, visual hierarchy	Brand integration, advanced layouts	Visual strategy, brand evolution, design innovation
User Experience	User flows, wireframing, prototyping	Interaction design, usability testing	Experience strategy, journey mapping	UX vision, service design, behavioral insights
Research & Validation	User interviews, usability tests	Research planning, data synthesis	Research strategy, methodology design	Research culture, insight frameworks, market research
Design Systems	Component usage, style guides	System contribution, pattern creation	System architecture, governance	System strategy, scalable design, platform thinking
Tools & Craft	Design software proficiency, asset creation	Advanced techniques, workflow optimization	Tool evaluation, process design	Technology integration, future tooling, craft evolution

Collaboration Competencies

Competency	Junior Designer (L1-L2)	Mid Designer (L3-L4)	Senior Designer (L5-L6)	Principal Designer (L7+)
Cross-functional Partnership	Engineering collaboration, handoff quality	Product partnership, stakeholder alignment	Leadership collaboration, strategic alignment	Executive partnership, business strategy integration
Communication & Advocacy	Design rationale, feedback integration	Design presentations, user advocacy	Executive communication, design thinking evangelism	Industry thought leadership, external representation
Mentorship & Growth	Peer learning, skill sharing	Junior mentoring, critique facilitation	Team development, hiring, career guidance	Design culture, talent strategy, industry leadership
Business Impact	User-centered thinking, design quality	Feature success, user satisfaction	Business metrics, strategic impact	Market influence, competitive advantage, innovation leadership

Data Science Competency Matrix

Technical Competencies

Competency	Junior DS (L1-L2)	Mid DS (L3-L4)	Senior DS (L5-L6)	Principal DS (L7+)
Statistical Analysis	Descriptive stats, hypothesis testing	Advanced statistics, experimental design	Causal inference, advanced modeling	Statistical strategy, methodology innovation
Machine Learning	Basic ML algorithms, model training	Advanced ML, feature engineering	ML systems, model deployment	ML strategy, AI platform, research direction
Data Engineering	SQL, basic ETL, data cleaning	Pipeline design, data modeling	Platform architecture, scalable systems	Data strategy, infrastructure vision, governance
Programming & Tools	Python/R proficiency, visualization	Advanced programming, tool integration	Software engineering, system design	Technology strategy, platform development, innovation
Domain Expertise	Business understanding, metric interpretation	Domain modeling, insight generation	Strategic analysis, business integration	Market expertise, competitive intelligence, thought leadership

Impact & Leadership Competencies

Competency	Junior DS (L1-L2)	Mid DS (L3-L4)	Senior DS (L5-L6)	Principal DS (L7+)
Business Impact	Metric improvement, insight delivery	Project leadership, business case development	Strategic initiatives, P&L impact	Business transformation, market advantage, innovation
Communication	Technical reporting, visualization	Stakeholder presentations, executive briefings	Board communication, external representation	Industry leadership, thought leadership, market influence
Team Leadership	Peer collaboration, knowledge sharing	Junior mentoring, project management	Team building, hiring, culture development	Organizational leadership, talent strategy, vision setting
Innovation & Research	Algorithm implementation, experimentation	Research projects, publication	Research strategy, academic partnerships	Research vision, industry influence, breakthrough innovation

DevOps Engineering Competency Matrix

Technical Competencies

Competency	Junior DevOps (L1-L2)	Mid DevOps (L3-L4)	Senior DevOps (L5-L6)	Principal DevOps (L7+)
Infrastructure	Basic cloud services, server management	Infrastructure automation, containerization	Platform architecture, multi-cloud strategy	Infrastructure vision, emerging technologies, industry standards
CI/CD & Automation	Pipeline basics, script writing	Advanced pipelines, deployment automation	Platform design, workflow optimization	Automation strategy, developer experience, productivity platforms
Monitoring & Observability	Basic monitoring, log analysis	Advanced monitoring, alerting systems	Observability strategy, SLA/SLI design	Monitoring vision, reliability engineering, performance culture
Security & Compliance	Security basics, access management	Security automation, compliance frameworks	Security architecture, risk management	Security strategy, governance, industry leadership
Performance & Scalability	Performance monitoring, basic optimization	Capacity planning, performance tuning	Scalability architecture, cost optimization	Performance strategy, efficiency platforms, innovation

Leadership & Impact Competencies

Competency	Junior DevOps (L1-L2)	Mid DevOps (L3-L4)	Senior DevOps (L5-L6)	Principal DevOps (L7+)
Developer Experience	Tool support, documentation	Platform development, self-service tools	Developer productivity, workflow design	Developer platform vision, industry best practices
Incident Management	Incident response, troubleshooting	Incident coordination, root cause analysis	Incident strategy, prevention systems	Reliability culture, organizational resilience
Team Collaboration	Cross-team support, knowledge sharing	Process improvement, training delivery	Culture building, practice evangelism	Organizational transformation, industry influence
Strategic Impact	Operational excellence, cost awareness	Efficiency improvements, platform adoption	Strategic initiatives, business enablement	Technology strategy, competitive advantage, market leadership

Engineering Management Competency Matrix

People Leadership Competencies

Competency	Manager (L1-L2)	Senior Manager (L3-L4)	Director (L5-L6)	VP+ (L7+)
Team Building	Hiring, onboarding, 1:1s	Team culture, performance management	Multi-team coordination, org design	Organizational culture, talent strategy
Performance Management	Individual development, feedback	Performance systems, coaching	Calibration across teams, promotion standards	Talent development, succession planning
Communication	Team updates, stakeholder management	Executive communication, cross-functional alignment	Board updates, external communication	Industry representation, thought leadership
Conflict Resolution	Team conflicts, process improvements	Cross-team issues, organizational friction	Strategic alignment, cultural challenges	Corporate-level conflicts, crisis management

Technical Leadership Competencies

Competency	Manager (L1-L2)	Senior Manager (L3-L4)	Director (L5-L6)	VP+ (L7+)
Technical Vision	Team technical decisions, architecture input	Platform strategy, technology choices	Technical roadmap, innovation strategy	Technology vision, industry standards
System Ownership	Feature/service ownership, quality standards	Platform ownership, scalability planning	System portfolio, technical debt management	Technology strategy, competitive advantage
Process & Practice	Team processes, development practices	Engineering standards, quality systems	Process innovation, best practices	Engineering culture, industry influence
Technology Strategy	Tool evaluation, team technology choices	Platform decisions, technical investments	Technology portfolio, strategic architecture	Corporate technology strategy, market leadership

Usage Guidelines

Assessment Approach

Level Calibration: Use these matrices to calibrate expectations for each level within your organization
Interview Design: Select competencies most relevant to the specific role and level being hired for
Evaluation Consistency: Ensure all interviewers understand and apply the same competency standards
Growth Planning: Use matrices for career development and promotion discussions

Customization Tips

Industry Adaptation: Modify competencies based on your industry (fintech, healthcare, etc.)
Company Stage: Adjust expectations based on startup vs. enterprise environment
Team Needs: Emphasize competencies most critical for current team challenges
Cultural Fit: Add company-specific values and cultural competencies

Common Pitfalls

Unrealistic Expectations: Don't expect senior-level competencies from junior candidates
One-Size-Fits-All: Customize competency emphasis based on role requirements
Static Assessment: Regularly update matrices based on changing business needs
Bias Introduction: Ensure competencies are measurable and don't introduce unconscious bias

Matrix Validation Process

Regular Review Cycle

Quarterly: Review competency relevance and adjust weights
Semi-annually: Update level expectations based on market standards
Annually: Comprehensive review with stakeholder feedback

Stakeholder Input

Hiring Managers: Validate role-specific competency requirements
Current Team Members: Confirm level expectations match reality
Recent Hires: Gather feedback on assessment accuracy
HR Partners: Ensure legal compliance and bias mitigation

Continuous Improvement

Performance Correlation: Track new hire performance against competency assessments
Market Benchmarking: Compare standards with industry peers
Feedback Integration: Incorporate interviewer and candidate feedback
Bias Monitoring: Regular analysis of assessment patterns across demographics

Interview Debrief Facilitation Guide

This guide provides a comprehensive framework for conducting effective, unbiased interview debriefs that lead to consistent hiring decisions. Use this to facilitate productive discussions that focus on evidence-based evaluation.

Pre-Debrief Preparation

Facilitator Responsibilities

Review all interviewer feedback before the meeting
Identify significant score discrepancies that need discussion
Prepare discussion agenda with time allocations
Gather role requirements and competency framework
Review any flags or special considerations noted during interviews
Ensure all required materials are available (scorecards, rubrics, candidate resume)
Set up meeting logistics (room, video conference, screen sharing)
Send agenda to participants 30 minutes before meeting

Required Materials Checklist

Candidate resume and application materials
Job description and competency requirements
Individual interviewer scorecards
Scoring rubrics and competency definitions
Interview notes and documentation
Any technical assessments or work samples
Company hiring standards and calibration examples
Bias mitigation reminders and prompts

Participant Preparation Requirements

All interviewers must complete independent scoring before debrief
Submit written feedback with specific evidence for each competency
Review scoring rubrics to ensure consistent interpretation
Prepare specific examples to support scoring decisions
Flag any concerns or unusual circumstances that affected assessment
Avoid discussing candidate with other interviewers before debrief
Come prepared to defend scores with concrete evidence
Be ready to adjust scores based on additional evidence shared

Debrief Meeting Structure

Opening (5 minutes)

State meeting purpose: Make hiring decision based on evidence
Review agenda and time limits: Keep discussion focused and productive
Remind of bias mitigation principles: Focus on competencies, not personality
Confirm confidentiality: Discussion stays within hiring team
Establish ground rules: One person speaks at a time, evidence-based discussion

Individual Score Sharing (10-15 minutes)

Go around the room systematically - each interviewer shares scores independently
No discussion or challenges yet - just data collection
Record scores on shared document visible to all participants
Note any abstentions or "insufficient data" responses
Identify clear patterns and discrepancies without commentary
Flag any scores requiring explanation (1s or 4s typically need strong evidence)

Competency-by-Competency Discussion (30-40 minutes)

For Each Core Competency:

1. Present Score Distribution (2 minutes)

Display all scores for this competency
Note range and any outliers
Identify if consensus exists or discussion needed

2. Evidence Sharing (5-8 minutes per competency)

Start with interviewers who assessed this competency directly
Share specific examples and observations
Focus on what candidate said/did, not interpretations
Allow questions for clarification (not challenges yet)

3. Discussion and Calibration (3-5 minutes)

Address significant discrepancies (>1 point difference)
Challenge vague or potentially biased language
Seek additional evidence if needed
Allow score adjustments based on new information
Reach consensus or note dissenting views

Structured Discussion Questions:

"What specific evidence supports this score?"
"Can you provide the exact example or quote?"
"How does this compare to our rubric definition?"
"Would this response receive the same score regardless of who gave it?"
"Are we evaluating the competency or making assumptions?"
"What would need to change for this to be the next level up/down?"

Overall Recommendation Discussion (10-15 minutes)

Weighted Score Calculation

Apply competency weights based on role requirements
Calculate overall weighted average
Check minimum threshold requirements
Consider any veto criteria (critical competency failures)

Final Recommendation Options

Strong Hire: Exceeds requirements in most areas, clear value-add
Hire: Meets requirements with growth potential
No Hire: Doesn't meet minimum requirements for success
Strong No Hire: Significant gaps that would impact team/company

Decision Rationale Documentation

Summarize key strengths with specific evidence
Identify development areas with specific examples
Explain final recommendation with competency-based reasoning
Note any dissenting opinions and reasoning
Document onboarding considerations if hiring

Closing and Next Steps (5 minutes)

Confirm final decision and documentation
Assign follow-up actions (feedback delivery, offer preparation, etc.)
Schedule any additional interviews if needed
Review timeline for candidate communication
Remind confidentiality of discussion and decision

Facilitation Best Practices

Creating Psychological Safety

Encourage honest feedback without fear of judgment
Validate different perspectives and assessment approaches
Address power dynamics - ensure junior voices are heard
Model vulnerability - admit when evidence changes your mind
Focus on learning and calibration, not winning arguments
Thank participants for thorough preparation and thoughtful input

Managing Difficult Conversations

When Scores Vary Significantly

Acknowledge the discrepancy without judgment
Ask for specific evidence from each scorer
Look for different interpretations of the same data
Consider if different questions revealed different competency levels
Check for bias patterns in reasoning
Allow time for reflection and potential score adjustments

When Someone Uses Biased Language

Pause the conversation gently but firmly
Ask for specific evidence behind the assessment
Reframe in competency terms - "What specific skills did this demonstrate?"
Challenge assumptions - "Help me understand how we know that"
Redirect to rubric - "How does this align with our scoring criteria?"
Document and follow up privately if bias persists

When the Discussion Gets Off Track

Redirect to competencies: "Let's focus on the technical skills demonstrated"
Ask for evidence: "What specific example supports that assessment?"
Reference rubrics: "How does this align with our level 3 definition?"
Manage time: "We have 5 minutes left on this competency"
Table unrelated issues: "That's important but separate from this hire decision"

Encouraging Evidence-Based Discussion

Good Evidence Examples

Direct quotes: "When asked about debugging, they said..."
Specific behaviors: "They organized their approach by first..."
Observable outcomes: "Their code compiled on first run and handled edge cases"
Process descriptions: "They walked through their problem-solving step by step"
Measurable results: "They identified 3 optimization opportunities"

Poor Evidence Examples

Gut feelings: "They just seemed off"
Comparisons: "Not as strong as our last hire"
Assumptions: "Probably wouldn't fit our culture"
Vague impressions: "Didn't seem passionate"
Irrelevant factors: "Their background is different from ours"

Managing Group Dynamics

Ensuring Equal Participation

Direct questions to quieter participants
Prevent interrupting and ensure everyone finishes thoughts
Balance speaking time across all interviewers
Validate minority opinions even if not adopted
Check for unheard perspectives before finalizing decisions

Handling Strong Personalities

Set time limits for individual speaking
Redirect monopolizers: "Let's hear from others on this"
Challenge confidently stated opinions that lack evidence
Support less assertive voices in expressing dissenting views
Focus on data, not personality or seniority in decision making

Bias Interruption Strategies

Affinity Bias Interruption

Notice pattern: Positive assessment seems based on shared background/interests
Interrupt with: "Let's focus on the job-relevant skills they demonstrated"
Redirect to: Specific competency evidence and measurable outcomes
Document: Note if personal connection affected professional assessment

Halo/Horn Effect Interruption

Notice pattern: One area strongly influencing assessment of unrelated areas
Interrupt with: "Let's score each competency independently"
Redirect to: Specific evidence for each individual competency area
Recalibrate: Ask for separate examples supporting each score

Confirmation Bias Interruption

Notice pattern: Only seeking/discussing evidence that supports initial impression
Interrupt with: "What evidence might suggest a different assessment?"
Redirect to: Consider alternative interpretations of the same data
Challenge: "How might we be wrong about this assessment?"

Attribution Bias Interruption

Notice pattern: Attributing success to luck/help for some demographics, skill for others
Interrupt with: "What role did the candidate play in achieving this outcome?"
Redirect to: Candidate's specific contributions and decision-making
Standardize: Apply same attribution standards across all candidates

Decision Documentation Framework

Required Documentation Elements

Final scores for each assessed competency
Overall recommendation with supporting rationale
Key strengths with specific evidence
Development areas with specific examples
Dissenting opinions if any, with reasoning
Special considerations or accommodation needs
Next steps and timeline for decision communication

Evidence Quality Standards

Specific and observable: What exactly did the candidate do or say?
Job-relevant: How does this relate to success in the role?
Measurable: Can this be quantified or clearly described?
Unbiased: Would this evidence be interpreted the same way regardless of candidate demographics?
Complete: Does this represent the full picture of their performance in this area?

Writing Guidelines

Use active voice and specific language
Avoid assumptions about motivations or personality
Focus on behaviors demonstrated during the interview
Provide context for any unusual circumstances
Be constructive in describing development areas
Maintain professionalism and respect for candidate

Common Debrief Challenges and Solutions

Challenge: "I just don't think they'd fit our culture"

Solution:

Ask for specific, observable evidence
Define what "culture fit" means in job-relevant terms
Challenge assumptions about cultural requirements
Focus on ability to collaborate and contribute effectively

Challenge: Scores vary widely with no clear explanation

Solution:

Review if different interviewers assessed different competencies
Look for question differences that might explain variance
Consider if candidate performance varied across interviews
May need additional data gathering or interview

Challenge: Everyone loved/hated the candidate but can't articulate why

Solution:

Push for specific evidence supporting emotional reactions
Review competency rubrics together
Look for halo/horn effects influencing overall impression
Consider unconscious bias training for team

Challenge: Technical vs. non-technical interviewers disagree

Solution:

Clarify which competencies each interviewer was assessing
Ensure technical assessments carry appropriate weight
Look for different perspectives on same evidence
Consider specialist input for technical decisions

Challenge: Senior interviewer dominates decision making

Solution:

Structure discussion to hear from all levels first
Ask direct questions to junior interviewers
Challenge opinions that lack supporting evidence
Remember that assessment ability doesn't correlate with seniority

Challenge: Team wants to hire but scores don't support it

Solution:

Review if rubrics match actual job requirements
Check for consistent application of scoring standards
Consider if additional competencies need assessment
May indicate need for rubric calibration or role requirement review

Post-Debrief Actions

Immediate Actions (Same Day)

Finalize decision documentation with all evidence
Communicate decision to recruiting team
Schedule candidate feedback delivery if applicable
Update interview scheduling based on decision
Note any process improvements needed for future

Follow-up Actions (Within 1 Week)

Deliver candidate feedback (internal or external)
Update interview feedback in tracking system
Schedule any additional interviews if needed
Begin offer process if hiring
Document lessons learned for process improvement

Long-term Actions (Monthly/Quarterly)

Analyze debrief effectiveness and decision quality
Review interviewer calibration based on decisions
Update rubrics based on debrief insights
Provide additional training if bias patterns identified
Share successful practices with other hiring teams

Continuous Improvement Framework

Debrief Effectiveness Metrics

Decision consistency: Are similar candidates receiving similar decisions?
Time to decision: Are debriefs completing within planned time?
Participation quality: Are all interviewers contributing evidence-based input?
Bias incidents: How often are bias interruptions needed?
Decision satisfaction: Do participants feel good about the process and outcome?

Regular Review Process

Monthly: Review debrief facilitation effectiveness and interviewer feedback
Quarterly: Analyze decision patterns and potential bias indicators
Semi-annually: Update debrief processes based on hiring outcome data
Annually: Comprehensive review of debrief framework and training needs

Training and Calibration

New facilitators: Shadow 3-5 debriefs before leading independently
All facilitators: Quarterly calibration sessions on bias interruption
Interviewer training: Include debrief participation expectations
Leadership training: Ensure hiring managers can facilitate effectively

This guide should be adapted to your organization's specific needs while maintaining focus on evidence-based, unbiased decision making.

#!/usr/bin/env python3
"""Generate an interview loop plan by role and level."""

from __future__ import annotations

import argparse
import json
from typing import Dict, List

BASE_ROUNDS = {
    "junior": [
        ("Screen", 45, "Fundamentals and communication"),
        ("Coding", 60, "Problem solving and code quality"),
        ("Behavioral", 45, "Collaboration and growth mindset"),
    ],
    "mid": [
        ("Screen", 45, "Fundamentals and ownership"),
        ("Coding", 60, "Implementation quality"),
        ("System Design", 60, "Service/component design"),
        ("Behavioral", 45, "Stakeholder collaboration"),
    ],
    "senior": [
        ("Screen", 45, "Depth and tradeoff reasoning"),
        ("Coding", 60, "Code quality and testing"),
        ("System Design", 75, "Scalability and reliability"),
        ("Leadership", 60, "Mentoring and decision making"),
        ("Behavioral", 45, "Cross-functional influence"),
    ],
    "staff": [
        ("Screen", 45, "Strategic and technical depth"),
        ("Architecture", 90, "Org-level design decisions"),
        ("Technical Strategy", 60, "Long-term tradeoffs"),
        ("Influence", 60, "Cross-team leadership"),
        ("Behavioral", 45, "Values and executive communication"),
    ],
}

QUESTION_BANK = {
    "coding": [
        "Walk through your approach before coding and identify tradeoffs.",
        "How would you test this implementation for edge cases?",
        "What would you refactor if this code became a shared library?",
    ],
    "system": [
        "Design this system for 10x traffic growth in 12 months.",
        "Where are the main failure modes and how would you detect them?",
        "What components would you scale first and why?",
    ],
    "leadership": [
        "Describe a time you changed technical direction with incomplete information.",
        "How do you raise the bar for code quality across a team?",
        "How do you handle disagreement between product and engineering priorities?",
    ],
    "behavioral": [
        "Tell me about a high-stakes mistake and what changed afterward.",
        "Describe a conflict where you had to influence without authority.",
        "How do you support underperforming teammates?",
    ],
}


def normalize_level(level: str) -> str:
    level = level.strip().lower()
    if level in {"staff+", "principal", "lead"}:
        return "staff"
    if level not in BASE_ROUNDS:
        raise ValueError(f"Unsupported level: {level}")
    return level


def suggested_questions(round_name: str) -> List[str]:
    name = round_name.lower()
    if "coding" in name:
        return QUESTION_BANK["coding"]
    if "system" in name or "architecture" in name:
        return QUESTION_BANK["system"]
    if "lead" in name or "influence" in name or "strategy" in name:
        return QUESTION_BANK["leadership"]
    return QUESTION_BANK["behavioral"]


def generate_plan(role: str, level: str) -> Dict[str, object]:
    normalized = normalize_level(level)
    rounds = []
    for idx, (name, minutes, focus) in enumerate(BASE_ROUNDS[normalized], start=1):
        rounds.append(
            {
                "round": idx,
                "name": name,
                "duration_minutes": minutes,
                "focus": focus,
                "suggested_questions": suggested_questions(name),
            }
        )
    return {
        "role": role,
        "level": normalized,
        "total_rounds": len(rounds),
        "total_minutes": sum(r["duration_minutes"] for r in rounds),
        "rounds": rounds,
    }


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(description="Generate an interview loop plan for a role and level.")
    parser.add_argument("--role", required=True, help="Role name (e.g., Senior Software Engineer)")
    parser.add_argument("--level", required=True, help="Level: junior|mid|senior|staff")
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    return parser.parse_args()


def main() -> int:
    args = parse_args()
    plan = generate_plan(args.role, args.level)

    if args.json:
        print(json.dumps(plan, indent=2))
    else:
        print(f"Interview Plan: {plan['role']} ({plan['level']})")
        print(f"Total rounds: {plan['total_rounds']} | Total time: {plan['total_minutes']} minutes")
        print("")
        for r in plan["rounds"]:
            print(f"Round {r['round']}: {r['name']} ({r['duration_minutes']} min)")
            print(f"Focus: {r['focus']}")
            for q in r["suggested_questions"]:
                print(f"- {q}")
            print("")

    return 0


if __name__ == "__main__":
    raise SystemExit(main())

Install this Skill

Skills give your AI agent a consistent, structured approach to this task — better output than a one-off prompt.

npx skills add alirezarezvani/claude-skills --skill engineering/interview-system-designer

Download ZIP

Community skill by @alirezarezvani. Need a walkthrough? See the install guide →

Works with

Claude Code OpenAI Codex CLI Gemini CLI

Prefer no terminal? Download the ZIP and place it manually.

Details

Category: Productivity
License: MIT
Author: @alirezarezvani
Source: GitHub →
Source file: show path
engineering/interview-system-designer/SKILL.md

interviews hiring talent assessment engineering

People who install this also use

👥

CHRO Advisor

Human resources leadership — hiring strategy, compensation benchmarking, org structure design, culture development, and people operations at scale.

@alirezarezvani

🏛️

Senior Software Architect

Design system architecture with C4 and sequence diagrams, write Architecture Decision Records, evaluate tech stacks, and guide architectural trade-offs.

@alirezarezvani

⚙️

CTO Advisor

Technical leadership guidance — engineering team scaling, technology strategy, build vs. buy decisions, and architecture at the executive level.

@alirezarezvani

Interview System Designer

Interview System Designer

Overview

Core Capabilities

Quick Start

Recommended Workflow

References

Common Pitfalls

Best Practices

Interview System Designer

Overview

Tools

1. Interview Loop Designer (loop_designer.py)

2. Question Bank Generator (question_bank_generator.py)

3. Hiring Calibrator (hiring_calibrator.py)

Data Formats

Role Definition Input (JSON)

Interview Results Input (JSON)

Reference Materials

Competency Matrix Templates (references/competency_matrix_templates.md)

Bias Mitigation Checklist (references/bias_mitigation_checklist.md)

Debrief Facilitation Guide (references/debrief_facilitation_guide.md)

Sample Data

Expected Outputs

Best Practices

Interview Loop Design

Question Bank Development

Calibration Analysis

Installation & Setup

Integration

With Existing Systems

Custom Workflows

Troubleshooting

Common Issues

Performance Considerations

Contributing

License & Usage

Interview Bias Mitigation Checklist

Pre-Interview Phase

Job Description & Requirements

Sourcing & Pipeline

Resume Screening

Interview Panel Composition

Diversity Requirements

Interviewer Selection

Interview Process Design

Question Standardization

Structured Interview Protocol

Accommodation Preparation

During the Interview

Interviewer Behavior

Question Delivery

Real-time Bias Checking

Evaluation & Scoring

Scoring Consistency

Bias Check Questions

Documentation Requirements

Debrief Process

Structured Discussion

Decision-Making Process

Final Recommendations

Post-Interview Monitoring

Data Collection

Regular Analysis

Bias Types to Watch For

Affinity Bias

Halo/Horn Effect

Confirmation Bias

Attribution Bias

Cultural Bias

Educational Bias

Experience Bias

Emergency Bias Response Protocol

During Interview

Post-Interview

Interviewer Coaching

Legal Compliance Reminders

Protected Characteristics

Prohibited Questions

Documentation Requirements

1. Interview Loop Designer (`loop_designer.py`)

2. Question Bank Generator (`question_bank_generator.py`)

3. Hiring Calibrator (`hiring_calibrator.py`)

Competency Matrix Templates (`references/competency_matrix_templates.md`)

Bias Mitigation Checklist (`references/bias_mitigation_checklist.md`)

Debrief Facilitation Guide (`references/debrief_facilitation_guide.md`)