Interview System Designer
Design structured technical and behavioral interview processes — question banks, evaluation rubrics, calibration guides, and fair assessment frameworks.
What this skill does
Design fair and consistent hiring processes with structured interview plans for any role. Create ready-to-use question lists, scoring guides, and bias-reduction checklists that keep your team aligned during candidate assessments. Reach for this whenever you need to standardize how you hire or improve your existing hiring system.
name: “interview-system-designer” description: This skill should be used when the user asks to “design interview processes”, “create hiring pipelines”, “calibrate interview loops”, “generate interview questions”, “design competency matrices”, “analyze interviewer bias”, “create scoring rubrics”, “build question banks”, or “optimize hiring systems”. Use for designing role-specific interview loops, competency assessments, and hiring calibration systems.
Interview System Designer
Comprehensive interview loop planning and calibration support for role-based hiring systems.
Overview
Use this skill to create structured interview loops, standardize question quality, and keep hiring signal consistent across interviewers.
Core Capabilities
- Interview loop planning by role and level
- Round-by-round focus and timing recommendations
- Suggested question sets by round type
- Framework support for scoring and calibration
- Bias-reduction and process consistency guidance
Quick Start
# Generate a loop plan for a role and level
python3 scripts/interview_planner.py --role "Senior Software Engineer" --level senior
# JSON output for integration with internal tooling
python3 scripts/interview_planner.py --role "Product Manager" --level mid --json
Recommended Workflow
- Run
scripts/interview_planner.pyto generate a baseline loop. - Align rounds to role-specific competencies.
- Validate scoring rubric consistency with interview panel leads.
- Review for bias controls before rollout.
- Recalibrate quarterly using hiring outcome data.
References
references/interview-frameworks.mdreferences/bias_mitigation_checklist.mdreferences/competency_matrix_templates.mdreferences/debrief_facilitation_guide.md
Common Pitfalls
- Overweighting one round while ignoring other competency signals
- Using unstructured interviews without standardized scoring
- Skipping calibration sessions for interviewers
- Changing hiring bar without documenting rationale
Best Practices
- Keep round objectives explicit and non-overlapping.
- Require evidence for each score recommendation.
- Use the same baseline rubric across comparable roles.
- Revisit loop design based on quality-of-hire outcomes.
Interview System Designer
A comprehensive toolkit for designing, optimizing, and calibrating interview processes. This skill provides tools to create role-specific interview loops, generate competency-based question banks, and analyze hiring data for bias and calibration issues.
Overview
The Interview System Designer skill includes three powerful Python tools and comprehensive reference materials to help you build fair, effective, and scalable hiring processes:
- Interview Loop Designer - Generate calibrated interview loops for any role and level
- Question Bank Generator - Create competency-based interview questions with scoring rubrics
- Hiring Calibrator - Analyze interview data to detect bias and calibration issues
Tools
1. Interview Loop Designer (loop_designer.py)
Generates complete interview loops tailored to specific roles, levels, and teams.
Features:
- Role-specific competency mapping (SWE, PM, Designer, Data, DevOps, Leadership)
- Level-appropriate interview rounds (junior through principal)
- Optimized scheduling and time allocation
- Interviewer skill requirements
- Standardized scorecard templates
Usage:
# Basic usage
python3 loop_designer.py --role "Senior Software Engineer" --level senior
# With team and custom competencies
python3 loop_designer.py --role "Product Manager" --level mid --team growth --competencies leadership,strategy,analytics
# Using JSON input file
python3 loop_designer.py --input assets/sample_role_definitions.json --output loops/
# Specify output format
python3 loop_designer.py --role "Staff Data Scientist" --level staff --format json --output data_scientist_loop.jsonInput Options:
--role: Job role title (e.g., "Senior Software Engineer", "Product Manager")--level: Experience level (junior, mid, senior, staff, principal)--team: Team or department (optional)--competencies: Comma-separated list of specific competencies to focus on--input: JSON file with role definition--output: Output directory or file path--format: Output format (json, text, both) - default: both
Example Output:
Interview Loop Design for Senior Software Engineer (Senior Level)
============================================================
Total Duration: 300 minutes (5h 0m)
Total Rounds: 5
INTERVIEW ROUNDS
----------------------------------------
Round 1: Technical Phone Screen
Duration: 45 minutes
Format: Virtual
Focus Areas: Coding Fundamentals, Problem Solving
Round 2: System Design
Duration: 75 minutes
Format: Collaborative Whitboard
Focus Areas: System Thinking, Architectural Reasoning
...2. Question Bank Generator (question_bank_generator.py)
Creates comprehensive interview question banks organized by competency area.
Features:
- Competency-based question organization
- Level-appropriate difficulty progression
- Multiple question types (technical, behavioral, situational)
- Detailed scoring rubrics with calibration examples
- Follow-up probes and conversation guides
Usage:
# Generate questions for specific competencies
python3 question_bank_generator.py --role "Frontend Engineer" --competencies react,typescript,system-design
# Create behavioral question bank
python3 question_bank_generator.py --role "Product Manager" --question-types behavioral,leadership --num-questions 15
# Generate questions for multiple levels
python3 question_bank_generator.py --role "DevOps Engineer" --levels junior,mid,senior --output questions/Input Options:
--role: Job role title--level: Experience level (default: senior)--competencies: Comma-separated list of competencies to focus on--question-types: Types to include (technical, behavioral, situational)--num-questions: Number of questions to generate (default: 20)--input: JSON file with role requirements--output: Output directory or file path--format: Output format (json, text, both) - default: both
Question Types:
- Technical: Coding problems, system design, domain-specific challenges
- Behavioral: STAR method questions focusing on past experiences
- Situational: Hypothetical scenarios testing decision-making
3. Hiring Calibrator (hiring_calibrator.py)
Analyzes interview scores to detect bias, calibration issues, and provides recommendations.
Features:
- Statistical bias detection across demographics
- Interviewer calibration analysis
- Score distribution and trending analysis
- Specific coaching recommendations
- Comprehensive reporting with actionable insights
Usage:
# Comprehensive analysis
python3 hiring_calibrator.py --input assets/sample_interview_results.json --analysis-type comprehensive
# Focus on specific areas
python3 hiring_calibrator.py --input interview_data.json --analysis-type bias --competencies technical,leadership
# Trend analysis over time
python3 hiring_calibrator.py --input historical_data.json --trend-analysis --period quarterlyInput Options:
--input: JSON file with interview results data (required)--analysis-type: Type of analysis (comprehensive, bias, calibration, interviewer, scoring)--competencies: Comma-separated list of competencies to focus on--trend-analysis: Enable trend analysis over time--period: Time period for trends (daily, weekly, monthly, quarterly)--output: Output file path--format: Output format (json, text, both) - default: both
Analysis Types:
- Comprehensive: Full analysis including bias, calibration, and recommendations
- Bias: Focus on demographic and interviewer bias patterns
- Calibration: Interviewer consistency and agreement analysis
- Interviewer: Individual interviewer performance and coaching needs
- Scoring: Score distribution and pattern analysis
Data Formats
Role Definition Input (JSON)
{
"role": "Senior Software Engineer",
"level": "senior",
"team": "platform",
"competencies": ["system_design", "technical_leadership", "mentoring"],
"requirements": {
"years_experience": "5-8",
"technical_skills": ["Python", "AWS", "Kubernetes"],
"leadership_experience": true
}
}Interview Results Input (JSON)
[
{
"candidate_id": "candidate_001",
"role": "Senior Software Engineer",
"interviewer_id": "interviewer_alice",
"date": "2024-01-15T09:00:00Z",
"scores": {
"coding_fundamentals": 3.5,
"system_design": 4.0,
"technical_leadership": 3.0,
"communication": 3.5
},
"overall_recommendation": "Hire",
"gender": "male",
"ethnicity": "asian",
"years_experience": 6
}
]Reference Materials
Competency Matrix Templates (references/competency_matrix_templates.md)
- Comprehensive competency matrices for all engineering roles
- Level-specific expectations (junior through principal)
- Assessment criteria and growth paths
- Customization guidelines for different company stages and industries
Bias Mitigation Checklist (references/bias_mitigation_checklist.md)
- Pre-interview preparation checklist
- Interview process bias prevention strategies
- Real-time bias interruption techniques
- Legal compliance reminders
- Emergency response protocols
Debrief Facilitation Guide (references/debrief_facilitation_guide.md)
- Structured debrief meeting frameworks
- Evidence-based discussion techniques
- Bias interruption strategies
- Decision documentation standards
- Common challenges and solutions
Sample Data
The assets/ directory contains sample data for testing:
sample_role_definitions.json: Example role definitions for various positionssample_interview_results.json: Sample interview data with multiple candidates and interviewers
Expected Outputs
The expected_outputs/ directory contains examples of tool outputs:
- Interview loop designs in both JSON and human-readable formats
- Question banks with scoring rubrics and calibration examples
- Calibration analysis reports with bias detection and recommendations
Best Practices
Interview Loop Design
- Competency Focus: Align interview rounds with role-critical competencies
- Level Calibration: Adjust expectations and question difficulty based on experience level
- Time Optimization: Balance thoroughness with candidate experience
- Interviewer Training: Ensure interviewers are qualified and calibrated
Question Bank Development
- Evidence-Based: Focus on observable behaviors and concrete examples
- Bias Mitigation: Use structured questions that minimize subjective interpretation
- Calibration: Include examples of different quality responses for consistency
- Continuous Improvement: Regularly update questions based on predictive validity
Calibration Analysis
- Regular Monitoring: Analyze hiring data quarterly for bias patterns
- Prompt Action: Address calibration issues immediately with targeted coaching
- Data Quality: Ensure complete and consistent data collection
- Legal Compliance: Monitor for discriminatory patterns and document corrections
Installation & Setup
No external dependencies required - uses Python 3 standard library only.
# Clone or download the skill directory
cd interview-system-designer/
# Make scripts executable (optional)
chmod +x *.py
# Test with sample data
python3 loop_designer.py --role "Senior Software Engineer" --level senior
python3 question_bank_generator.py --role "Product Manager" --level mid
python3 hiring_calibrator.py --input assets/sample_interview_results.jsonIntegration
With Existing Systems
- ATS Integration: Export interview loops as structured data for applicant tracking systems
- Calendar Systems: Use scheduling outputs to auto-create interview blocks
- HR Analytics: Import calibration reports into broader diversity and inclusion dashboards
Custom Workflows
- Batch Processing: Process multiple roles or historical data sets
- Automated Reporting: Schedule regular calibration analysis
- Custom Competencies: Extend frameworks with company-specific competencies
Troubleshooting
Common Issues
"Role not found" errors:
- The tool will map common variations (engineer → software_engineer)
- For custom roles, use the closest standard role and specify custom competencies
"Insufficient data" errors:
- Minimum 5 interviews required for statistical analysis
- Ensure interview data includes required fields (candidate_id, interviewer_id, scores, date)
Missing output files:
- Check file permissions in output directory
- Ensure adequate disk space
- Verify JSON input file format is valid
Performance Considerations
- Interview loop generation: < 1 second
- Question bank generation: 1-3 seconds for 20 questions
- Calibration analysis: 1-5 seconds for 50 interviews, scales linearly
Contributing
To extend this skill:
- New Roles: Add competency frameworks in
_init_competency_frameworks() - New Question Types: Extend question templates in respective generators
- New Analysis Types: Add analysis methods to hiring calibrator
- Custom Outputs: Modify formatting functions for different output needs
License & Usage
This skill is designed for internal company use in hiring process optimization. All bias detection and mitigation features should be reviewed with legal counsel to ensure compliance with local employment laws.
For questions or support, refer to the comprehensive documentation in each script's docstring and the reference materials provided.
[
{
"candidate_id": "candidate_001",
"role": "Senior Software Engineer",
"interviewer_id": "interviewer_alice",
"date": "2024-01-15T09:00:00Z",
"scores": {
"coding_fundamentals": 3.5,
"system_design": 4.0,
"technical_leadership": 3.0,
"communication": 3.5,
"problem_solving": 4.0
},
"overall_recommendation": "Hire",
"gender": "male",
"ethnicity": "asian",
"years_experience": 6,
"university_tier": "tier_1",
"previous_company_size": "large"
},
{
"candidate_id": "candidate_001",
"role": "Senior Software Engineer",
"interviewer_id": "interviewer_bob",
"date": "2024-01-15T11:00:00Z",
"scores": {
"system_design": 3.5,
"technical_leadership": 3.5,
"mentoring": 3.0,
"cross_team_collaboration": 4.0,
"strategic_thinking": 3.5
},
"overall_recommendation": "Hire",
"gender": "male",
"ethnicity": "asian",
"years_experience": 6,
"university_tier": "tier_1",
"previous_company_size": "large"
},
{
"candidate_id": "candidate_002",
"role": "Senior Software Engineer",
"interviewer_id": "interviewer_alice",
"date": "2024-01-16T09:00:00Z",
"scores": {
"coding_fundamentals": 2.5,
"system_design": 3.0,
"technical_leadership": 2.0,
"communication": 3.0,
"problem_solving": 3.0
},
"overall_recommendation": "No Hire",
"gender": "female",
"ethnicity": "hispanic",
"years_experience": 5,
"university_tier": "tier_2",
"previous_company_size": "startup"
},
{
"candidate_id": "candidate_002",
"role": "Senior Software Engineer",
"interviewer_id": "interviewer_charlie",
"date": "2024-01-16T11:00:00Z",
"scores": {
"system_design": 2.0,
"technical_leadership": 2.5,
"mentoring": 2.0,
"cross_team_collaboration": 3.0,
"strategic_thinking": 2.5
},
"overall_recommendation": "No Hire",
"gender": "female",
"ethnicity": "hispanic",
"years_experience": 5,
"university_tier": "tier_2",
"previous_company_size": "startup"
},
{
"candidate_id": "candidate_003",
"role": "Senior Software Engineer",
"interviewer_id": "interviewer_david",
"date": "2024-01-17T14:00:00Z",
"scores": {
"coding_fundamentals": 4.0,
"system_design": 3.5,
"technical_leadership": 4.0,
"communication": 4.0,
"problem_solving": 3.5
},
"overall_recommendation": "Strong Hire",
"gender": "male",
"ethnicity": "white",
"years_experience": 8,
"university_tier": "tier_1",
"previous_company_size": "large"
},
{
"candidate_id": "candidate_003",
"role": "Senior Software Engineer",
"interviewer_id": "interviewer_alice",
"date": "2024-01-17T16:00:00Z",
"scores": {
"system_design": 4.0,
"technical_leadership": 4.0,
"mentoring": 3.5,
"cross_team_collaboration": 4.0,
"strategic_thinking": 3.5
},
"overall_recommendation": "Hire",
"gender": "male",
"ethnicity": "white",
"years_experience": 8,
"university_tier": "tier_1",
"previous_company_size": "large"
},
{
"candidate_id": "candidate_004",
"role": "Product Manager",
"interviewer_id": "interviewer_emma",
"date": "2024-01-18T10:00:00Z",
"scores": {
"product_strategy": 3.0,
"user_research": 3.5,
"data_analysis": 4.0,
"stakeholder_management": 3.0,
"communication": 3.5
},
"overall_recommendation": "Hire",
"gender": "female",
"ethnicity": "black",
"years_experience": 4,
"university_tier": "tier_2",
"previous_company_size": "medium"
},
{
"candidate_id": "candidate_005",
"role": "Product Manager",
"interviewer_id": "interviewer_frank",
"date": "2024-01-19T13:00:00Z",
"scores": {
"product_strategy": 2.5,
"user_research": 2.0,
"data_analysis": 3.0,
"stakeholder_management": 2.5,
"communication": 3.0
},
"overall_recommendation": "No Hire",
"gender": "male",
"ethnicity": "white",
"years_experience": 3,
"university_tier": "tier_3",
"previous_company_size": "startup"
},
{
"candidate_id": "candidate_006",
"role": "Junior Software Engineer",
"interviewer_id": "interviewer_alice",
"date": "2024-01-20T09:00:00Z",
"scores": {
"coding_fundamentals": 3.0,
"debugging": 3.5,
"testing_basics": 3.0,
"collaboration": 4.0,
"learning_agility": 3.5
},
"overall_recommendation": "Hire",
"gender": "female",
"ethnicity": "asian",
"years_experience": 1,
"university_tier": "bootcamp",
"previous_company_size": "none"
},
{
"candidate_id": "candidate_007",
"role": "Junior Software Engineer",
"interviewer_id": "interviewer_bob",
"date": "2024-01-21T10:30:00Z",
"scores": {
"coding_fundamentals": 2.0,
"debugging": 2.5,
"testing_basics": 2.0,
"collaboration": 3.0,
"learning_agility": 3.0
},
"overall_recommendation": "No Hire",
"gender": "male",
"ethnicity": "hispanic",
"years_experience": 0,
"university_tier": "tier_2",
"previous_company_size": "none"
},
{
"candidate_id": "candidate_008",
"role": "Staff Frontend Engineer",
"interviewer_id": "interviewer_grace",
"date": "2024-01-22T14:00:00Z",
"scores": {
"frontend_architecture": 4.0,
"system_design": 4.0,
"technical_leadership": 4.0,
"team_building": 3.5,
"strategic_thinking": 3.5
},
"overall_recommendation": "Strong Hire",
"gender": "female",
"ethnicity": "white",
"years_experience": 9,
"university_tier": "tier_1",
"previous_company_size": "large"
},
{
"candidate_id": "candidate_008",
"role": "Staff Frontend Engineer",
"interviewer_id": "interviewer_henry",
"date": "2024-01-22T16:00:00Z",
"scores": {
"frontend_architecture": 3.5,
"technical_leadership": 4.0,
"team_building": 4.0,
"cross_functional_collaboration": 4.0,
"organizational_impact": 3.5
},
"overall_recommendation": "Hire",
"gender": "female",
"ethnicity": "white",
"years_experience": 9,
"university_tier": "tier_1",
"previous_company_size": "large"
},
{
"candidate_id": "candidate_009",
"role": "Data Scientist",
"interviewer_id": "interviewer_ivan",
"date": "2024-01-23T11:00:00Z",
"scores": {
"statistical_analysis": 3.5,
"machine_learning": 4.0,
"data_engineering": 3.0,
"business_acumen": 3.5,
"communication": 3.0
},
"overall_recommendation": "Hire",
"gender": "male",
"ethnicity": "indian",
"years_experience": 5,
"university_tier": "tier_1",
"previous_company_size": "medium"
},
{
"candidate_id": "candidate_010",
"role": "DevOps Engineer",
"interviewer_id": "interviewer_jane",
"date": "2024-01-24T15:00:00Z",
"scores": {
"infrastructure_automation": 3.5,
"ci_cd_design": 4.0,
"monitoring_observability": 3.0,
"security_implementation": 3.5,
"incident_management": 4.0
},
"overall_recommendation": "Hire",
"gender": "female",
"ethnicity": "black",
"years_experience": 6,
"university_tier": "tier_2",
"previous_company_size": "startup"
},
{
"candidate_id": "candidate_011",
"role": "UX Designer",
"interviewer_id": "interviewer_karl",
"date": "2024-01-25T10:00:00Z",
"scores": {
"design_process": 4.0,
"user_research": 3.5,
"design_systems": 4.0,
"cross_functional_collaboration": 3.5,
"design_leadership": 3.0
},
"overall_recommendation": "Hire",
"gender": "non_binary",
"ethnicity": "white",
"years_experience": 7,
"university_tier": "tier_1",
"previous_company_size": "medium"
},
{
"candidate_id": "candidate_012",
"role": "Engineering Manager",
"interviewer_id": "interviewer_lisa",
"date": "2024-01-26T13:30:00Z",
"scores": {
"people_leadership": 4.0,
"technical_background": 3.5,
"strategic_thinking": 3.5,
"performance_management": 4.0,
"cross_functional_leadership": 3.5
},
"overall_recommendation": "Hire",
"gender": "male",
"ethnicity": "white",
"years_experience": 8,
"university_tier": "tier_1",
"previous_company_size": "large"
},
{
"candidate_id": "candidate_013",
"role": "Senior Software Engineer",
"interviewer_id": "interviewer_alice",
"date": "2024-01-27T09:00:00Z",
"scores": {
"coding_fundamentals": 4.0,
"system_design": 4.0,
"technical_leadership": 4.0,
"communication": 4.0,
"problem_solving": 4.0
},
"overall_recommendation": "Strong Hire",
"gender": "female",
"ethnicity": "asian",
"years_experience": 7,
"university_tier": "tier_1",
"previous_company_size": "large"
},
{
"candidate_id": "candidate_013",
"role": "Senior Software Engineer",
"interviewer_id": "interviewer_charlie",
"date": "2024-01-27T11:00:00Z",
"scores": {
"system_design": 3.5,
"technical_leadership": 3.5,
"mentoring": 4.0,
"cross_team_collaboration": 4.0,
"strategic_thinking": 3.5
},
"overall_recommendation": "Hire",
"gender": "female",
"ethnicity": "asian",
"years_experience": 7,
"university_tier": "tier_1",
"previous_company_size": "large"
},
{
"candidate_id": "candidate_014",
"role": "Senior Software Engineer",
"interviewer_id": "interviewer_david",
"date": "2024-01-28T14:00:00Z",
"scores": {
"coding_fundamentals": 1.5,
"system_design": 2.0,
"technical_leadership": 1.0,
"communication": 2.0,
"problem_solving": 2.0
},
"overall_recommendation": "Strong No Hire",
"gender": "male",
"ethnicity": "white",
"years_experience": 4,
"university_tier": "tier_3",
"previous_company_size": "startup"
},
{
"candidate_id": "candidate_015",
"role": "Product Manager",
"interviewer_id": "interviewer_emma",
"date": "2024-01-29T11:00:00Z",
"scores": {
"product_strategy": 4.0,
"user_research": 3.5,
"data_analysis": 4.0,
"stakeholder_management": 4.0,
"communication": 3.5
},
"overall_recommendation": "Strong Hire",
"gender": "male",
"ethnicity": "black",
"years_experience": 5,
"university_tier": "tier_2",
"previous_company_size": "medium"
}
] [
{
"role": "Senior Software Engineer",
"level": "senior",
"team": "platform",
"department": "engineering",
"competencies": [
"system_design",
"coding_fundamentals",
"technical_leadership",
"mentoring",
"cross_team_collaboration"
],
"requirements": {
"years_experience": "5-8",
"technical_skills": ["Python", "Java", "Docker", "Kubernetes", "AWS"],
"leadership_experience": true,
"mentoring_required": true
},
"hiring_bar": "high",
"interview_focus": ["technical_depth", "system_architecture", "leadership_potential"]
},
{
"role": "Product Manager",
"level": "mid",
"team": "growth",
"department": "product",
"competencies": [
"product_strategy",
"user_research",
"data_analysis",
"stakeholder_management",
"cross_functional_leadership"
],
"requirements": {
"years_experience": "3-5",
"domain_knowledge": ["user_analytics", "experimentation", "product_metrics"],
"leadership_experience": false,
"technical_background": "preferred"
},
"hiring_bar": "medium-high",
"interview_focus": ["product_sense", "analytical_thinking", "execution_ability"]
},
{
"role": "Staff Frontend Engineer",
"level": "staff",
"team": "consumer",
"department": "engineering",
"competencies": [
"frontend_architecture",
"system_design",
"technical_leadership",
"team_building",
"cross_functional_collaboration"
],
"requirements": {
"years_experience": "8+",
"technical_skills": ["React", "TypeScript", "GraphQL", "Webpack", "Performance Optimization"],
"leadership_experience": true,
"architecture_experience": true
},
"hiring_bar": "very-high",
"interview_focus": ["architectural_vision", "technical_strategy", "organizational_impact"]
},
{
"role": "Data Scientist",
"level": "mid",
"team": "ml_platform",
"department": "data",
"competencies": [
"statistical_analysis",
"machine_learning",
"data_engineering",
"business_acumen",
"communication"
],
"requirements": {
"years_experience": "3-6",
"technical_skills": ["Python", "SQL", "TensorFlow", "Spark", "Statistics"],
"domain_knowledge": ["ML algorithms", "experimentation", "data_pipelines"],
"leadership_experience": false
},
"hiring_bar": "high",
"interview_focus": ["technical_depth", "problem_solving", "business_impact"]
},
{
"role": "DevOps Engineer",
"level": "senior",
"team": "infrastructure",
"department": "engineering",
"competencies": [
"infrastructure_automation",
"ci_cd_design",
"monitoring_observability",
"security_implementation",
"incident_management"
],
"requirements": {
"years_experience": "5-7",
"technical_skills": ["Kubernetes", "Terraform", "AWS", "Docker", "Monitoring"],
"security_background": "required",
"leadership_experience": "preferred"
},
"hiring_bar": "high",
"interview_focus": ["system_reliability", "automation_expertise", "operational_excellence"]
},
{
"role": "UX Designer",
"level": "senior",
"team": "design_systems",
"department": "design",
"competencies": [
"design_process",
"user_research",
"design_systems",
"cross_functional_collaboration",
"design_leadership"
],
"requirements": {
"years_experience": "5-8",
"portfolio_quality": "high",
"research_experience": true,
"systems_thinking": true
},
"hiring_bar": "high",
"interview_focus": ["design_process", "systems_thinking", "user_advocacy"]
},
{
"role": "Engineering Manager",
"level": "senior",
"team": "backend",
"department": "engineering",
"competencies": [
"people_leadership",
"technical_background",
"strategic_thinking",
"performance_management",
"cross_functional_leadership"
],
"requirements": {
"years_experience": "6-10",
"management_experience": "2+ years",
"technical_background": "required",
"hiring_experience": true
},
"hiring_bar": "very-high",
"interview_focus": ["people_leadership", "technical_judgment", "organizational_impact"]
},
{
"role": "Junior Software Engineer",
"level": "junior",
"team": "web",
"department": "engineering",
"competencies": [
"coding_fundamentals",
"debugging",
"testing_basics",
"collaboration",
"learning_agility"
],
"requirements": {
"years_experience": "0-2",
"technical_skills": ["JavaScript", "HTML/CSS", "Git", "Basic Algorithms"],
"education": "CS degree or bootcamp",
"growth_mindset": true
},
"hiring_bar": "medium",
"interview_focus": ["coding_ability", "problem_solving", "potential_assessment"]
}
] {
"role": "Product Manager",
"level": "senior",
"competencies": [
"strategy",
"analytics",
"business_strategy",
"product_strategy",
"stakeholder_management",
"p&l_responsibility",
"leadership",
"team_leadership",
"user_research",
"data_analysis"
],
"question_types": [
"technical",
"behavioral",
"situational"
],
"generated_at": "2026-02-16T13:27:41.303329",
"total_questions": 20,
"questions": [
{
"question": "What challenges have you faced related to p&l responsibility and how did you overcome them?",
"competency": "p&l_responsibility",
"type": "challenge_based",
"focus_areas": [
"problem_solving",
"learning_from_experience"
]
},
{
"question": "Analyze conversion funnel data to identify the biggest drop-off point and propose solutions.",
"competency": "data_analysis",
"type": "analytical",
"difficulty": "medium",
"time_limit": 45,
"key_concepts": [
"funnel_analysis",
"conversion_optimization",
"statistical_significance"
]
},
{
"question": "What challenges have you faced related to team leadership and how did you overcome them?",
"competency": "team_leadership",
"type": "challenge_based",
"focus_areas": [
"problem_solving",
"learning_from_experience"
]
},
{
"question": "Design a go-to-market strategy for a new B2B SaaS product entering a competitive market.",
"competency": "product_strategy",
"type": "strategic",
"difficulty": "hard",
"time_limit": 60,
"key_concepts": [
"market_analysis",
"competitive_positioning",
"pricing_strategy",
"channel_strategy"
]
},
{
"question": "What challenges have you faced related to business strategy and how did you overcome them?",
"competency": "business_strategy",
"type": "challenge_based",
"focus_areas": [
"problem_solving",
"learning_from_experience"
]
},
{
"question": "Describe your experience with business strategy in your current or previous role.",
"competency": "business_strategy",
"type": "experience",
"focus_areas": [
"experience_depth",
"practical_application"
]
},
{
"question": "Describe your experience with team leadership in your current or previous role.",
"competency": "team_leadership",
"type": "experience",
"focus_areas": [
"experience_depth",
"practical_application"
]
},
{
"question": "Describe a situation where you had to influence someone without having direct authority over them.",
"competency": "leadership",
"type": "behavioral",
"method": "STAR",
"focus_areas": [
"influence",
"persuasion",
"stakeholder_management"
]
},
{
"question": "Given a dataset of user activities, calculate the daily active users for the past month.",
"competency": "data_analysis",
"type": "analytical",
"difficulty": "easy",
"time_limit": 30,
"key_concepts": [
"sql_basics",
"date_functions",
"aggregation"
]
},
{
"question": "Describe your experience with analytics in your current or previous role.",
"competency": "analytics",
"type": "experience",
"focus_areas": [
"experience_depth",
"practical_application"
]
},
{
"question": "How would you prioritize features for a mobile app with limited engineering resources?",
"competency": "product_strategy",
"type": "case_study",
"difficulty": "medium",
"time_limit": 45,
"key_concepts": [
"prioritization_frameworks",
"resource_allocation",
"impact_estimation"
]
},
{
"question": "Describe your experience with stakeholder management in your current or previous role.",
"competency": "stakeholder_management",
"type": "experience",
"focus_areas": [
"experience_depth",
"practical_application"
]
},
{
"question": "What challenges have you faced related to stakeholder management and how did you overcome them?",
"competency": "stakeholder_management",
"type": "challenge_based",
"focus_areas": [
"problem_solving",
"learning_from_experience"
]
},
{
"question": "What challenges have you faced related to user research and how did you overcome them?",
"competency": "user_research",
"type": "challenge_based",
"focus_areas": [
"problem_solving",
"learning_from_experience"
]
},
{
"question": "What challenges have you faced related to strategy and how did you overcome them?",
"competency": "strategy",
"type": "challenge_based",
"focus_areas": [
"problem_solving",
"learning_from_experience"
]
},
{
"question": "Describe your experience with user research in your current or previous role.",
"competency": "user_research",
"type": "experience",
"focus_areas": [
"experience_depth",
"practical_application"
]
},
{
"question": "Describe your experience with p&l responsibility in your current or previous role.",
"competency": "p&l_responsibility",
"type": "experience",
"focus_areas": [
"experience_depth",
"practical_application"
]
},
{
"question": "Describe your experience with strategy in your current or previous role.",
"competency": "strategy",
"type": "experience",
"focus_areas": [
"experience_depth",
"practical_application"
]
},
{
"question": "Tell me about a time when you had to lead a team through a significant change or challenge.",
"competency": "leadership",
"type": "behavioral",
"method": "STAR",
"focus_areas": [
"change_management",
"team_motivation",
"communication"
]
},
{
"question": "What challenges have you faced related to analytics and how did you overcome them?",
"competency": "analytics",
"type": "challenge_based",
"focus_areas": [
"problem_solving",
"learning_from_experience"
]
}
],
"scoring_rubrics": {
"question_8": {
"question": "Describe a situation where you had to influence someone without having direct authority over them.",
"competency": "leadership",
"type": "behavioral",
"scoring_criteria": {
"situation_clarity": {
"4": "Clear, specific situation with relevant context and stakes",
"3": "Good situation description with adequate context",
"2": "Situation described but lacks some specifics",
"1": "Vague or unclear situation description"
},
"action_quality": {
"4": "Specific, thoughtful actions showing strong competency",
"3": "Good actions demonstrating competency",
"2": "Adequate actions but could be stronger",
"1": "Weak or inappropriate actions"
},
"result_impact": {
"4": "Significant positive impact with measurable results",
"3": "Good positive impact with clear outcomes",
"2": "Some positive impact demonstrated",
"1": "Little or no positive impact shown"
},
"self_awareness": {
"4": "Excellent self-reflection, learns from experience, acknowledges growth areas",
"3": "Good self-awareness and learning orientation",
"2": "Some self-reflection demonstrated",
"1": "Limited self-awareness or reflection"
}
},
"weight": "high",
"time_limit": 30
},
"question_19": {
"question": "Tell me about a time when you had to lead a team through a significant change or challenge.",
"competency": "leadership",
"type": "behavioral",
"scoring_criteria": {
"situation_clarity": {
"4": "Clear, specific situation with relevant context and stakes",
"3": "Good situation description with adequate context",
"2": "Situation described but lacks some specifics",
"1": "Vague or unclear situation description"
},
"action_quality": {
"4": "Specific, thoughtful actions showing strong competency",
"3": "Good actions demonstrating competency",
"2": "Adequate actions but could be stronger",
"1": "Weak or inappropriate actions"
},
"result_impact": {
"4": "Significant positive impact with measurable results",
"3": "Good positive impact with clear outcomes",
"2": "Some positive impact demonstrated",
"1": "Little or no positive impact shown"
},
"self_awareness": {
"4": "Excellent self-reflection, learns from experience, acknowledges growth areas",
"3": "Good self-awareness and learning orientation",
"2": "Some self-reflection demonstrated",
"1": "Limited self-awareness or reflection"
}
},
"weight": "high",
"time_limit": 30
}
},
"follow_up_probes": {
"question_1": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_2": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_3": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_4": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_5": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_6": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_7": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_8": [
"What would you do differently if you faced this situation again?",
"How did you handle team members who were resistant to the change?",
"What metrics did you use to measure success?",
"How did you communicate progress to stakeholders?",
"What did you learn from this experience?"
],
"question_9": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_10": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_11": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_12": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_13": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_14": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_15": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_16": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_17": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_18": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
],
"question_19": [
"What would you do differently if you faced this situation again?",
"How did you handle team members who were resistant to the change?",
"What metrics did you use to measure success?",
"How did you communicate progress to stakeholders?",
"What did you learn from this experience?"
],
"question_20": [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
]
},
"calibration_examples": {
"question_1": {
"question": "What challenges have you faced related to p&l responsibility and how did you overcome them?",
"competency": "p&l_responsibility",
"sample_answers": {
"poor_answer": {
"answer": "Sample poor answer for p&l_responsibility question - lacks detail, specificity, or demonstrates weak competency",
"score": "1-2",
"issues": [
"Vague response",
"Limited evidence of competency",
"Poor structure"
]
},
"good_answer": {
"answer": "Sample good answer for p&l_responsibility question - adequate detail, demonstrates competency clearly",
"score": "3",
"strengths": [
"Clear structure",
"Demonstrates competency",
"Adequate detail"
]
},
"great_answer": {
"answer": "Sample excellent answer for p&l_responsibility question - exceptional detail, strong evidence, goes above and beyond",
"score": "4",
"strengths": [
"Exceptional detail",
"Strong evidence",
"Strategic thinking",
"Goes beyond requirements"
]
}
},
"scoring_rationale": {
"key_indicators": "Look for evidence of p&l responsibility competency",
"red_flags": "Vague answers, lack of specifics, negative outcomes without learning",
"green_flags": "Specific examples, clear impact, demonstrates growth and learning"
}
},
"question_2": {
"question": "Analyze conversion funnel data to identify the biggest drop-off point and propose solutions.",
"competency": "data_analysis",
"sample_answers": {
"poor_answer": {
"answer": "Sample poor answer for data_analysis question - lacks detail, specificity, or demonstrates weak competency",
"score": "1-2",
"issues": [
"Vague response",
"Limited evidence of competency",
"Poor structure"
]
},
"good_answer": {
"answer": "Sample good answer for data_analysis question - adequate detail, demonstrates competency clearly",
"score": "3",
"strengths": [
"Clear structure",
"Demonstrates competency",
"Adequate detail"
]
},
"great_answer": {
"answer": "Sample excellent answer for data_analysis question - exceptional detail, strong evidence, goes above and beyond",
"score": "4",
"strengths": [
"Exceptional detail",
"Strong evidence",
"Strategic thinking",
"Goes beyond requirements"
]
}
},
"scoring_rationale": {
"key_indicators": "Look for evidence of data analysis competency",
"red_flags": "Vague answers, lack of specifics, negative outcomes without learning",
"green_flags": "Specific examples, clear impact, demonstrates growth and learning"
}
},
"question_3": {
"question": "What challenges have you faced related to team leadership and how did you overcome them?",
"competency": "team_leadership",
"sample_answers": {
"poor_answer": {
"answer": "Sample poor answer for team_leadership question - lacks detail, specificity, or demonstrates weak competency",
"score": "1-2",
"issues": [
"Vague response",
"Limited evidence of competency",
"Poor structure"
]
},
"good_answer": {
"answer": "Sample good answer for team_leadership question - adequate detail, demonstrates competency clearly",
"score": "3",
"strengths": [
"Clear structure",
"Demonstrates competency",
"Adequate detail"
]
},
"great_answer": {
"answer": "Sample excellent answer for team_leadership question - exceptional detail, strong evidence, goes above and beyond",
"score": "4",
"strengths": [
"Exceptional detail",
"Strong evidence",
"Strategic thinking",
"Goes beyond requirements"
]
}
},
"scoring_rationale": {
"key_indicators": "Look for evidence of team leadership competency",
"red_flags": "Vague answers, lack of specifics, negative outcomes without learning",
"green_flags": "Specific examples, clear impact, demonstrates growth and learning"
}
},
"question_4": {
"question": "Design a go-to-market strategy for a new B2B SaaS product entering a competitive market.",
"competency": "product_strategy",
"sample_answers": {
"poor_answer": {
"answer": "Sample poor answer for product_strategy question - lacks detail, specificity, or demonstrates weak competency",
"score": "1-2",
"issues": [
"Vague response",
"Limited evidence of competency",
"Poor structure"
]
},
"good_answer": {
"answer": "Sample good answer for product_strategy question - adequate detail, demonstrates competency clearly",
"score": "3",
"strengths": [
"Clear structure",
"Demonstrates competency",
"Adequate detail"
]
},
"great_answer": {
"answer": "Sample excellent answer for product_strategy question - exceptional detail, strong evidence, goes above and beyond",
"score": "4",
"strengths": [
"Exceptional detail",
"Strong evidence",
"Strategic thinking",
"Goes beyond requirements"
]
}
},
"scoring_rationale": {
"key_indicators": "Look for evidence of product strategy competency",
"red_flags": "Vague answers, lack of specifics, negative outcomes without learning",
"green_flags": "Specific examples, clear impact, demonstrates growth and learning"
}
},
"question_5": {
"question": "What challenges have you faced related to business strategy and how did you overcome them?",
"competency": "business_strategy",
"sample_answers": {
"poor_answer": {
"answer": "Sample poor answer for business_strategy question - lacks detail, specificity, or demonstrates weak competency",
"score": "1-2",
"issues": [
"Vague response",
"Limited evidence of competency",
"Poor structure"
]
},
"good_answer": {
"answer": "Sample good answer for business_strategy question - adequate detail, demonstrates competency clearly",
"score": "3",
"strengths": [
"Clear structure",
"Demonstrates competency",
"Adequate detail"
]
},
"great_answer": {
"answer": "Sample excellent answer for business_strategy question - exceptional detail, strong evidence, goes above and beyond",
"score": "4",
"strengths": [
"Exceptional detail",
"Strong evidence",
"Strategic thinking",
"Goes beyond requirements"
]
}
},
"scoring_rationale": {
"key_indicators": "Look for evidence of business strategy competency",
"red_flags": "Vague answers, lack of specifics, negative outcomes without learning",
"green_flags": "Specific examples, clear impact, demonstrates growth and learning"
}
}
},
"usage_guidelines": {
"interview_flow": {
"warm_up": "Start with 1-2 easier questions to build rapport",
"core_assessment": "Focus majority of time on core competency questions",
"closing": "End with questions about candidate's questions/interests"
},
"time_management": {
"technical_questions": "Allow extra time for coding/design questions",
"behavioral_questions": "Keep to time limits but allow for follow-ups",
"total_recommendation": "45-75 minutes per interview round"
},
"question_selection": {
"variety": "Mix question types within each competency area",
"difficulty": "Adjust based on candidate responses and energy",
"customization": "Adapt questions based on candidate's background"
},
"common_mistakes": [
"Don't ask all questions mechanically",
"Don't skip follow-up questions",
"Don't forget to assess cultural fit alongside competencies",
"Don't let one strong/weak area bias overall assessment"
],
"calibration_reminders": [
"Compare against role standard, not other candidates",
"Focus on evidence demonstrated, not potential",
"Consider level-appropriate expectations",
"Document specific examples in feedback"
]
}
} Interview Question Bank: Product Manager (Senior Level)
======================================================================
Generated: 2026-02-16T13:27:41.303329
Total Questions: 20
Question Types: technical, behavioral, situational
Target Competencies: strategy, analytics, business_strategy, product_strategy, stakeholder_management, p&l_responsibility, leadership, team_leadership, user_research, data_analysis
INTERVIEW QUESTIONS
--------------------------------------------------
1. What challenges have you faced related to p&l responsibility and how did you overcome them?
Competency: P&L Responsibility
Type: Challenge_Based
Focus Areas: problem_solving, learning_from_experience
2. Analyze conversion funnel data to identify the biggest drop-off point and propose solutions.
Competency: Data Analysis
Type: Analytical
Time Limit: 45 minutes
3. What challenges have you faced related to team leadership and how did you overcome them?
Competency: Team Leadership
Type: Challenge_Based
Focus Areas: problem_solving, learning_from_experience
4. Design a go-to-market strategy for a new B2B SaaS product entering a competitive market.
Competency: Product Strategy
Type: Strategic
Time Limit: 60 minutes
5. What challenges have you faced related to business strategy and how did you overcome them?
Competency: Business Strategy
Type: Challenge_Based
Focus Areas: problem_solving, learning_from_experience
6. Describe your experience with business strategy in your current or previous role.
Competency: Business Strategy
Type: Experience
Focus Areas: experience_depth, practical_application
7. Describe your experience with team leadership in your current or previous role.
Competency: Team Leadership
Type: Experience
Focus Areas: experience_depth, practical_application
8. Describe a situation where you had to influence someone without having direct authority over them.
Competency: Leadership
Type: Behavioral
Focus Areas: influence, persuasion, stakeholder_management
9. Given a dataset of user activities, calculate the daily active users for the past month.
Competency: Data Analysis
Type: Analytical
Time Limit: 30 minutes
10. Describe your experience with analytics in your current or previous role.
Competency: Analytics
Type: Experience
Focus Areas: experience_depth, practical_application
11. How would you prioritize features for a mobile app with limited engineering resources?
Competency: Product Strategy
Type: Case_Study
Time Limit: 45 minutes
12. Describe your experience with stakeholder management in your current or previous role.
Competency: Stakeholder Management
Type: Experience
Focus Areas: experience_depth, practical_application
13. What challenges have you faced related to stakeholder management and how did you overcome them?
Competency: Stakeholder Management
Type: Challenge_Based
Focus Areas: problem_solving, learning_from_experience
14. What challenges have you faced related to user research and how did you overcome them?
Competency: User Research
Type: Challenge_Based
Focus Areas: problem_solving, learning_from_experience
15. What challenges have you faced related to strategy and how did you overcome them?
Competency: Strategy
Type: Challenge_Based
Focus Areas: problem_solving, learning_from_experience
16. Describe your experience with user research in your current or previous role.
Competency: User Research
Type: Experience
Focus Areas: experience_depth, practical_application
17. Describe your experience with p&l responsibility in your current or previous role.
Competency: P&L Responsibility
Type: Experience
Focus Areas: experience_depth, practical_application
18. Describe your experience with strategy in your current or previous role.
Competency: Strategy
Type: Experience
Focus Areas: experience_depth, practical_application
19. Tell me about a time when you had to lead a team through a significant change or challenge.
Competency: Leadership
Type: Behavioral
Focus Areas: change_management, team_motivation, communication
20. What challenges have you faced related to analytics and how did you overcome them?
Competency: Analytics
Type: Challenge_Based
Focus Areas: problem_solving, learning_from_experience
SCORING RUBRICS
--------------------------------------------------
Sample Scoring Criteria (behavioral questions):
Situation Clarity:
4: Clear, specific situation with relevant context and stakes
3: Good situation description with adequate context
2: Situation described but lacks some specifics
1: Vague or unclear situation description
Action Quality:
4: Specific, thoughtful actions showing strong competency
3: Good actions demonstrating competency
2: Adequate actions but could be stronger
1: Weak or inappropriate actions
Result Impact:
4: Significant positive impact with measurable results
3: Good positive impact with clear outcomes
2: Some positive impact demonstrated
1: Little or no positive impact shown
Self Awareness:
4: Excellent self-reflection, learns from experience, acknowledges growth areas
3: Good self-awareness and learning orientation
2: Some self-reflection demonstrated
1: Limited self-awareness or reflection
FOLLOW-UP PROBE EXAMPLES
--------------------------------------------------
Sample follow-up questions:
• Can you provide more specific details about your approach?
• What would you do differently if you had to do this again?
• What challenges did you face and how did you overcome them?
USAGE GUIDELINES
--------------------------------------------------
Interview Flow:
• Warm Up: Start with 1-2 easier questions to build rapport
• Core Assessment: Focus majority of time on core competency questions
• Closing: End with questions about candidate's questions/interests
Time Management:
• Technical Questions: Allow extra time for coding/design questions
• Behavioral Questions: Keep to time limits but allow for follow-ups
• Total Recommendation: 45-75 minutes per interview round
Common Mistakes to Avoid:
• Don't ask all questions mechanically
• Don't skip follow-up questions
• Don't forget to assess cultural fit alongside competencies
CALIBRATION EXAMPLES
--------------------------------------------------
Question: What challenges have you faced related to p&l responsibility and how did you overcome them?
Sample Answer Quality Levels:
Poor Answer (Score 1-2):
Issues: Vague response, Limited evidence of competency, Poor structure
Good Answer (Score 3):
Strengths: Clear structure, Demonstrates competency, Adequate detail
Great Answer (Score 4):
Strengths: Exceptional detail, Strong evidence, Strategic thinking, Goes beyond requirements {
"role": "Senior Software Engineer",
"level": "senior",
"team": "platform",
"generated_at": "2026-02-16T13:27:37.925680",
"total_duration_minutes": 300,
"total_rounds": 5,
"rounds": {
"round_1_technical_phone_screen": {
"name": "Technical Phone Screen",
"duration_minutes": 45,
"format": "virtual",
"objectives": [
"Assess coding fundamentals",
"Evaluate problem-solving approach",
"Screen for basic technical competency"
],
"question_types": [
"coding_problems",
"technical_concepts",
"experience_questions"
],
"evaluation_criteria": [
"technical_accuracy",
"problem_solving_process",
"communication_clarity"
],
"order": 1,
"focus_areas": [
"coding_fundamentals",
"problem_solving",
"technical_leadership",
"system_architecture",
"people_development"
]
},
"round_2_coding_deep_dive": {
"name": "Coding Deep Dive",
"duration_minutes": 75,
"format": "in_person_or_virtual",
"objectives": [
"Evaluate coding skills in depth",
"Assess code quality and testing",
"Review debugging approach"
],
"question_types": [
"complex_coding_problems",
"code_review",
"testing_strategy"
],
"evaluation_criteria": [
"code_quality",
"testing_approach",
"debugging_skills",
"optimization_thinking"
],
"order": 2,
"focus_areas": [
"technical_execution",
"code_quality",
"technical_leadership",
"system_architecture",
"people_development"
]
},
"round_3_system_design": {
"name": "System Design",
"duration_minutes": 75,
"format": "collaborative_whiteboard",
"objectives": [
"Assess architectural thinking",
"Evaluate scalability considerations",
"Review trade-off analysis"
],
"question_types": [
"system_architecture",
"scalability_design",
"trade_off_analysis"
],
"evaluation_criteria": [
"architectural_thinking",
"scalability_awareness",
"trade_off_reasoning"
],
"order": 3,
"focus_areas": [
"system_thinking",
"architectural_reasoning",
"technical_leadership",
"system_architecture",
"people_development"
]
},
"round_4_behavioral": {
"name": "Behavioral Interview",
"duration_minutes": 45,
"format": "conversational",
"objectives": [
"Assess cultural fit",
"Evaluate past experiences",
"Review leadership examples"
],
"question_types": [
"star_method_questions",
"situational_scenarios",
"values_alignment"
],
"evaluation_criteria": [
"communication_skills",
"leadership_examples",
"cultural_alignment"
],
"order": 4,
"focus_areas": [
"cultural_fit",
"communication",
"teamwork",
"technical_leadership",
"system_architecture"
]
},
"round_5_technical_leadership": {
"name": "Technical Leadership",
"duration_minutes": 60,
"format": "discussion_based",
"objectives": [
"Evaluate mentoring capability",
"Assess technical decision making",
"Review cross-team collaboration"
],
"question_types": [
"leadership_scenarios",
"technical_decisions",
"mentoring_examples"
],
"evaluation_criteria": [
"leadership_potential",
"technical_judgment",
"influence_skills"
],
"order": 5,
"focus_areas": [
"leadership",
"mentoring",
"influence",
"technical_leadership",
"system_architecture"
]
}
},
"suggested_schedule": {
"type": "multi_day",
"total_duration_minutes": 300,
"recommended_breaks": [
{
"type": "short_break",
"duration": 15,
"after_minutes": 90
},
{
"type": "lunch_break",
"duration": 60,
"after_minutes": 180
}
],
"day_structure": {
"day_1": {
"date": "TBD",
"start_time": "09:00",
"end_time": "12:45",
"rounds": [
{
"type": "interview",
"round_name": "round_1_technical_phone_screen",
"title": "Technical Phone Screen",
"start_time": "09:00",
"end_time": "09:45",
"duration_minutes": 45,
"format": "virtual"
},
{
"type": "interview",
"round_name": "round_2_coding_deep_dive",
"title": "Coding Deep Dive",
"start_time": "10:00",
"end_time": "11:15",
"duration_minutes": 75,
"format": "in_person_or_virtual"
},
{
"type": "interview",
"round_name": "round_3_system_design",
"title": "System Design",
"start_time": "11:30",
"end_time": "12:45",
"duration_minutes": 75,
"format": "collaborative_whiteboard"
}
]
},
"day_2": {
"date": "TBD",
"start_time": "09:00",
"end_time": "11:00",
"rounds": [
{
"type": "interview",
"round_name": "round_4_behavioral",
"title": "Behavioral Interview",
"start_time": "09:00",
"end_time": "09:45",
"duration_minutes": 45,
"format": "conversational"
},
{
"type": "interview",
"round_name": "round_5_technical_leadership",
"title": "Technical Leadership",
"start_time": "10:00",
"end_time": "11:00",
"duration_minutes": 60,
"format": "discussion_based"
}
]
}
},
"logistics_notes": [
"Coordinate interviewer availability before scheduling",
"Ensure all interviewers have access to job description and competency requirements",
"Prepare interview rooms/virtual links for all rounds",
"Share candidate resume and application with all interviewers",
"Test video conferencing setup before virtual interviews",
"Share virtual meeting links with candidate 24 hours in advance",
"Prepare whiteboard or collaborative online tool for design sessions"
]
},
"scorecard_template": {
"scoring_scale": {
"4": "Exceeds Expectations - Demonstrates mastery beyond required level",
"3": "Meets Expectations - Solid performance meeting all requirements",
"2": "Partially Meets - Shows potential but has development areas",
"1": "Does Not Meet - Significant gaps in required competencies"
},
"dimensions": [
{
"dimension": "system_architecture",
"weight": "high",
"scale": "1-4",
"description": "Assessment of system architecture competency"
},
{
"dimension": "technical_leadership",
"weight": "high",
"scale": "1-4",
"description": "Assessment of technical leadership competency"
},
{
"dimension": "mentoring",
"weight": "high",
"scale": "1-4",
"description": "Assessment of mentoring competency"
},
{
"dimension": "cross_team_collab",
"weight": "high",
"scale": "1-4",
"description": "Assessment of cross team collab competency"
},
{
"dimension": "technology_evaluation",
"weight": "medium",
"scale": "1-4",
"description": "Assessment of technology evaluation competency"
},
{
"dimension": "process_improvement",
"weight": "medium",
"scale": "1-4",
"description": "Assessment of process improvement competency"
},
{
"dimension": "hiring_contribution",
"weight": "medium",
"scale": "1-4",
"description": "Assessment of hiring contribution competency"
},
{
"dimension": "communication",
"weight": "high",
"scale": "1-4"
},
{
"dimension": "cultural_fit",
"weight": "medium",
"scale": "1-4"
},
{
"dimension": "learning_agility",
"weight": "medium",
"scale": "1-4"
}
],
"overall_recommendation": {
"options": [
"Strong Hire",
"Hire",
"No Hire",
"Strong No Hire"
],
"criteria": "Based on weighted average and minimum thresholds"
},
"calibration_notes": {
"required": true,
"min_length": 100,
"sections": [
"strengths",
"areas_for_development",
"specific_examples"
]
}
},
"interviewer_requirements": {
"round_1_technical_phone_screen": {
"required_skills": [
"technical_assessment",
"coding_evaluation"
],
"preferred_experience": [
"same_domain",
"senior_level"
],
"calibration_level": "standard",
"suggested_interviewers": [
"senior_engineer",
"tech_lead"
]
},
"round_2_coding_deep_dive": {
"required_skills": [
"advanced_technical",
"code_quality_assessment"
],
"preferred_experience": [
"senior_engineer",
"system_design"
],
"calibration_level": "high",
"suggested_interviewers": [
"senior_engineer",
"staff_engineer"
]
},
"round_3_system_design": {
"required_skills": [
"architecture_design",
"scalability_assessment"
],
"preferred_experience": [
"senior_architect",
"large_scale_systems"
],
"calibration_level": "high",
"suggested_interviewers": [
"senior_architect",
"staff_engineer"
]
},
"round_4_behavioral": {
"required_skills": [
"behavioral_interviewing",
"competency_assessment"
],
"preferred_experience": [
"hiring_manager",
"people_leadership"
],
"calibration_level": "standard",
"suggested_interviewers": [
"hiring_manager",
"people_manager"
]
},
"round_5_technical_leadership": {
"required_skills": [
"leadership_assessment",
"technical_mentoring"
],
"preferred_experience": [
"engineering_manager",
"tech_lead"
],
"calibration_level": "high",
"suggested_interviewers": [
"engineering_manager",
"senior_staff"
]
}
},
"competency_framework": {
"required": [
"system_architecture",
"technical_leadership",
"mentoring",
"cross_team_collab"
],
"preferred": [
"technology_evaluation",
"process_improvement",
"hiring_contribution"
],
"focus_areas": [
"technical_leadership",
"system_architecture",
"people_development"
]
},
"calibration_notes": {
"hiring_bar_notes": "Calibrated for senior level software engineer role",
"common_pitfalls": [
"Avoid comparing candidates to each other rather than to the role standard",
"Don't let one strong/weak area overshadow overall assessment",
"Ensure consistent application of evaluation criteria"
],
"calibration_checkpoints": [
"Review score distribution after every 5 candidates",
"Conduct monthly interviewer calibration sessions",
"Track correlation with 6-month performance reviews"
],
"escalation_criteria": [
"Any candidate receiving all 4s or all 1s",
"Significant disagreement between interviewers (>1.5 point spread)",
"Unusual circumstances or accommodations needed"
]
}
} Interview Loop Design for Senior Software Engineer (Senior Level)
============================================================
Team: platform
Generated: 2026-02-16T13:27:37.925680
Total Duration: 300 minutes (5h 0m)
Total Rounds: 5
INTERVIEW ROUNDS
----------------------------------------
Round 1: Technical Phone Screen
Duration: 45 minutes
Format: Virtual
Objectives:
• Assess coding fundamentals
• Evaluate problem-solving approach
• Screen for basic technical competency
Focus Areas:
• Coding Fundamentals
• Problem Solving
• Technical Leadership
• System Architecture
• People Development
Round 2: Coding Deep Dive
Duration: 75 minutes
Format: In Person Or Virtual
Objectives:
• Evaluate coding skills in depth
• Assess code quality and testing
• Review debugging approach
Focus Areas:
• Technical Execution
• Code Quality
• Technical Leadership
• System Architecture
• People Development
Round 3: System Design
Duration: 75 minutes
Format: Collaborative Whiteboard
Objectives:
• Assess architectural thinking
• Evaluate scalability considerations
• Review trade-off analysis
Focus Areas:
• System Thinking
• Architectural Reasoning
• Technical Leadership
• System Architecture
• People Development
Round 4: Behavioral Interview
Duration: 45 minutes
Format: Conversational
Objectives:
• Assess cultural fit
• Evaluate past experiences
• Review leadership examples
Focus Areas:
• Cultural Fit
• Communication
• Teamwork
• Technical Leadership
• System Architecture
Round 5: Technical Leadership
Duration: 60 minutes
Format: Discussion Based
Objectives:
• Evaluate mentoring capability
• Assess technical decision making
• Review cross-team collaboration
Focus Areas:
• Leadership
• Mentoring
• Influence
• Technical Leadership
• System Architecture
SUGGESTED SCHEDULE
----------------------------------------
Schedule Type: Multi Day
Day 1:
Time: 09:00 - 12:45
09:00-09:45: Technical Phone Screen (45min)
10:00-11:15: Coding Deep Dive (75min)
11:30-12:45: System Design (75min)
Day 2:
Time: 09:00 - 11:00
09:00-09:45: Behavioral Interview (45min)
10:00-11:00: Technical Leadership (60min)
INTERVIEWER REQUIREMENTS
----------------------------------------
Technical Phone Screen:
Required Skills: technical_assessment, coding_evaluation
Suggested Interviewers: senior_engineer, tech_lead
Calibration Level: Standard
Coding Deep Dive:
Required Skills: advanced_technical, code_quality_assessment
Suggested Interviewers: senior_engineer, staff_engineer
Calibration Level: High
System Design:
Required Skills: architecture_design, scalability_assessment
Suggested Interviewers: senior_architect, staff_engineer
Calibration Level: High
Behavioral:
Required Skills: behavioral_interviewing, competency_assessment
Suggested Interviewers: hiring_manager, people_manager
Calibration Level: Standard
Technical Leadership:
Required Skills: leadership_assessment, technical_mentoring
Suggested Interviewers: engineering_manager, senior_staff
Calibration Level: High
SCORECARD TEMPLATE
----------------------------------------
Scoring Scale:
4: Exceeds Expectations - Demonstrates mastery beyond required level
3: Meets Expectations - Solid performance meeting all requirements
2: Partially Meets - Shows potential but has development areas
1: Does Not Meet - Significant gaps in required competencies
Evaluation Dimensions:
• System Architecture (Weight: high)
• Technical Leadership (Weight: high)
• Mentoring (Weight: high)
• Cross Team Collab (Weight: high)
• Technology Evaluation (Weight: medium)
• Process Improvement (Weight: medium)
• Hiring Contribution (Weight: medium)
• Communication (Weight: high)
• Cultural Fit (Weight: medium)
• Learning Agility (Weight: medium)
CALIBRATION NOTES
----------------------------------------
Hiring Bar: Calibrated for senior level software engineer role
Common Pitfalls:
• Avoid comparing candidates to each other rather than to the role standard
• Don't let one strong/weak area overshadow overall assessment
• Ensure consistent application of evaluation criteria #!/usr/bin/env python3
"""
Hiring Calibrator
Analyzes interview scores from multiple candidates and interviewers to detect bias,
calibration issues, and inconsistent rubric application. Generates calibration reports
with specific recommendations for interviewer coaching and process improvements.
Usage:
python hiring_calibrator.py --input interview_results.json --analysis-type comprehensive
python hiring_calibrator.py --input data.json --competencies technical,leadership --output report.json
python hiring_calibrator.py --input historical_data.json --trend-analysis --period quarterly
"""
import os
import sys
import json
import argparse
import statistics
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Any, Tuple
from collections import defaultdict, Counter
import math
class HiringCalibrator:
"""Analyzes interview data for bias detection and calibration issues."""
def __init__(self):
self.bias_thresholds = self._init_bias_thresholds()
self.calibration_standards = self._init_calibration_standards()
self.demographic_categories = self._init_demographic_categories()
def _init_bias_thresholds(self) -> Dict[str, float]:
"""Initialize statistical thresholds for bias detection."""
return {
"score_variance_threshold": 1.5, # Standard deviations
"pass_rate_difference_threshold": 0.15, # 15% difference
"interviewer_consistency_threshold": 0.8, # Correlation coefficient
"demographic_parity_threshold": 0.10, # 10% difference
"score_inflation_threshold": 0.3, # 30% above historical average
"score_deflation_threshold": 0.3, # 30% below historical average
"minimum_sample_size": 5 # Minimum candidates per analysis
}
def _init_calibration_standards(self) -> Dict[str, Dict]:
"""Initialize expected calibration standards."""
return {
"score_distribution": {
"target_mean": 2.8, # Expected average score (1-4 scale)
"target_std": 0.9, # Expected standard deviation
"expected_distribution": {
"1": 0.10, # 10% score 1 (does not meet)
"2": 0.25, # 25% score 2 (partially meets)
"3": 0.45, # 45% score 3 (meets expectations)
"4": 0.20 # 20% score 4 (exceeds expectations)
}
},
"interviewer_agreement": {
"minimum_correlation": 0.70, # Minimum correlation between interviewers
"maximum_std_deviation": 0.8, # Maximum std dev in scores for same candidate
"agreement_threshold": 0.75 # % of time interviewers should agree within 1 point
},
"pass_rates": {
"junior_level": 0.25, # 25% pass rate for junior roles
"mid_level": 0.20, # 20% pass rate for mid roles
"senior_level": 0.15, # 15% pass rate for senior roles
"staff_level": 0.10, # 10% pass rate for staff+ roles
"leadership": 0.12 # 12% pass rate for leadership roles
}
}
def _init_demographic_categories(self) -> List[str]:
"""Initialize demographic categories to analyze for bias."""
return [
"gender", "ethnicity", "education_level", "previous_company_size",
"years_experience", "university_tier", "geographic_location"
]
def analyze_hiring_calibration(self, interview_data: List[Dict[str, Any]],
analysis_type: str = "comprehensive",
competencies: Optional[List[str]] = None,
trend_analysis: bool = False,
period: str = "monthly") -> Dict[str, Any]:
"""Perform comprehensive hiring calibration analysis."""
# Validate and preprocess data
processed_data = self._preprocess_interview_data(interview_data)
if len(processed_data) < self.bias_thresholds["minimum_sample_size"]:
return {
"error": "Insufficient data for analysis",
"minimum_required": self.bias_thresholds["minimum_sample_size"],
"actual_samples": len(processed_data)
}
# Perform different types of analysis based on request
analysis_results = {
"analysis_type": analysis_type,
"data_summary": self._generate_data_summary(processed_data),
"generated_at": datetime.now().isoformat()
}
if analysis_type in ["comprehensive", "bias"]:
analysis_results["bias_analysis"] = self._analyze_bias_patterns(processed_data, competencies)
if analysis_type in ["comprehensive", "calibration"]:
analysis_results["calibration_analysis"] = self._analyze_calibration_consistency(processed_data, competencies)
if analysis_type in ["comprehensive", "interviewer"]:
analysis_results["interviewer_analysis"] = self._analyze_interviewer_bias(processed_data)
if analysis_type in ["comprehensive", "scoring"]:
analysis_results["scoring_analysis"] = self._analyze_scoring_patterns(processed_data, competencies)
if trend_analysis:
analysis_results["trend_analysis"] = self._analyze_trends_over_time(processed_data, period)
# Generate recommendations
analysis_results["recommendations"] = self._generate_recommendations(analysis_results)
# Calculate overall calibration health score
analysis_results["calibration_health_score"] = self._calculate_health_score(analysis_results)
return analysis_results
def _preprocess_interview_data(self, raw_data: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Clean and validate interview data."""
processed_data = []
for record in raw_data:
if self._validate_interview_record(record):
processed_record = self._standardize_record(record)
processed_data.append(processed_record)
return processed_data
def _validate_interview_record(self, record: Dict[str, Any]) -> bool:
"""Validate that an interview record has required fields."""
required_fields = ["candidate_id", "interviewer_id", "scores", "overall_recommendation", "date"]
for field in required_fields:
if field not in record or record[field] is None:
return False
# Validate scores format
if not isinstance(record["scores"], dict):
return False
# Validate score values are numeric and in valid range (1-4)
for competency, score in record["scores"].items():
if not isinstance(score, (int, float)) or not (1 <= score <= 4):
return False
return True
def _standardize_record(self, record: Dict[str, Any]) -> Dict[str, Any]:
"""Standardize record format and add computed fields."""
standardized = record.copy()
# Calculate average score
scores = list(record["scores"].values())
standardized["average_score"] = statistics.mean(scores)
# Standardize recommendation to binary
recommendation = record["overall_recommendation"].lower()
standardized["hire_decision"] = recommendation in ["hire", "strong hire", "yes"]
# Parse date if string
if isinstance(record["date"], str):
try:
standardized["date"] = datetime.fromisoformat(record["date"].replace("Z", "+00:00"))
except ValueError:
standardized["date"] = datetime.now()
# Add demographic info if available
for category in self.demographic_categories:
if category not in standardized:
standardized[category] = "unknown"
# Add level normalization
role = record.get("role", "").lower()
if any(level in role for level in ["junior", "associate", "entry"]):
standardized["normalized_level"] = "junior"
elif any(level in role for level in ["senior", "sr"]):
standardized["normalized_level"] = "senior"
elif any(level in role for level in ["staff", "principal", "lead"]):
standardized["normalized_level"] = "staff"
else:
standardized["normalized_level"] = "mid"
return standardized
def _generate_data_summary(self, data: List[Dict[str, Any]]) -> Dict[str, Any]:
"""Generate summary statistics for the dataset."""
if not data:
return {}
total_candidates = len(data)
unique_interviewers = len(set(record["interviewer_id"] for record in data))
# Score statistics
all_scores = []
all_average_scores = []
hire_decisions = []
for record in data:
all_scores.extend(record["scores"].values())
all_average_scores.append(record["average_score"])
hire_decisions.append(record["hire_decision"])
# Date range
dates = [record["date"] for record in data if record["date"]]
date_range = {
"start_date": min(dates).isoformat() if dates else None,
"end_date": max(dates).isoformat() if dates else None,
"total_days": (max(dates) - min(dates)).days if len(dates) > 1 else 0
}
# Role distribution
roles = [record.get("role", "unknown") for record in data]
role_distribution = dict(Counter(roles))
return {
"total_candidates": total_candidates,
"unique_interviewers": unique_interviewers,
"candidates_per_interviewer": round(total_candidates / unique_interviewers, 2),
"date_range": date_range,
"score_statistics": {
"mean_individual_scores": round(statistics.mean(all_scores), 2),
"std_individual_scores": round(statistics.stdev(all_scores) if len(all_scores) > 1 else 0, 2),
"mean_average_scores": round(statistics.mean(all_average_scores), 2),
"std_average_scores": round(statistics.stdev(all_average_scores) if len(all_average_scores) > 1 else 0, 2)
},
"hire_rate": round(sum(hire_decisions) / len(hire_decisions), 3),
"role_distribution": role_distribution
}
def _analyze_bias_patterns(self, data: List[Dict[str, Any]],
target_competencies: Optional[List[str]]) -> Dict[str, Any]:
"""Analyze potential bias patterns in interview decisions."""
bias_analysis = {
"demographic_bias": {},
"interviewer_bias": {},
"competency_bias": {},
"overall_bias_score": 0
}
# Analyze demographic bias
for demographic in self.demographic_categories:
if all(record.get(demographic) == "unknown" for record in data):
continue
demographic_analysis = self._analyze_demographic_bias(data, demographic)
if demographic_analysis["bias_detected"]:
bias_analysis["demographic_bias"][demographic] = demographic_analysis
# Analyze interviewer bias
bias_analysis["interviewer_bias"] = self._analyze_interviewer_bias(data)
# Analyze competency bias if specified
if target_competencies:
bias_analysis["competency_bias"] = self._analyze_competency_bias(data, target_competencies)
# Calculate overall bias score
bias_analysis["overall_bias_score"] = self._calculate_bias_score(bias_analysis)
return bias_analysis
def _analyze_demographic_bias(self, data: List[Dict[str, Any]],
demographic: str) -> Dict[str, Any]:
"""Analyze bias for a specific demographic category."""
# Group data by demographic values
demographic_groups = defaultdict(list)
for record in data:
demo_value = record.get(demographic, "unknown")
if demo_value != "unknown":
demographic_groups[demo_value].append(record)
if len(demographic_groups) < 2:
return {"bias_detected": False, "reason": "insufficient_groups"}
# Calculate statistics for each group
group_stats = {}
for group, records in demographic_groups.items():
if len(records) >= self.bias_thresholds["minimum_sample_size"]:
scores = [r["average_score"] for r in records]
hire_rate = sum(r["hire_decision"] for r in records) / len(records)
group_stats[group] = {
"count": len(records),
"mean_score": statistics.mean(scores),
"hire_rate": hire_rate,
"std_score": statistics.stdev(scores) if len(scores) > 1 else 0
}
if len(group_stats) < 2:
return {"bias_detected": False, "reason": "insufficient_sample_sizes"}
# Detect statistical differences
bias_detected = False
bias_details = {}
# Check for significant differences in hire rates
hire_rates = [stats["hire_rate"] for stats in group_stats.values()]
max_hire_rate_diff = max(hire_rates) - min(hire_rates)
if max_hire_rate_diff > self.bias_thresholds["demographic_parity_threshold"]:
bias_detected = True
bias_details["hire_rate_disparity"] = {
"max_difference": round(max_hire_rate_diff, 3),
"threshold": self.bias_thresholds["demographic_parity_threshold"],
"group_stats": group_stats
}
# Check for significant differences in scoring
mean_scores = [stats["mean_score"] for stats in group_stats.values()]
max_score_diff = max(mean_scores) - min(mean_scores)
if max_score_diff > 0.5: # Half point difference threshold
bias_detected = True
bias_details["scoring_disparity"] = {
"max_difference": round(max_score_diff, 3),
"group_stats": group_stats
}
return {
"bias_detected": bias_detected,
"demographic": demographic,
"group_statistics": group_stats,
"bias_details": bias_details,
"recommendation": self._generate_demographic_bias_recommendation(demographic, bias_details) if bias_detected else None
}
def _analyze_interviewer_bias(self, data: List[Dict[str, Any]]) -> Dict[str, Any]:
"""Analyze bias patterns across different interviewers."""
interviewer_stats = defaultdict(list)
# Group by interviewer
for record in data:
interviewer_id = record["interviewer_id"]
interviewer_stats[interviewer_id].append(record)
# Calculate statistics per interviewer
interviewer_analysis = {}
for interviewer_id, records in interviewer_stats.items():
if len(records) >= self.bias_thresholds["minimum_sample_size"]:
scores = [r["average_score"] for r in records]
hire_rate = sum(r["hire_decision"] for r in records) / len(records)
interviewer_analysis[interviewer_id] = {
"total_interviews": len(records),
"mean_score": statistics.mean(scores),
"std_score": statistics.stdev(scores) if len(scores) > 1 else 0,
"hire_rate": hire_rate,
"score_inflation": self._detect_score_inflation(scores),
"consistency_score": self._calculate_interviewer_consistency(records)
}
# Identify outlier interviewers
if len(interviewer_analysis) > 1:
overall_mean_score = statistics.mean([stats["mean_score"] for stats in interviewer_analysis.values()])
overall_hire_rate = statistics.mean([stats["hire_rate"] for stats in interviewer_analysis.values()])
outlier_interviewers = {}
for interviewer_id, stats in interviewer_analysis.items():
issues = []
# Check for score inflation/deflation
if stats["mean_score"] > overall_mean_score * (1 + self.bias_thresholds["score_inflation_threshold"]):
issues.append("score_inflation")
elif stats["mean_score"] < overall_mean_score * (1 - self.bias_thresholds["score_deflation_threshold"]):
issues.append("score_deflation")
# Check for hire rate deviation
hire_rate_diff = abs(stats["hire_rate"] - overall_hire_rate)
if hire_rate_diff > self.bias_thresholds["pass_rate_difference_threshold"]:
issues.append("hire_rate_deviation")
# Check for low consistency
if stats["consistency_score"] < self.bias_thresholds["interviewer_consistency_threshold"]:
issues.append("low_consistency")
if issues:
outlier_interviewers[interviewer_id] = {
"issues": issues,
"statistics": stats,
"severity": len(issues) # More issues = higher severity
}
return {
"interviewer_statistics": interviewer_analysis,
"outlier_interviewers": outlier_interviewers if len(interviewer_analysis) > 1 else {},
"overall_consistency": self._calculate_overall_interviewer_consistency(data),
"recommendations": self._generate_interviewer_recommendations(outlier_interviewers if len(interviewer_analysis) > 1 else {})
}
def _analyze_competency_bias(self, data: List[Dict[str, Any]],
competencies: List[str]) -> Dict[str, Any]:
"""Analyze bias patterns within specific competencies."""
competency_analysis = {}
for competency in competencies:
# Extract scores for this competency
competency_scores = []
for record in data:
if competency in record["scores"]:
competency_scores.append({
"score": record["scores"][competency],
"interviewer": record["interviewer_id"],
"candidate": record["candidate_id"],
"overall_decision": record["hire_decision"]
})
if len(competency_scores) < self.bias_thresholds["minimum_sample_size"]:
continue
# Analyze scoring patterns
scores = [item["score"] for item in competency_scores]
score_variance = statistics.variance(scores) if len(scores) > 1 else 0
# Analyze by interviewer
interviewer_competency_scores = defaultdict(list)
for item in competency_scores:
interviewer_competency_scores[item["interviewer"]].append(item["score"])
interviewer_variations = {}
if len(interviewer_competency_scores) > 1:
interviewer_means = {interviewer: statistics.mean(scores)
for interviewer, scores in interviewer_competency_scores.items()
if len(scores) >= 3}
if len(interviewer_means) > 1:
mean_of_means = statistics.mean(interviewer_means.values())
for interviewer, mean_score in interviewer_means.items():
deviation = abs(mean_score - mean_of_means)
if deviation > 0.5: # More than half point deviation
interviewer_variations[interviewer] = {
"mean_score": round(mean_score, 2),
"deviation_from_average": round(deviation, 2),
"sample_size": len(interviewer_competency_scores[interviewer])
}
competency_analysis[competency] = {
"total_scores": len(competency_scores),
"mean_score": round(statistics.mean(scores), 2),
"score_variance": round(score_variance, 2),
"interviewer_variations": interviewer_variations,
"bias_detected": len(interviewer_variations) > 0
}
return competency_analysis
def _analyze_calibration_consistency(self, data: List[Dict[str, Any]],
target_competencies: Optional[List[str]]) -> Dict[str, Any]:
"""Analyze calibration consistency across interviews."""
# Group candidates by those interviewed by multiple people
candidate_interviewers = defaultdict(list)
for record in data:
candidate_interviewers[record["candidate_id"]].append(record)
multi_interviewer_candidates = {
candidate: records for candidate, records in candidate_interviewers.items()
if len(records) > 1
}
if not multi_interviewer_candidates:
return {
"error": "No candidates with multiple interviewers found",
"single_interviewer_analysis": self._analyze_single_interviewer_consistency(data)
}
# Calculate agreement statistics
agreement_stats = []
score_correlations = []
for candidate, records in multi_interviewer_candidates.items():
candidate_scores = []
interviewer_pairs = []
for record in records:
avg_score = record["average_score"]
candidate_scores.append(avg_score)
interviewer_pairs.append(record["interviewer_id"])
if len(candidate_scores) > 1:
# Calculate standard deviation of scores for this candidate
score_std = statistics.stdev(candidate_scores)
agreement_stats.append(score_std)
# Check if all interviewers agree within 1 point
score_range = max(candidate_scores) - min(candidate_scores)
agreement_within_one = score_range <= 1.0
score_correlations.append({
"candidate": candidate,
"scores": candidate_scores,
"interviewers": interviewer_pairs,
"score_std": score_std,
"score_range": score_range,
"agreement_within_one": agreement_within_one
})
# Calculate overall calibration metrics
mean_score_std = statistics.mean(agreement_stats) if agreement_stats else 0
agreement_rate = sum(1 for corr in score_correlations if corr["agreement_within_one"]) / len(score_correlations) if score_correlations else 0
calibration_quality = "good"
if mean_score_std > self.calibration_standards["interviewer_agreement"]["maximum_std_deviation"]:
calibration_quality = "poor"
elif agreement_rate < self.calibration_standards["interviewer_agreement"]["agreement_threshold"]:
calibration_quality = "fair"
return {
"multi_interviewer_candidates": len(multi_interviewer_candidates),
"mean_score_standard_deviation": round(mean_score_std, 3),
"agreement_within_one_point_rate": round(agreement_rate, 3),
"calibration_quality": calibration_quality,
"candidate_agreement_details": score_correlations,
"target_standards": self.calibration_standards["interviewer_agreement"],
"recommendations": self._generate_calibration_recommendations(mean_score_std, agreement_rate)
}
def _analyze_scoring_patterns(self, data: List[Dict[str, Any]],
target_competencies: Optional[List[str]]) -> Dict[str, Any]:
"""Analyze overall scoring patterns and distributions."""
# Overall score distribution
all_individual_scores = []
all_average_scores = []
score_distribution = defaultdict(int)
for record in data:
avg_score = record["average_score"]
all_average_scores.append(avg_score)
for competency, score in record["scores"].items():
if not target_competencies or competency in target_competencies:
all_individual_scores.append(score)
score_distribution[str(int(score))] += 1
# Calculate distribution percentages
total_scores = sum(score_distribution.values())
score_percentages = {score: count/total_scores for score, count in score_distribution.items()}
# Compare against expected distribution
expected_dist = self.calibration_standards["score_distribution"]["expected_distribution"]
distribution_analysis = {}
for score in ["1", "2", "3", "4"]:
expected_pct = expected_dist.get(score, 0)
actual_pct = score_percentages.get(score, 0)
difference = actual_pct - expected_pct
distribution_analysis[score] = {
"expected_percentage": expected_pct,
"actual_percentage": round(actual_pct, 3),
"difference": round(difference, 3),
"significant_deviation": abs(difference) > 0.05 # 5% threshold
}
# Calculate scoring statistics
mean_score = statistics.mean(all_individual_scores) if all_individual_scores else 0
std_score = statistics.stdev(all_individual_scores) if len(all_individual_scores) > 1 else 0
target_mean = self.calibration_standards["score_distribution"]["target_mean"]
target_std = self.calibration_standards["score_distribution"]["target_std"]
# Analyze pass rates by level
level_pass_rates = {}
level_groups = defaultdict(list)
for record in data:
level = record.get("normalized_level", "unknown")
level_groups[level].append(record["hire_decision"])
for level, decisions in level_groups.items():
if len(decisions) >= self.bias_thresholds["minimum_sample_size"]:
pass_rate = sum(decisions) / len(decisions)
expected_rate = self.calibration_standards["pass_rates"].get(f"{level}_level", 0.15)
level_pass_rates[level] = {
"actual_pass_rate": round(pass_rate, 3),
"expected_pass_rate": expected_rate,
"difference": round(pass_rate - expected_rate, 3),
"sample_size": len(decisions)
}
return {
"score_statistics": {
"mean_score": round(mean_score, 2),
"std_score": round(std_score, 2),
"target_mean": target_mean,
"target_std": target_std,
"mean_deviation": round(abs(mean_score - target_mean), 2),
"std_deviation": round(abs(std_score - target_std), 2)
},
"score_distribution": distribution_analysis,
"level_pass_rates": level_pass_rates,
"overall_assessment": self._assess_scoring_health(distribution_analysis, mean_score, target_mean)
}
def _analyze_trends_over_time(self, data: List[Dict[str, Any]], period: str) -> Dict[str, Any]:
"""Analyze trends in hiring patterns over time."""
# Sort data by date
dated_data = [record for record in data if record.get("date")]
dated_data.sort(key=lambda x: x["date"])
if len(dated_data) < 10: # Need minimum data for trend analysis
return {"error": "Insufficient data for trend analysis", "minimum_required": 10}
# Group by time period
period_groups = defaultdict(list)
for record in dated_data:
date = record["date"]
if period == "weekly":
period_key = date.strftime("%Y-W%U")
elif period == "monthly":
period_key = date.strftime("%Y-%m")
elif period == "quarterly":
quarter = (date.month - 1) // 3 + 1
period_key = f"{date.year}-Q{quarter}"
else: # daily
period_key = date.strftime("%Y-%m-%d")
period_groups[period_key].append(record)
# Calculate metrics for each period
period_metrics = {}
for period_key, records in period_groups.items():
if len(records) >= 3: # Minimum for meaningful metrics
scores = [r["average_score"] for r in records]
hire_rate = sum(r["hire_decision"] for r in records) / len(records)
period_metrics[period_key] = {
"count": len(records),
"mean_score": statistics.mean(scores),
"hire_rate": hire_rate,
"std_score": statistics.stdev(scores) if len(scores) > 1 else 0
}
if len(period_metrics) < 3:
return {"error": "Insufficient periods for trend analysis"}
# Analyze trends
sorted_periods = sorted(period_metrics.keys())
mean_scores = [period_metrics[p]["mean_score"] for p in sorted_periods]
hire_rates = [period_metrics[p]["hire_rate"] for p in sorted_periods]
# Simple linear trend calculation
score_trend = self._calculate_linear_trend(mean_scores)
hire_rate_trend = self._calculate_linear_trend(hire_rates)
return {
"period": period,
"total_periods": len(period_metrics),
"period_metrics": period_metrics,
"trends": {
"score_trend": {
"direction": "increasing" if score_trend > 0.01 else "decreasing" if score_trend < -0.01 else "stable",
"slope": round(score_trend, 4),
"significance": "significant" if abs(score_trend) > 0.05 else "minor"
},
"hire_rate_trend": {
"direction": "increasing" if hire_rate_trend > 0.005 else "decreasing" if hire_rate_trend < -0.005 else "stable",
"slope": round(hire_rate_trend, 4),
"significance": "significant" if abs(hire_rate_trend) > 0.02 else "minor"
}
},
"insights": self._generate_trend_insights(score_trend, hire_rate_trend, period_metrics)
}
def _calculate_linear_trend(self, values: List[float]) -> float:
"""Calculate simple linear trend slope."""
if len(values) < 2:
return 0
n = len(values)
x = list(range(n))
# Calculate slope using least squares
x_mean = statistics.mean(x)
y_mean = statistics.mean(values)
numerator = sum((x[i] - x_mean) * (values[i] - y_mean) for i in range(n))
denominator = sum((x[i] - x_mean) ** 2 for i in range(n))
return numerator / denominator if denominator != 0 else 0
def _detect_score_inflation(self, scores: List[float]) -> Dict[str, Any]:
"""Detect if an interviewer shows score inflation patterns."""
if len(scores) < 5:
return {"insufficient_data": True}
mean_score = statistics.mean(scores)
std_score = statistics.stdev(scores)
# Check against expected mean (2.8)
expected_mean = self.calibration_standards["score_distribution"]["target_mean"]
deviation = mean_score - expected_mean
# High scores with low variance might indicate inflation
high_scores_low_variance = mean_score > 3.2 and std_score < 0.5
# Check distribution - too many 4s might indicate inflation
score_counts = Counter([int(score) for score in scores])
four_count_ratio = score_counts.get(4, 0) / len(scores)
return {
"mean_score": round(mean_score, 2),
"expected_mean": expected_mean,
"deviation": round(deviation, 2),
"high_scores_low_variance": high_scores_low_variance,
"four_count_ratio": round(four_count_ratio, 2),
"inflation_detected": deviation > 0.3 or high_scores_low_variance or four_count_ratio > 0.4
}
def _calculate_interviewer_consistency(self, records: List[Dict[str, Any]]) -> float:
"""Calculate consistency score for an interviewer."""
if len(records) < 3:
return 0.5 # Neutral score for insufficient data
# Look at variance in scoring
avg_scores = [r["average_score"] for r in records]
score_variance = statistics.variance(avg_scores)
# Look at decision consistency relative to scores
decisions = [r["hire_decision"] for r in records]
scores_of_hires = [r["average_score"] for r in records if r["hire_decision"]]
scores_of_no_hires = [r["average_score"] for r in records if not r["hire_decision"]]
# Good consistency means hires have higher average scores
decision_consistency = 0.5
if scores_of_hires and scores_of_no_hires:
hire_mean = statistics.mean(scores_of_hires)
no_hire_mean = statistics.mean(scores_of_no_hires)
score_gap = hire_mean - no_hire_mean
decision_consistency = min(1.0, max(0.0, score_gap / 2.0)) # Normalize to 0-1
# Combine metrics (lower variance = higher consistency)
variance_consistency = max(0.0, 1.0 - (score_variance / 2.0))
return (decision_consistency + variance_consistency) / 2
def _calculate_overall_interviewer_consistency(self, data: List[Dict[str, Any]]) -> Dict[str, Any]:
"""Calculate overall consistency across all interviewers."""
interviewer_consistency_scores = []
interviewer_records = defaultdict(list)
for record in data:
interviewer_records[record["interviewer_id"]].append(record)
for interviewer_id, records in interviewer_records.items():
if len(records) >= 3:
consistency = self._calculate_interviewer_consistency(records)
interviewer_consistency_scores.append(consistency)
if not interviewer_consistency_scores:
return {"error": "Insufficient data per interviewer for consistency analysis"}
return {
"mean_consistency": round(statistics.mean(interviewer_consistency_scores), 3),
"std_consistency": round(statistics.stdev(interviewer_consistency_scores) if len(interviewer_consistency_scores) > 1 else 0, 3),
"min_consistency": round(min(interviewer_consistency_scores), 3),
"max_consistency": round(max(interviewer_consistency_scores), 3),
"interviewers_analyzed": len(interviewer_consistency_scores),
"target_threshold": self.bias_thresholds["interviewer_consistency_threshold"]
}
def _calculate_bias_score(self, bias_analysis: Dict[str, Any]) -> float:
"""Calculate overall bias score (0-1, where 1 is most biased)."""
bias_factors = []
# Demographic bias factors
demographic_bias = bias_analysis.get("demographic_bias", {})
for demo, analysis in demographic_bias.items():
if analysis.get("bias_detected"):
bias_factors.append(0.3) # Each demographic bias adds 0.3
# Interviewer bias factors
interviewer_bias = bias_analysis.get("interviewer_bias", {})
outlier_interviewers = interviewer_bias.get("outlier_interviewers", {})
if outlier_interviewers:
# Scale by severity and number of outliers
total_severity = sum(info["severity"] for info in outlier_interviewers.values())
bias_factors.append(min(0.5, total_severity * 0.1))
# Competency bias factors
competency_bias = bias_analysis.get("competency_bias", {})
for comp, analysis in competency_bias.items():
if analysis.get("bias_detected"):
bias_factors.append(0.2) # Each competency bias adds 0.2
return min(1.0, sum(bias_factors))
def _calculate_health_score(self, analysis: Dict[str, Any]) -> Dict[str, Any]:
"""Calculate overall calibration health score."""
health_factors = []
# Bias score (lower is better)
bias_analysis = analysis.get("bias_analysis", {})
bias_score = bias_analysis.get("overall_bias_score", 0)
bias_health = max(0, 1 - bias_score)
health_factors.append(("bias", bias_health, 0.3))
# Calibration consistency
calibration_analysis = analysis.get("calibration_analysis", {})
if "calibration_quality" in calibration_analysis:
quality_map = {"good": 1.0, "fair": 0.7, "poor": 0.3}
calibration_health = quality_map.get(calibration_analysis["calibration_quality"], 0.5)
health_factors.append(("calibration", calibration_health, 0.25))
# Interviewer consistency
interviewer_analysis = analysis.get("interviewer_analysis", {})
overall_consistency = interviewer_analysis.get("overall_consistency", {})
if "mean_consistency" in overall_consistency:
consistency_health = overall_consistency["mean_consistency"]
health_factors.append(("interviewer_consistency", consistency_health, 0.25))
# Scoring patterns health
scoring_analysis = analysis.get("scoring_analysis", {})
if "overall_assessment" in scoring_analysis:
assessment_map = {"healthy": 1.0, "concerning": 0.6, "poor": 0.2}
scoring_health = assessment_map.get(scoring_analysis["overall_assessment"], 0.5)
health_factors.append(("scoring_patterns", scoring_health, 0.2))
# Calculate weighted average
if health_factors:
weighted_sum = sum(score * weight for _, score, weight in health_factors)
total_weight = sum(weight for _, _, weight in health_factors)
overall_score = weighted_sum / total_weight
else:
overall_score = 0.5 # Neutral if no data
# Categorize health
if overall_score >= 0.8:
health_category = "excellent"
elif overall_score >= 0.7:
health_category = "good"
elif overall_score >= 0.5:
health_category = "fair"
else:
health_category = "poor"
return {
"overall_score": round(overall_score, 3),
"health_category": health_category,
"component_scores": {name: round(score, 3) for name, score, _ in health_factors},
"improvement_priority": self._identify_improvement_priorities(health_factors)
}
def _identify_improvement_priorities(self, health_factors: List[Tuple[str, float, float]]) -> List[str]:
"""Identify areas that need the most improvement."""
priorities = []
for name, score, weight in health_factors:
impact = (1 - score) * weight # Low scores with high weights = high priority
if impact > 0.15: # Significant impact threshold
priorities.append(name)
# Sort by impact (highest first)
priorities.sort(key=lambda name: next((1 - score) * weight for n, score, weight in health_factors if n == name), reverse=True)
return priorities
def _generate_recommendations(self, analysis: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Generate actionable recommendations based on analysis results."""
recommendations = []
# Bias-related recommendations
bias_analysis = analysis.get("bias_analysis", {})
# Demographic bias recommendations
for demo, demo_analysis in bias_analysis.get("demographic_bias", {}).items():
if demo_analysis.get("bias_detected"):
recommendations.append({
"priority": "high",
"category": "bias_mitigation",
"title": f"Address {demo.replace('_', ' ').title()} Bias",
"description": demo_analysis.get("recommendation", f"Implement bias mitigation strategies for {demo}"),
"actions": [
"Conduct unconscious bias training focused on this demographic",
"Review and standardize interview questions",
"Implement diverse interview panels",
"Monitor hiring metrics by demographic group"
]
})
# Interviewer-specific recommendations
interviewer_analysis = bias_analysis.get("interviewer_bias", {})
outlier_interviewers = interviewer_analysis.get("outlier_interviewers", {})
for interviewer_id, outlier_info in outlier_interviewers.items():
issues = outlier_info["issues"]
priority = "high" if outlier_info["severity"] >= 3 else "medium"
actions = []
if "score_inflation" in issues:
actions.extend([
"Provide calibration training on scoring standards",
"Shadow experienced interviewers for recalibration",
"Review examples of each score level"
])
if "score_deflation" in issues:
actions.extend([
"Review expectations for role level",
"Calibrate against recent successful hires",
"Discuss evaluation criteria with hiring manager"
])
if "hire_rate_deviation" in issues:
actions.extend([
"Review hiring bar standards",
"Participate in calibration sessions",
"Compare decision criteria with team"
])
if "low_consistency" in issues:
actions.extend([
"Practice structured interviewing techniques",
"Use standardized scorecards",
"Document specific examples for each score"
])
recommendations.append({
"priority": priority,
"category": "interviewer_coaching",
"title": f"Coach Interviewer {interviewer_id}",
"description": f"Address issues: {', '.join(issues)}",
"actions": list(set(actions)) # Remove duplicates
})
# Calibration recommendations
calibration_analysis = analysis.get("calibration_analysis", {})
if calibration_analysis.get("calibration_quality") in ["fair", "poor"]:
recommendations.append({
"priority": "high",
"category": "calibration_improvement",
"title": "Improve Interview Calibration",
"description": f"Current calibration quality: {calibration_analysis.get('calibration_quality')}",
"actions": [
"Conduct monthly calibration sessions",
"Create shared examples of good/poor answers",
"Implement mandatory interviewer shadowing",
"Standardize scoring rubrics across all interviewers",
"Review and align on role expectations"
]
})
# Scoring pattern recommendations
scoring_analysis = analysis.get("scoring_analysis", {})
if scoring_analysis.get("overall_assessment") in ["concerning", "poor"]:
recommendations.append({
"priority": "medium",
"category": "scoring_standards",
"title": "Adjust Scoring Standards",
"description": "Scoring patterns deviate significantly from expected distribution",
"actions": [
"Review and communicate target score distributions",
"Provide examples for each score level",
"Monitor pass rates by role level",
"Adjust hiring bar if consistently too high/low"
]
})
# Health score recommendations
health_score = analysis.get("calibration_health_score", {})
priorities = health_score.get("improvement_priority", [])
if "bias" in priorities:
recommendations.append({
"priority": "critical",
"category": "bias_mitigation",
"title": "Implement Comprehensive Bias Mitigation",
"description": "Multiple bias indicators detected across the hiring process",
"actions": [
"Mandatory unconscious bias training for all interviewers",
"Implement structured interview protocols",
"Diversify interview panels",
"Regular bias audits and monitoring",
"Create accountability metrics for fair hiring"
]
})
# Sort by priority
priority_order = {"critical": 0, "high": 1, "medium": 2, "low": 3}
recommendations.sort(key=lambda x: priority_order.get(x["priority"], 3))
return recommendations
def _generate_demographic_bias_recommendation(self, demographic: str, bias_details: Dict[str, Any]) -> str:
"""Generate specific recommendation for demographic bias."""
if "hire_rate_disparity" in bias_details:
return f"Significant hire rate disparity detected for {demographic}. Implement structured interviews and diverse panels."
elif "scoring_disparity" in bias_details:
return f"Scoring disparity detected for {demographic}. Provide unconscious bias training and standardize evaluation criteria."
else:
return f"Potential bias detected for {demographic}. Monitor closely and implement bias mitigation strategies."
def _generate_interviewer_recommendations(self, outlier_interviewers: Dict[str, Any]) -> List[str]:
"""Generate recommendations for interviewer issues."""
if not outlier_interviewers:
return ["All interviewers performing within expected ranges"]
recommendations = []
for interviewer, info in outlier_interviewers.items():
issues = info["issues"]
if len(issues) >= 2:
recommendations.append(f"Interviewer {interviewer}: Requires comprehensive recalibration - multiple issues detected")
elif "score_inflation" in issues:
recommendations.append(f"Interviewer {interviewer}: Provide calibration training on scoring standards")
elif "hire_rate_deviation" in issues:
recommendations.append(f"Interviewer {interviewer}: Review hiring bar standards and decision criteria")
return recommendations
def _generate_calibration_recommendations(self, mean_std: float, agreement_rate: float) -> List[str]:
"""Generate calibration improvement recommendations."""
recommendations = []
if mean_std > self.calibration_standards["interviewer_agreement"]["maximum_std_deviation"]:
recommendations.append("High score variance detected - implement regular calibration sessions")
recommendations.append("Create shared examples of scoring standards for each competency")
if agreement_rate < self.calibration_standards["interviewer_agreement"]["agreement_threshold"]:
recommendations.append("Low interviewer agreement rate - standardize interview questions and evaluation criteria")
recommendations.append("Implement mandatory interviewer training on consistent evaluation")
if not recommendations:
recommendations.append("Calibration appears healthy - maintain current practices")
return recommendations
def _assess_scoring_health(self, distribution: Dict[str, Any], mean_score: float, target_mean: float) -> str:
"""Assess overall health of scoring patterns."""
issues = 0
# Check distribution deviations
for score_level, analysis in distribution.items():
if analysis["significant_deviation"]:
issues += 1
# Check mean deviation
if abs(mean_score - target_mean) > 0.3:
issues += 1
if issues == 0:
return "healthy"
elif issues <= 2:
return "concerning"
else:
return "poor"
def _generate_trend_insights(self, score_trend: float, hire_rate_trend: float, period_metrics: Dict[str, Any]) -> List[str]:
"""Generate insights from trend analysis."""
insights = []
if abs(score_trend) > 0.05:
direction = "increasing" if score_trend > 0 else "decreasing"
insights.append(f"Significant {direction} trend in average scores over time")
if score_trend > 0:
insights.append("May indicate score inflation or improving candidate quality")
else:
insights.append("May indicate stricter evaluation or declining candidate quality")
if abs(hire_rate_trend) > 0.02:
direction = "increasing" if hire_rate_trend > 0 else "decreasing"
insights.append(f"Significant {direction} trend in hire rates over time")
if hire_rate_trend > 0:
insights.append("Consider if hiring bar has lowered or candidate pool improved")
else:
insights.append("Consider if hiring bar has raised or candidate pool declined")
# Check for consistency
period_values = list(period_metrics.values())
hire_rates = [p["hire_rate"] for p in period_values]
hire_rate_variance = statistics.variance(hire_rates) if len(hire_rates) > 1 else 0
if hire_rate_variance > 0.01: # High variance in hire rates
insights.append("High variance in hire rates across periods - consider process standardization")
if not insights:
insights.append("Hiring patterns appear stable over time")
return insights
def _analyze_single_interviewer_consistency(self, data: List[Dict[str, Any]]) -> Dict[str, Any]:
"""Analyze consistency for single-interviewer candidates."""
# Look at consistency within individual interviewers
interviewer_scores = defaultdict(list)
for record in data:
interviewer_scores[record["interviewer_id"]].extend(record["scores"].values())
consistency_analysis = {}
for interviewer, scores in interviewer_scores.items():
if len(scores) >= 10: # Need sufficient data
consistency_analysis[interviewer] = {
"mean_score": round(statistics.mean(scores), 2),
"std_score": round(statistics.stdev(scores), 2),
"coefficient_of_variation": round(statistics.stdev(scores) / statistics.mean(scores), 2),
"total_scores": len(scores)
}
return consistency_analysis
def format_human_readable(calibration_report: Dict[str, Any]) -> str:
"""Format calibration report in human-readable format."""
output = []
# Header
output.append("HIRING CALIBRATION ANALYSIS REPORT")
output.append("=" * 60)
output.append(f"Analysis Type: {calibration_report.get('analysis_type', 'N/A').title()}")
output.append(f"Generated: {calibration_report.get('generated_at', 'N/A')}")
if "error" in calibration_report:
output.append(f"\nError: {calibration_report['error']}")
return "\n".join(output)
# Data Summary
data_summary = calibration_report.get("data_summary", {})
if data_summary:
output.append(f"\nDATA SUMMARY")
output.append("-" * 30)
output.append(f"Total Candidates: {data_summary.get('total_candidates', 0)}")
output.append(f"Unique Interviewers: {data_summary.get('unique_interviewers', 0)}")
output.append(f"Overall Hire Rate: {data_summary.get('hire_rate', 0):.1%}")
score_stats = data_summary.get("score_statistics", {})
output.append(f"Average Score: {score_stats.get('mean_average_scores', 0):.2f}")
output.append(f"Score Std Dev: {score_stats.get('std_average_scores', 0):.2f}")
# Health Score
health_score = calibration_report.get("calibration_health_score", {})
if health_score:
output.append(f"\nCALIBRATION HEALTH SCORE")
output.append("-" * 30)
output.append(f"Overall Score: {health_score.get('overall_score', 0):.3f}")
output.append(f"Health Category: {health_score.get('health_category', 'Unknown').title()}")
if health_score.get("improvement_priority"):
output.append(f"Priority Areas: {', '.join(health_score['improvement_priority'])}")
# Bias Analysis
bias_analysis = calibration_report.get("bias_analysis", {})
if bias_analysis:
output.append(f"\nBIAS ANALYSIS")
output.append("-" * 30)
output.append(f"Overall Bias Score: {bias_analysis.get('overall_bias_score', 0):.3f}")
# Demographic bias
demographic_bias = bias_analysis.get("demographic_bias", {})
if demographic_bias:
output.append(f"\nDemographic Bias Issues:")
for demo, analysis in demographic_bias.items():
output.append(f" • {demo.replace('_', ' ').title()}: {analysis.get('bias_details', {}).keys()}")
# Interviewer bias
interviewer_bias = bias_analysis.get("interviewer_bias", {})
outlier_interviewers = interviewer_bias.get("outlier_interviewers", {})
if outlier_interviewers:
output.append(f"\nOutlier Interviewers:")
for interviewer, info in outlier_interviewers.items():
issues = ", ".join(info["issues"])
output.append(f" • {interviewer}: {issues}")
# Calibration Analysis
calibration_analysis = calibration_report.get("calibration_analysis", {})
if calibration_analysis and "error" not in calibration_analysis:
output.append(f"\nCALIBRATION CONSISTENCY")
output.append("-" * 30)
output.append(f"Quality: {calibration_analysis.get('calibration_quality', 'Unknown').title()}")
output.append(f"Agreement Rate: {calibration_analysis.get('agreement_within_one_point_rate', 0):.1%}")
output.append(f"Score Std Dev: {calibration_analysis.get('mean_score_standard_deviation', 0):.3f}")
# Scoring Analysis
scoring_analysis = calibration_report.get("scoring_analysis", {})
if scoring_analysis:
output.append(f"\nSCORING PATTERNS")
output.append("-" * 30)
output.append(f"Overall Assessment: {scoring_analysis.get('overall_assessment', 'Unknown').title()}")
score_stats = scoring_analysis.get("score_statistics", {})
output.append(f"Mean Score: {score_stats.get('mean_score', 0):.2f} (Target: {score_stats.get('target_mean', 0):.2f})")
# Distribution analysis
distribution = scoring_analysis.get("score_distribution", {})
if distribution:
output.append(f"\nScore Distribution vs Expected:")
for score in ["1", "2", "3", "4"]:
if score in distribution:
actual = distribution[score]["actual_percentage"]
expected = distribution[score]["expected_percentage"]
output.append(f" Score {score}: {actual:.1%} (Expected: {expected:.1%})")
# Top Recommendations
recommendations = calibration_report.get("recommendations", [])
if recommendations:
output.append(f"\nTOP RECOMMENDATIONS")
output.append("-" * 30)
for i, rec in enumerate(recommendations[:5], 1): # Show top 5
output.append(f"{i}. {rec['title']} ({rec['priority'].title()} Priority)")
output.append(f" {rec['description']}")
if rec.get('actions'):
output.append(f" Actions: {len(rec['actions'])} specific action items")
return "\n".join(output)
def main():
parser = argparse.ArgumentParser(description="Analyze interview data for bias and calibration issues")
parser.add_argument("--input", type=str, required=True, help="Input JSON file with interview results data")
parser.add_argument("--analysis-type", type=str, choices=["comprehensive", "bias", "calibration", "interviewer", "scoring"],
default="comprehensive", help="Type of analysis to perform")
parser.add_argument("--competencies", type=str, help="Comma-separated list of competencies to focus on")
parser.add_argument("--trend-analysis", action="store_true", help="Perform trend analysis over time")
parser.add_argument("--period", type=str, choices=["daily", "weekly", "monthly", "quarterly"],
default="monthly", help="Time period for trend analysis")
parser.add_argument("--output", type=str, help="Output file path")
parser.add_argument("--format", choices=["json", "text", "both"], default="both", help="Output format")
args = parser.parse_args()
# Load input data
try:
with open(args.input, 'r') as f:
interview_data = json.load(f)
if not isinstance(interview_data, list):
print("Error: Input data must be a JSON array of interview records")
sys.exit(1)
except FileNotFoundError:
print(f"Error: Input file '{args.input}' not found")
sys.exit(1)
except json.JSONDecodeError as e:
print(f"Error: Invalid JSON in input file: {e}")
sys.exit(1)
except Exception as e:
print(f"Error reading input file: {e}")
sys.exit(1)
# Initialize calibrator and run analysis
calibrator = HiringCalibrator()
competencies = args.competencies.split(',') if args.competencies else None
try:
results = calibrator.analyze_hiring_calibration(
interview_data=interview_data,
analysis_type=args.analysis_type,
competencies=competencies,
trend_analysis=args.trend_analysis,
period=args.period
)
# Handle output
if args.output:
output_path = args.output
json_path = output_path if output_path.endswith('.json') else f"{output_path}.json"
text_path = output_path.replace('.json', '.txt') if output_path.endswith('.json') else f"{output_path}.txt"
else:
base_filename = f"calibration_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
json_path = f"{base_filename}.json"
text_path = f"{base_filename}.txt"
# Write outputs
if args.format in ["json", "both"]:
with open(json_path, 'w') as f:
json.dump(results, f, indent=2, default=str)
print(f"JSON report written to: {json_path}")
if args.format in ["text", "both"]:
with open(text_path, 'w') as f:
f.write(format_human_readable(results))
print(f"Text report written to: {text_path}")
# Print summary
print(f"\nCalibration Analysis Summary:")
if "error" in results:
print(f"Error: {results['error']}")
else:
health_score = results.get("calibration_health_score", {})
print(f"Health Score: {health_score.get('overall_score', 0):.3f} ({health_score.get('health_category', 'Unknown').title()})")
bias_score = results.get("bias_analysis", {}).get("overall_bias_score", 0)
print(f"Bias Score: {bias_score:.3f} (Lower is better)")
recommendations = results.get("recommendations", [])
print(f"Recommendations Generated: {len(recommendations)}")
if recommendations:
print(f"Top Priority: {recommendations[0]['title']} ({recommendations[0]['priority'].title()})")
except Exception as e:
print(f"Error during analysis: {e}")
sys.exit(1)
if __name__ == "__main__":
main() #!/usr/bin/env python3
"""
Interview Loop Designer
Generates calibrated interview loops tailored to specific roles, levels, and teams.
Creates complete interview loops with rounds, focus areas, time allocation,
interviewer skill requirements, and scorecard templates.
Usage:
python loop_designer.py --role "Senior Software Engineer" --level senior --team platform
python loop_designer.py --role "Product Manager" --level mid --competencies leadership,strategy
python loop_designer.py --input role_definition.json --output loops/
"""
import os
import sys
import json
import argparse
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Any, Tuple
from collections import defaultdict
class InterviewLoopDesigner:
"""Designs comprehensive interview loops based on role requirements."""
def __init__(self):
self.competency_frameworks = self._init_competency_frameworks()
self.role_templates = self._init_role_templates()
self.interviewer_skills = self._init_interviewer_skills()
def _init_competency_frameworks(self) -> Dict[str, Dict]:
"""Initialize competency frameworks for different roles."""
return {
"software_engineer": {
"junior": {
"required": ["coding_fundamentals", "debugging", "testing_basics", "version_control"],
"preferred": ["system_understanding", "code_review", "collaboration"],
"focus_areas": ["technical_execution", "learning_agility", "team_collaboration"]
},
"mid": {
"required": ["advanced_coding", "system_design_basics", "testing_strategy", "debugging_complex"],
"preferred": ["mentoring_basics", "technical_communication", "project_ownership"],
"focus_areas": ["technical_depth", "system_thinking", "ownership"]
},
"senior": {
"required": ["system_architecture", "technical_leadership", "mentoring", "cross_team_collab"],
"preferred": ["technology_evaluation", "process_improvement", "hiring_contribution"],
"focus_areas": ["technical_leadership", "system_architecture", "people_development"]
},
"staff": {
"required": ["architectural_vision", "organizational_impact", "technical_strategy", "team_building"],
"preferred": ["industry_influence", "innovation_leadership", "executive_communication"],
"focus_areas": ["organizational_impact", "technical_vision", "strategic_influence"]
},
"principal": {
"required": ["company_wide_impact", "technical_vision", "talent_development", "strategic_planning"],
"preferred": ["industry_leadership", "board_communication", "market_influence"],
"focus_areas": ["strategic_leadership", "organizational_transformation", "external_influence"]
}
},
"product_manager": {
"junior": {
"required": ["product_execution", "user_research", "data_analysis", "stakeholder_comm"],
"preferred": ["market_awareness", "technical_understanding", "project_management"],
"focus_areas": ["execution_excellence", "user_focus", "analytical_thinking"]
},
"mid": {
"required": ["product_strategy", "cross_functional_leadership", "metrics_design", "market_analysis"],
"preferred": ["team_building", "technical_collaboration", "competitive_analysis"],
"focus_areas": ["strategic_thinking", "leadership", "business_impact"]
},
"senior": {
"required": ["business_strategy", "team_leadership", "p&l_ownership", "market_positioning"],
"preferred": ["hiring_leadership", "board_communication", "partnership_development"],
"focus_areas": ["business_leadership", "market_strategy", "organizational_impact"]
},
"staff": {
"required": ["portfolio_management", "organizational_leadership", "strategic_planning", "market_creation"],
"preferred": ["executive_presence", "investor_relations", "acquisition_strategy"],
"focus_areas": ["strategic_leadership", "market_innovation", "organizational_transformation"]
}
},
"designer": {
"junior": {
"required": ["design_fundamentals", "user_research", "prototyping", "design_tools"],
"preferred": ["user_empathy", "visual_design", "collaboration"],
"focus_areas": ["design_execution", "user_research", "creative_problem_solving"]
},
"mid": {
"required": ["design_systems", "user_testing", "cross_functional_collab", "design_strategy"],
"preferred": ["mentoring", "process_improvement", "business_understanding"],
"focus_areas": ["design_leadership", "system_thinking", "business_impact"]
},
"senior": {
"required": ["design_leadership", "team_building", "strategic_design", "stakeholder_management"],
"preferred": ["design_culture", "hiring_leadership", "executive_communication"],
"focus_areas": ["design_strategy", "team_leadership", "organizational_impact"]
}
},
"data_scientist": {
"junior": {
"required": ["statistical_analysis", "python_r", "data_visualization", "sql"],
"preferred": ["machine_learning", "business_understanding", "communication"],
"focus_areas": ["analytical_skills", "technical_execution", "business_impact"]
},
"mid": {
"required": ["advanced_ml", "experiment_design", "data_engineering", "stakeholder_comm"],
"preferred": ["mentoring", "project_leadership", "product_collaboration"],
"focus_areas": ["advanced_analytics", "project_leadership", "cross_functional_impact"]
},
"senior": {
"required": ["data_strategy", "team_leadership", "ml_systems", "business_strategy"],
"preferred": ["hiring_leadership", "executive_communication", "technology_evaluation"],
"focus_areas": ["strategic_leadership", "technical_vision", "organizational_impact"]
}
},
"devops_engineer": {
"junior": {
"required": ["infrastructure_basics", "scripting", "monitoring", "troubleshooting"],
"preferred": ["automation", "cloud_platforms", "security_awareness"],
"focus_areas": ["operational_excellence", "automation_mindset", "problem_solving"]
},
"mid": {
"required": ["ci_cd_design", "infrastructure_as_code", "security_implementation", "performance_optimization"],
"preferred": ["team_collaboration", "incident_management", "capacity_planning"],
"focus_areas": ["system_reliability", "automation_leadership", "cross_team_collaboration"]
},
"senior": {
"required": ["platform_architecture", "team_leadership", "security_strategy", "organizational_impact"],
"preferred": ["hiring_contribution", "technology_evaluation", "executive_communication"],
"focus_areas": ["platform_leadership", "strategic_thinking", "organizational_transformation"]
}
},
"engineering_manager": {
"junior": {
"required": ["team_leadership", "technical_background", "people_management", "project_coordination"],
"preferred": ["hiring_experience", "performance_management", "technical_mentoring"],
"focus_areas": ["people_leadership", "team_building", "execution_excellence"]
},
"senior": {
"required": ["organizational_leadership", "strategic_planning", "talent_development", "cross_functional_leadership"],
"preferred": ["technical_vision", "culture_building", "executive_communication"],
"focus_areas": ["organizational_impact", "strategic_leadership", "talent_development"]
},
"staff": {
"required": ["multi_team_leadership", "organizational_strategy", "executive_presence", "cultural_transformation"],
"preferred": ["board_communication", "market_understanding", "acquisition_integration"],
"focus_areas": ["organizational_transformation", "strategic_leadership", "cultural_evolution"]
}
}
}
def _init_role_templates(self) -> Dict[str, Dict]:
"""Initialize role-specific interview templates."""
return {
"software_engineer": {
"core_rounds": ["technical_phone_screen", "coding_deep_dive", "system_design", "behavioral"],
"optional_rounds": ["technical_leadership", "domain_expertise", "culture_fit"],
"total_duration_range": (180, 360), # 3-6 hours
"required_competencies": ["coding", "problem_solving", "communication"]
},
"product_manager": {
"core_rounds": ["product_sense", "analytical_thinking", "execution_process", "behavioral"],
"optional_rounds": ["strategic_thinking", "technical_collaboration", "leadership"],
"total_duration_range": (180, 300), # 3-5 hours
"required_competencies": ["product_strategy", "analytical_thinking", "stakeholder_management"]
},
"designer": {
"core_rounds": ["portfolio_review", "design_challenge", "collaboration_process", "behavioral"],
"optional_rounds": ["design_system_thinking", "research_methodology", "leadership"],
"total_duration_range": (180, 300), # 3-5 hours
"required_competencies": ["design_process", "user_empathy", "visual_communication"]
},
"data_scientist": {
"core_rounds": ["technical_assessment", "case_study", "statistical_thinking", "behavioral"],
"optional_rounds": ["ml_systems", "business_strategy", "technical_leadership"],
"total_duration_range": (210, 330), # 3.5-5.5 hours
"required_competencies": ["statistical_analysis", "programming", "business_acumen"]
},
"devops_engineer": {
"core_rounds": ["technical_assessment", "system_design", "troubleshooting", "behavioral"],
"optional_rounds": ["security_assessment", "automation_design", "leadership"],
"total_duration_range": (180, 300), # 3-5 hours
"required_competencies": ["infrastructure", "automation", "problem_solving"]
},
"engineering_manager": {
"core_rounds": ["leadership_assessment", "technical_background", "people_management", "behavioral"],
"optional_rounds": ["strategic_thinking", "hiring_assessment", "culture_building"],
"total_duration_range": (240, 360), # 4-6 hours
"required_competencies": ["people_leadership", "technical_understanding", "strategic_thinking"]
}
}
def _init_interviewer_skills(self) -> Dict[str, Dict]:
"""Initialize interviewer skill requirements for different round types."""
return {
"technical_phone_screen": {
"required_skills": ["technical_assessment", "coding_evaluation"],
"preferred_experience": ["same_domain", "senior_level"],
"calibration_level": "standard"
},
"coding_deep_dive": {
"required_skills": ["advanced_technical", "code_quality_assessment"],
"preferred_experience": ["senior_engineer", "system_design"],
"calibration_level": "high"
},
"system_design": {
"required_skills": ["architecture_design", "scalability_assessment"],
"preferred_experience": ["senior_architect", "large_scale_systems"],
"calibration_level": "high"
},
"behavioral": {
"required_skills": ["behavioral_interviewing", "competency_assessment"],
"preferred_experience": ["hiring_manager", "people_leadership"],
"calibration_level": "standard"
},
"technical_leadership": {
"required_skills": ["leadership_assessment", "technical_mentoring"],
"preferred_experience": ["engineering_manager", "tech_lead"],
"calibration_level": "high"
},
"product_sense": {
"required_skills": ["product_evaluation", "market_analysis"],
"preferred_experience": ["product_manager", "product_leadership"],
"calibration_level": "high"
},
"analytical_thinking": {
"required_skills": ["data_analysis", "metrics_evaluation"],
"preferred_experience": ["data_analyst", "product_manager"],
"calibration_level": "standard"
},
"design_challenge": {
"required_skills": ["design_evaluation", "user_experience"],
"preferred_experience": ["senior_designer", "design_manager"],
"calibration_level": "high"
}
}
def generate_interview_loop(self, role: str, level: str, team: Optional[str] = None,
competencies: Optional[List[str]] = None) -> Dict[str, Any]:
"""Generate a complete interview loop for the specified role and level."""
# Normalize inputs
role_key = role.lower().replace(" ", "_").replace("-", "_")
level_key = level.lower()
# Get role template and competency requirements
if role_key not in self.competency_frameworks:
role_key = self._find_closest_role(role_key)
if level_key not in self.competency_frameworks[role_key]:
level_key = self._find_closest_level(role_key, level_key)
competency_req = self.competency_frameworks[role_key][level_key]
role_template = self.role_templates.get(role_key, self.role_templates["software_engineer"])
# Design the interview loop
rounds = self._design_rounds(role_key, level_key, competency_req, role_template, competencies)
schedule = self._create_schedule(rounds)
scorecard = self._generate_scorecard(role_key, level_key, competency_req)
interviewer_requirements = self._define_interviewer_requirements(rounds)
return {
"role": role,
"level": level,
"team": team,
"generated_at": datetime.now().isoformat(),
"total_duration_minutes": sum(round_info["duration_minutes"] for round_info in rounds.values()),
"total_rounds": len(rounds),
"rounds": rounds,
"suggested_schedule": schedule,
"scorecard_template": scorecard,
"interviewer_requirements": interviewer_requirements,
"competency_framework": competency_req,
"calibration_notes": self._generate_calibration_notes(role_key, level_key)
}
def _find_closest_role(self, role_key: str) -> str:
"""Find the closest matching role template."""
role_mappings = {
"engineer": "software_engineer",
"developer": "software_engineer",
"swe": "software_engineer",
"backend": "software_engineer",
"frontend": "software_engineer",
"fullstack": "software_engineer",
"pm": "product_manager",
"product": "product_manager",
"ux": "designer",
"ui": "designer",
"graphic": "designer",
"data": "data_scientist",
"analyst": "data_scientist",
"ml": "data_scientist",
"ops": "devops_engineer",
"sre": "devops_engineer",
"infrastructure": "devops_engineer",
"manager": "engineering_manager",
"lead": "engineering_manager"
}
for key_part in role_key.split("_"):
if key_part in role_mappings:
return role_mappings[key_part]
return "software_engineer" # Default fallback
def _find_closest_level(self, role_key: str, level_key: str) -> str:
"""Find the closest matching level for the role."""
available_levels = list(self.competency_frameworks[role_key].keys())
level_mappings = {
"entry": "junior",
"associate": "junior",
"jr": "junior",
"mid": "mid",
"middle": "mid",
"sr": "senior",
"senior": "senior",
"staff": "staff",
"principal": "principal",
"lead": "senior",
"manager": "senior"
}
mapped_level = level_mappings.get(level_key, level_key)
if mapped_level in available_levels:
return mapped_level
elif "senior" in available_levels:
return "senior"
else:
return available_levels[0]
def _design_rounds(self, role_key: str, level_key: str, competency_req: Dict,
role_template: Dict, custom_competencies: Optional[List[str]]) -> Dict[str, Dict]:
"""Design the specific interview rounds based on role and level."""
rounds = {}
# Determine which rounds to include
core_rounds = role_template["core_rounds"].copy()
optional_rounds = role_template["optional_rounds"].copy()
# Add optional rounds based on level
if level_key in ["senior", "staff", "principal"]:
if "technical_leadership" in optional_rounds and role_key in ["software_engineer", "engineering_manager"]:
core_rounds.append("technical_leadership")
if "strategic_thinking" in optional_rounds and role_key in ["product_manager", "engineering_manager"]:
core_rounds.append("strategic_thinking")
if "design_system_thinking" in optional_rounds and role_key == "designer":
core_rounds.append("design_system_thinking")
if level_key in ["staff", "principal"]:
if "domain_expertise" in optional_rounds:
core_rounds.append("domain_expertise")
# Define round details
round_definitions = self._get_round_definitions()
for i, round_type in enumerate(core_rounds, 1):
if round_type in round_definitions:
round_def = round_definitions[round_type].copy()
round_def["order"] = i
round_def["focus_areas"] = self._customize_focus_areas(round_type, competency_req, custom_competencies)
rounds[f"round_{i}_{round_type}"] = round_def
return rounds
def _get_round_definitions(self) -> Dict[str, Dict]:
"""Get predefined round definitions with standard durations and formats."""
return {
"technical_phone_screen": {
"name": "Technical Phone Screen",
"duration_minutes": 45,
"format": "virtual",
"objectives": ["Assess coding fundamentals", "Evaluate problem-solving approach", "Screen for basic technical competency"],
"question_types": ["coding_problems", "technical_concepts", "experience_questions"],
"evaluation_criteria": ["technical_accuracy", "problem_solving_process", "communication_clarity"]
},
"coding_deep_dive": {
"name": "Coding Deep Dive",
"duration_minutes": 75,
"format": "in_person_or_virtual",
"objectives": ["Evaluate coding skills in depth", "Assess code quality and testing", "Review debugging approach"],
"question_types": ["complex_coding_problems", "code_review", "testing_strategy"],
"evaluation_criteria": ["code_quality", "testing_approach", "debugging_skills", "optimization_thinking"]
},
"system_design": {
"name": "System Design",
"duration_minutes": 75,
"format": "collaborative_whiteboard",
"objectives": ["Assess architectural thinking", "Evaluate scalability considerations", "Review trade-off analysis"],
"question_types": ["system_architecture", "scalability_design", "trade_off_analysis"],
"evaluation_criteria": ["architectural_thinking", "scalability_awareness", "trade_off_reasoning"]
},
"behavioral": {
"name": "Behavioral Interview",
"duration_minutes": 45,
"format": "conversational",
"objectives": ["Assess cultural fit", "Evaluate past experiences", "Review leadership examples"],
"question_types": ["star_method_questions", "situational_scenarios", "values_alignment"],
"evaluation_criteria": ["communication_skills", "leadership_examples", "cultural_alignment"]
},
"technical_leadership": {
"name": "Technical Leadership",
"duration_minutes": 60,
"format": "discussion_based",
"objectives": ["Evaluate mentoring capability", "Assess technical decision making", "Review cross-team collaboration"],
"question_types": ["leadership_scenarios", "technical_decisions", "mentoring_examples"],
"evaluation_criteria": ["leadership_potential", "technical_judgment", "influence_skills"]
},
"product_sense": {
"name": "Product Sense",
"duration_minutes": 75,
"format": "case_study",
"objectives": ["Assess product intuition", "Evaluate user empathy", "Review market understanding"],
"question_types": ["product_scenarios", "feature_prioritization", "user_journey_analysis"],
"evaluation_criteria": ["product_intuition", "user_empathy", "analytical_thinking"]
},
"analytical_thinking": {
"name": "Analytical Thinking",
"duration_minutes": 60,
"format": "data_analysis",
"objectives": ["Evaluate data interpretation", "Assess metric design", "Review experiment planning"],
"question_types": ["data_interpretation", "metric_design", "experiment_analysis"],
"evaluation_criteria": ["analytical_rigor", "metric_intuition", "experimental_thinking"]
},
"design_challenge": {
"name": "Design Challenge",
"duration_minutes": 90,
"format": "hands_on_design",
"objectives": ["Assess design process", "Evaluate user-centered thinking", "Review iteration approach"],
"question_types": ["design_problems", "user_research", "design_critique"],
"evaluation_criteria": ["design_process", "user_focus", "visual_communication"]
},
"portfolio_review": {
"name": "Portfolio Review",
"duration_minutes": 75,
"format": "presentation_discussion",
"objectives": ["Review past work", "Assess design thinking", "Evaluate impact measurement"],
"question_types": ["portfolio_walkthrough", "design_decisions", "impact_stories"],
"evaluation_criteria": ["design_quality", "process_thinking", "business_impact"]
}
}
def _customize_focus_areas(self, round_type: str, competency_req: Dict,
custom_competencies: Optional[List[str]]) -> List[str]:
"""Customize focus areas based on role competency requirements."""
base_focus_areas = competency_req.get("focus_areas", [])
round_focus_mapping = {
"technical_phone_screen": ["coding_fundamentals", "problem_solving"],
"coding_deep_dive": ["technical_execution", "code_quality"],
"system_design": ["system_thinking", "architectural_reasoning"],
"behavioral": ["cultural_fit", "communication", "teamwork"],
"technical_leadership": ["leadership", "mentoring", "influence"],
"product_sense": ["product_intuition", "user_empathy"],
"analytical_thinking": ["data_analysis", "metric_design"],
"design_challenge": ["design_process", "user_focus"]
}
focus_areas = round_focus_mapping.get(round_type, [])
# Add custom competencies if specified
if custom_competencies:
focus_areas.extend([comp for comp in custom_competencies if comp not in focus_areas])
# Add role-specific focus areas
focus_areas.extend([area for area in base_focus_areas if area not in focus_areas])
return focus_areas[:5] # Limit to top 5 focus areas
def _create_schedule(self, rounds: Dict[str, Dict]) -> Dict[str, Any]:
"""Create a suggested interview schedule."""
sorted_rounds = sorted(rounds.items(), key=lambda x: x[1]["order"])
# Calculate optimal scheduling
total_duration = sum(round_info["duration_minutes"] for _, round_info in sorted_rounds)
if total_duration <= 240: # 4 hours or less - single day
schedule_type = "single_day"
day_structure = self._create_single_day_schedule(sorted_rounds)
else: # Multi-day schedule
schedule_type = "multi_day"
day_structure = self._create_multi_day_schedule(sorted_rounds)
return {
"type": schedule_type,
"total_duration_minutes": total_duration,
"recommended_breaks": self._calculate_breaks(total_duration),
"day_structure": day_structure,
"logistics_notes": self._generate_logistics_notes(sorted_rounds)
}
def _create_single_day_schedule(self, rounds: List[Tuple[str, Dict]]) -> Dict[str, Any]:
"""Create a single-day interview schedule."""
start_time = datetime.strptime("09:00", "%H:%M")
current_time = start_time
schedule = []
for round_name, round_info in rounds:
# Add break if needed (after 90 minutes of interviews)
if schedule and sum(item.get("duration_minutes", 0) for item in schedule if "break" not in item.get("type", "")) >= 90:
schedule.append({
"type": "break",
"start_time": current_time.strftime("%H:%M"),
"duration_minutes": 15,
"end_time": (current_time + timedelta(minutes=15)).strftime("%H:%M")
})
current_time += timedelta(minutes=15)
# Add the interview round
end_time = current_time + timedelta(minutes=round_info["duration_minutes"])
schedule.append({
"type": "interview",
"round_name": round_name,
"title": round_info["name"],
"start_time": current_time.strftime("%H:%M"),
"end_time": end_time.strftime("%H:%M"),
"duration_minutes": round_info["duration_minutes"],
"format": round_info["format"]
})
current_time = end_time
return {
"day_1": {
"date": "TBD",
"start_time": start_time.strftime("%H:%M"),
"end_time": current_time.strftime("%H:%M"),
"rounds": schedule
}
}
def _create_multi_day_schedule(self, rounds: List[Tuple[str, Dict]]) -> Dict[str, Any]:
"""Create a multi-day interview schedule."""
# Split rounds across days (max 4 hours per day)
max_daily_minutes = 240
days = {}
current_day = 1
current_day_duration = 0
current_day_rounds = []
for round_name, round_info in rounds:
duration = round_info["duration_minutes"] + 15 # Add buffer time
if current_day_duration + duration > max_daily_minutes and current_day_rounds:
# Finalize current day
days[f"day_{current_day}"] = self._finalize_day_schedule(current_day_rounds)
current_day += 1
current_day_duration = 0
current_day_rounds = []
current_day_rounds.append((round_name, round_info))
current_day_duration += duration
# Finalize last day
if current_day_rounds:
days[f"day_{current_day}"] = self._finalize_day_schedule(current_day_rounds)
return days
def _finalize_day_schedule(self, day_rounds: List[Tuple[str, Dict]]) -> Dict[str, Any]:
"""Finalize the schedule for a specific day."""
start_time = datetime.strptime("09:00", "%H:%M")
current_time = start_time
schedule = []
for round_name, round_info in day_rounds:
end_time = current_time + timedelta(minutes=round_info["duration_minutes"])
schedule.append({
"type": "interview",
"round_name": round_name,
"title": round_info["name"],
"start_time": current_time.strftime("%H:%M"),
"end_time": end_time.strftime("%H:%M"),
"duration_minutes": round_info["duration_minutes"],
"format": round_info["format"]
})
current_time = end_time + timedelta(minutes=15) # 15-min buffer
return {
"date": "TBD",
"start_time": start_time.strftime("%H:%M"),
"end_time": (current_time - timedelta(minutes=15)).strftime("%H:%M"),
"rounds": schedule
}
def _calculate_breaks(self, total_duration: int) -> List[Dict[str, Any]]:
"""Calculate recommended breaks based on total duration."""
breaks = []
if total_duration >= 120: # 2+ hours
breaks.append({"type": "short_break", "duration": 15, "after_minutes": 90})
if total_duration >= 240: # 4+ hours
breaks.append({"type": "lunch_break", "duration": 60, "after_minutes": 180})
if total_duration >= 360: # 6+ hours
breaks.append({"type": "short_break", "duration": 15, "after_minutes": 300})
return breaks
def _generate_scorecard(self, role_key: str, level_key: str, competency_req: Dict) -> Dict[str, Any]:
"""Generate a scorecard template for the interview loop."""
scoring_dimensions = []
# Add competency-based scoring dimensions
for competency in competency_req["required"]:
scoring_dimensions.append({
"dimension": competency,
"weight": "high",
"scale": "1-4",
"description": f"Assessment of {competency.replace('_', ' ')} competency"
})
for competency in competency_req.get("preferred", []):
scoring_dimensions.append({
"dimension": competency,
"weight": "medium",
"scale": "1-4",
"description": f"Assessment of {competency.replace('_', ' ')} competency"
})
# Add standard dimensions
standard_dimensions = [
{"dimension": "communication", "weight": "high", "scale": "1-4"},
{"dimension": "cultural_fit", "weight": "medium", "scale": "1-4"},
{"dimension": "learning_agility", "weight": "medium", "scale": "1-4"}
]
scoring_dimensions.extend(standard_dimensions)
return {
"scoring_scale": {
"4": "Exceeds Expectations - Demonstrates mastery beyond required level",
"3": "Meets Expectations - Solid performance meeting all requirements",
"2": "Partially Meets - Shows potential but has development areas",
"1": "Does Not Meet - Significant gaps in required competencies"
},
"dimensions": scoring_dimensions,
"overall_recommendation": {
"options": ["Strong Hire", "Hire", "No Hire", "Strong No Hire"],
"criteria": "Based on weighted average and minimum thresholds"
},
"calibration_notes": {
"required": True,
"min_length": 100,
"sections": ["strengths", "areas_for_development", "specific_examples"]
}
}
def _define_interviewer_requirements(self, rounds: Dict[str, Dict]) -> Dict[str, Dict]:
"""Define interviewer skill requirements for each round."""
requirements = {}
for round_name, round_info in rounds.items():
round_type = round_name.split("_", 2)[-1] # Extract round type
if round_type in self.interviewer_skills:
skill_req = self.interviewer_skills[round_type].copy()
skill_req["suggested_interviewers"] = self._suggest_interviewer_profiles(round_type)
requirements[round_name] = skill_req
else:
# Default requirements
requirements[round_name] = {
"required_skills": ["interviewing_basics", "evaluation_skills"],
"preferred_experience": ["relevant_domain"],
"calibration_level": "standard",
"suggested_interviewers": ["experienced_interviewer"]
}
return requirements
def _suggest_interviewer_profiles(self, round_type: str) -> List[str]:
"""Suggest specific interviewer profiles for different round types."""
profile_mapping = {
"technical_phone_screen": ["senior_engineer", "tech_lead"],
"coding_deep_dive": ["senior_engineer", "staff_engineer"],
"system_design": ["senior_architect", "staff_engineer"],
"behavioral": ["hiring_manager", "people_manager"],
"technical_leadership": ["engineering_manager", "senior_staff"],
"product_sense": ["senior_pm", "product_leader"],
"analytical_thinking": ["senior_analyst", "data_scientist"],
"design_challenge": ["senior_designer", "design_manager"]
}
return profile_mapping.get(round_type, ["experienced_interviewer"])
def _generate_calibration_notes(self, role_key: str, level_key: str) -> Dict[str, Any]:
"""Generate calibration notes and best practices."""
return {
"hiring_bar_notes": f"Calibrated for {level_key} level {role_key.replace('_', ' ')} role",
"common_pitfalls": [
"Avoid comparing candidates to each other rather than to the role standard",
"Don't let one strong/weak area overshadow overall assessment",
"Ensure consistent application of evaluation criteria"
],
"calibration_checkpoints": [
"Review score distribution after every 5 candidates",
"Conduct monthly interviewer calibration sessions",
"Track correlation with 6-month performance reviews"
],
"escalation_criteria": [
"Any candidate receiving all 4s or all 1s",
"Significant disagreement between interviewers (>1.5 point spread)",
"Unusual circumstances or accommodations needed"
]
}
def _generate_logistics_notes(self, rounds: List[Tuple[str, Dict]]) -> List[str]:
"""Generate logistics and coordination notes."""
notes = [
"Coordinate interviewer availability before scheduling",
"Ensure all interviewers have access to job description and competency requirements",
"Prepare interview rooms/virtual links for all rounds",
"Share candidate resume and application with all interviewers"
]
# Add format-specific notes
formats_used = {round_info["format"] for _, round_info in rounds}
if "virtual" in formats_used:
notes.append("Test video conferencing setup before virtual interviews")
notes.append("Share virtual meeting links with candidate 24 hours in advance")
if "collaborative_whiteboard" in formats_used:
notes.append("Prepare whiteboard or collaborative online tool for design sessions")
if "hands_on_design" in formats_used:
notes.append("Provide design tools access or ensure candidate can screen share their preferred tools")
return notes
def format_human_readable(loop_data: Dict[str, Any]) -> str:
"""Format the interview loop data in a human-readable format."""
output = []
# Header
output.append(f"Interview Loop Design for {loop_data['role']} ({loop_data['level'].title()} Level)")
output.append("=" * 60)
if loop_data.get('team'):
output.append(f"Team: {loop_data['team']}")
output.append(f"Generated: {loop_data['generated_at']}")
output.append(f"Total Duration: {loop_data['total_duration_minutes']} minutes ({loop_data['total_duration_minutes']//60}h {loop_data['total_duration_minutes']%60}m)")
output.append(f"Total Rounds: {loop_data['total_rounds']}")
output.append("")
# Interview Rounds
output.append("INTERVIEW ROUNDS")
output.append("-" * 40)
sorted_rounds = sorted(loop_data['rounds'].items(), key=lambda x: x[1]['order'])
for round_name, round_info in sorted_rounds:
output.append(f"\nRound {round_info['order']}: {round_info['name']}")
output.append(f"Duration: {round_info['duration_minutes']} minutes")
output.append(f"Format: {round_info['format'].replace('_', ' ').title()}")
output.append("Objectives:")
for obj in round_info['objectives']:
output.append(f" • {obj}")
output.append("Focus Areas:")
for area in round_info['focus_areas']:
output.append(f" • {area.replace('_', ' ').title()}")
# Suggested Schedule
output.append("\nSUGGESTED SCHEDULE")
output.append("-" * 40)
schedule = loop_data['suggested_schedule']
output.append(f"Schedule Type: {schedule['type'].replace('_', ' ').title()}")
for day_name, day_info in schedule['day_structure'].items():
output.append(f"\n{day_name.replace('_', ' ').title()}:")
output.append(f"Time: {day_info['start_time']} - {day_info['end_time']}")
for item in day_info['rounds']:
if item['type'] == 'interview':
output.append(f" {item['start_time']}-{item['end_time']}: {item['title']} ({item['duration_minutes']}min)")
else:
output.append(f" {item['start_time']}-{item['end_time']}: {item['type'].title()} ({item['duration_minutes']}min)")
# Interviewer Requirements
output.append("\nINTERVIEWER REQUIREMENTS")
output.append("-" * 40)
for round_name, requirements in loop_data['interviewer_requirements'].items():
round_display = round_name.split("_", 2)[-1].replace("_", " ").title()
output.append(f"\n{round_display}:")
output.append(f"Required Skills: {', '.join(requirements['required_skills'])}")
output.append(f"Suggested Interviewers: {', '.join(requirements['suggested_interviewers'])}")
output.append(f"Calibration Level: {requirements['calibration_level'].title()}")
# Scorecard Overview
output.append("\nSCORECARD TEMPLATE")
output.append("-" * 40)
scorecard = loop_data['scorecard_template']
output.append("Scoring Scale:")
for score, description in scorecard['scoring_scale'].items():
output.append(f" {score}: {description}")
output.append("\nEvaluation Dimensions:")
for dim in scorecard['dimensions']:
output.append(f" • {dim['dimension'].replace('_', ' ').title()} (Weight: {dim['weight']})")
# Calibration Notes
output.append("\nCALIBRATION NOTES")
output.append("-" * 40)
calibration = loop_data['calibration_notes']
output.append(f"Hiring Bar: {calibration['hiring_bar_notes']}")
output.append("\nCommon Pitfalls:")
for pitfall in calibration['common_pitfalls']:
output.append(f" • {pitfall}")
return "\n".join(output)
def main():
parser = argparse.ArgumentParser(description="Generate calibrated interview loops for specific roles and levels")
parser.add_argument("--role", type=str, help="Job role title (e.g., 'Senior Software Engineer')")
parser.add_argument("--level", type=str, help="Experience level (junior, mid, senior, staff, principal)")
parser.add_argument("--team", type=str, help="Team or department (optional)")
parser.add_argument("--competencies", type=str, help="Comma-separated list of specific competencies to focus on")
parser.add_argument("--input", type=str, help="Input JSON file with role definition")
parser.add_argument("--output", type=str, help="Output directory or file path")
parser.add_argument("--format", choices=["json", "text", "both"], default="both", help="Output format")
args = parser.parse_args()
designer = InterviewLoopDesigner()
# Handle input
if args.input:
try:
with open(args.input, 'r') as f:
role_data = json.load(f)
role = role_data.get('role') or role_data.get('title', '')
level = role_data.get('level', 'senior')
team = role_data.get('team')
competencies = role_data.get('competencies')
except Exception as e:
print(f"Error reading input file: {e}")
sys.exit(1)
else:
if not args.role or not args.level:
print("Error: --role and --level are required when not using --input")
sys.exit(1)
role = args.role
level = args.level
team = args.team
competencies = args.competencies.split(',') if args.competencies else None
# Generate interview loop
try:
loop_data = designer.generate_interview_loop(role, level, team, competencies)
# Handle output
if args.output:
output_path = args.output
if os.path.isdir(output_path):
safe_role = "".join(c for c in role.lower() if c.isalnum() or c in (' ', '-', '_')).replace(' ', '_')
base_filename = f"{safe_role}_{level}_interview_loop"
json_path = os.path.join(output_path, f"{base_filename}.json")
text_path = os.path.join(output_path, f"{base_filename}.txt")
else:
# Use provided path as base
json_path = output_path if output_path.endswith('.json') else f"{output_path}.json"
text_path = output_path.replace('.json', '.txt') if output_path.endswith('.json') else f"{output_path}.txt"
else:
safe_role = "".join(c for c in role.lower() if c.isalnum() or c in (' ', '-', '_')).replace(' ', '_')
base_filename = f"{safe_role}_{level}_interview_loop"
json_path = f"{base_filename}.json"
text_path = f"{base_filename}.txt"
# Write outputs
if args.format in ["json", "both"]:
with open(json_path, 'w') as f:
json.dump(loop_data, f, indent=2, default=str)
print(f"JSON output written to: {json_path}")
if args.format in ["text", "both"]:
with open(text_path, 'w') as f:
f.write(format_human_readable(loop_data))
print(f"Text output written to: {text_path}")
# Always print summary to stdout
print("\nInterview Loop Summary:")
print(f"Role: {loop_data['role']} ({loop_data['level'].title()})")
print(f"Total Duration: {loop_data['total_duration_minutes']} minutes")
print(f"Number of Rounds: {loop_data['total_rounds']}")
print(f"Schedule Type: {loop_data['suggested_schedule']['type'].replace('_', ' ').title()}")
except Exception as e:
print(f"Error generating interview loop: {e}")
sys.exit(1)
if __name__ == "__main__":
main() #!/usr/bin/env python3
"""
Question Bank Generator
Generates comprehensive, competency-based interview questions with detailed scoring criteria.
Creates structured question banks organized by competency area with scoring rubrics,
follow-up probes, and calibration examples.
Usage:
python question_bank_generator.py --role "Frontend Engineer" --competencies react,typescript,system-design
python question_bank_generator.py --role "Product Manager" --question-types behavioral,leadership
python question_bank_generator.py --input role_requirements.json --output questions/
"""
import os
import sys
import json
import argparse
import random
from datetime import datetime
from typing import Dict, List, Optional, Any, Tuple
from collections import defaultdict
class QuestionBankGenerator:
"""Generates comprehensive interview question banks with scoring criteria."""
def __init__(self):
self.technical_questions = self._init_technical_questions()
self.behavioral_questions = self._init_behavioral_questions()
self.competency_mapping = self._init_competency_mapping()
self.scoring_rubrics = self._init_scoring_rubrics()
self.follow_up_strategies = self._init_follow_up_strategies()
def _init_technical_questions(self) -> Dict[str, Dict]:
"""Initialize technical questions by competency area and level."""
return {
"coding_fundamentals": {
"junior": [
{
"question": "Write a function to reverse a string without using built-in reverse methods.",
"competency": "coding_fundamentals",
"type": "coding",
"difficulty": "easy",
"time_limit": 15,
"key_concepts": ["loops", "string_manipulation", "basic_algorithms"]
},
{
"question": "Implement a function to check if a string is a palindrome.",
"competency": "coding_fundamentals",
"type": "coding",
"difficulty": "easy",
"time_limit": 15,
"key_concepts": ["string_processing", "comparison", "edge_cases"]
},
{
"question": "Find the largest element in an array without using built-in max functions.",
"competency": "coding_fundamentals",
"type": "coding",
"difficulty": "easy",
"time_limit": 10,
"key_concepts": ["arrays", "iteration", "comparison"]
}
],
"mid": [
{
"question": "Implement a function to find the first non-repeating character in a string.",
"competency": "coding_fundamentals",
"type": "coding",
"difficulty": "medium",
"time_limit": 20,
"key_concepts": ["hash_maps", "string_processing", "efficiency"]
},
{
"question": "Write a function to merge two sorted arrays into one sorted array.",
"competency": "coding_fundamentals",
"type": "coding",
"difficulty": "medium",
"time_limit": 25,
"key_concepts": ["merge_algorithms", "two_pointers", "optimization"]
}
],
"senior": [
{
"question": "Implement a LRU (Least Recently Used) cache with O(1) operations.",
"competency": "coding_fundamentals",
"type": "coding",
"difficulty": "hard",
"time_limit": 35,
"key_concepts": ["data_structures", "hash_maps", "doubly_linked_lists"]
}
]
},
"system_design": {
"mid": [
{
"question": "Design a URL shortener service like bit.ly for 10K users.",
"competency": "system_design",
"type": "design",
"difficulty": "medium",
"time_limit": 45,
"key_concepts": ["database_design", "hashing", "basic_scalability"]
}
],
"senior": [
{
"question": "Design a real-time chat system supporting 1M concurrent users.",
"competency": "system_design",
"type": "design",
"difficulty": "hard",
"time_limit": 60,
"key_concepts": ["websockets", "load_balancing", "database_sharding", "caching"]
},
{
"question": "Design a distributed cache system like Redis with high availability.",
"competency": "system_design",
"type": "design",
"difficulty": "hard",
"time_limit": 60,
"key_concepts": ["distributed_systems", "replication", "consistency", "partitioning"]
}
],
"staff": [
{
"question": "Design the architecture for a global content delivery network (CDN).",
"competency": "system_design",
"type": "design",
"difficulty": "expert",
"time_limit": 75,
"key_concepts": ["global_architecture", "edge_computing", "content_optimization", "network_protocols"]
}
]
},
"frontend_development": {
"junior": [
{
"question": "Create a responsive navigation menu using HTML, CSS, and vanilla JavaScript.",
"competency": "frontend_development",
"type": "coding",
"difficulty": "easy",
"time_limit": 30,
"key_concepts": ["html_css", "responsive_design", "dom_manipulation"]
}
],
"mid": [
{
"question": "Build a React component that fetches and displays paginated data from an API.",
"competency": "frontend_development",
"type": "coding",
"difficulty": "medium",
"time_limit": 45,
"key_concepts": ["react_hooks", "api_integration", "state_management", "pagination"]
}
],
"senior": [
{
"question": "Design and implement a custom React hook for managing complex form state with validation.",
"competency": "frontend_development",
"type": "coding",
"difficulty": "hard",
"time_limit": 60,
"key_concepts": ["custom_hooks", "form_validation", "state_management", "performance"]
}
]
},
"data_analysis": {
"junior": [
{
"question": "Given a dataset of user activities, calculate the daily active users for the past month.",
"competency": "data_analysis",
"type": "analytical",
"difficulty": "easy",
"time_limit": 30,
"key_concepts": ["sql_basics", "date_functions", "aggregation"]
}
],
"mid": [
{
"question": "Analyze conversion funnel data to identify the biggest drop-off point and propose solutions.",
"competency": "data_analysis",
"type": "analytical",
"difficulty": "medium",
"time_limit": 45,
"key_concepts": ["funnel_analysis", "conversion_optimization", "statistical_significance"]
}
],
"senior": [
{
"question": "Design an A/B testing framework to measure the impact of a new recommendation algorithm.",
"competency": "data_analysis",
"type": "analytical",
"difficulty": "hard",
"time_limit": 60,
"key_concepts": ["experiment_design", "statistical_power", "bias_mitigation", "causal_inference"]
}
]
},
"machine_learning": {
"mid": [
{
"question": "Explain how you would build a recommendation system for an e-commerce platform.",
"competency": "machine_learning",
"type": "conceptual",
"difficulty": "medium",
"time_limit": 45,
"key_concepts": ["collaborative_filtering", "content_based", "cold_start", "evaluation_metrics"]
}
],
"senior": [
{
"question": "Design a real-time fraud detection system for financial transactions.",
"competency": "machine_learning",
"type": "design",
"difficulty": "hard",
"time_limit": 60,
"key_concepts": ["anomaly_detection", "real_time_ml", "feature_engineering", "model_monitoring"]
}
]
},
"product_strategy": {
"mid": [
{
"question": "How would you prioritize features for a mobile app with limited engineering resources?",
"competency": "product_strategy",
"type": "case_study",
"difficulty": "medium",
"time_limit": 45,
"key_concepts": ["prioritization_frameworks", "resource_allocation", "impact_estimation"]
}
],
"senior": [
{
"question": "Design a go-to-market strategy for a new B2B SaaS product entering a competitive market.",
"competency": "product_strategy",
"type": "strategic",
"difficulty": "hard",
"time_limit": 60,
"key_concepts": ["market_analysis", "competitive_positioning", "pricing_strategy", "channel_strategy"]
}
]
}
}
def _init_behavioral_questions(self) -> Dict[str, List[Dict]]:
"""Initialize behavioral questions by competency area."""
return {
"leadership": [
{
"question": "Tell me about a time when you had to lead a team through a significant change or challenge.",
"competency": "leadership",
"type": "behavioral",
"method": "STAR",
"focus_areas": ["change_management", "team_motivation", "communication"]
},
{
"question": "Describe a situation where you had to influence someone without having direct authority over them.",
"competency": "leadership",
"type": "behavioral",
"method": "STAR",
"focus_areas": ["influence", "persuasion", "stakeholder_management"]
},
{
"question": "Give me an example of when you had to make a difficult decision that affected your team.",
"competency": "leadership",
"type": "behavioral",
"method": "STAR",
"focus_areas": ["decision_making", "team_impact", "communication"]
}
],
"collaboration": [
{
"question": "Describe a time when you had to work with a difficult colleague or stakeholder.",
"competency": "collaboration",
"type": "behavioral",
"method": "STAR",
"focus_areas": ["conflict_resolution", "relationship_building", "professionalism"]
},
{
"question": "Tell me about a project where you had to coordinate across multiple teams or departments.",
"competency": "collaboration",
"type": "behavioral",
"method": "STAR",
"focus_areas": ["cross_functional_work", "communication", "project_coordination"]
}
],
"problem_solving": [
{
"question": "Walk me through a complex problem you solved recently. What was your approach?",
"competency": "problem_solving",
"type": "behavioral",
"method": "STAR",
"focus_areas": ["analytical_thinking", "methodology", "creativity"]
},
{
"question": "Describe a time when you had to solve a problem with limited information or resources.",
"competency": "problem_solving",
"type": "behavioral",
"method": "STAR",
"focus_areas": ["resourcefulness", "ambiguity_tolerance", "decision_making"]
}
],
"communication": [
{
"question": "Tell me about a time when you had to present complex technical information to a non-technical audience.",
"competency": "communication",
"type": "behavioral",
"method": "STAR",
"focus_areas": ["technical_communication", "audience_adaptation", "clarity"]
},
{
"question": "Describe a situation where you had to deliver difficult feedback to a colleague.",
"competency": "communication",
"type": "behavioral",
"method": "STAR",
"focus_areas": ["feedback_delivery", "empathy", "constructive_criticism"]
}
],
"adaptability": [
{
"question": "Tell me about a time when you had to quickly learn a new technology or skill for work.",
"competency": "adaptability",
"type": "behavioral",
"method": "STAR",
"focus_areas": ["learning_agility", "growth_mindset", "knowledge_acquisition"]
},
{
"question": "Describe how you handled a situation when project requirements changed significantly mid-way.",
"competency": "adaptability",
"type": "behavioral",
"method": "STAR",
"focus_areas": ["flexibility", "change_management", "resilience"]
}
],
"innovation": [
{
"question": "Tell me about a time when you came up with a creative solution to improve a process or solve a problem.",
"competency": "innovation",
"type": "behavioral",
"method": "STAR",
"focus_areas": ["creative_thinking", "process_improvement", "initiative"]
}
]
}
def _init_competency_mapping(self) -> Dict[str, Dict]:
"""Initialize role to competency mapping."""
return {
"software_engineer": {
"core_competencies": ["coding_fundamentals", "system_design", "problem_solving", "collaboration"],
"level_specific": {
"junior": ["coding_fundamentals", "debugging", "learning_agility"],
"mid": ["advanced_coding", "system_design", "mentoring_basics"],
"senior": ["system_architecture", "technical_leadership", "innovation"],
"staff": ["architectural_vision", "organizational_impact", "strategic_thinking"]
}
},
"frontend_engineer": {
"core_competencies": ["frontend_development", "ui_ux_understanding", "problem_solving", "collaboration"],
"level_specific": {
"junior": ["html_css_js", "responsive_design", "basic_frameworks"],
"mid": ["react_vue_angular", "state_management", "performance_optimization"],
"senior": ["frontend_architecture", "team_leadership", "cross_functional_collaboration"],
"staff": ["frontend_strategy", "technology_evaluation", "organizational_impact"]
}
},
"backend_engineer": {
"core_competencies": ["backend_development", "database_design", "api_design", "system_design"],
"level_specific": {
"junior": ["server_side_programming", "database_basics", "api_consumption"],
"mid": ["microservices", "caching", "security_basics"],
"senior": ["distributed_systems", "performance_optimization", "technical_leadership"],
"staff": ["system_architecture", "technology_strategy", "cross_team_influence"]
}
},
"product_manager": {
"core_competencies": ["product_strategy", "user_research", "data_analysis", "stakeholder_management"],
"level_specific": {
"junior": ["feature_specification", "user_stories", "basic_analytics"],
"mid": ["product_roadmap", "cross_functional_leadership", "market_research"],
"senior": ["business_strategy", "team_leadership", "p&l_responsibility"],
"staff": ["portfolio_management", "organizational_strategy", "market_creation"]
}
},
"data_scientist": {
"core_competencies": ["statistical_analysis", "machine_learning", "data_analysis", "business_acumen"],
"level_specific": {
"junior": ["python_r", "sql", "basic_ml", "data_visualization"],
"mid": ["advanced_ml", "experiment_design", "model_evaluation"],
"senior": ["ml_systems", "data_strategy", "stakeholder_communication"],
"staff": ["data_platform", "ai_strategy", "organizational_impact"]
}
},
"designer": {
"core_competencies": ["design_process", "user_research", "visual_design", "collaboration"],
"level_specific": {
"junior": ["design_tools", "user_empathy", "visual_communication"],
"mid": ["design_systems", "user_testing", "cross_functional_work"],
"senior": ["design_strategy", "team_leadership", "business_impact"],
"staff": ["design_vision", "organizational_design", "strategic_influence"]
}
},
"devops_engineer": {
"core_competencies": ["infrastructure", "automation", "monitoring", "troubleshooting"],
"level_specific": {
"junior": ["scripting", "basic_cloud", "ci_cd_basics"],
"mid": ["infrastructure_as_code", "container_orchestration", "security"],
"senior": ["platform_design", "reliability_engineering", "team_leadership"],
"staff": ["platform_strategy", "organizational_infrastructure", "technology_vision"]
}
}
}
def _init_scoring_rubrics(self) -> Dict[str, Dict]:
"""Initialize scoring rubrics for different question types."""
return {
"coding": {
"correctness": {
"4": "Solution is completely correct, handles all edge cases, optimal complexity",
"3": "Solution is correct for main cases, good complexity, minor edge case issues",
"2": "Solution works but has some bugs or suboptimal approach",
"1": "Solution has significant issues or doesn't work"
},
"code_quality": {
"4": "Clean, readable, well-structured code with excellent naming and comments",
"3": "Good code structure, readable with appropriate naming",
"2": "Code works but has style/structure issues",
"1": "Poor code quality, hard to understand"
},
"problem_solving_approach": {
"4": "Excellent problem breakdown, clear thinking process, considers alternatives",
"3": "Good approach, logical thinking, systematic problem solving",
"2": "Decent approach but some confusion or inefficiency",
"1": "Poor approach, unclear thinking process"
},
"communication": {
"4": "Excellent explanation of approach, asks clarifying questions, clear reasoning",
"3": "Good communication, explains thinking well",
"2": "Adequate communication, some explanation",
"1": "Poor communication, little explanation"
}
},
"behavioral": {
"situation_clarity": {
"4": "Clear, specific situation with relevant context and stakes",
"3": "Good situation description with adequate context",
"2": "Situation described but lacks some specifics",
"1": "Vague or unclear situation description"
},
"action_quality": {
"4": "Specific, thoughtful actions showing strong competency",
"3": "Good actions demonstrating competency",
"2": "Adequate actions but could be stronger",
"1": "Weak or inappropriate actions"
},
"result_impact": {
"4": "Significant positive impact with measurable results",
"3": "Good positive impact with clear outcomes",
"2": "Some positive impact demonstrated",
"1": "Little or no positive impact shown"
},
"self_awareness": {
"4": "Excellent self-reflection, learns from experience, acknowledges growth areas",
"3": "Good self-awareness and learning orientation",
"2": "Some self-reflection demonstrated",
"1": "Limited self-awareness or reflection"
}
},
"design": {
"system_thinking": {
"4": "Comprehensive system view, considers all components and interactions",
"3": "Good system understanding with most components identified",
"2": "Basic system thinking with some gaps",
"1": "Limited system thinking, misses key components"
},
"scalability": {
"4": "Excellent scalability considerations, multiple strategies discussed",
"3": "Good scalability awareness with practical solutions",
"2": "Basic scalability understanding",
"1": "Little to no scalability consideration"
},
"trade_offs": {
"4": "Excellent trade-off analysis, considers multiple dimensions",
"3": "Good trade-off awareness with clear reasoning",
"2": "Some trade-off consideration",
"1": "Limited trade-off analysis"
},
"technical_depth": {
"4": "Deep technical knowledge with implementation details",
"3": "Good technical knowledge with solid understanding",
"2": "Adequate technical knowledge",
"1": "Limited technical depth"
}
}
}
def _init_follow_up_strategies(self) -> Dict[str, List[str]]:
"""Initialize follow-up question strategies by competency."""
return {
"coding_fundamentals": [
"How would you optimize this solution for better time complexity?",
"What edge cases should we consider for this problem?",
"How would you test this function?",
"What would happen if the input size was very large?"
],
"system_design": [
"How would you handle if the system needed to scale 10x?",
"What would you do if one of your services went down?",
"How would you monitor this system in production?",
"What security considerations would you implement?"
],
"leadership": [
"What would you do differently if you faced this situation again?",
"How did you handle team members who were resistant to the change?",
"What metrics did you use to measure success?",
"How did you communicate progress to stakeholders?"
],
"problem_solving": [
"Walk me through your thought process step by step",
"What alternative approaches did you consider?",
"How did you validate your solution worked?",
"What did you learn from this experience?"
],
"collaboration": [
"How did you build consensus among the different stakeholders?",
"What communication channels did you use to keep everyone aligned?",
"How did you handle disagreements or conflicts?",
"What would you do to improve collaboration in the future?"
]
}
def generate_question_bank(self, role: str, level: str = "senior",
competencies: Optional[List[str]] = None,
question_types: Optional[List[str]] = None,
num_questions: int = 20) -> Dict[str, Any]:
"""Generate a comprehensive question bank for the specified role and competencies."""
# Normalize inputs
role_key = self._normalize_role(role)
level_key = level.lower()
# Get competency requirements
role_competencies = self._get_role_competencies(role_key, level_key, competencies)
# Determine question types to include
if question_types is None:
question_types = ["technical", "behavioral", "situational"]
# Generate questions
questions = self._generate_questions(role_competencies, question_types, level_key, num_questions)
# Create scoring rubrics
scoring_rubrics = self._create_scoring_rubrics(questions)
# Generate follow-up probes
follow_up_probes = self._generate_follow_up_probes(questions)
# Create calibration examples
calibration_examples = self._create_calibration_examples(questions[:5]) # Sample for first 5 questions
return {
"role": role,
"level": level,
"competencies": role_competencies,
"question_types": question_types,
"generated_at": datetime.now().isoformat(),
"total_questions": len(questions),
"questions": questions,
"scoring_rubrics": scoring_rubrics,
"follow_up_probes": follow_up_probes,
"calibration_examples": calibration_examples,
"usage_guidelines": self._generate_usage_guidelines(role_key, level_key)
}
def _normalize_role(self, role: str) -> str:
"""Normalize role name to match competency mapping keys."""
role_lower = role.lower().replace(" ", "_").replace("-", "_")
# Map variations to standard roles
role_mappings = {
"software_engineer": ["engineer", "developer", "swe", "software_developer"],
"frontend_engineer": ["frontend", "front_end", "ui_engineer", "web_developer"],
"backend_engineer": ["backend", "back_end", "server_engineer", "api_developer"],
"product_manager": ["pm", "product", "product_owner", "po"],
"data_scientist": ["ds", "data", "analyst", "ml_engineer"],
"designer": ["ux", "ui", "ux_ui", "product_designer", "visual_designer"],
"devops_engineer": ["devops", "sre", "platform_engineer", "infrastructure"]
}
for standard_role, variations in role_mappings.items():
if any(var in role_lower for var in variations):
return standard_role
# Default fallback
return "software_engineer"
def _get_role_competencies(self, role_key: str, level_key: str,
custom_competencies: Optional[List[str]]) -> List[str]:
"""Get competencies for the role and level."""
if role_key not in self.competency_mapping:
role_key = "software_engineer"
role_mapping = self.competency_mapping[role_key]
competencies = role_mapping["core_competencies"].copy()
# Add level-specific competencies
if level_key in role_mapping["level_specific"]:
competencies.extend(role_mapping["level_specific"][level_key])
elif "senior" in role_mapping["level_specific"]:
competencies.extend(role_mapping["level_specific"]["senior"])
# Add custom competencies if specified
if custom_competencies:
competencies.extend([comp.strip() for comp in custom_competencies if comp.strip() not in competencies])
return list(set(competencies)) # Remove duplicates
def _generate_questions(self, competencies: List[str], question_types: List[str],
level: str, num_questions: int) -> List[Dict[str, Any]]:
"""Generate questions based on competencies and types."""
questions = []
questions_per_competency = max(1, num_questions // len(competencies))
for competency in competencies:
competency_questions = []
# Add technical questions if requested and available
if "technical" in question_types and competency in self.technical_questions:
tech_questions = []
# Get questions for current level and below
level_order = ["junior", "mid", "senior", "staff", "principal"]
current_level_idx = level_order.index(level) if level in level_order else 2
for lvl_idx in range(current_level_idx + 1):
lvl = level_order[lvl_idx]
if lvl in self.technical_questions[competency]:
tech_questions.extend(self.technical_questions[competency][lvl])
competency_questions.extend(tech_questions[:questions_per_competency])
# Add behavioral questions if requested
if "behavioral" in question_types and competency in self.behavioral_questions:
behavioral_q = self.behavioral_questions[competency][:questions_per_competency]
competency_questions.extend(behavioral_q)
# Add situational questions (variations of behavioral)
if "situational" in question_types:
situational_q = self._generate_situational_questions(competency, questions_per_competency)
competency_questions.extend(situational_q)
# Ensure we have enough questions for this competency
while len(competency_questions) < questions_per_competency:
competency_questions.extend(self._generate_fallback_questions(competency, level))
if len(competency_questions) >= questions_per_competency:
break
questions.extend(competency_questions[:questions_per_competency])
# Shuffle and limit to requested number
random.shuffle(questions)
return questions[:num_questions]
def _generate_situational_questions(self, competency: str, count: int) -> List[Dict[str, Any]]:
"""Generate situational questions for a competency."""
situational_templates = {
"leadership": [
{
"question": "You're leading a project that's behind schedule and the client is unhappy. How do you handle this situation?",
"competency": competency,
"type": "situational",
"focus_areas": ["crisis_management", "client_communication", "team_leadership"]
}
],
"collaboration": [
{
"question": "You're working on a cross-functional project and two team members have opposing views on the technical approach. How do you resolve this?",
"competency": competency,
"type": "situational",
"focus_areas": ["conflict_resolution", "technical_decision_making", "facilitation"]
}
],
"problem_solving": [
{
"question": "You've been assigned to improve the performance of a critical system, but you have limited time and budget. Walk me through your approach.",
"competency": competency,
"type": "situational",
"focus_areas": ["prioritization", "resource_constraints", "systematic_approach"]
}
]
}
if competency in situational_templates:
return situational_templates[competency][:count]
return []
def _generate_fallback_questions(self, competency: str, level: str) -> List[Dict[str, Any]]:
"""Generate fallback questions when specific ones aren't available."""
fallback_questions = [
{
"question": f"Describe your experience with {competency.replace('_', ' ')} in your current or previous role.",
"competency": competency,
"type": "experience",
"focus_areas": ["experience_depth", "practical_application"]
},
{
"question": f"What challenges have you faced related to {competency.replace('_', ' ')} and how did you overcome them?",
"competency": competency,
"type": "challenge_based",
"focus_areas": ["problem_solving", "learning_from_experience"]
}
]
return fallback_questions
def _create_scoring_rubrics(self, questions: List[Dict[str, Any]]) -> Dict[str, Dict]:
"""Create scoring rubrics for the generated questions."""
rubrics = {}
for i, question in enumerate(questions, 1):
question_key = f"question_{i}"
question_type = question.get("type", "behavioral")
if question_type in self.scoring_rubrics:
rubrics[question_key] = {
"question": question["question"],
"competency": question["competency"],
"type": question_type,
"scoring_criteria": self.scoring_rubrics[question_type],
"weight": self._determine_question_weight(question),
"time_limit": question.get("time_limit", 30)
}
return rubrics
def _determine_question_weight(self, question: Dict[str, Any]) -> str:
"""Determine the weight/importance of a question."""
competency = question.get("competency", "")
question_type = question.get("type", "")
difficulty = question.get("difficulty", "medium")
# Core competencies get higher weight
core_competencies = ["coding_fundamentals", "system_design", "leadership", "problem_solving"]
if competency in core_competencies:
return "high"
elif question_type in ["coding", "design"] or difficulty == "hard":
return "high"
elif difficulty == "easy":
return "medium"
else:
return "medium"
def _generate_follow_up_probes(self, questions: List[Dict[str, Any]]) -> Dict[str, List[str]]:
"""Generate follow-up probes for each question."""
probes = {}
for i, question in enumerate(questions, 1):
question_key = f"question_{i}"
competency = question.get("competency", "")
# Get competency-specific follow-ups
if competency in self.follow_up_strategies:
competency_probes = self.follow_up_strategies[competency].copy()
else:
competency_probes = [
"Can you provide more specific details about your approach?",
"What would you do differently if you had to do this again?",
"What challenges did you face and how did you overcome them?"
]
# Add question-type specific probes
question_type = question.get("type", "")
if question_type == "coding":
competency_probes.extend([
"How would you test this solution?",
"What's the time and space complexity of your approach?",
"Can you think of any optimizations?"
])
elif question_type == "behavioral":
competency_probes.extend([
"What did you learn from this experience?",
"How did others react to your approach?",
"What metrics did you use to measure success?"
])
elif question_type == "design":
competency_probes.extend([
"How would you handle failure scenarios?",
"What monitoring would you implement?",
"How would this scale to 10x the load?"
])
probes[question_key] = competency_probes[:5] # Limit to 5 follow-ups
return probes
def _create_calibration_examples(self, sample_questions: List[Dict[str, Any]]) -> Dict[str, Dict]:
"""Create calibration examples with poor/good/great answers."""
examples = {}
for i, question in enumerate(sample_questions, 1):
question_key = f"question_{i}"
examples[question_key] = {
"question": question["question"],
"competency": question["competency"],
"sample_answers": {
"poor_answer": self._generate_sample_answer(question, "poor"),
"good_answer": self._generate_sample_answer(question, "good"),
"great_answer": self._generate_sample_answer(question, "great")
},
"scoring_rationale": self._generate_scoring_rationale(question)
}
return examples
def _generate_sample_answer(self, question: Dict[str, Any], quality: str) -> Dict[str, str]:
"""Generate sample answers of different quality levels."""
competency = question.get("competency", "")
question_type = question.get("type", "")
if quality == "poor":
return {
"answer": f"Sample poor answer for {competency} question - lacks detail, specificity, or demonstrates weak competency",
"score": "1-2",
"issues": ["Vague response", "Limited evidence of competency", "Poor structure"]
}
elif quality == "good":
return {
"answer": f"Sample good answer for {competency} question - adequate detail, demonstrates competency clearly",
"score": "3",
"strengths": ["Clear structure", "Demonstrates competency", "Adequate detail"]
}
else: # great
return {
"answer": f"Sample excellent answer for {competency} question - exceptional detail, strong evidence, goes above and beyond",
"score": "4",
"strengths": ["Exceptional detail", "Strong evidence", "Strategic thinking", "Goes beyond requirements"]
}
def _generate_scoring_rationale(self, question: Dict[str, Any]) -> Dict[str, str]:
"""Generate rationale for scoring this question."""
competency = question.get("competency", "")
return {
"key_indicators": f"Look for evidence of {competency.replace('_', ' ')} competency",
"red_flags": "Vague answers, lack of specifics, negative outcomes without learning",
"green_flags": "Specific examples, clear impact, demonstrates growth and learning"
}
def _generate_usage_guidelines(self, role_key: str, level_key: str) -> Dict[str, Any]:
"""Generate usage guidelines for the question bank."""
return {
"interview_flow": {
"warm_up": "Start with 1-2 easier questions to build rapport",
"core_assessment": "Focus majority of time on core competency questions",
"closing": "End with questions about candidate's questions/interests"
},
"time_management": {
"technical_questions": "Allow extra time for coding/design questions",
"behavioral_questions": "Keep to time limits but allow for follow-ups",
"total_recommendation": "45-75 minutes per interview round"
},
"question_selection": {
"variety": "Mix question types within each competency area",
"difficulty": "Adjust based on candidate responses and energy",
"customization": "Adapt questions based on candidate's background"
},
"common_mistakes": [
"Don't ask all questions mechanically",
"Don't skip follow-up questions",
"Don't forget to assess cultural fit alongside competencies",
"Don't let one strong/weak area bias overall assessment"
],
"calibration_reminders": [
"Compare against role standard, not other candidates",
"Focus on evidence demonstrated, not potential",
"Consider level-appropriate expectations",
"Document specific examples in feedback"
]
}
def format_human_readable(question_bank: Dict[str, Any]) -> str:
"""Format question bank data in human-readable format."""
output = []
# Header
output.append(f"Interview Question Bank: {question_bank['role']} ({question_bank['level'].title()} Level)")
output.append("=" * 70)
output.append(f"Generated: {question_bank['generated_at']}")
output.append(f"Total Questions: {question_bank['total_questions']}")
output.append(f"Question Types: {', '.join(question_bank['question_types'])}")
output.append(f"Target Competencies: {', '.join(question_bank['competencies'])}")
output.append("")
# Questions
output.append("INTERVIEW QUESTIONS")
output.append("-" * 50)
for i, question in enumerate(question_bank['questions'], 1):
output.append(f"\n{i}. {question['question']}")
output.append(f" Competency: {question['competency'].replace('_', ' ').title()}")
output.append(f" Type: {question.get('type', 'N/A').title()}")
if 'time_limit' in question:
output.append(f" Time Limit: {question['time_limit']} minutes")
if 'focus_areas' in question:
output.append(f" Focus Areas: {', '.join(question['focus_areas'])}")
# Scoring Guidelines
output.append("\n\nSCORING RUBRICS")
output.append("-" * 50)
# Show sample scoring criteria
if question_bank['scoring_rubrics']:
first_question = list(question_bank['scoring_rubrics'].keys())[0]
sample_rubric = question_bank['scoring_rubrics'][first_question]
output.append(f"Sample Scoring Criteria ({sample_rubric['type']} questions):")
for criterion, scores in sample_rubric['scoring_criteria'].items():
output.append(f"\n{criterion.replace('_', ' ').title()}:")
for score, description in scores.items():
output.append(f" {score}: {description}")
# Follow-up Probes
output.append("\n\nFOLLOW-UP PROBE EXAMPLES")
output.append("-" * 50)
if question_bank['follow_up_probes']:
first_question = list(question_bank['follow_up_probes'].keys())[0]
sample_probes = question_bank['follow_up_probes'][first_question]
output.append("Sample follow-up questions:")
for probe in sample_probes[:3]: # Show first 3
output.append(f" • {probe}")
# Usage Guidelines
output.append("\n\nUSAGE GUIDELINES")
output.append("-" * 50)
guidelines = question_bank['usage_guidelines']
output.append("Interview Flow:")
for phase, description in guidelines['interview_flow'].items():
output.append(f" • {phase.replace('_', ' ').title()}: {description}")
output.append("\nTime Management:")
for aspect, recommendation in guidelines['time_management'].items():
output.append(f" • {aspect.replace('_', ' ').title()}: {recommendation}")
output.append("\nCommon Mistakes to Avoid:")
for mistake in guidelines['common_mistakes'][:3]: # Show first 3
output.append(f" • {mistake}")
# Calibration Examples (if available)
if question_bank['calibration_examples']:
output.append("\n\nCALIBRATION EXAMPLES")
output.append("-" * 50)
first_example = list(question_bank['calibration_examples'].values())[0]
output.append(f"Question: {first_example['question']}")
output.append("\nSample Answer Quality Levels:")
for quality, details in first_example['sample_answers'].items():
output.append(f" {quality.replace('_', ' ').title()} (Score {details['score']}):")
if 'issues' in details:
output.append(f" Issues: {', '.join(details['issues'])}")
if 'strengths' in details:
output.append(f" Strengths: {', '.join(details['strengths'])}")
return "\n".join(output)
def main():
parser = argparse.ArgumentParser(description="Generate comprehensive interview question banks with scoring criteria")
parser.add_argument("--role", type=str, help="Job role title (e.g., 'Frontend Engineer')")
parser.add_argument("--level", type=str, default="senior", help="Experience level (junior, mid, senior, staff, principal)")
parser.add_argument("--competencies", type=str, help="Comma-separated list of competencies to focus on")
parser.add_argument("--question-types", type=str, help="Comma-separated list of question types (technical, behavioral, situational)")
parser.add_argument("--num-questions", type=int, default=20, help="Number of questions to generate")
parser.add_argument("--input", type=str, help="Input JSON file with role requirements")
parser.add_argument("--output", type=str, help="Output directory or file path")
parser.add_argument("--format", choices=["json", "text", "both"], default="both", help="Output format")
args = parser.parse_args()
generator = QuestionBankGenerator()
# Handle input
if args.input:
try:
with open(args.input, 'r') as f:
role_data = json.load(f)
role = role_data.get('role') or role_data.get('title', '')
level = role_data.get('level', 'senior')
competencies = role_data.get('competencies')
question_types = role_data.get('question_types')
num_questions = role_data.get('num_questions', 20)
except Exception as e:
print(f"Error reading input file: {e}")
sys.exit(1)
else:
if not args.role:
print("Error: --role is required when not using --input")
sys.exit(1)
role = args.role
level = args.level
competencies = args.competencies.split(',') if args.competencies else None
question_types = args.question_types.split(',') if args.question_types else None
num_questions = args.num_questions
# Generate question bank
try:
question_bank = generator.generate_question_bank(
role=role,
level=level,
competencies=competencies,
question_types=question_types,
num_questions=num_questions
)
# Handle output
if args.output:
output_path = args.output
if os.path.isdir(output_path):
safe_role = "".join(c for c in role.lower() if c.isalnum() or c in (' ', '-', '_')).replace(' ', '_')
base_filename = f"{safe_role}_{level}_questions"
json_path = os.path.join(output_path, f"{base_filename}.json")
text_path = os.path.join(output_path, f"{base_filename}.txt")
else:
json_path = output_path if output_path.endswith('.json') else f"{output_path}.json"
text_path = output_path.replace('.json', '.txt') if output_path.endswith('.json') else f"{output_path}.txt"
else:
safe_role = "".join(c for c in role.lower() if c.isalnum() or c in (' ', '-', '_')).replace(' ', '_')
base_filename = f"{safe_role}_{level}_questions"
json_path = f"{base_filename}.json"
text_path = f"{base_filename}.txt"
# Write outputs
if args.format in ["json", "both"]:
with open(json_path, 'w') as f:
json.dump(question_bank, f, indent=2, default=str)
print(f"JSON output written to: {json_path}")
if args.format in ["text", "both"]:
with open(text_path, 'w') as f:
f.write(format_human_readable(question_bank))
print(f"Text output written to: {text_path}")
# Print summary
print(f"\nQuestion Bank Summary:")
print(f"Role: {question_bank['role']} ({question_bank['level'].title()})")
print(f"Total Questions: {question_bank['total_questions']}")
print(f"Competencies Covered: {len(question_bank['competencies'])}")
print(f"Question Types: {', '.join(question_bank['question_types'])}")
except Exception as e:
print(f"Error generating question bank: {e}")
sys.exit(1)
if __name__ == "__main__":
main() Interview Bias Mitigation Checklist
This comprehensive checklist helps identify, prevent, and mitigate various forms of bias in the interview process. Use this as a systematic guide to ensure fair and equitable hiring practices.
Pre-Interview Phase
Job Description & Requirements
- Remove unnecessary requirements that don't directly relate to job performance
- Avoid gendered language (competitive, aggressive vs. collaborative, detail-oriented)
- Remove university prestige requirements unless absolutely necessary for role
- Focus on skills and outcomes rather than years of experience in specific technologies
- Use inclusive language and avoid cultural assumptions
- Specify only essential requirements vs. nice-to-have qualifications
- Remove location/commute assumptions for remote-eligible positions
- Review requirements for unconscious bias (e.g., assuming continuous work history)
Sourcing & Pipeline
- Diversify sourcing channels beyond traditional networks
- Partner with diverse professional organizations and communities
- Use bias-minimizing sourcing tools and platforms
- Track sourcing effectiveness by demographic groups
- Train recruiters on bias awareness and inclusive outreach
- Review referral patterns for potential network bias
- Expand university partnerships beyond elite institutions
- Use structured outreach messages to reduce individual bias
Resume Screening
- Implement blind resume review (remove names, photos, university names initially)
- Use standardized screening criteria applied consistently
- Multiple screeners for each resume with independent scoring
- Focus on relevant skills and achievements over pedigree indicators
- Avoid assumptions about career gaps or non-traditional backgrounds
- Consider alternative paths to skills (bootcamps, self-taught, career changes)
- Track screening pass rates by demographic groups
- Regular screener calibration sessions on bias awareness
Interview Panel Composition
Diversity Requirements
- Ensure diverse interview panels (gender, ethnicity, seniority levels)
- Include at least one underrepresented interviewer when possible
- Rotate panel assignments to prevent bias patterns
- Balance seniority levels on panels (not all senior or all junior)
- Include cross-functional perspectives when relevant
- Avoid panels of only one demographic group when possible
- Consider panel member unconscious bias training status
- Document panel composition rationale for future review
Interviewer Selection
- Choose interviewers based on relevant competency assessment ability
- Ensure interviewers have completed bias training within last 12 months
- Select interviewers with consistent calibration history
- Avoid interviewers with known bias patterns (flagged in previous analyses)
- Include at least one interviewer familiar with candidate's background type
- Balance perspectives (technical depth, cultural fit, growth potential)
- Consider interviewer availability for proper preparation time
- Ensure interviewers understand role requirements and standards
Interview Process Design
Question Standardization
- Use standardized question sets for each competency area
- Develop questions that assess skills, not culture fit stereotypes
- Avoid questions about personal background unless directly job-relevant
- Remove questions that could reveal protected characteristics
- Focus on behavioral examples using STAR method
- Include scenario-based questions with clear evaluation criteria
- Test questions for potential bias with diverse interviewers
- Regularly update question bank based on effectiveness data
Structured Interview Protocol
- Define clear time allocations for each question/section
- Establish consistent interview flow across all candidates
- Create standardized intro/outro processes
- Use identical technical setup and tools for all candidates
- Provide same background information to all interviewers
- Standardize note-taking format and requirements
- Define clear handoff procedures between interviewers
- Document any deviations from standard protocol
Accommodation Preparation
- Proactively offer accommodations without requiring disclosure
- Provide multiple interview format options (phone, video, in-person)
- Ensure accessibility of interview locations and tools
- Allow extended time when requested or needed
- Provide materials in advance when helpful
- Train interviewers on accommodation protocols
- Test all technology for accessibility compliance
- Have backup plans for technical issues
During the Interview
Interviewer Behavior
- Use welcoming, professional tone with all candidates
- Avoid assumptions based on appearance or background
- Give equal encouragement and support to all candidates
- Allow equal time for candidate questions
- Avoid leading questions that suggest desired answers
- Listen actively without interrupting unnecessarily
- Take detailed notes focusing on responses, not impressions
- Avoid small talk that could reveal irrelevant personal information
Question Delivery
- Ask questions as written without improvisation that could introduce bias
- Provide equal clarification when candidates ask for it
- Use consistent follow-up probing across candidates
- Allow reasonable thinking time before expecting responses
- Avoid rephrasing questions in ways that give hints
- Stay focused on defined competencies being assessed
- Give equal encouragement for elaboration when needed
- Maintain professional demeanor regardless of candidate background
Real-time Bias Checking
- Notice first impressions but don't let them drive assessment
- Question gut reactions - are they based on competency evidence?
- Focus on specific examples and evidence provided
- Avoid pattern matching to existing successful employees
- Notice cultural assumptions in interpretation of responses
- Check for confirmation bias - seeking evidence to support initial impressions
- Consider alternative explanations for candidate responses
- Stay aware of fatigue effects on judgment throughout the day
Evaluation & Scoring
Scoring Consistency
- Use defined rubrics consistently across all candidates
- Score immediately after interview while details are fresh
- Focus scoring on demonstrated competencies not potential or personality
- Provide specific evidence for each score given
- Avoid comparative scoring (comparing candidates to each other)
- Use calibrated examples of each score level
- Score independently before discussing with other interviewers
- Document reasoning for all scores, especially extreme ones (1s and 4s)
Bias Check Questions
- "Would I score this differently if the candidate looked different?"
- "Am I basing this on evidence or assumptions?"
- "Would this response get the same score from a different demographic?"
- "Am I penalizing non-traditional backgrounds or approaches?"
- "Is my scoring consistent with the defined rubric?"
- "Am I letting one strong/weak area bias overall assessment?"
- "Are my cultural assumptions affecting interpretation?"
- "Would I want to work with this person?" (Check if this is biasing assessment)
Documentation Requirements
- Record specific examples supporting each competency score
- Avoid subjective language like "seems like," "appears to be"
- Focus on observable behaviors and concrete responses
- Note exact quotes when relevant to assessment
- Distinguish between facts and interpretations
- Provide improvement suggestions that are skill-based, not person-based
- Avoid comparative language to other candidates or employees
- Use neutral language free from cultural assumptions
Debrief Process
Structured Discussion
- Start with independent score sharing before discussion
- Focus discussion on evidence not impressions or feelings
- Address significant score discrepancies with evidence review
- Challenge biased language or assumptions in discussion
- Ensure all voices are heard in group decision making
- Document reasons for final decision with specific evidence
- Avoid personality-based discussions ("culture fit" should be evidence-based)
- Consider multiple perspectives on candidate responses
Decision-Making Process
- Use weighted scoring system based on role requirements
- Require minimum scores in critical competency areas
- Avoid veto power unless based on clear, documented evidence
- Consider growth potential fairly across all candidates
- Document dissenting opinions and reasoning
- Use tie-breaking criteria that are predetermined and fair
- Consider additional data collection if team is split
- Make final decision based on role requirements, not team preferences
Final Recommendations
- Provide specific, actionable feedback for development areas
- Focus recommendations on skills and competencies
- Avoid language that could reflect bias in written feedback
- Consider onboarding needs based on actual skill gaps, not assumptions
- Provide coaching recommendations that are evidence-based
- Avoid personal judgments about candidate character or personality
- Make hiring recommendation based solely on job-relevant criteria
- Document any concerns with specific, observable evidence
Post-Interview Monitoring
Data Collection
- Track interviewer scoring patterns for consistency analysis
- Monitor pass rates by demographic groups
- Collect candidate experience feedback on interview fairness
- Analyze score distributions for potential bias indicators
- Track time-to-decision across different candidate types
- Monitor offer acceptance rates by demographics
- Collect new hire performance data for process validation
- Document any bias incidents or concerns raised
Regular Analysis
- Conduct quarterly bias audits of interview data
- Review interviewer calibration and identify outliers
- Analyze demographic trends in hiring outcomes
- Compare candidate experience surveys across groups
- Track correlation between interview scores and job performance
- Review and update bias mitigation strategies based on data
- Share findings with interview teams for continuous improvement
- Update training programs based on identified bias patterns
Bias Types to Watch For
Affinity Bias
- Definition: Favoring candidates similar to yourself
- Watch for: Over-positive response to shared backgrounds, interests, or experiences
- Mitigation: Focus on job-relevant competencies, diversify interview panels
Halo/Horn Effect
- Definition: One positive/negative trait influencing overall assessment
- Watch for: Strong performance in one area affecting scores in unrelated areas
- Mitigation: Score each competency independently, use structured evaluation
Confirmation Bias
- Definition: Seeking information that confirms initial impressions
- Watch for: Asking follow-ups that lead candidate toward expected responses
- Mitigation: Use standardized questions, consider alternative interpretations
Attribution Bias
- Definition: Attributing success/failure to different causes based on candidate demographics
- Watch for: Assuming women are "lucky" vs. men are "skilled" for same achievements
- Mitigation: Focus on candidate's role in achievements, avoid assumptions
Cultural Bias
- Definition: Judging candidates based on cultural differences rather than job performance
- Watch for: Penalizing communication styles, work approaches, or values that differ from team norm
- Mitigation: Define job-relevant criteria clearly, consider diverse perspectives valuable
Educational Bias
- Definition: Over-weighting prestigious educational credentials
- Watch for: Assuming higher capability based on school rank rather than demonstrated skills
- Mitigation: Focus on skills demonstration, consider alternative learning paths
Experience Bias
- Definition: Requiring specific company or industry experience unnecessarily
- Watch for: Discounting transferable skills from different industries or company sizes
- Mitigation: Define core skills needed, assess adaptability and learning ability
Emergency Bias Response Protocol
During Interview
- Pause the interview if significant bias is observed
- Privately address bias with interviewer if possible
- Document the incident for review
- Continue with fair assessment of candidate
- Flag for debrief discussion if interview continues
Post-Interview
- Report bias incidents to hiring manager/HR immediately
- Document specific behaviors observed
- Consider additional interviewer for second opinion
- Review candidate assessment for bias impact
- Implement corrective actions for future interviews
Interviewer Coaching
- Provide immediate feedback on bias observed
- Schedule bias training refresher if needed
- Monitor future interviews for improvement
- Consider removing from interview rotation if bias persists
- Document coaching provided for performance management
Legal Compliance Reminders
Protected Characteristics
- Age, race, color, religion, sex, national origin, disability status, veteran status
- Pregnancy, genetic information, sexual orientation, gender identity
- Any other characteristics protected by local/state/federal law
Prohibited Questions
- Questions about family planning, marital status, pregnancy
- Age-related questions (unless BFOQ)
- Religious or political affiliations
- Disability status (unless voluntary disclosure for accommodation)
- Arrest records (without conviction relevance)
- Financial status or credit (unless job-relevant)
Documentation Requirements
- Keep all interview materials for required retention period
- Ensure consistent documentation across all candidates
- Avoid documenting protected characteristic observations
- Focus documentation on job-relevant observations only
Training & Certification
Required Training Topics
- Unconscious bias awareness and mitigation
- Structured interviewing techniques
- Legal compliance in hiring
- Company-specific bias mitigation protocols
- Role-specific competency assessment
- Accommodation and accessibility requirements
Ongoing Development
- Annual bias training refresher
- Quarterly calibration sessions
- Regular updates on legal requirements
- Peer feedback and coaching
- Industry best practice updates
- Data-driven process improvements
This checklist should be reviewed and updated regularly based on legal requirements, industry best practices, and internal bias analysis results.
Competency Matrix Templates
This document provides comprehensive competency matrix templates for different engineering roles and levels. Use these matrices to design role-specific interview loops and evaluation criteria.
Software Engineering Competency Matrix
Technical Competencies
| Competency | Junior (L1-L2) | Mid (L3-L4) | Senior (L5-L6) | Staff+ (L7+) |
|---|---|---|---|---|
| Coding & Algorithms | Basic data structures, simple algorithms, language syntax | Advanced algorithms, complexity analysis, optimization | Complex problem solving, algorithm design, performance tuning | Architecture-level algorithmic decisions, novel approach design |
| System Design | Component interactions, basic scalability concepts | Service design, database modeling, API design | Distributed systems, scalability patterns, trade-off analysis | Large-scale architecture, cross-system design, technology strategy |
| Code Quality | Readable code, basic testing, follows conventions | Maintainable code, comprehensive testing, design patterns | Code reviews, quality standards, refactoring leadership | Engineering standards, quality culture, technical debt management |
| Debugging & Problem Solving | Basic debugging, structured problem approach | Complex debugging, root cause analysis, performance issues | System-wide debugging, production issues, incident response | Cross-system troubleshooting, preventive measures, tooling design |
| Domain Knowledge | Learning role-specific technologies | Proficiency in domain tools/frameworks | Deep domain expertise, technology evaluation | Domain leadership, technology roadmap, innovation |
Behavioral Competencies
| Competency | Junior (L1-L2) | Mid (L3-L4) | Senior (L5-L6) | Staff+ (L7+) |
|---|---|---|---|---|
| Communication | Clear status updates, asks good questions | Technical explanations, stakeholder updates | Cross-functional communication, technical writing | Executive communication, external representation, thought leadership |
| Collaboration | Team participation, code reviews | Cross-team projects, knowledge sharing | Team leadership, conflict resolution | Cross-org collaboration, culture building, strategic partnerships |
| Leadership & Influence | Peer mentoring, positive attitude | Junior mentoring, project ownership | Team guidance, technical decisions, hiring | Org-wide influence, vision setting, culture change |
| Growth & Learning | Skill development, feedback receptivity | Proactive learning, teaching others | Continuous improvement, trend awareness | Learning culture, industry leadership, innovation adoption |
| Ownership & Initiative | Task completion, quality focus | Project ownership, process improvement | Feature/service ownership, strategic thinking | Product/platform ownership, business impact, market influence |
Product Management Competency Matrix
Product Competencies
| Competency | Associate PM (L1-L2) | PM (L3-L4) | Senior PM (L5-L6) | Principal PM (L7+) |
|---|---|---|---|---|
| Product Strategy | Feature requirements, user stories | Product roadmaps, market analysis | Business strategy, competitive positioning | Portfolio strategy, market creation, platform vision |
| User Research & Analytics | Basic user interviews, metrics tracking | Research design, data interpretation | Research strategy, advanced analytics | Research culture, measurement frameworks, insight generation |
| Technical Understanding | Basic tech concepts, API awareness | System architecture, technical trade-offs | Technical strategy, platform decisions | Technology vision, architectural influence, innovation leadership |
| Execution & Process | Feature delivery, stakeholder coordination | Project management, cross-functional leadership | Process optimization, team scaling | Operational excellence, org design, strategic execution |
| Business Acumen | Revenue awareness, customer understanding | P&L understanding, business case development | Business strategy, market dynamics | Corporate strategy, board communication, investor relations |
Leadership Competencies
| Competency | Associate PM (L1-L2) | PM (L3-L4) | Senior PM (L5-L6) | Principal PM (L7+) |
|---|---|---|---|---|
| Stakeholder Management | Team collaboration, clear communication | Cross-functional alignment, expectation management | Executive communication, influence without authority | Board interaction, external partnerships, industry influence |
| Team Development | Peer learning, feedback sharing | Junior mentoring, knowledge transfer | Team building, hiring, performance management | Talent development, culture building, org leadership |
| Decision Making | Data-driven decisions, priority setting | Complex trade-offs, strategic choices | Ambiguous situations, high-stakes decisions | Strategic vision, transformational decisions, risk management |
| Innovation & Vision | Creative problem solving, user empathy | Market opportunity identification, feature innovation | Product vision, market strategy | Industry vision, disruptive thinking, platform creation |
Design Competency Matrix
Design Competencies
| Competency | Junior Designer (L1-L2) | Mid Designer (L3-L4) | Senior Designer (L5-L6) | Principal Designer (L7+) |
|---|---|---|---|---|
| Visual Design | UI components, typography, color theory | Design systems, visual hierarchy | Brand integration, advanced layouts | Visual strategy, brand evolution, design innovation |
| User Experience | User flows, wireframing, prototyping | Interaction design, usability testing | Experience strategy, journey mapping | UX vision, service design, behavioral insights |
| Research & Validation | User interviews, usability tests | Research planning, data synthesis | Research strategy, methodology design | Research culture, insight frameworks, market research |
| Design Systems | Component usage, style guides | System contribution, pattern creation | System architecture, governance | System strategy, scalable design, platform thinking |
| Tools & Craft | Design software proficiency, asset creation | Advanced techniques, workflow optimization | Tool evaluation, process design | Technology integration, future tooling, craft evolution |
Collaboration Competencies
| Competency | Junior Designer (L1-L2) | Mid Designer (L3-L4) | Senior Designer (L5-L6) | Principal Designer (L7+) |
|---|---|---|---|---|
| Cross-functional Partnership | Engineering collaboration, handoff quality | Product partnership, stakeholder alignment | Leadership collaboration, strategic alignment | Executive partnership, business strategy integration |
| Communication & Advocacy | Design rationale, feedback integration | Design presentations, user advocacy | Executive communication, design thinking evangelism | Industry thought leadership, external representation |
| Mentorship & Growth | Peer learning, skill sharing | Junior mentoring, critique facilitation | Team development, hiring, career guidance | Design culture, talent strategy, industry leadership |
| Business Impact | User-centered thinking, design quality | Feature success, user satisfaction | Business metrics, strategic impact | Market influence, competitive advantage, innovation leadership |
Data Science Competency Matrix
Technical Competencies
| Competency | Junior DS (L1-L2) | Mid DS (L3-L4) | Senior DS (L5-L6) | Principal DS (L7+) |
|---|---|---|---|---|
| Statistical Analysis | Descriptive stats, hypothesis testing | Advanced statistics, experimental design | Causal inference, advanced modeling | Statistical strategy, methodology innovation |
| Machine Learning | Basic ML algorithms, model training | Advanced ML, feature engineering | ML systems, model deployment | ML strategy, AI platform, research direction |
| Data Engineering | SQL, basic ETL, data cleaning | Pipeline design, data modeling | Platform architecture, scalable systems | Data strategy, infrastructure vision, governance |
| Programming & Tools | Python/R proficiency, visualization | Advanced programming, tool integration | Software engineering, system design | Technology strategy, platform development, innovation |
| Domain Expertise | Business understanding, metric interpretation | Domain modeling, insight generation | Strategic analysis, business integration | Market expertise, competitive intelligence, thought leadership |
Impact & Leadership Competencies
| Competency | Junior DS (L1-L2) | Mid DS (L3-L4) | Senior DS (L5-L6) | Principal DS (L7+) |
|---|---|---|---|---|
| Business Impact | Metric improvement, insight delivery | Project leadership, business case development | Strategic initiatives, P&L impact | Business transformation, market advantage, innovation |
| Communication | Technical reporting, visualization | Stakeholder presentations, executive briefings | Board communication, external representation | Industry leadership, thought leadership, market influence |
| Team Leadership | Peer collaboration, knowledge sharing | Junior mentoring, project management | Team building, hiring, culture development | Organizational leadership, talent strategy, vision setting |
| Innovation & Research | Algorithm implementation, experimentation | Research projects, publication | Research strategy, academic partnerships | Research vision, industry influence, breakthrough innovation |
DevOps Engineering Competency Matrix
Technical Competencies
| Competency | Junior DevOps (L1-L2) | Mid DevOps (L3-L4) | Senior DevOps (L5-L6) | Principal DevOps (L7+) |
|---|---|---|---|---|
| Infrastructure | Basic cloud services, server management | Infrastructure automation, containerization | Platform architecture, multi-cloud strategy | Infrastructure vision, emerging technologies, industry standards |
| CI/CD & Automation | Pipeline basics, script writing | Advanced pipelines, deployment automation | Platform design, workflow optimization | Automation strategy, developer experience, productivity platforms |
| Monitoring & Observability | Basic monitoring, log analysis | Advanced monitoring, alerting systems | Observability strategy, SLA/SLI design | Monitoring vision, reliability engineering, performance culture |
| Security & Compliance | Security basics, access management | Security automation, compliance frameworks | Security architecture, risk management | Security strategy, governance, industry leadership |
| Performance & Scalability | Performance monitoring, basic optimization | Capacity planning, performance tuning | Scalability architecture, cost optimization | Performance strategy, efficiency platforms, innovation |
Leadership & Impact Competencies
| Competency | Junior DevOps (L1-L2) | Mid DevOps (L3-L4) | Senior DevOps (L5-L6) | Principal DevOps (L7+) |
|---|---|---|---|---|
| Developer Experience | Tool support, documentation | Platform development, self-service tools | Developer productivity, workflow design | Developer platform vision, industry best practices |
| Incident Management | Incident response, troubleshooting | Incident coordination, root cause analysis | Incident strategy, prevention systems | Reliability culture, organizational resilience |
| Team Collaboration | Cross-team support, knowledge sharing | Process improvement, training delivery | Culture building, practice evangelism | Organizational transformation, industry influence |
| Strategic Impact | Operational excellence, cost awareness | Efficiency improvements, platform adoption | Strategic initiatives, business enablement | Technology strategy, competitive advantage, market leadership |
Engineering Management Competency Matrix
People Leadership Competencies
| Competency | Manager (L1-L2) | Senior Manager (L3-L4) | Director (L5-L6) | VP+ (L7+) |
|---|---|---|---|---|
| Team Building | Hiring, onboarding, 1:1s | Team culture, performance management | Multi-team coordination, org design | Organizational culture, talent strategy |
| Performance Management | Individual development, feedback | Performance systems, coaching | Calibration across teams, promotion standards | Talent development, succession planning |
| Communication | Team updates, stakeholder management | Executive communication, cross-functional alignment | Board updates, external communication | Industry representation, thought leadership |
| Conflict Resolution | Team conflicts, process improvements | Cross-team issues, organizational friction | Strategic alignment, cultural challenges | Corporate-level conflicts, crisis management |
Technical Leadership Competencies
| Competency | Manager (L1-L2) | Senior Manager (L3-L4) | Director (L5-L6) | VP+ (L7+) |
|---|---|---|---|---|
| Technical Vision | Team technical decisions, architecture input | Platform strategy, technology choices | Technical roadmap, innovation strategy | Technology vision, industry standards |
| System Ownership | Feature/service ownership, quality standards | Platform ownership, scalability planning | System portfolio, technical debt management | Technology strategy, competitive advantage |
| Process & Practice | Team processes, development practices | Engineering standards, quality systems | Process innovation, best practices | Engineering culture, industry influence |
| Technology Strategy | Tool evaluation, team technology choices | Platform decisions, technical investments | Technology portfolio, strategic architecture | Corporate technology strategy, market leadership |
Usage Guidelines
Assessment Approach
- Level Calibration: Use these matrices to calibrate expectations for each level within your organization
- Interview Design: Select competencies most relevant to the specific role and level being hired for
- Evaluation Consistency: Ensure all interviewers understand and apply the same competency standards
- Growth Planning: Use matrices for career development and promotion discussions
Customization Tips
- Industry Adaptation: Modify competencies based on your industry (fintech, healthcare, etc.)
- Company Stage: Adjust expectations based on startup vs. enterprise environment
- Team Needs: Emphasize competencies most critical for current team challenges
- Cultural Fit: Add company-specific values and cultural competencies
Common Pitfalls
- Unrealistic Expectations: Don't expect senior-level competencies from junior candidates
- One-Size-Fits-All: Customize competency emphasis based on role requirements
- Static Assessment: Regularly update matrices based on changing business needs
- Bias Introduction: Ensure competencies are measurable and don't introduce unconscious bias
Matrix Validation Process
Regular Review Cycle
- Quarterly: Review competency relevance and adjust weights
- Semi-annually: Update level expectations based on market standards
- Annually: Comprehensive review with stakeholder feedback
Stakeholder Input
- Hiring Managers: Validate role-specific competency requirements
- Current Team Members: Confirm level expectations match reality
- Recent Hires: Gather feedback on assessment accuracy
- HR Partners: Ensure legal compliance and bias mitigation
Continuous Improvement
- Performance Correlation: Track new hire performance against competency assessments
- Market Benchmarking: Compare standards with industry peers
- Feedback Integration: Incorporate interviewer and candidate feedback
- Bias Monitoring: Regular analysis of assessment patterns across demographics
Interview Debrief Facilitation Guide
This guide provides a comprehensive framework for conducting effective, unbiased interview debriefs that lead to consistent hiring decisions. Use this to facilitate productive discussions that focus on evidence-based evaluation.
Pre-Debrief Preparation
Facilitator Responsibilities
- Review all interviewer feedback before the meeting
- Identify significant score discrepancies that need discussion
- Prepare discussion agenda with time allocations
- Gather role requirements and competency framework
- Review any flags or special considerations noted during interviews
- Ensure all required materials are available (scorecards, rubrics, candidate resume)
- Set up meeting logistics (room, video conference, screen sharing)
- Send agenda to participants 30 minutes before meeting
Required Materials Checklist
- Candidate resume and application materials
- Job description and competency requirements
- Individual interviewer scorecards
- Scoring rubrics and competency definitions
- Interview notes and documentation
- Any technical assessments or work samples
- Company hiring standards and calibration examples
- Bias mitigation reminders and prompts
Participant Preparation Requirements
- All interviewers must complete independent scoring before debrief
- Submit written feedback with specific evidence for each competency
- Review scoring rubrics to ensure consistent interpretation
- Prepare specific examples to support scoring decisions
- Flag any concerns or unusual circumstances that affected assessment
- Avoid discussing candidate with other interviewers before debrief
- Come prepared to defend scores with concrete evidence
- Be ready to adjust scores based on additional evidence shared
Debrief Meeting Structure
Opening (5 minutes)
- State meeting purpose: Make hiring decision based on evidence
- Review agenda and time limits: Keep discussion focused and productive
- Remind of bias mitigation principles: Focus on competencies, not personality
- Confirm confidentiality: Discussion stays within hiring team
- Establish ground rules: One person speaks at a time, evidence-based discussion
Individual Score Sharing (10-15 minutes)
- Go around the room systematically - each interviewer shares scores independently
- No discussion or challenges yet - just data collection
- Record scores on shared document visible to all participants
- Note any abstentions or "insufficient data" responses
- Identify clear patterns and discrepancies without commentary
- Flag any scores requiring explanation (1s or 4s typically need strong evidence)
Competency-by-Competency Discussion (30-40 minutes)
For Each Core Competency:
1. Present Score Distribution (2 minutes)
- Display all scores for this competency
- Note range and any outliers
- Identify if consensus exists or discussion needed
2. Evidence Sharing (5-8 minutes per competency)
- Start with interviewers who assessed this competency directly
- Share specific examples and observations
- Focus on what candidate said/did, not interpretations
- Allow questions for clarification (not challenges yet)
3. Discussion and Calibration (3-5 minutes)
- Address significant discrepancies (>1 point difference)
- Challenge vague or potentially biased language
- Seek additional evidence if needed
- Allow score adjustments based on new information
- Reach consensus or note dissenting views
Structured Discussion Questions:
- "What specific evidence supports this score?"
- "Can you provide the exact example or quote?"
- "How does this compare to our rubric definition?"
- "Would this response receive the same score regardless of who gave it?"
- "Are we evaluating the competency or making assumptions?"
- "What would need to change for this to be the next level up/down?"
Overall Recommendation Discussion (10-15 minutes)
Weighted Score Calculation
- Apply competency weights based on role requirements
- Calculate overall weighted average
- Check minimum threshold requirements
- Consider any veto criteria (critical competency failures)
Final Recommendation Options
- Strong Hire: Exceeds requirements in most areas, clear value-add
- Hire: Meets requirements with growth potential
- No Hire: Doesn't meet minimum requirements for success
- Strong No Hire: Significant gaps that would impact team/company
Decision Rationale Documentation
- Summarize key strengths with specific evidence
- Identify development areas with specific examples
- Explain final recommendation with competency-based reasoning
- Note any dissenting opinions and reasoning
- Document onboarding considerations if hiring
Closing and Next Steps (5 minutes)
- Confirm final decision and documentation
- Assign follow-up actions (feedback delivery, offer preparation, etc.)
- Schedule any additional interviews if needed
- Review timeline for candidate communication
- Remind confidentiality of discussion and decision
Facilitation Best Practices
Creating Psychological Safety
- Encourage honest feedback without fear of judgment
- Validate different perspectives and assessment approaches
- Address power dynamics - ensure junior voices are heard
- Model vulnerability - admit when evidence changes your mind
- Focus on learning and calibration, not winning arguments
- Thank participants for thorough preparation and thoughtful input
Managing Difficult Conversations
When Scores Vary Significantly
- Acknowledge the discrepancy without judgment
- Ask for specific evidence from each scorer
- Look for different interpretations of the same data
- Consider if different questions revealed different competency levels
- Check for bias patterns in reasoning
- Allow time for reflection and potential score adjustments
When Someone Uses Biased Language
- Pause the conversation gently but firmly
- Ask for specific evidence behind the assessment
- Reframe in competency terms - "What specific skills did this demonstrate?"
- Challenge assumptions - "Help me understand how we know that"
- Redirect to rubric - "How does this align with our scoring criteria?"
- Document and follow up privately if bias persists
When the Discussion Gets Off Track
- Redirect to competencies: "Let's focus on the technical skills demonstrated"
- Ask for evidence: "What specific example supports that assessment?"
- Reference rubrics: "How does this align with our level 3 definition?"
- Manage time: "We have 5 minutes left on this competency"
- Table unrelated issues: "That's important but separate from this hire decision"
Encouraging Evidence-Based Discussion
Good Evidence Examples
- Direct quotes: "When asked about debugging, they said..."
- Specific behaviors: "They organized their approach by first..."
- Observable outcomes: "Their code compiled on first run and handled edge cases"
- Process descriptions: "They walked through their problem-solving step by step"
- Measurable results: "They identified 3 optimization opportunities"
Poor Evidence Examples
- Gut feelings: "They just seemed off"
- Comparisons: "Not as strong as our last hire"
- Assumptions: "Probably wouldn't fit our culture"
- Vague impressions: "Didn't seem passionate"
- Irrelevant factors: "Their background is different from ours"
Managing Group Dynamics
Ensuring Equal Participation
- Direct questions to quieter participants
- Prevent interrupting and ensure everyone finishes thoughts
- Balance speaking time across all interviewers
- Validate minority opinions even if not adopted
- Check for unheard perspectives before finalizing decisions
Handling Strong Personalities
- Set time limits for individual speaking
- Redirect monopolizers: "Let's hear from others on this"
- Challenge confidently stated opinions that lack evidence
- Support less assertive voices in expressing dissenting views
- Focus on data, not personality or seniority in decision making
Bias Interruption Strategies
Affinity Bias Interruption
- Notice pattern: Positive assessment seems based on shared background/interests
- Interrupt with: "Let's focus on the job-relevant skills they demonstrated"
- Redirect to: Specific competency evidence and measurable outcomes
- Document: Note if personal connection affected professional assessment
Halo/Horn Effect Interruption
- Notice pattern: One area strongly influencing assessment of unrelated areas
- Interrupt with: "Let's score each competency independently"
- Redirect to: Specific evidence for each individual competency area
- Recalibrate: Ask for separate examples supporting each score
Confirmation Bias Interruption
- Notice pattern: Only seeking/discussing evidence that supports initial impression
- Interrupt with: "What evidence might suggest a different assessment?"
- Redirect to: Consider alternative interpretations of the same data
- Challenge: "How might we be wrong about this assessment?"
Attribution Bias Interruption
- Notice pattern: Attributing success to luck/help for some demographics, skill for others
- Interrupt with: "What role did the candidate play in achieving this outcome?"
- Redirect to: Candidate's specific contributions and decision-making
- Standardize: Apply same attribution standards across all candidates
Decision Documentation Framework
Required Documentation Elements
- Final scores for each assessed competency
- Overall recommendation with supporting rationale
- Key strengths with specific evidence
- Development areas with specific examples
- Dissenting opinions if any, with reasoning
- Special considerations or accommodation needs
- Next steps and timeline for decision communication
Evidence Quality Standards
- Specific and observable: What exactly did the candidate do or say?
- Job-relevant: How does this relate to success in the role?
- Measurable: Can this be quantified or clearly described?
- Unbiased: Would this evidence be interpreted the same way regardless of candidate demographics?
- Complete: Does this represent the full picture of their performance in this area?
Writing Guidelines
- Use active voice and specific language
- Avoid assumptions about motivations or personality
- Focus on behaviors demonstrated during the interview
- Provide context for any unusual circumstances
- Be constructive in describing development areas
- Maintain professionalism and respect for candidate
Common Debrief Challenges and Solutions
Challenge: "I just don't think they'd fit our culture"
Solution:
- Ask for specific, observable evidence
- Define what "culture fit" means in job-relevant terms
- Challenge assumptions about cultural requirements
- Focus on ability to collaborate and contribute effectively
Challenge: Scores vary widely with no clear explanation
Solution:
- Review if different interviewers assessed different competencies
- Look for question differences that might explain variance
- Consider if candidate performance varied across interviews
- May need additional data gathering or interview
Challenge: Everyone loved/hated the candidate but can't articulate why
Solution:
- Push for specific evidence supporting emotional reactions
- Review competency rubrics together
- Look for halo/horn effects influencing overall impression
- Consider unconscious bias training for team
Challenge: Technical vs. non-technical interviewers disagree
Solution:
- Clarify which competencies each interviewer was assessing
- Ensure technical assessments carry appropriate weight
- Look for different perspectives on same evidence
- Consider specialist input for technical decisions
Challenge: Senior interviewer dominates decision making
Solution:
- Structure discussion to hear from all levels first
- Ask direct questions to junior interviewers
- Challenge opinions that lack supporting evidence
- Remember that assessment ability doesn't correlate with seniority
Challenge: Team wants to hire but scores don't support it
Solution:
- Review if rubrics match actual job requirements
- Check for consistent application of scoring standards
- Consider if additional competencies need assessment
- May indicate need for rubric calibration or role requirement review
Post-Debrief Actions
Immediate Actions (Same Day)
- Finalize decision documentation with all evidence
- Communicate decision to recruiting team
- Schedule candidate feedback delivery if applicable
- Update interview scheduling based on decision
- Note any process improvements needed for future
Follow-up Actions (Within 1 Week)
- Deliver candidate feedback (internal or external)
- Update interview feedback in tracking system
- Schedule any additional interviews if needed
- Begin offer process if hiring
- Document lessons learned for process improvement
Long-term Actions (Monthly/Quarterly)
- Analyze debrief effectiveness and decision quality
- Review interviewer calibration based on decisions
- Update rubrics based on debrief insights
- Provide additional training if bias patterns identified
- Share successful practices with other hiring teams
Continuous Improvement Framework
Debrief Effectiveness Metrics
- Decision consistency: Are similar candidates receiving similar decisions?
- Time to decision: Are debriefs completing within planned time?
- Participation quality: Are all interviewers contributing evidence-based input?
- Bias incidents: How often are bias interruptions needed?
- Decision satisfaction: Do participants feel good about the process and outcome?
Regular Review Process
- Monthly: Review debrief facilitation effectiveness and interviewer feedback
- Quarterly: Analyze decision patterns and potential bias indicators
- Semi-annually: Update debrief processes based on hiring outcome data
- Annually: Comprehensive review of debrief framework and training needs
Training and Calibration
- New facilitators: Shadow 3-5 debriefs before leading independently
- All facilitators: Quarterly calibration sessions on bias interruption
- Interviewer training: Include debrief participation expectations
- Leadership training: Ensure hiring managers can facilitate effectively
This guide should be adapted to your organization's specific needs while maintaining focus on evidence-based, unbiased decision making.
Interview Frameworks
Loop Design by Level
Junior/Mid
- Emphasize fundamentals, debugging, and growth potential.
- Keep loops concise with coding + behavioral validation.
Senior
- Add system design and leadership rounds.
- Evaluate tradeoff quality, mentoring, and cross-team collaboration.
Staff+
- Focus on architecture direction and organizational impact.
- Assess strategy, influence, and long-term technical judgment.
Competency Areas
- Technical depth (implementation, design, quality)
- Problem solving (ambiguity handling, prioritization)
- Collaboration (communication, stakeholder alignment)
- Leadership (ownership, mentoring, influence)
Scoring Rubric Baseline
4: exceeds level expectations with strong evidence3: meets expectations consistently2: partial signal with notable gaps1: does not meet baseline requirements
Calibration Guidelines
- Run recurring interviewer calibration sessions.
- Compare interviewer scoring variance across rounds.
- Track interview signal against new-hire outcomes.
- Use structured debriefs with independent scoring before discussion.
Bias-Reduction Baseline
- Standardize question banks per competency area.
- Keep scorecards evidence-based and behavior-specific.
- Use diverse interviewer panels where possible.
- Require written rationale for strong yes/no recommendations.
#!/usr/bin/env python3
"""Generate an interview loop plan by role and level."""
from __future__ import annotations
import argparse
import json
from typing import Dict, List
BASE_ROUNDS = {
"junior": [
("Screen", 45, "Fundamentals and communication"),
("Coding", 60, "Problem solving and code quality"),
("Behavioral", 45, "Collaboration and growth mindset"),
],
"mid": [
("Screen", 45, "Fundamentals and ownership"),
("Coding", 60, "Implementation quality"),
("System Design", 60, "Service/component design"),
("Behavioral", 45, "Stakeholder collaboration"),
],
"senior": [
("Screen", 45, "Depth and tradeoff reasoning"),
("Coding", 60, "Code quality and testing"),
("System Design", 75, "Scalability and reliability"),
("Leadership", 60, "Mentoring and decision making"),
("Behavioral", 45, "Cross-functional influence"),
],
"staff": [
("Screen", 45, "Strategic and technical depth"),
("Architecture", 90, "Org-level design decisions"),
("Technical Strategy", 60, "Long-term tradeoffs"),
("Influence", 60, "Cross-team leadership"),
("Behavioral", 45, "Values and executive communication"),
],
}
QUESTION_BANK = {
"coding": [
"Walk through your approach before coding and identify tradeoffs.",
"How would you test this implementation for edge cases?",
"What would you refactor if this code became a shared library?",
],
"system": [
"Design this system for 10x traffic growth in 12 months.",
"Where are the main failure modes and how would you detect them?",
"What components would you scale first and why?",
],
"leadership": [
"Describe a time you changed technical direction with incomplete information.",
"How do you raise the bar for code quality across a team?",
"How do you handle disagreement between product and engineering priorities?",
],
"behavioral": [
"Tell me about a high-stakes mistake and what changed afterward.",
"Describe a conflict where you had to influence without authority.",
"How do you support underperforming teammates?",
],
}
def normalize_level(level: str) -> str:
level = level.strip().lower()
if level in {"staff+", "principal", "lead"}:
return "staff"
if level not in BASE_ROUNDS:
raise ValueError(f"Unsupported level: {level}")
return level
def suggested_questions(round_name: str) -> List[str]:
name = round_name.lower()
if "coding" in name:
return QUESTION_BANK["coding"]
if "system" in name or "architecture" in name:
return QUESTION_BANK["system"]
if "lead" in name or "influence" in name or "strategy" in name:
return QUESTION_BANK["leadership"]
return QUESTION_BANK["behavioral"]
def generate_plan(role: str, level: str) -> Dict[str, object]:
normalized = normalize_level(level)
rounds = []
for idx, (name, minutes, focus) in enumerate(BASE_ROUNDS[normalized], start=1):
rounds.append(
{
"round": idx,
"name": name,
"duration_minutes": minutes,
"focus": focus,
"suggested_questions": suggested_questions(name),
}
)
return {
"role": role,
"level": normalized,
"total_rounds": len(rounds),
"total_minutes": sum(r["duration_minutes"] for r in rounds),
"rounds": rounds,
}
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Generate an interview loop plan for a role and level.")
parser.add_argument("--role", required=True, help="Role name (e.g., Senior Software Engineer)")
parser.add_argument("--level", required=True, help="Level: junior|mid|senior|staff")
parser.add_argument("--json", action="store_true", help="Output as JSON")
return parser.parse_args()
def main() -> int:
args = parse_args()
plan = generate_plan(args.role, args.level)
if args.json:
print(json.dumps(plan, indent=2))
else:
print(f"Interview Plan: {plan['role']} ({plan['level']})")
print(f"Total rounds: {plan['total_rounds']} | Total time: {plan['total_minutes']} minutes")
print("")
for r in plan["rounds"]:
print(f"Round {r['round']}: {r['name']} ({r['duration_minutes']} min)")
print(f"Focus: {r['focus']}")
for q in r["suggested_questions"]:
print(f"- {q}")
print("")
return 0
if __name__ == "__main__":
raise SystemExit(main())
Install this Skill
Skills give your AI agent a consistent, structured approach to this task — better output than a one-off prompt.
npx skills add alirezarezvani/claude-skills --skill engineering/interview-system-designer Community skill by @alirezarezvani. Need a walkthrough? See the install guide →
Works with
Prefer no terminal? Download the ZIP and place it manually.
Details
- Category
- Productivity
- License
- MIT
- Author
- @alirezarezvani
- Source
- GitHub →
- Source file
-
show path
engineering/interview-system-designer/SKILL.md
People who install this also use
CHRO Advisor
Human resources leadership — hiring strategy, compensation benchmarking, org structure design, culture development, and people operations at scale.
@alirezarezvani
Senior Software Architect
Design system architecture with C4 and sequence diagrams, write Architecture Decision Records, evaluate tech stacks, and guide architectural trade-offs.
@alirezarezvani
CTO Advisor
Technical leadership guidance — engineering team scaling, technology strategy, build vs. buy decisions, and architecture at the executive level.
@alirezarezvani