xAI Human Data - Software Engineering
Contract software-engineering model evaluation work for coding-model training and benchmarking
Overview
I contract with xAI's Human Data software-engineering track to create and review high-quality coding data for training, benchmarking, and improving large language models.
Work Scope
- Evaluate AI-generated code across correctness, maintainability, performance, security, test coverage, task alignment, and production plausibility.
- Rank model outputs using evidence from prompts, diffs, traces, responses, tests, and container checks.
- Write concise, evidence-grounded justifications for model training and benchmarking workflows.
- Review coding tasks across Python, TypeScript/JavaScript, Java, Go, Rust, C/C++, databases, distributed systems, AI/ML, security, and performance domains.
Why It Matters
This work sits close to the quality layer of AI coding systems: identifying which outputs actually satisfy real software-engineering tasks, why they work or fail, and how evaluation evidence should be grounded in code behavior rather than vibes.