xAI Human Data - Software Engineering

Overview

I contract with xAI's Human Data software-engineering track to create and review high-quality coding data for training, benchmarking, and improving large language models.

Work Scope

Evaluate AI-generated code across correctness, maintainability, performance, security, test coverage, task alignment, and production plausibility.
Rank model outputs using evidence from prompts, diffs, traces, responses, tests, and container checks.
Write concise, evidence-grounded justifications for model training and benchmarking workflows.
Review coding tasks across Python, TypeScript/JavaScript, Java, Go, Rust, C/C++, databases, distributed systems, AI/ML, security, and performance domains.

Why It Matters

This work sits close to the quality layer of AI coding systems: identifying which outputs actually satisfy real software-engineering tasks, why they work or fail, and how evaluation evidence should be grounded in code behavior rather than vibes.