xAI Human Data - Software Engineering

Contract software-engineering model evaluation work for coding-model training and benchmarking

Overview

I contract with xAI's Human Data software-engineering track to create and review high-quality coding data for training, benchmarking, and improving large language models.

Work Scope

  • Evaluate AI-generated code across correctness, maintainability, performance, security, test coverage, task alignment, and production plausibility.
  • Rank model outputs using evidence from prompts, diffs, traces, responses, tests, and container checks.
  • Write concise, evidence-grounded justifications for model training and benchmarking workflows.
  • Review coding tasks across Python, TypeScript/JavaScript, Java, Go, Rust, C/C++, databases, distributed systems, AI/ML, security, and performance domains.

Why It Matters

This work sits close to the quality layer of AI coding systems: identifying which outputs actually satisfy real software-engineering tasks, why they work or fail, and how evaluation evidence should be grounded in code behavior rather than vibes.