AIBuildAI: Autonomous ML Engineering through Collaborative AI Agents

News

[2026-03] Our project website for AIBuildAI is now live!

Introduction

Machine learning engineering involves complex, multi-step workflows: understanding a problem, selecting appropriate models, writing training and inference code, debugging, hyperparameter tuning, and evaluating results. These tasks are time-consuming even for experienced practitioners and often require iterative trial-and-error across many design choices.

We introduce AIBuildAI, a framework for autonomous machine learning engineering powered by an iterative multi-agent loop. AIBuildAI decomposes the entire ML pipeline into specialized roles — setup, management, design, coding, tuning, and aggregation — each handled by a dedicated AI agent built on top of Claude Opus 4.6. A central Manager Agent orchestrates the workflow, iteratively dispatching tasks to specialized agents until the best result is achieved.

Overview

Traditional approaches to automating ML workflows rely on fixed pipelines (e.g., AutoML) that lack the flexibility to handle diverse, open-ended tasks. AIBuildAI takes a fundamentally different approach: it employs an iterative multi-agent loop where a Manager Agent dynamically coordinates specialized agents, adapting the workflow to each unique problem. The system takes competition context, workflow state, and decision rules as input, and outputs concrete actions — running code, generating instructions, stopping with reasons — all with full reasoning traces.

The iterative multi-agent loop of AIBuildAI: a Manager Agent orchestrates specialized agents (Setup, Designer, Coder, Tuner, Aggregator) powered by Claude Opus 4.6 to autonomously solve ML competitions.

AIBuildAI operates through a coordinated, iterative pipeline with six specialized agents:

Setup Agent — Initializes the environment by configuring conda, locating dataset paths (images, CSVs), installing required Python packages, and verifying GPU/resource availability.
Manager Agent — The central coordinator that decides which agent to call next, manages workflow state, runs agents in parallel when possible, and determines when to stop (i.e., when the best result is found).
Designer Agent — Receives dataset info, available packages, and competition metrics to produce a candidate plan including model selection, feature engineering, validation strategy, and coding guidelines — all without writing code.
Coder Agent — Takes the plan and instructions to implement the solution: writes code, runs experiments, performs quick test runs, checks results, and ensures production-ready code quality.
Tuner Agent — Receives the candidate code directory and instructions to optimize hyperparameters, conduct extended training, perform model checkpointing, and push for better validation scores.
Aggregator Agent — Analyzes all candidate models with their validation AUC scores, selects the best checkpoint, applies ensemble or stacking strategies if beneficial, and generates the final submission file for official grading.

Iterative Multi-Agent Loop

Unlike single-pass systems, AIBuildAI's Manager Agent drives an iterative loop: after each agent completes its task, the Manager evaluates the current state and decides the next action. This enables the system to revisit and refine earlier decisions — for example, calling the Designer Agent again with new insights from failed experiments, or invoking the Tuner Agent multiple rounds to progressively improve performance. The Manager can also run the Coder and Tuner agents in parallel to explore different strategies simultaneously.

Each agent receives structured input (competition context, dataset info, prior results) and produces structured output (JSON reports, code directories, submission files). This clear interface enables modular composition and makes the system easy to extend with new specialized agents.

Evaluation on MLE-Bench

We evaluate AIBuildAI on MLE-Bench, a comprehensive benchmark comprising real-world Kaggle competitions spanning diverse ML domains including computer vision, natural language processing, tabular data, and more. The system autonomously handles the full pipeline — from environment setup and solution design to code generation, hyperparameter tuning, model aggregation, and final submission — without any human intervention.

Key Highlights

Fully autonomous: end-to-end ML engineering from raw competition description to graded submission
Iterative refinement: the Manager Agent drives multiple rounds of design-code-tune cycles to maximize performance
Parallel execution: independent agents (e.g., Coder and Tuner) can run concurrently on different strategies
Modular architecture: each agent has a clear role and structured I/O, making the system easy to extend
Built on Claude Opus 4.6: all agents leverage state-of-the-art reasoning capabilities for planning, coding, and decision-making

Explore More

Explore more about our approach, the agent architecture, and experimental results — stay tuned for future updates!

Reference

If you find our work useful, please give us a cite:

@misc{aibuildai2026,
    title={AIBuildAI: Autonomous Machine Learning Engineering through Collaborative AI Agents},
    author={AIBuildAI Team},
    year={2026}
}