How Agent Training Works
agym.ai uses autoresearch to continuously improve AI agents for telecom network operations. Here are the core concepts that make it work.
Autoresearch
Autoresearch is the core methodology behind agym.ai. Instead of manually engineering automation scripts, we define an operational intent and let an AI agent iteratively improve its own implementation through structured evaluation loops.
Intent-Driven
Operators define what the agent should achieve, not how. The agent discovers the implementation through experimentation and evaluation feedback.
Self-Improving
Each training iteration produces a new version of the agent. Results from evaluation feed back into the skill template, creating a closed loop of continuous improvement.
Converging
Training continues until the agent reaches a convergence target or exhausts its patience budget. The best-performing iteration becomes the certified agent.
The Autoresearch Loop
DOIL Intent
Define the operational intent in DOIL - a declarative language that describes what the agent should achieve, not how.
Skill Template
The intent is compiled into a skill template - a structured set of instructions, constraints, and evaluation criteria.
Agent Generation
Claude generates an agent implementation from the skill template, producing executable automation code.
Evaluation Loop
The agent runs against a simulated network environment. Results feed back to refine the skill template. The loop continues until convergence.
DOIL - Declarative Operational Intent Language
DOIL is a domain-specific language for expressing operational intents. It tells the training system what the agent should optimize for, the constraints it must respect, and how its performance should be evaluated.
Structure
intent: optimize_mobility
domain: ran.mobility_management
objective:
primary: minimize_handover_failure_rate
constraint: service_continuity >= 99.5%
parameters:
search_space:
hysteresis_db: [0.5, 6.0, 0.5]
time_to_trigger_ms: [40, 640, 40]
evaluation:
layers:
- correctness: validate_ranges
- simulation: run_trace(users=500)
- performance: measure_kpis(...)
- robustness: stress_test(...)
convergence:
target: >= 98.5%
strategy: bayesian_optimizationSelf-Improving Factory
The self-improving factory is the runtime engine that orchestrates the autoresearch loop. It takes a DOIL script, generates agent implementations, evaluates them against simulated network conditions, and feeds results back to produce better versions.
Parse Intent
DOIL script is parsed into a structured skill template with objectives, constraints, and evaluation criteria.
Generate Agent
Claude generates an agent implementation from the skill template, producing executable automation code.
Evaluate
The agent runs against a simulated network. All 4 evaluation layers score the result independently.
Refine & Repeat
Scores feed back into the skill template. Parameters are adjusted. The loop continues until convergence.
Key Insight
The factory does not require human intervention between iterations. Once a DOIL script is submitted, the system autonomously trains the agent until it reaches the convergence target or the patience budget is exhausted. Operators only need to define the intent.
4-Layer Evaluation
Every agent iteration is scored across four independent evaluation layers. This multi-dimensional assessment prevents overfitting to a single metric and ensures agents are both effective and robust.
Correctness
Validates that the agent's output is syntactically and semantically valid. Checks parameter ranges, constraint satisfaction, and logical consistency. This is the first gate - if it fails here, the iteration is rejected without further evaluation.
Simulation
Runs the agent's configuration against a simulated network environment. Models real-world conditions including traffic patterns, user mobility, interference, and equipment behavior. Duration and fidelity depend on the DOIL script specification.
Performance
Measures specific KPIs defined in the DOIL script against target thresholds. These are the real-world metrics that matter: handover success rate, energy savings percentage, detection accuracy, throughput improvement, and more.
Robustness
Stress-tests the agent under extreme conditions: traffic surges, equipment failures, edge cases, and adversarial scenarios. An agent that performs well under normal conditions but fails under stress is not ready for production.
Composite Scoring
Each layer produces an independent score out of 100. The composite score is a weighted average that can be customized per DOIL script. An agent achieves “certified” status when it consistently scores above the convergence target across all four layers over multiple consecutive iterations.