Core Concepts

How Agent Training Works

agym.ai uses autoresearch to continuously improve AI agents for telecom network operations. Here are the core concepts that make it work.

Autoresearch

Autoresearch is the core methodology behind agym.ai. Instead of manually engineering automation scripts, we define an operational intent and let an AI agent iteratively improve its own implementation through structured evaluation loops.

Intent-Driven

Operators define what the agent should achieve, not how. The agent discovers the implementation through experimentation and evaluation feedback.

Self-Improving

Each training iteration produces a new version of the agent. Results from evaluation feed back into the skill template, creating a closed loop of continuous improvement.

Converging

Training continues until the agent reaches a convergence target or exhausts its patience budget. The best-performing iteration becomes the certified agent.

The Autoresearch Loop

Step 1

DOIL Intent

Define the operational intent in DOIL - a declarative language that describes what the agent should achieve, not how.

Step 2

Skill Template

The intent is compiled into a skill template - a structured set of instructions, constraints, and evaluation criteria.

Step 3

Agent Generation

Claude generates an agent implementation from the skill template, producing executable automation code.

Step 4

Evaluation Loop

The agent runs against a simulated network environment. Results feed back to refine the skill template. The loop continues until convergence.

DOIL - Declarative Operational Intent Language

DOIL is a domain-specific language for expressing operational intents. It tells the training system what the agent should optimize for, the constraints it must respect, and how its performance should be evaluated.

Structure

intent:The high-level goal the agent should pursue

objective:Primary/secondary targets and constraints

context:Network type, deployment scenario, environment

parameters:Search space ranges and hard constraints

evaluation:The 4-layer evaluation criteria

convergence:Target metric, threshold, and optimization strategy

intent: optimize_mobility
domain: ran.mobility_management

objective:
  primary: minimize_handover_failure_rate
  constraint: service_continuity >= 99.5%

parameters:
  search_space:
    hysteresis_db: [0.5, 6.0, 0.5]
    time_to_trigger_ms: [40, 640, 40]

evaluation:
  layers:
    - correctness: validate_ranges
    - simulation: run_trace(users=500)
    - performance: measure_kpis(...)
    - robustness: stress_test(...)

convergence:
  target: >= 98.5%
  strategy: bayesian_optimization

Self-Improving Factory

The self-improving factory is the runtime engine that orchestrates the autoresearch loop. It takes a DOIL script, generates agent implementations, evaluates them against simulated network conditions, and feeds results back to produce better versions.

Phase 1

Parse Intent

DOIL script is parsed into a structured skill template with objectives, constraints, and evaluation criteria.

Phase 2

Generate Agent

Claude generates an agent implementation from the skill template, producing executable automation code.

Phase 3

Evaluate

The agent runs against a simulated network. All 4 evaluation layers score the result independently.

Phase 4

Refine & Repeat

Scores feed back into the skill template. Parameters are adjusted. The loop continues until convergence.

Key Insight

The factory does not require human intervention between iterations. Once a DOIL script is submitted, the system autonomously trains the agent until it reaches the convergence target or the patience budget is exhausted. Operators only need to define the intent.

4-Layer Evaluation

Every agent iteration is scored across four independent evaluation layers. This multi-dimensional assessment prevents overfitting to a single metric and ensures agents are both effective and robust.

Layer 1

Correctness

Validates that the agent's output is syntactically and semantically valid. Checks parameter ranges, constraint satisfaction, and logical consistency. This is the first gate - if it fails here, the iteration is rejected without further evaluation.

Layer 2

Simulation

Runs the agent's configuration against a simulated network environment. Models real-world conditions including traffic patterns, user mobility, interference, and equipment behavior. Duration and fidelity depend on the DOIL script specification.

Layer 3

Performance

Measures specific KPIs defined in the DOIL script against target thresholds. These are the real-world metrics that matter: handover success rate, energy savings percentage, detection accuracy, throughput improvement, and more.

Layer 4

Robustness

Stress-tests the agent under extreme conditions: traffic surges, equipment failures, edge cases, and adversarial scenarios. An agent that performs well under normal conditions but fails under stress is not ready for production.

Composite Scoring

Each layer produces an independent score out of 100. The composite score is a weighted average that can be customized per DOIL script. An agent achieves “certified” status when it consistently scores above the convergence target across all four layers over multiple consecutive iterations.

Training Catalog About agym.ai