In this tutorial, we build a Meta-Agent that designs other agents automatically from a simple task description. We implement a system that analyzes the task, selects tools, chooses a memory architecture, configures a planner, and then instantiates a fully working agent runtime. We go beyond static agent templates and instead build a dynamic, self-configuring architecture that can evaluate its own performance and refine itself as needed. We also demonstrate how agent design automation, tool selection, memory strategy, and iterative self-improvement can be unified into a cohesive, Colab-ready framework.
import os, re, json, math, time, textwrap, traceback, random
from dataclasses import dataclass
from typing import Any, Dict, List, Optional, Callable, Tuple
def _pip_install():
try:
import pydantic
import transformers
return
except Exception:
pass
import sys, subprocess
pkgs = [
"pydantic>=2.6.0",
"transformers>=4.41.0",
"accelerate>=0.30.0",
"sentencepiece",
"torch",
"numpy",
"scikit-learn",
"pandas",
]
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q"] + pkgs)
_pip_install()
import numpy as np
import pandas as pd
from pydantic import BaseModel, Field
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import NearestNeighbors
try:
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
_HAS_TRANSFORMERS = True
except Exception:
_HAS_TRANSFORMERS = False
class ToolSpec(BaseModel):
name: str
description: str
inputs_schema: Dict[str, Any] = Field(default_factory=dict)
class MemorySpec(BaseModel):
kind: str = Field(default="scratchpad", description="scratchpad | retrieval_tfidf")
max_items: int = 200
retrieval_k: int = 5
class PlannerSpec(BaseModel):
kind: str = Field(default="react", description="react | plan_execute")
max_steps: int = 10
temperature: float = 0.2
class AgentConfig(BaseModel):
agent_name: str = "DesignedAgent"
objective: str
planner: PlannerSpec
memory: MemorySpec
tools: List[ToolSpec]
output_style: str = "concise"
safety_rules: List[str] = Field(default_factory=lambda: [
"Do not execute arbitrary OS commands.",
"Refuse harmful/illegal instructions; suggest safe alternatives.",
"If uncertain, ask for missing inputs or state assumptions.",
])
We set up the complete foundational environment for our meta-agent system. We install required dependencies, import all necessary libraries, and define the core configuration schemas using Pydantic. We formalize structured specifications for tools, memory, planner, and the overall agent configuration to enable typed, automated agent construction.
class LocalLLM:
def __init__(self, model_name: str = "google/flan-t5-small", device: int = -1):
self.model_name = model_name
self.device = device
self._pipe = None
self._fallback = False
if not _HAS_TRANSFORMERS:
self._fallback = True
return
try:
tok = AutoTokenizer.from_pretrained(model_name)
mdl = AutoModelForSeq2SeqLM.from_pretrained(model_name)
self._pipe = pipeline(
"text2text-generation",
model=mdl,
tokenizer=tok,
device=device,
)
except Exception:
self._fallback = True
def generate(self, prompt: str, max_new_tokens: int = 256, temperature: float = 0.2) -> str:
if self._fallback or self._pipe is None:
return self._heuristic(prompt)
try:
out = self._pipe(
prompt,
max_new_tokens=max_new_tokens,
do_sample=temperature > 0,
temperature=max(temperature, 1e-6),
num_return_sequences=1,
)[0]["generated_text"]
return out.strip()
except Exception:
return self._heuristic(prompt)
def _heuristic(self, prompt: str) -> str:
p = prompt.lower()
if "tool" in p and "json" in p:
return '{"action":"final","final":"(fallback) I can’t load the model. Provide more details or enable internet in Colab to download the model."}'
return "(fallback) I can’t load the model. Please ensure Colab has internet access and retry."
class ScratchpadMemory:
def __init__(self, max_items: int = 200):
self.max_items = max_items
self.items: List[Dict[str, str]] = []
def add(self, role: str, content: str):
self.items.append({"role": role, "content": content})
if len(self.items) > self.max_items:
self.items = self.items[-self.max_items:]
def recent(self, n: int = 12) -> List[Dict[str, str]]:
return self.items[-n:]
def retrieve(self, query: str, k: int = 5) -> List[Dict[str, str]]:
return self.recent(k)
class TfidfRetrievalMemory:
def __init__(self, max_items: int = 200, retrieval_k: int = 5):
self.max_items = max_items
self.retrieval_k = retrieval_k
self.items: List[Dict[str, str]] = []
self._vectorizer = TfidfVectorizer(stop_words="english")
self._nn = None
self._X = None
def add(self, role: str, content: str):
self.items.append({"role": role, "content": content})
if len(self.items) > self.max_items:
self.items = self.items[-self.max_items:]
self._rebuild_index()
def _rebuild_index(self):
docs = [it["content"] for it in self.items] or [""]
self._X = self._vectorizer.fit_transform(docs)
n_neighbors = min(self.retrieval_k, self._X.shape[0])
self._nn = NearestNeighbors(n_neighbors=n_neighbors, metric="cosine")
self._nn.fit(self._X)
def recent(self, n: int = 12) -> List[Dict[str, str]]:
return self.items[-n:]
def retrieve(self, query: str, k: Optional[int] = None) -> List[Dict[str, str]]:
if not self.items:
return []
if self._nn is None:
self._rebuild_index()
k = k or self.retrieval_k
q = self._vectorizer.transform([query])
n_neighbors = min(k, self._X.shape[0])
dists, idx = self._nn.kneighbors(q, n_neighbors=n_neighbors)
hits = [self.items[i] for i in idx[0].tolist()]
return hits
We implement the LocalLLM wrapper that powers reasoning and tool-selection behavior. We configure a lightweight open-source model with a safe fallback mechanism to ensure robustness in Colab. We also define both scratchpad and retrieval-based memory systems to support contextual and semantic recall.
class ToolResult(BaseModel):
ok: bool
output: str
data: Optional[Any] = None
class Tool:
def __init__(self, name: str, description: str, fn: Callable[..., ToolResult], inputs_schema: Dict[str, Any]):
self.name = name
self.description = description
self.fn = fn
self.inputs_schema = inputs_schema
def call(self, **kwargs) -> ToolResult:
try:
return self.fn(**kwargs)
except Exception as e:
return ToolResult(ok=False, output=f"Tool error: {e}n{traceback.format_exc()}")
class ToolRegistry:
def __init__(self):
self._tools: Dict[str, Tool] = {}
def register(self, tool: Tool):
self._tools[tool.name] = tool
def has(self, name: str) -> bool:
return name in self._tools
def specs(self) -> List[ToolSpec]:
return [
ToolSpec(name=t.name, description=t.description, inputs_schema=t.inputs_schema)
for t in self._tools.values()
]
def call(self, name: str, args: Dict[str, Any]) -> ToolResult:
if name not in self._tools:
return ToolResult(ok=False, output=f"Unknown tool: {name}")
return self._tools[name].call(**args)
_ALLOWED_MATH = {
"abs": abs, "round": round, "min": min, "max": max,
"sqrt": math.sqrt, "log": math.log, "exp": math.exp,
"sin": math.sin, "cos": math.cos, "tan": math.tan,
"pi": math.pi, "e": math.e
}
def tool_calc(expression: str) -> ToolResult:
expr = expression.strip()
if not expr:
return ToolResult(ok=False, output="Empty expression.")
if re.search(r"[A-Za-z_]w*", expr):
names = set(re.findall(r"[A-Za-z_]w*", expr))
bad = [n for n in names if n not in _ALLOWED_MATH]
if bad:
return ToolResult(ok=False, output=f"Disallowed names in expression: {bad}")
if re.search(r"__|import|exec|eval|open|os.|sys.", expr):
return ToolResult(ok=False, output="Disallowed tokens in expression.")
try:
val = eval(expr, {"__builtins__": {}}, dict(_ALLOWED_MATH))
return ToolResult(ok=True, output=str(val), data=val)
except Exception as e:
return ToolResult(ok=False, output=f"Failed to evaluate: {e}")
def tool_text_stats(text: str) -> ToolResult:
s = text or ""
words = re.findall(r"w+", s)
lines = s.splitlines() if s else []
out = {
"chars": len(s),
"words": len(words),
"lines": len(lines),
"unique_words": len(set(w.lower() for w in words)),
}
return ToolResult(ok=True, output=json.dumps(out, indent=2), data=out)
def tool_csv_profile(path: str, n_rows: int = 5) -> ToolResult:
try:
df = pd.read_csv(path)
except Exception as e:
return ToolResult(ok=False, output=f"Could not read CSV: {e}")
head = df.head(n_rows)
desc = df.describe(include="all").transpose().head(30)
out = (
f"Shape: {df.shape}nn"
f"Columns: {list(df.columns)}nn"
f"Head({n_rows}):n{head}nn"
f"Describe(top 30 cols):n{desc}n"
)
return ToolResult(ok=True, output=out, data={"shape": df.shape, "columns": list(df.columns)})
def default_tool_registry() -> ToolRegistry:
reg = ToolRegistry()
reg.register(Tool(
name="calc",
description="Evaluate a safe mathematical expression (no arbitrary code).",
fn=lambda expression: tool_calc(expression),
inputs_schema={"type":"object","properties":{"expression":{"type":"string"}}, "required":["expression"]}
))
reg.register(Tool(
name="text_stats",
description="Compute basic statistics about a text blob (words, lines, unique words).",
fn=lambda text: tool_text_stats(text),
inputs_schema={"type":"object","properties":{"text":{"type":"string"}}, "required":["text"]}
))
reg.register(Tool(
name="csv_profile",
description="Load a CSV from a local path and print a quick profile (head, describe).",
fn=lambda path, n_rows=5: tool_csv_profile(path, n_rows),
inputs_schema={"type":"object","properties":{"path":{"type":"string"},"n_rows":{"type":"integer"}}, "required":["path"]}
))
return reg
We build the full tool infrastructure including tool registration, safe execution, and structured outputs. We implement secure mathematical evaluation, text statistics analysis, and CSV profiling capabilities. We design the ToolRegistry abstraction to allow the meta-agent to dynamically select and invoke tools during runtime.
class AgentRuntime:
def __init__(self, config: AgentConfig, llm: LocalLLM, tools: ToolRegistry, memory):
self.config = config
self.llm = llm
self.tools = tools
self.memory = memory
def _tool_prompt(self) -> str:
specs = self.config.tools
lines = []
for t in specs:
lines.append(f"- {t.name}: {t.description} | inputs_schema={json.dumps(t.inputs_schema)}")
return "n".join(lines)
def _format_context(self, task: str) -> str:
retrieved = self.memory.retrieve(task, k=getattr(self.config.memory, "retrieval_k", 5))
recent = self.memory.recent(8)
def pack(items):
return "n".join([f"[{it['role']}] {it['content']}" for it in items])
return (
f"OBJECTIVE:n{self.config.objective}nn"
f"TASK:n{task}nn"
f"SAFETY RULES:n- " + "n- ".join(self.config.safety_rules) + "nn"
f"AVAILABLE TOOLS:n{self._tool_prompt()}nn"
f"RETRIEVED MEMORY (may be relevant):n{pack(retrieved) if retrieved else '(none)'}nn"
f"RECENT CONTEXT:n{pack(recent) if recent else '(none)'}n"
)
def _react_step_prompt(self, task: str, scratch: str) -> str:
ctx = self._format_context(task)
return textwrap.dedent(f"""
You are an expert tool-using agent.
Use the following JSON-only protocol (no extra text):
{{
"action": "tool" | "final",
"tool_name": "name" (if action=tool),
"tool_args": {{...}} (if action=tool),
"final": "answer" (if action=final)
}}
Rules:
- If a tool is needed, pick ONE tool call per step.
- Keep args strictly matching the tool schema.
- If you can answer directly, output action="final".
- Output valid JSON only.
{ctx}
SCRATCHPAD (internal notes, may be incomplete):
{scratch}
""" ).strip()
def run(self, task: str, verbose: bool = True) -> str:
scratch = ""
self.memory.add("user", task)
for step in range(1, self.config.planner.max_steps + 1):
prompt = self._react_step_prompt(task, scratch)
raw = self.llm.generate(prompt, max_new_tokens=256, temperature=self.config.planner.temperature)
m = re.search(r"{.*}", raw, re.DOTALL)
raw_json = m.group(0).strip() if m else raw.strip()
try:
action = json.loads(raw_json)
except Exception:
final = f"(Parser fallback) I couldn't parse a tool plan. Here is what I can do:n- Clarify your goaln- Use available tools: {[t.name for t in self.config.tools]}nRaw model output:n{raw}"
self.memory.add("assistant", final)
return final
if verbose:
print(f"n--- Step {step}/{self.config.planner.max_steps} ---")
print("Model JSON:", json.dumps(action, indent=2))
if action.get("action") == "tool":
name = action.get("tool_name", "")
args = action.get("tool_args", {}) or {}
res = self.tools.call(name, args)
if verbose:
print(f"Tool call: {name}({args})")
print("Tool ok:", res.ok)
print("Tool output:n", res.output[:2000])
scratch += f"n[tool:{name}] args={args}nresult_ok={res.ok}nresult={res.output}n"
self.memory.add("tool", f"{name} args={args}n{res.output}")
if not res.ok:
scratch += "nNOTE: tool failed; consider alternative approach or ask for missing input.n"
elif action.get("action") == "final":
final = action.get("final", "").strip()
if not final:
final = "I’m missing the final answer text. Please restate the task or provide more details."
self.memory.add("assistant", final)
return final
else:
final = f"Unknown action type in model output: {action}"
self.memory.add("assistant", final)
return final
final = "Reached max steps without a final answer. Provide missing inputs or simplify the request."
self.memory.add("assistant", final)
return final
We implement the core AgentRuntime that executes the designed agent configuration. We construct the structured ReAct-style prompting loop, enforce a strict JSON-based tool-calling protocol, and integrate memory retrieval into reasoning. We manage iterative use of tools, scratchpad updates, and controlled final answer generation.
class MetaAgent:
def __init__(self, llm: Optional[LocalLLM] = None):
self.llm = llm or LocalLLM()
def _capability_heuristics(self, task: str) -> Dict[str, Any]:
t = task.lower()
needs_data = any(k in t for k in ["csv", "dataframe", "pandas", "dataset", "table", "excel"])
needs_math = any(k in t for k in ["calculate", "compute", "probability", "equation", "optimize", "derivative", "integral"])
needs_writing = any(k in t for k in ["write", "draft", "email", "cover letter", "proposal", "summarize", "rewrite"])
needs_analysis = any(k in t for k in ["analyze", "insights", "trend", "compare", "benchmark"])
needs_memory = any(k in t for k in ["long", "multi-step", "remember", "plan", "workflow", "pipeline"])
return {
"needs_data": needs_data,
"needs_math": needs_math,
"needs_writing": needs_writing,
"needs_analysis": needs_analysis,
"needs_memory": needs_memory,
}
def design(self, task_description: str) -> AgentConfig:
caps = self._capability_heuristics(task_description)
tools = default_tool_registry()
selected: List[ToolSpec] = []
selected.append(ToolSpec(
name="calc",
description="Evaluate a safe mathematical expression (no arbitrary code).",
inputs_schema={"type":"object","properties":{"expression":{"type":"string"}}, "required":["expression"]}
))
selected.append(ToolSpec(
name="text_stats",
description="Compute basic statistics about a text blob (words, lines, unique words).",
inputs_schema={"type":"object","properties":{"text":{"type":"string"}}, "required":["text"]}
))
if caps["needs_data"]:
selected.append(ToolSpec(
name="csv_profile",
description="Load a CSV from a local path and print a quick profile (head, describe).",
inputs_schema={"type":"object","properties":{"path":{"type":"string"},"n_rows":{"type":"integer"}}, "required":["path"]}
))
if caps["needs_memory"] or caps["needs_analysis"] or caps["needs_data"]:
mem = MemorySpec(kind="retrieval_tfidf", max_items=250, retrieval_k=6)
else:
mem = MemorySpec(kind="scratchpad", max_items=120, retrieval_k=5)
if caps["needs_analysis"] or caps["needs_data"] or caps["needs_memory"]:
planner = PlannerSpec(kind="react", max_steps=12, temperature=0.2)
else:
planner = PlannerSpec(kind="react", max_steps=8, temperature=0.2)
objective = "Solve the user task with tool use when helpful; produce a clean final response."
cfg = AgentConfig(
agent_name="AutoDesignedAgent",
objective=objective,
planner=planner,
memory=mem,
tools=selected,
output_style="concise",
)
for ts in selected:
if not tools.has(ts.name):
raise RuntimeError(f"Tool selected but not registered: {ts.name}")
return cfg
def instantiate(self, cfg: AgentConfig) -> AgentRuntime:
tools = default_tool_registry()
if cfg.memory.kind == "retrieval_tfidf":
mem = TfidfRetrievalMemory(max_items=cfg.memory.max_items, retrieval_k=cfg.memory.retrieval_k)
else:
mem = ScratchpadMemory(max_items=cfg.memory.max_items)
return AgentRuntime(config=cfg, llm=self.llm, tools=tools, memory=mem)
def evaluate(self, task: str, answer: str) -> Dict[str, Any]:
a = (answer or "").strip().lower()
flags = {
"empty": len(a) == 0,
"generic": any(p in a for p in ["i can't", "cannot", "missing", "provide more details", "parser fallback"]),
"mentions_max_steps": "max steps" in a,
}
score = 1.0
if flags["empty"]: score -= 0.6
if flags["generic"]: score -= 0.25
if flags["mentions_max_steps"]: score -= 0.2
score = max(0.0, min(1.0, score))
return {"score": score, "flags": flags}
def refine(self, cfg: AgentConfig, eval_report: Dict[str, Any], task: str) -> AgentConfig:
new_cfg = cfg.model_copy(deep=True)
if eval_report["flags"]["generic"] or eval_report["flags"]["mentions_max_steps"]:
new_cfg.planner.max_steps = min(18, new_cfg.planner.max_steps + 6)
new_cfg.planner.temperature = min(0.35, new_cfg.planner.temperature + 0.05)
if new_cfg.memory.kind != "retrieval_tfidf":
new_cfg.memory.kind = "retrieval_tfidf"
new_cfg.memory.max_items = max(new_cfg.memory.max_items, 200)
new_cfg.memory.retrieval_k = max(new_cfg.memory.retrieval_k, 6)
t = task.lower()
if any(k in t for k in ["csv", "dataframe", "pandas", "dataset", "table"]):
if not any(ts.name == "csv_profile" for ts in new_cfg.tools):
new_cfg.tools.append(ToolSpec(
name="csv_profile",
description="Load a CSV from a local path and print a quick profile (head, describe).",
inputs_schema={"type":"object","properties":{"path":{"type":"string"},"n_rows":{"type":"integer"}}, "required":["path"]}
))
return new_cfg
def build_and_run(self, task: str, improve_rounds: int = 1, verbose: bool = True) -> Tuple[str, AgentConfig]:
cfg = self.design(task)
agent = self.instantiate(cfg)
if verbose:
print("n==============================")
print("META-AGENT: DESIGNED CONFIG")
print("==============================")
print(cfg.model_dump_json(indent=2))
ans = agent.run(task, verbose=verbose)
report = self.evaluate(task, ans)
if verbose:
print("n==============================")
print("EVALUATION REPORT")
print("==============================")
print(json.dumps(report, indent=2))
print("n==============================")
print("FINAL ANSWER")
print("==============================")
print(ans)
for r in range(improve_rounds):
if report["score"] >= 0.85:
break
cfg = self.refine(cfg, report, task)
agent = self.instantiate(cfg)
if verbose:
print(f"nn==============================")
print(f"SELF-IMPROVEMENT ROUND {r+1}: UPDATED CONFIG")
print("==============================")
print(cfg.model_dump_json(indent=2))
ans = agent.run(task, verbose=verbose)
report = self.evaluate(task, ans)
if verbose:
print("nEVAL:", json.dumps(report, indent=2))
print("nANSWER:n", ans)
return ans, cfg
meta = MetaAgent()
examples = [
"Design an agent workflow to summarize a long meeting transcript and extract action items. Keep it concise.",
"I have a local CSV at /content/sample.csv. Profile it and tell me the top 3 insights.",
"Compute the monthly payment for a $12,000 loan at 8% APR over 36 months. Show the formula briefly.",
]
print("n==============================")
print("RUNNING A QUICK DEMO TASK")
print("==============================")
demo_task = examples[2]
_ = meta.build_and_run(demo_task, improve_rounds=1, verbose=True)
We implement MetaAgent, which analyzes tasks, designs agent configurations, instantiates runtimes, evaluates performance, and refines the architecture as needed. We apply capability heuristics to dynamically choose tools, memory strategy, and planner depth. We then demonstrate the full build-and-run pipeline, including optional self-improvement, to complete the automated agent design lifecycle.
In conclusion, we demonstrated how a Meta-Agent can move from passive task execution to active architecture construction. We designed agents programmatically, instantiated them automatically, evaluated their outputs, and refined their configurations through a self-improvement loop. We showed that agentic systems can reason not only about tasks but also about their own structure, capabilities, and limitations. This approach pushes us toward self-evolving AI systems in which the architecture becomes adaptive, automated, and increasingly autonomous, bringing us closer to fully self-designing agent ecosystems.
Check out Full Codes here. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post How to Build a Self-Designing Meta-Agent That Automatically Constructs, Instantiates, and Refines Task-Specific AI Agents appeared first on MarkTechPost.
