How to Build an Advanced Agentic AI System with Planning, Tool Calling, Memory, and Self-Critique Using OpenAI API

In this tutorial, we build an advanced agentic AI system using the OpenAI API and a hidden terminal prompt for the API key. We design the agent as a small pipeline of specialized roles: planner, tool-using executor, and critic, so that we can separate strategy, action, and quality control. We also integrate structured tools (calculator, mini knowledge-base search, JSON extraction, and file writing) so the agent can reliably compute, retrieve guidance, produce structured outputs, and save artifacts as deliverables.

Copy CodeCopiedUse a different Browser

!pip -q install -U openai


import os, json, re, math, hashlib
from dataclasses import dataclass, field
from typing import Any, Dict, List
from getpass import getpass
from openai import OpenAI


if not os.environ.get("OPENAI_API_KEY"):
   os.environ["OPENAI_API_KEY"] = getpass("Enter OPENAI_API_KEY (hidden): ").strip()


assert os.environ["OPENAI_API_KEY"], "OPENAI_API_KEY required"


client = OpenAI()
MODEL = "gpt-5.2"

We install the OpenAI SDK and import only what we need to keep the notebook lightweight and reproducible in Colab. We take the API key via getpass() so it stays hidden and never appears in the notebook output or code. We then create an OpenAI client and set the model string once so the rest of the system can reuse it consistently.

Copy CodeCopiedUse a different Browser

KB = [
   {"title": "Agent Protocol: Execution", "text": "Use tools only when necessary. Prefer short intermediate notes. Always verify numeric results."},
   {"title": "Policy: Output Quality", "text": "Final answers must include steps, checks, and deliverables. Emails must include subject and next steps."},
   {"title": "Playbook: Meeting Follow-up", "text": "Summarize decisions. List action items with owner and due date. Draft concise follow-up."},
]


def _safe_calc(expr: str):
   allowed = set("0123456789+-*/().% eE")
   if any(ch not in allowed for ch in expr): return {"ok": False, "error": "Invalid characters"}
   if re.search(r"[A-Za-z_]", expr): return {"ok": False, "error": "Variables not allowed"}
   try:
       val = eval(expr, {"__builtins__": {}}, {"math": math})
       return {"ok": True, "expression": expr, "value": val}
   except Exception as e:
       return {"ok": False, "error": str(e)}


def _kb_search(query: str, k: int = 3):
   q = query.lower()
   scored = []
   for item in KB:
       hay = (item["title"] + " " + item["text"]).lower()
       score = sum(1 for tok in set(re.findall(r"w+", q)) if tok in hay)
       scored.append((score, item))
   scored.sort(key=lambda x: x[0], reverse=True)
   return {"ok": True, "results": [it for _, it in scored[:k]]}


def _extract_json(text: str):
   m = re.search(r"{.*}", text, flags=re.DOTALL)
   if not m: return {"ok": False, "error": "No JSON found"}
   try:
       return {"ok": True, "json": json.loads(m.group(0))}
   except Exception as e:
       return {"ok": False, "error": str(e), "raw": m.group(0)[:1500]}


def _write_file(path: str, content: str):
   os.makedirs(os.path.dirname(path) or ".", exist_ok=True)
   with open(path, "w", encoding="utf-8") as f: f.write(content)
   sha = hashlib.sha256(content.encode()).hexdigest()[:16]
   return {"ok": True, "path": path, "sha16": sha, "bytes": len(content.encode("utf-8"))}

We define a small internal “knowledge base” to simulate playbooks or team documentation that the agent can consult via a tool call. We implement tools that return structured dictionaries to keep tool outputs machine-readable and robust. We include a safe calculator, a keyword-based KB search, a JSON extractor for structured parsing, and a file writer to save final deliverables as artifacts.

Copy CodeCopiedUse a different Browser

TOOLS = {
   "calc": lambda expression: _safe_calc(expression),
   "kb_search": lambda query, k=3: _kb_search(query, int(k)),
   "extract_json": lambda text: _extract_json(text),
   "write_file": lambda path, content: _write_file(path, content),
}


TOOL_SCHEMAS = [
   {"type": "function","function":{"name":"calc","description":"Safely compute a numeric expression.","parameters":{"type":"object","properties":{"expression":{"type":"string"}},"required":["expression"]}}},
   {"type": "function","function":{"name":"kb_search","description":"Search internal mini knowledge base.","parameters":{"type":"object","properties":{"query":{"type":"string"},"k":{"type":"integer","default":3}},"required":["query"]}}},
   {"type": "function","function":{"name":"extract_json","description":"Extract and parse first JSON object from text.","parameters":{"type":"object","properties":{"text":{"type":"string"}},"required":["text"]}}},
   {"type": "function","function":{"name":"write_file","description":"Write content to a file path.","parameters":{"type":"object","properties":{"path":{"type":"string"},"content":{"type":"string"}},"required":["path","content"]}}},
]


@dataclass
class AgentState:
   goal: str
   memory: List[str] = field(default_factory=list)
   trace: List[Dict[str, Any]] = field(default_factory=list)


def chat(messages, tools=None, tool_choice="auto", temperature=0.2):
   kwargs = dict(
       model=MODEL,
       messages=messages,
       temperature=temperature,
   )
   if tools is not None:
       kwargs["tools"] = tools
       kwargs["tool_choice"] = tool_choice
   return client.chat.completions.create(**kwargs)


def run_tool(name, args):
   fn = TOOLS.get(name)
   if not fn: return {"ok": False, "error": f"Unknown tool: {name}"}
   try:
       return fn(**args)
   except Exception as e:
       return {"ok": False, "error": str(e), "args": args}

We register our Python tools in a mapping so we can call them by name during function-calling. We declare tool schemas so the model can call tools with correct argument structures. We define AgentState to store the goal, memory, and tool-call trace, allowing us to inspect what happened and debug the agent’s behavior. We implement a safe chat() wrapper that only includes tool_choice when tools are provided, preventing the 400 error you saw.

Copy CodeCopiedUse a different Browser

PLANNER_SYS = """You are a senior planner.
Return STRICT JSON with keys:
objective (string), steps (array of strings), tool_checkpoints (array of strings)."""


EXECUTOR_SYS = """You are a tool-using executor.
Use tools when needed. Keep intermediate notes short.
When done, return:
1) DRAFT output
2) Verification checklist"""


CRITIC_SYS = """You are a critic.
Given goal + draft, return:
- Issues (bullets)
- Fixes (bullets)
- Improved final answer (clean)"""


def plan(state: AgentState):
   r = chat(
       [{"role":"system","content":PLANNER_SYS},{"role":"user","content":state.goal}],
       tools=None,
       temperature=0.1,
   )
   txt = r.choices[0].message.content or ""
   parsed = _extract_json(txt)
   if not parsed.get("ok"):
       return {"objective": state.goal, "steps": ["Proceed directly (planner JSON parse failed)."], "tool_checkpoints": []}
   return parsed["json"]


def execute(state: AgentState, plan_obj: Dict[str, Any]):
   msgs = [
       {"role":"system","content":EXECUTOR_SYS},
       {"role":"user","content":f"GOAL:n{state.goal}nnPLAN:n{json.dumps(plan_obj, indent=2)}nnMEMORY:n" + "n".join(f"- {m}" for m in state.memory[-10:])}
   ]
   for _ in range(12):
       r = chat(msgs, tools=TOOL_SCHEMAS, tool_choice="auto", temperature=0.2)
       msg = r.choices[0].message
       tool_calls = getattr(msg, "tool_calls", None)
       if tool_calls:
           msgs.append({"role":"assistant","content":msg.content or "", "tool_calls": tool_calls})
           for tc in tool_calls:
               name = tc.function.name
               args = json.loads(tc.function.arguments or "{}")
               out = run_tool(name, args)
               state.trace.append({"tool": name, "args": args, "out": out})
               msgs.append({"role":"tool","tool_call_id": tc.id, "content": json.dumps(out)})
           continue
       return msg.content or ""
   return "Executor stopped (iteration limit reached)."

We define three role prompts to separate the agent’s responsibilities: the planner produces a structured plan, the executor performs the task and uses tools as needed, and the critic improves the final output. We implement a plan to request strict JSON, a loop to execute model calls, detect tool calls, execute them in Python, and then feed their outputs back to the model. This creates a true tool-using agent rather than a single-shot text generator.

Copy CodeCopiedUse a different Browser

def critique(state: AgentState, draft: str):
   r = chat(
       [{"role":"system","content":CRITIC_SYS},{"role":"user","content":f"GOAL:n{state.goal}nnDRAFT:n{draft}nnTRACE:n{json.dumps(state.trace, indent=2)[:9000]}"}],
       tools=None,
       temperature=0.2,
   )
   return r.choices[0].message.content or draft


def run_agent(goal: str):
   state = AgentState(goal=goal)
   state.memory.append("Use kb_search if you need internal guidance or formatting playbooks.")
   plan_obj = plan(state)
   draft = execute(state, plan_obj)
   final = critique(state, draft)
   return {"plan": plan_obj, "draft": draft, "final": final, "trace": state.trace}


demo_goal = """
From this transcript, produce:
A) concise meeting summary
B) action items as JSON array with fields: owner, action, due_date (or null)
C) follow-up email (subject + body)
D) Save output to /content/meeting_followup.md using write_file


Transcript:
- Decision: Ship v2 dashboard on March 15.
- Risk: Data latency might spike; Priya will run load tests.
- Amir will update the KPI definitions doc and share with finance.
- Next check-in: Tuesday. Owner: Nikhil.
"""


result = run_agent(demo_goal)
print(result["final"])

We implement critique to review the draft and produce a polished final response, using the Trace tool as additional evidence for debugging and accountability. We implement run_agent() to orchestrate the full loop: initialize state, plan, execute with tools, then critique and finalize. Finally, we provide a demo goal that forces a realistic deliverable: summary, structured action items JSON, a follow-up email, and saving output to a file via the write_file tool.

In conclusion, we implemented a practical agentic architecture that cleanly separates planning, tool execution, and critique-based refinement. We connected the model to real tools with strict schemas, recorded a transparent tool trace for debugging, and produced artifacts by saving the final output to a file. With this structure, we can extend the agent to more production-grade workflows by adding tool retry policies, parallel sub-agents, richer memory (vector + symbolic), and evaluation harnesses to measure how well the agent plans, uses tools, and improves output over time.

Check out the Full Codes with Notebook here. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

The post How to Build an Advanced Agentic AI System with Planning, Tool Calling, Memory, and Self-Critique Using OpenAI API appeared first on MarkTechPost.