design_doc · markdown

Codex Agent Candidate Assessment

Codex Agent Candidate Assessment Date: 2026 05 08 Status: Draft Owner: Justin / Atlas Codex Scope: RedKey platform agent strategy Summary RedKey should build Codex native agents, but not all named RedKey agents are equally good first candidates. Codex is strongest when the work is repo centered and verifiable: inspect files, make scoped edits, run tests, rea...

Date: 2026-05-08 Status: Draft Owner: Justin / Atlas-Codex Scope: RedKey platform agent strategy

Summary

RedKey should build Codex-native agents, but not all named RedKey agents are equally good first candidates.

Codex is strongest when the work is repo-centered and verifiable: inspect files, make scoped edits, run tests, read logs, update docs, open PR-ready changes, and respond to review. The first Codex agent should therefore operate inside the RedKey harness before it attempts broad product, content, communication, or external-system workflows.

Recommendation:

1. Build Team OS Gardener first as the lowest-risk persistent Codex-style agent. 2. Build Quinn next as the primary implementation/slice agent. 3. Build B2BEA Slice Auditor as a focused reviewer agent for Quinn and B2BEA governance work.

Atlas should remain the supervisor/persona layer. Jess and Mia are valid future agents, but they are better fits for OpenAI Agents SDK workflows with Gmail, Drive, Calendar, approvals, and persistent state than for the first Codex-native repo agent.

Decision Criteria

A good first Codex agent should have:

Repo-local work.
Clear input artifacts.
Clear file ownership.
Verifiable output.
Low external side effects.
Repeatable test/build/lint commands.
Small blast radius.
A natural review loop.
Strong fit with the RedKey Team OS and Harness standards.

A weaker first Codex agent has:

Heavy dependence on external communication tools.
Client-visible publication risk.
Ambiguous subjective output.
Weak automated verification.
Many approval paths.
Broad authority across projects or systems.

Candidate Ranking

| Rank | Candidate | Fit | Why | | --- | --- | --- | --- | | 1 | Team OS Gardener | Best first persistent agent | Low-risk, repo-centered, improves the harness itself | | 2 | Quinn | Best implementation agent | Natural Codex fit: code, tests, docs, PR-ready patches | | 3 | B2BEA Slice Auditor | Best reviewer agent | Focused governance checks with strong existing patterns | | 4 | Bezel API Harness Agent | Strong backend candidate | Good smoke/test surface and clear API invariants | | 5 | Priya | Useful later | Planning/artifact synthesis depends on Studio retrieval maturity | | 6 | Jess | Not first | External workflow agent with Gmail/Drive/Calendar and approvals | | 7 | Mia | Not first | Content workflow agent; subjective output and publication risks | | 8 | Atlas | Do not convert first | Should remain supervisor/persona/router, not first autonomous agent |

1. Team OS Gardener

Purpose:

Keep RedKey context legible.
Scan Team OS docs, project routing, state files, specs, and plans for drift.
Open small patches to improve cross-links, freshness, and routing clarity.

Why first:

It exercises the harness without risking product code.
It improves the substrate future agents depend on.
It has low external side effects.
Its output is easy for Justin or Atlas to review.

Inputs:

TEAM_OS.md once created.
project-refs.yaml.
docs/state.md.
docs/specs.
docs/plans.
docs/team-os.
.codex/atlas-codex/OPERATING.md.
AGENTS.md.

Allowed actions:

Read repository docs and config.
Propose or patch documentation fixes.
Flag contradictions between state, project refs, and specs.
Add missing cross-links.
Suggest stale-state cleanup.

2. Quinn Codex Agent

Purpose:

Execute small implementation slices from approved specs or plans.
Make scoped repo patches.
Run verification.
Produce PR-ready summaries.

Why second:

Quinn is the clearest Codex-native role.
Implementation work has natural verification loops.
The Team OS Gardener should improve the context Quinn depends on before Quinn becomes persistent.

Initial scope:

B2BEA runtime-governance slices.
Bezel API hardening tasks.
Focused test/build/doc patches.

Inputs:

Approved implementation plan or Studio build-execution artifact.
Repo-specific AGENTS.md.
Relevant specs and tests.
Harness standards.

3. B2BEA Slice Auditor

Purpose:

Review B2BEA website slices against existing governance rules.
Catch drift before merge.
Serve as a focused reviewer for Quinn or Atlas-Codex work.

Why third:

B2BEA already has strong slice history and governance patterns.
Auditor behavior can be narrow and repeatable.
It improves quality without initially granting implementation authority.

Checks:

No rogue CSS or design-system violations.
Runtime JavaScript moved to governed assets when required.
Inline runtime migrations preserve behavior.
Focused tests updated.
Full test/build pass recorded.
Studio artifact/build-execution references are present when required.
Route/auth/access behavior remains aligned with policy.

Allowed actions:

Read code and tests.
Run verification commands.
Produce review findings.
Optionally patch small doc/test expectation fixes after approval.

Bezel API Harness Agent

Good fit once RedKey wants backend reliability automation.

Possible responsibilities:

Run live and local smoke tests.
Check OpenAPI and SDK drift.
Verify claim lifecycle behavior.
Verify event leak boundaries.
Inspect migration state.
Propose small hardening patches.

This agent should wait until Team OS Gardener and Quinn establish baseline harness discipline.

Priya

Priya is a planning and artifact synthesis agent. Codex can support Priya-like work, but Priya should not be the first Codex agent because planning quality depends on clean Studio artifact retrieval and strict artifact routing.

Priya becomes a better candidate when:

Studio artifacts have reliable read/write helpers.
Product specs have stable schemas.
Agent outputs can be validated mechanically.
Artifact routing is enforced enough to prevent repo-doc leakage.

Jess

Jess Podcast Coordinator is a better fit for an OpenAI Agents SDK workflow than a first Codex-native repo agent.

Reasons:

Needs Gmail, Calendar, Drive, guest context, and approval flows.
Produces client/human-facing communication.
Has external side effects.
Needs durable state across scheduling and prep workflows.

Jess is a good future SDK-agent candidate after approval gates and connector tools are well defined.

Mia

Mia Content Creator is also better as a future workflow/content agent than the first Codex agent.

Reasons:

Output quality is subjective.
Publication risk is higher.
Verification is less mechanical.
Needs brand voice, content calendar, review, and approval workflows.

Mia should wait until content review and approval gates are encoded.

Atlas

Atlas should not be converted into the first autonomous Codex agent.

Atlas is the supervisor/router/persona layer:

Loads operating context.
Routes project work.
Applies memory protocol.
Helps Justin decide what to do.
Coordinates tools and agents.

Turning Atlas into the first autonomous agent would blur supervision and execution. Atlas should remain the control layer while narrower agents do bounded work.