platform-notes

February 26, 2026 • 7 min read

Why Your Internal Platform Is Competing With ChatGPT

Conversational AI changed developer expectations: platform teams now compete on interaction cost and flow, not just capability.

I noticed it on a Tuesday, five minutes before standup.

I opened our internal platform docs, typed “create a new service with standard logging”, and got the familiar result: a wiki page last edited “recently” with twenty links and a diagram nobody could explain anymore.

Then I opened ChatGPT and asked the same thing.

In less than a minute I had a working scaffold, a logging config, and a short explanation of why our existing template was failing in Kubernetes. No ticket. No waiting.

That’s when it clicked. Our platform wasn’t competing with other platforms. It was competing with immediacy.

The promise, and the new baseline

Internal platforms are supposed to reduce accidental complexity. Golden paths for service creation. Standard observability. Secure networking. Repeatable CI.

The hidden trade is friction. Safety and consistency require opinionated abstractions. Abstractions create steps. Steps create context switches. Context switches kill momentum.

Conversational AI changed the baseline. Engineers now have a default interface that answers quickly, in the shape they asked for, without making them learn your portal’s mental model first.

Even when the answer is imperfect, it keeps them in flow. Flow has become the developer experience metric that beats everything else.

Why it matters

When engineers bypass your platform, they don’t just skip docs.

They skip the operating model: templates, policy checks, sanctioned libraries, incident patterns, cost controls. You get short term velocity and long term fragmentation.

You can see it in soft signals:

  • Doc page views decline even as you “improve” them.
  • Support channels get quieter, not because things are better, but because people stopped asking.
  • Retros start including “we copied a snippet from a chat” and “nobody owns this module”.

None of that looks like a platform outage. It looks like entropy.

Constraints you can’t wish away

Most platforms operate under the same constraints:

  • Security and compliance need audit trails and least privilege.
  • Legacy systems are real and weird for historical reasons.
  • Budget and headcount are finite, so scaling must come from automation.
  • Teams have mixed maturity, so you need guardrails and escape hatches.

Now add the newest constraint: developers expect a conversational workflow.

If your platform can’t participate in that workflow, engineers will route around it.

Our first attempts, and why they failed

We did the obvious things first.

We “fixed the docs”. Better structure. Better “Start Here” pages. More examples. Engagement still fell.

We “fixed the portal”. Fewer fields. Better defaults. A wizard. Adoption rose briefly, then plateaued. Seniors still bypassed it. Newer engineers still hit edge cases and ended up in support anyway.

The dead end was thinking our problem was UI polish.

The real problem was interaction cost. ChatGPT wasn’t better at platform tasks. It was better at keeping people moving.

The turning point: AI became the new developer UX

The turning point came from a boring correlation exercise.

When LLM usage (measured in anonymized, aggregated signals) spiked, platform portal usage dipped. People were getting answers elsewhere.

That gave me a cleaner framing: internal platforms used to compete on capability. Now they compete on conversational ergonomics.

So the adaptation isn’t “add a chatbot to the portal”. That’s mostly cosmetic.

The adaptation is to treat the platform as an execution layer that conversational tools can call.

Recent “tool protocol” discussions, including MCP style tooling, are interesting for this reason. The protocol will change. The architectural direction is stable.

Designing for conversational workflows

A conversational workflow is a loop:

  1. Ask in natural language.
  2. Get a plan plus a small set of actions.
  3. Execute those actions against real systems.
  4. Observe results and iterate.

If your platform is only a portal and docs, it can’t execute in that loop. The LLM will fill the gap by generating scripts, Terraform, YAML, or kubectl commands, and engineers will run them manually.

That’s where drift and policy bypass show up.

The better move is to expose your platform capabilities as explicit tools: well defined inputs, deterministic outputs, enforcement built in, and an audit trail by default.

Implementation: make the platform own the verbs

We started with three high leverage verbs:

  • Create a service scaffold with approved defaults.
  • Create or update an environment with required tags and guardrails.
  • Wire observability with standard dashboards and alerts.

Then we exposed them as a small internal “platform tools” API. Multiple conversational surfaces could call the same primitives: chat, IDE assistants, or a CLI wrapper.

The principle was strict: the assistant never decides permissions. Humans do. The platform enforces.

Tool contract

We wrote explicit tool schemas (OpenAPI or JSON Schema works). Here’s a simplified example:

{
  "name": "create_service",
  "description": "Scaffold a new service using approved templates and defaults",
  "input_schema": {
    "type": "object",
    "properties": {
      "service_name": { "type": "string" },
      "language": { "type": "string", "enum": ["python", "node", "dotnet"] },
      "owner_team": { "type": "string" },
      "exposure": { "type": "string", "enum": ["internal", "public"] }
    },
    "required": ["service_name", "language", "owner_team"]
  }
}

This forces decisions you might have kept informal: naming, ownership, exposure classification. Those decisions are the platform.

Tool handler with policy and audit

A handler is where your platform stops being advice and becomes an enforcement point:

from dataclasses import dataclass
from typing import Literal
import re

Language = Literal["python", "node", "dotnet"]
Exposure = Literal["internal", "public"]

@dataclass(frozen=True)
class CreateServiceRequest:
    service_name: str
    language: Language
    owner_team: str
    exposure: Exposure = "internal"

SERVICE_RE = re.compile(r"^[a-z][a-z0-9-]{2,30}$")

def create_service(req: CreateServiceRequest, actor: str) -> dict:
    if not SERVICE_RE.match(req.service_name):
        raise ValueError("service_name must be kebab-case, 3-31 chars")

    if req.exposure == "public" and req.owner_team not in allowed_public_teams():
        raise PermissionError("team not approved for public exposure")

    repo_url = scaffold_repo(req.service_name, req.language, owner=req.owner_team)
    emit_audit_event(action="create_service", actor=actor, resource=req.service_name,
                     metadata={"repo_url": repo_url, "owner_team": req.owner_team})

    return {"repo_url": repo_url, "next_steps": ["open_pr", "run_ci", "deploy_dev"]}

Two things matter here:

Guardrails live in the execution path, not in a wiki.

Every action emits an auditable event tied to the human identity.

Without those, you’ve built a faster bypass, not a better platform.

Validation: what we measured

We avoided making up benchmarks. We tracked signals we could trust:

  • Time to first successful scaffold.
  • Support pings for “how do I start”.
  • Policy violations (missing tags, wrong exposure, missing observability wiring).

What improved first wasn’t speed. It was consistency. The assistant handled intent and clarification. The platform applied defaults deterministically.

Support load dropped because people stopped asking “where is the doc” questions. Security trust improved because audit trails were automatic.

Tradeoffs and alternatives

Portal first with an “AI search” box helps discovery, but it doesn’t fix execution drift.

Letting the LLM generate raw Terraform and hoping policy catches it later optimizes for speed, but it turns rollback into a blame game.

Banning LLMs is usually unenforceable and pushes work into the shadows.

“Platform as tools” balanced speed and correctness for us, but it comes with real costs: schema versioning, backward compatibility, careful deprecation, and proper SLOs for the tool API.

Production hardening: treat it like an execution surface

Once you connect conversational interfaces to real actions, you inherit a new class of risk.

  • Authorization must be external to the model. Never let the assistant infer permissions.
  • Destructive actions need explicit confirmation with a clear summary.
  • Rate limits matter. A helpful assistant can become an efficient outage generator.
  • Dry run modes (“plan”) are worth the time.
  • Tool calls need observability like any other production API: structured logs, metrics, traces.
  • Fallback behavior must be safe. When a tool fails, the assistant should not improvise a workaround that bypasses policy.

The misconception to avoid is thinking this is “just a bot”. It’s an execution surface.

Lessons that held up

  • Flow beats features. If your platform interrupts flow, people will route around it.
  • Docs are not a primary UX anymore. They’re reference material the system should retrieve, not a maze humans must navigate.
  • The platform should own the verbs. Humans can express intent. The platform executes safely.
  • Guardrails must move into the handler. Guidelines don’t scale when speed increases.
  • Measure friction, not satisfaction. The real signal is what engineers do when nobody is watching.

Closing reflection

I still care about clean docs and a usable portal. But I no longer think they’re the center of platform value.

The center is whether the platform can safely do what engineers want, in the interface they will actually use.

ChatGPT didn’t replace our platform. It replaced the waiting and wandering around it.

If you’re on a platform team, you don’t need to fight that shift. You can harness it.

Expose the verbs. Make them safe. Make them observable. Make them composable.

Then let the conversation happen where engineers already are.

Final takeaways

  • Competing with conversational AI is mostly competing with interaction cost, not capability.
  • Treat the platform as an execution layer with tool contracts, not a portal with pages.
  • Put policy and audit in tool handlers, not in guidelines and review queues.
  • Measure friction in real workflows, then design for flow without losing control.

Related posts