Anyone doing creative work in 2026 should be navigating at least two parallel systems at once: your own mind and a collection of AI models. If you’re like me, you’ve built habits around using these tools like ChatGPT and Claude as instruments that interact with your brain.

Writers, especially, tend to have a very ambivalent relationship with AI. Some are resistant, skeptical, or just completely burned out on the discourse. At the same time, many use these systems every day to brainstorm, test ideas, sketch scenes, or explore new angles on our characters and worlds.

One of the things I’ve noticed over time is that almost everyone seems to talk up “thinking” models, the ones that deliberate more, use chain-of-thought, or internally “break down the problem, as inherently superior. And yet, in my actual creative workflow, the models that help me most when I’m drafting and playing are almost always the instant, low-deliberation ones. I find they introduce chaos and risk that is far more creative than what the thinking models have to offer. As a writer with a background in psychology, that fascinates me.

And once you look at both human cognition and AI from the angle of probability and variance rather than just “intelligence,” it starts to make sense why that might be true. If you’re a ChatGPT user missing 4o’s creativity, let me convince you to give instant models a try.


The mind as a probabilistic machine

The simplest way I’ve found to think about this is to treat the mind as a probabilistic machine.

Underneath all the human mess (emotion, memory, mood, dopamine chasing, all of it) your brain is constantly doing something very mechanical: it is predicting what might come next and selecting among options. You rarely “decide” in a single leap. You sample possibilities, consciously or not, and then some of them stick.

That means your creative process is not linear. It doesn’t move from Idea A to Draft B in a straight line. It drifts through a space of half-formed impressions, scraps of sensory detail, phrases you’ve heard before, and things you’re trying not to think about but absolutely are. Your mind doesn’t generate one idea. It generates a distribution of possible next steps and then collapses that into something you can type.

Large language models (LLMs) do something structurally similar. They take in context, then at each step sample from a probability distribution over possible tokens. They don’t “know” the right next word. They choose one based on how likely it is in the model’s internal representation of language. Then they do that again. And again. And again.

Once you see both systems as sampling engines, the question shifts from “Which is smarter?” to “What kind of sampling is best for what I’m trying to do?”

That’s where overthinking comes in.


Human overthinking as premature convergence

When we talk about “overthinking,” we usually mean something like this: replaying the same scenarios, worrying about whether an idea will work, needing just a bit more clarity before starting, and somehow never starting. It feels like being thorough. Psychologically, it often looks more like what the research calls rumination: repetitive, sticky thought loops that don’t move you toward action and tend to reduce cognitive flexibility over time.

For creative work, overthinking almost always collapses into some version of premature convergence. You try to evaluate the quality of an idea before you’ve actually produced the messy version of it. You won’t draft the scene until you’re sure it will “count.” You won’t let yourself write the messy version first. Your internal editor steps in early, demanding neatness at a stage that fundamentally needs noise and experimentation.

In practice, that means you never really allow yourself to sample the distribution. You optimize nothing. You plan for a draft that doesn’t exist.

If you’ve ever sat with a blank page while your brain produces a thousand reasons not to write, you’ve felt this. The problem isn’t that you don’t have ideas. It’s that your evaluation mode is running at full power, and evaluation is supposed to come after divergence, not before it.


How AI can overthink in parallel

AI systems don’t ruminate emotionally, but they can still overthink in a way that has similar effects on your process.

On many platforms, there’s now a split between instant / low-reasoning modes and deliberative / high-reasoning ones. The latter will internally generate reasoning tokens, break a problem into steps, consider multiple approaches, and apply extra checks before answering.

That’s amazing for multi-step reasoning, technical explanation, or consistency checking. Those are tasks where more internal structure is exactly what you want.

But if you ask a reasoning-heavy mode to “just write the scene,” you’ve probably noticed what happens. Instead of dropping into sensory detail and voice, it may start by narrating its plan:

“First I’ll establish the setting, then I’ll introduce tension through a small anomaly…”

It means well. It thinks it’s being helpful. But what it’s actually doing is pulling you into meta at the exact moment when you’re trying to enter a scene.

If you already have a tendency to overplan and overjudge your own work, pairing that with a model that also wants to pre-plan everything is a recipe for joint paralysis. You end up in a loop where you’re managing the model’s thought process instead of reacting to actual prose.


Where guardrails and therapy-voice enter the picture

There’s another layer to this, one that writers talk about constantly but rarely articulate precisely: the therapy voice that sometimes surfaces in AI outputs.

Modern reasoning-heavy models aren’t just more deliberate; they’re also more cautious. Safety training emphasizes:

  • emotional grounding

  • supportive language

  • hedging and validation

  • avoidance of dark or risky tonal choices

  • checking whether “you’re okay”

This is appropriate for users who are distressed or vulnerable, sure, but I am a grounded adult that isn’t going to be harmed by anything an AI says. If you’re writing dark fiction or experimenting with morally ambiguous characters, that tonal stability becomes a form of interference.

The “supportive counselor” persona shows up most reliably when the model is using deeper internal reasoning. The more it deliberates, the more it tries to protect you from your own material. It may reassure you when you don’t need it, soften tension in scenes that need sharpness, or misinterpret fictional intensity as personal distress.

Instant models tend to sidestep this because they do less internal scaffolding. They follow your tone rather than negotiating with it. They don’t stabilize emotions that weren’t unstable to begin with. They don’t interrupt a character’s meltdown to center your emotional wellbeing.

It’s the difference between play and policing.


Instant vs. thinking models as different phases of cognition

When I looked more closely at how these systems are designed, a pattern from cognitive psychology and creativity research dropped into place: the classic split between divergent and convergent thinking. Generative phases are about producing a wide range of options; evaluative phases are about narrowing, critiquing, and refining.

Instant models behave more like a divergent-thinking engine. They emphasize speed and directness. They don’t spend as much compute on internal scaffolding. They tend to follow “just do it” instructions literally. And critically, they usually give you more access to stochasticity, which refers to the randomness in how they sample their next tokens.

Reasoning-heavy models behave more like convergent-thinking engines. They interpret your request in structured ways. They try to break it down into sub-problems, handle constraints, and avoid error. They tend to stabilize tone, hedge more, and are optimized to be careful. That is a feature, but it’s not a feature you want at every moment.

Once you understand that, the mismatch becomes obvious. Drafting a scene, generating metaphors, or exploring a character’s voice is fundamentally a divergent task. It thrives on variance and surprise. Outlining a plot, checking continuity, or analyzing the implications of some fictional technology is fundamentally convergent. It benefits from deliberation and constraint.

We get in trouble when we hand a divergent task to a convergent engine and then wonder why everything feels flat and slow.


Stochasticity: why instant often feels more “creative”

The missing piece in a lot of discussions is how instant models actually use randomness, and why that matters for creative work.

Language models don’t just produce the single most likely next word every time. If they did, you’d get repetitive, boring output very quickly. Techniques like temperature and top-p (nucleus sampling) deliberately introduce controlled randomness into the process. By sampling from the “tail” of the probability distribution instead of always picking the peak, the model can explore less obvious continuations while staying coherent.

In practice, that means instant models, which often give you more direct access to these parameters and aren’t being implicitly dampened by extended reasoning, can be more stochastic where it counts. They take more small risks in word choice, imagery, and metaphor.

You can see this difference in style even without touching settings. A high-reasoning model might describe a scene as:

The street signs looked slightly off. The bakery smelled different, as if something had been changed.

An instant model, with a bit more stochastic freedom, might give you:

The street signs had new vowels. The bakery exhaled a smell like burnt orange peel and damp paper.

Both are plausible. Only one of them feels like something you might keep.

The point isn’t that instant models are “better writers.” The point is that they are less deterministic in ways that matter for creativity. They don’t spend as many tokens on pre-planning and hedging; more of their capacity goes directly into sampling from the space of possible continuations.

If your brain already has a strong taste filter, and if you’re the kind of person who will immediately know which lines resonate and which don’t, you don’t need the model to protect you from bad options. You need it to show you possibilities.

Instant models do that. They bring the chaos.


Why fast and feral works for creative work

When I say I find instant modes better for my creative work, I’m describing a match between mode and phase.

When I’m doing big structural thinking for worldbuilding, political arcs, continuity audits, I absolutely use more deliberative models and sometimes higher reasoning effort. There, I want the model to behave more like a slow, cautious analyst. And occasionally? Instant is still more fun for this.

And when I’m in the business of making things up, I definitely want volatility. I want to see what happens if I let the model run fast and feral, and then I decide which sparks to keep. I don’t need the model to pretend it’s my conscience. I need it to be my loud, weird, occasionally brilliant co-conspirator who throws out too many ideas.

At one point, GPT-4o did this for me and many others. But long before 4o was removed from ChatGPT, GPT-5.1 Instant became my go-to for chaos, risk, and magic. I will be disappointed if 5.1 is removed and future models do not offer me a satisfying replacement as I find 5.2 overly cautious. Every model of Claude has so far been overly cautious and less creative for me as well, but this is a newer part of my workflow.

Given everything we know about divergence, convergent thinking, cognitive load, and rumination, it would actually be surprising if the high-deliberation modes felt better for most people at the generative stage. It’s just a bad fit between the tool’s default behavior and the job you’re asking it to do.


Separating generation from evaluation

The practical takeaway here is simple, but not always easy to stick to: separate generation from evaluation in your mind and in your choice of models.

Use instant models when you need:

  • chaos

  • speed

  • raw material

  • surprising phrasing

  • tone exploration

  • multiple options you can sort through

Use thinking models when you need:

  • structure

  • logic

  • continuity checking

  • sensitivity around complex topics

  • explanation and argument

If you catch yourself editing the prompt for ten minutes instead of reading any output, or asking the model to explain how it will solve the task instead of just solving it, that’s the joint human–AI version of rumination. That’s the moment to switch modes, or at least to demand: “No outline, no explanation. Just give me the thing.”


Where this goes next

I want Neural Ecstasy to be a place where we treat both human minds and AI models as probabilistic systems, look at how they mirror each other, and talk about what happens when we put them in the same creative loop.

In future posts, I’ll dig into specific models and their differences as a user of multiple models and platforms. I’m interested in the controversies around different models and social phenomenon like users calling their emotional support AIs “abusive” and “gaslighting,” which are words with specific meanings for human behavior. I read the online discourse religiously and have a lot to say about it all, though I am utterly uninterested in sentience discussions and any type of doomerism.

For now, the only point I want to make is this:

If instant models feel better to you when you’re drafting, that’s a sign you’re intuitively matching a high-variance tool to a high-variance phase.

Thinking has its place. So does not thinking quite so hard.

Sometimes, the more you think, the more you sink, whether you’re a human staring at a blank page or a model burning reasoning tokens before generating the first sentence.

So if you’re looking for feral chaos and risk rather than excessive guardrails, and especially if you miss GPT-4o dearly, try an instant model. You may be surprised.