The Eloquent Trap

Hidden costs, subtle traps, and eloquent lies: why devs need to use LLMs with care — and managers need to know the difference.

Sep 24, 2025

A Short History of AI into the (mechanical) Keyboard"

It's been less than three years since ChatGPT made its debut to the general public, which fundamentally changed (or even created) the rapport that society has with Artificial Intelligence. Platforms powered by Large Language Models (or LLMs for short) like ChatGPT itself, Claude, and DeepSeek are close to ubiquitous at any workplace, officially or not. And almost instantly companies realized they seemed to be pretty good at writing code.

Now, several products focused on supporting software development are available on the market, like GitHub Copilot and Cursor. The first is provided by the largest code platform in the world, trained over billions of lines of open-source code (probably violating many licenses along the way), and is backed up by some serious productivity claims. The latter is more like a full IDE featuring different ways of operating the LLM models available (e.g. agent mode), but with no less impressive claims associated.

And the results of this problem space are anecdotally sound: I don't know any software engineering team that does not make extensive use of at least one LLM-based coding product right now.

I don't want to refute the productivity claims right away — a complex and nuanced topic, where the definition of the word alone deserves a full article. And I'll be the first to admit that these products feel good; it is beyond satisfactory to see that massive volume of code being yielded right in front of your eyes, with several micro-decisions that you would have to take, all elegantly put. Few adjustments, some copy and paste, quick check on docs and voilà, you have something running.

But amidst all these small tasks, I've noticed several productivity traps, which can substantially harm (if not kill) all potential productivity benefits. Most traps are inherent limitations of the current state of the art on LLMs, and some others are not fundamental constraints but are really hard to solve due to the nature of how these enormous language models are trained.

In my experience, the only way to effectively introduce AI-based coding into your workflow is if you understand the basic constraints attached to the underlying tech. Then you will be able to operate it as a particularly ingenious autocomplete; not because it's capable of advanced reasoning or because it knows about software engineering. Its main strength is being particularly effective at speaking our language. It won't take away the responsibility of thinking from the human pressing the buttons, but it can be an efficient tool to translate instructions in natural language to code.

My Experience So Far

The Beginning – Copilot

I've started using (and paying for) GitHub Copilot in August 2023, just after it became generally available (GA). The autocomplete mode (where you start typing a comment or a line of code and it spits out the rest for you) was basically the only one available, and I kept it on for most of my code development. Especially working on situations where I know enough about the problem, I always missed the ability to provide more context to the model — such as which packages to use, what are the logical steps, and fine-grained implementation details.

I've started using ChatGPT to be able to provide more context along the existing code and refine the output through successive prompts. It ended up replacing that other search engine, posing as an easy way of seeking information from docs and public forums with a simpler language, less focused on keywords and less dodging from ads. In a way, I was trying a rudimentary version of what would be called agent mode just a bit later. But still, I'd lose a lot of productivity by copying and pasting code for each interaction. And after a while, as the context window grows larger, the usefulness of the answers declines, presenting more hallucinations that would make it hardly worthwhile.

Cursor

It shortly became clear to me that, in order to produce meaningful code for non-trivial problems, you need several interactions. Work from a simpler idea and walk through the AI agent towards the desired result. Each interaction was painful (lots of copy and paste), and I noticed most of the alleged productivity was probably fading away just through that friction.

That's where Cursor got my attention. The AI code editor promised AI features deeply integrated with the code editing view by seeking out existing files and adding them to the context, allowing you to switch between models and apply changes (like creating new files) or running commands. To be clear: it's not the underlying LLM that's better; it's actually the same you'll find in other products. But the integration changes the game. Reducing friction and removing the constant context-resetting allows the model to actually assist instead of being a very articulate clipboard.

The Pitfalls

1. Hard-to-Catch Errors (The Eloquent Trap Itself)

LLMs write code that looks right. The formatting is clean, the naming is convincing, and the logic seems elegant. But elegance and correctness are not the same. And because the output sounds like it’s coming from a senior engineer — full of best practices, comments, clean abstractions — it’s especially deceptive. The more eloquent it is, the more confident you feel in trusting it, even when the result is broken.

If you're not an expert in the domain, you're unlikely to catch those issues until they hit you later — in production or in debugging. Worse, you might never fully understand why something went wrong, because it seemed so "well-written" in the first place.

2. It Won’t Tell You “No”

LLMs are trained to agree. Your input — your phrasing, assumptions, biases — steer the output heavily. Ask it, "Why is PostgreSQL not scaling well in our setup?" and it’ll offer reasons why, even if that's not the real issue. Phrase the question with a bias, and the model will confidently lean into your framing — not challenge it.

This can be particularly dangerous in decision-making. If you're exploring tradeoffs or trying to think critically, you may unknowingly trap yourself in an echo chamber of your own initial assumptions, reinforced by a confident model that never says "that might not be true." You have to get used to ask "what could go wrong" and make it explicitly. Make the opposite questions and set the eloquence pointed out to disprove your initial premises.

3. Context Size Limitations

Context windows are increasing — we’re seeing 128k+ token models on the market. But that doesn’t mean the models can understand 128k tokens with perfect recall. And in most real projects, you’re not only dealing with volume — you’re dealing with similarity. Utility functions, repeated patterns, common imports — all that repetition makes it hard for the model to isolate what matters.

Efficient prompting and good tooling around LLMs is less about more context, and more about smart context selection. That’s what most interfaces still struggle to get right. Without structure, larger contexts just mean more noise.

4. AI Replies in Team Communication

Here’s a trap I’ve seen more often lately: using an LLM to write a response in code reviews, Slack threads, or architecture debates — and not telling people it came from a model.

The problem? You’re outsourcing your thinking, and presenting the result as your own. If the message is full of technical jargon, structured like a real argument, and delivered confidently, your teammates will assume it's a well-thought-out position. But it’s not — it’s eloquent noise.

This shifts the burden of proof unfairly. Now others must either challenge a ghost or waste time validating a position that shouldn’t have made it into the discussion. You’ve gained writing efficiency, but lost collective decision-making quality. Worse, by hiding the LLM’s authorship, you make the response harder to critique — people assume you understand it and have reasons for it. A hallucination is easier to correct when it's flagged as such.

5. Predictions & Estimations

LLMs are not good at estimation. They can output numbers, sure. But these are not based on computation, benchmarking, or even rational heuristics — they’re pattern-matched from text. The statistical engine behind LLMs doesn’t “calculate” time or effort — it surfaces plausible-sounding completions based on historical examples, which may be wildly different in scope or constraints.

Ask it: "How long would it take to build a payment system using Stripe?" You might get “around two weeks” as a reply. But is that for a solo dev? A team of 10? With or without fraud checks? Is it considering legal, QA, CI/CD? It’s just... filler. Text that sounds like an answer, but isn’t one.

Use models for pointing out what you can't forget about when estimating, not sizing the effort.

6. Copy and Pasting Is a Bottleneck

Every time you’re copying and pasting between your editor and a chat window, you're losing time and value. You're losing the chance for the LLM to analyze your actual file tree, your build errors, your type errors, your lint results. It can’t respond to the feedback loop unless it lives in your dev environment.

Agentic tools like Cursor, Continue, or Codeium allow the model to actually edit your codebase, inspect your project structure, run tests, analyze compiler output, and apply batch edits across files. And the speed of iteration is night and day. You go from 30–60s per prompt+paste+edit to 3–5s actions, sometimes even batched.

The quality of the model matters less than the interface around it. The real productivity gain comes not from better code, but from a smoother loop. And it highlights how friction kills utility — the more fluent your tool is inside your workflow, the more likely you are to use it well.

Final Thoughts

LLMs are a new layer of abstraction in programming — but not one that takes away thinking. It’s still you pushing the code. It’s you doing the review. It’s your name in the PR, and your team that will own the bugs.

Never transfer decision-making ownership to an LLM. It doesn’t have responsibility, and it doesn’t know your project’s context or constraints. It doesn’t care about security, latency, deadlines, or teammates. You do.

Treat LLMs like fluent but unreliable assistants: good at generating scaffolding, less good at deep judgment. And above all, remember — eloquence is not truth. Clean syntax is not a guarantee. A great model can sound like a senior dev, but underneath it’s just a very clever mirror.

If you don’t catch the trap early, it will slow you down in the exact name of productivity.

Some of these limitations will improve as tooling and interfaces catch up. Better context management, in-editor agents, deeper IDE integrations and a myriad of MCP APIs surfacing will minimize friction and help align the model's strengths to developer needs. But even then, don’t let the agent go wide without guidance. Prepare your prompt and context carefully — the model can't know what matters if you don't tell it.

And managers: invite your teams to explore LLMs. Don’t just ask “why did this take 10 hours if Cursor could do it in 1h?” Not all code will be shipped faster and code writing was never a good proxy for productivity anyway. Instead, fund the tools that make real productivity possible. Support the learning curve. Build a culture of understanding how and when LLMs make sense. If you want your team to benefit from this shift, you have to create space to experiment — and space to get it right.

*Cover Illustration by Mila Aguiar

Vinny’s Substack

Discussion about this post