In my previous post, I went over my history with AI-assisted coding and some general lessons learned. While you can definitely read this on its own, the extra context might be helpful.
Having said that, I spend a lot of time working with AI tools to build things, and I’ve said it before - both the models and the tools are getting better, making them more and more capable. At the same time, creating a general sense of what these models are and aren’t capable of and how to communicate with them are of great help.
AI Tools are improving itself
A few days ago, Microsoft’s CEO Satya Nadella pointed out how they’re using ChatGPT’s o1 model with Copilot to improve Copilot, make it slightly better/faster. The AI tool is “using itself” to improve. I don’t want to overstate this as some breakthrough, especially since other tools have been doing similar things for a while - take Aider’s (an open-source AI-programming tool) recent update having written 77% (!!) of its own code.
Now, this isn’t the version of AI recursively improving upon itself that some see as the point where AI will take off. For it to be that, it lacks a few things. For starters, it’s slightly improving the tool it uses, not the model itself. Second, it lacks initiative - these actions are being taken by the model after being asked by the user to do so. I do think this is a glimpse of the trajectory this will take. Maybe we never get the hyper-intelligent AI model that recursively improves upon itself, but I do think this is the earliest version that we’re getting.
Current State and Limitations
Talking about early versions - I think that’s fair to say for AI-assisted coding (or AI pair programming) in general. There are a lot of tools out there that all do things in slightly different ways, and we’re all figuring out what the best strategy is when talking to AI models, and how we’re dealing with limitations.
I think one claim is safe to make: There has NEVER been a tool that is as capable as LLMs at turning natural language into actual code. We can discuss whether Claude’s Sonnet 3.5 or OpenAI’s o1 models are best, but I think that’s beside the point. It’s pretty incredible to see, but there are still some key limitations:
Context: Despite recent massive increases in context windows, we still can’t simply feed entire codebases to AI models. At roughly 10 tokens per line, even a 60k token context only covers about 6,000 lines of code. While this is a huge improvement from before, it’s insufficient for medium-large codebases. Tools are improving though - Cursor has made great steps in this direction, but we’re not quite there yet.
Lazy coding: We’ve seen big steps here in the last 6 months, in two ways:
Models are getting better at full outputs: o1 models can consistently output much larger code blocks (700-800 lines reliably, potentially up to 2000)
Tools are getting smarter: Aider uses search-and-replace patterns while Cursor uses a fast specialized model to compare code line-by-line
Code quality: This might be the most important factor. If you’re new to programming and want to build something (think snake that plays itself), the AI can already vastly outperform you. Take Bolt.new or Replit Agent - both can build basic apps across multiple files with just a few instructions. But when you go deeper into system architecture and larger codebases, the limitations become clear. While models can write functioning code, they often miss important architectural considerations and struggle with complex design patterns.
Hallucinations: Models sometimes reference non-existent functions, modules, or APIs, especially with specialized libraries. This seems to happen less with better models, but all tools can do is make CTRL + Z as easy as possible.
Being up-to-date: Even the newest models lag behind current tech trends and framework versions. Adding documentation in context or including webpages in these tools helps handle this.
Where Tools Are Heading
Tools are evolving in two interesting directions:
1. Working Alongside You
Models are getting much better at outputting complete solutions, and tools are getting smarter at breaking down large changes. Cursor is tackling context management with a fast reranker model that can filter 500k tokens down to the most relevant 8k. They’re also expanding to predict your next action across the entire editor.
GitHub Copilot has been focusing on integration, adding features like searching across GitHub entities and new skills in VS Code. They’re also working on custom models for enterprise customers to better understand company-specific patterns.
Aider takes an interesting approach by splitting the process: using an “Architect” model to describe how to solve the problem, and an “Editor” model to turn that solution into actual code changes. This works really well with the o1-models combined with a different model like Sonnet or Deepseek, where o1 does the thinking, and the other model the actual writing of the code.
2. Building Everything For You
Tools like Replit Agent and Bolt.new are trying something completely different - instead of helping you write code, they’re trying to build entire applications from scratch. These tools are making coding more accessible: you describe what you want to build, and they handle everything from choosing the tech stack to writing the code.
Here’s the catch though - while these tools are impressive at getting something up and running quickly, you tend to hit a wall pretty fast. As your application grows in size and complexity, small changes start introducing more and more errors. It’s not just about writing code anymore - it’s about understanding the entire codebase, managing dependencies, and keeping everything consistent.
This really feels like the early days of two different approaches to AI-assisted development. We’re seeing rapid improvements in both directions - either making pair programming tools smarter and more context-aware, or building autonomous agents that can handle entire projects. It’s fascinating to watch, and it’s not clear yet which approach (or combination of approaches) will ultimately prove most effective.