4 Lessons Learned from 2 years of AI-assisted coding
This part kicks off a series where I dive into AI-assisted coding. In this first part I will dive into general lessons learned from 2 years of working together with AI to code.
Over the past two years I’ve done a bunch of different freelance jobs, where I’ve built automations that either leverage AI, or are built with the help of AI. I can’t really code, but with each improvement of LLMs and apps I create with the help of AI, I’m learning more and more, developing faster and being able to handle more complexity. Before I dive in the how, I’d like to share the journey I’ve been on with my projects over the last 1.5-2 years. It started off small, and then grew bigger and more elaborate over time, and the progress I’ve made can be distilled into a combination of four things:
LLMs getting better
Tooling around LLMs getting better
Learning how to work with LLMs
Learning about coding in general
One of the very first projects I did for myself was some basic classification of data using LLMs. A client had a scraped list of a few thousand homepages of all kinds of companies. They wanted to classify them, so you can easily filter on types of companies.
In the past you probably would have to classify them using keywords based on contents on their page, but with LLMs you can simply write a decent prompt, feed it with the contents of the webpage, and have an LLM decide how it should be classified. Might not be 100% perfect, but that was alright for this purpose.
So in short, I wrote a little script that went through the list, grabbed some text content from each webpage, passed it on to GPT-3.5-turbo, it would classify what type of company it was and store the result.
Simple, one time job that didn’t need to be optimized. Perfect to get started!
That little project alone taught me about coding with the help of an LLM, scraping and classifying data. This rolled over into some other projects, like a no-code chatbot I built which also classified user messages with the help of LLMs.
This got bigger and bigger, until I worked on a project which could monitor any news site, automatically check for new articles, classify each article based on just a prompt, and show it on a front-end where the end user would see the filtered/classified results. This project involved much more elaborate scraping methods, workers for managing long running tasks, schedules, efficient database operations.
Hell, this would get too expensive to run on a model like GPT-4 anyways, so I took to training a much smaller model built specifically for classification (BERT). I’d have ChatGPT write all the code for me, pass back any errors, get stuck, waste 6 hours going in circles, realize I’d need to learn how to use git, figure out the problem, learn to prompt around it, etc.
Basically, a LOT of trial and error. In the end though, my code worked. The News monitoring I just shared? I’ve had this code run for 6 months straight without issues.
Using AI to learn from my interactions with AI:
NotebookLM is a great tool, and in this case I used it for analysis. I exported conversation history from ChatGPT, Claude and Aider (usually my AI programming tool of choice) and loaded them into NotebookLM to analyze it at a large scale. So let’s dive into some of the results.
Learning how to effectively work with LLMs is in short, knowing its pitfalls as well as it's strengths. It's recognizing when you can pass a task on to an LLM and knowing it'll solve it, or when you're digging a hole. With how good these models have become at writing code, the quality of the output is for a large part dependant on the quality of the input. But how can you get that quality up? It’s dependent on how these models functions.
So I’d like to dive into a few of these pitfalls, starting with:
AI models “jump to solutions”
All of these language models share a similiar "trait", whether it's sonnet 3.5, GPT-4o or o1-mini, they all aim to please. This means that when asking for a solution, it's like they will almost always attempt to solve it straight away, even if you haven't given the right context, parts are unclear or there's ambiguity in the question.
When working on some data-analysis I notified the model that somewhere along the way I received an error that a column was missing. A change was introduced and some names were messed up. I notified the model that I was receiving the error. The proposed solution? Add a check to see if the column is there, if not, log it and continue with the rest of the code. At least the code didn’t break at that point, but it did not solve my problem.
These sorts of moments make you realize that there’s the model seemingly missed the intent behind my message. I wanted it to make sure it referred to the right column, which I should have said. Sometimes, the model picks up on the intent behind a message, sometimes they don’t. Getting used to giving context and sharing what the end goal is, make the chances of the model outputting proper code go up signifcantly. On top of that, be as specific as you can be.
The unintentional Few-Shot prompt
A few-shot prompt is when you give examples of input and output to the model. This generally is a great prompting technique to get consistent results out of a model. However, this can also be introduced by accident where the model gets stuck in a bad pattern.
This can occur in different ways, and one of the most common ones is the model getting stuck in a loop trying to fix a bug. So there’s a bug, and you need the model to solve it. It has access to the file, the error and you request it to propose a fix. the model proposes a solution, that doesn’t work, so you point that out, and the model tries again. It’s looking for a band-aid fix, and every model (even sonnet and o1-preview) can have a tendency to get into the following pattern:
1. User: Error X occurs
2. Model: Solution A
3. User: Error X still occurs
4. Model: Solution B
5. User: Error X still occurs
6. Model: Solution C
7. User: Error X still occurs
8. Model: Solution A
9. User: Error X still occurs
10. Model: Solution B
etc.
Here’s a silly example I had with o1-mini.
There’s a few ways to deal with this issue. The first solution is taking a quick note of the solutions, going back up to the message where the model first introduced the “pattern”, editing it and providing context that the solution is not in any of the noted down solutions. Basically, go back to where the pattern started to occur and tell the model to NOT go down that path is a great way of handling it. In other words, go back to message 1 in the example above, and edit your message to: Error X occurs, I’ve tried solution A, B and C and those didn’t work. Can you solve this for me?
A different method is trying to break out of the bad pattern. If it’s a particularly stubborn bug, instruct the model to log it’s previous attempts and explain why they didn’t work. This can help guide the model to a solution that actually works, and you as the user have something more clear to point at.
LLMs struggle with large adjustments
Like what I shared above, if the instructions aren’t very clear or if you get in a bad pattern, it’s easy to veer of course. Now, generally if you’ve got very clear instructions for a model, this isn’t that much of an issue. However, if there’s a large change, update or refactor you would like to implement, things get a lot more difficult.
“Refactor this” is a bad approach. First of all, you’ll quickly run into the output limit of the models, and there’s too much room for errors. Trying to do too much at once can turn your code into a mess, and leave you trying for hours to get it back to a working state again.
It’s much better to take a step back and figure out what you want. Use a model to brainstorm, especially if you’re trying to do something you don’t know that much about. Ask for help, give room to let LLMs give you suggestions without asking for code. It’s important to think of a step-by-step plan with checks along the way that you can then execute.
o1-preview shines here. It’s by far the best model at planning ahead, taking into account little things you’ll come across along the way. It might not be the absolute best for making quick changes to code or small bug fixes as it’s very slow, but what it comes up with is generally very solid.
Once you have a plan, pass it on to the model of preference and get to work making small adjustments along the way. Large refactors will for the forseeable future be a challenge, but it’s better than ever with tooling like Aider or Cursor combined with models like o1-mini and o1-preview. I’ve had decent success with Aider’s architect mode, which in simple terms uses o1-mini or o1-preview to write a plan and instructions, and then uses a cheaper, faster model to write the actual code and make the adjustments. The combination of o1-preview and deepseek coder 2.5 is very slow, but if it can get 90% of the way there and then it takes a bit of effort fixing small mistakes. Definitely worth it, just make sure to grab a cup of coffee while these models are doing their thing.
My takeaways after two years
There’s been a lot of changes the past two years. It’s a constant learning process that consists of a significant amount of experimentation. One question keeps popping up in my mind: Am I actually getting better at this, or are the AI models and tools just doing more of the heavy lifting? Honestly, it's probably a bit of both.
As I've gotten more comfortable working with these AI tools, they've also gotten a lot smarter. It's like we've been improving together. This tag-team effort has let me take on projects that would've made my head spin when I first started.
When I look back at my old conversations with AI models, I see a lot of the same issues cropping up. They're not as bad as they used to be, but they're still there:
The AI still jumps to solutions
It occasionally misses the point of what I'm asking and goes off on a tangent.
It still gets caught in bad patterns
The good news? These hiccups happen less often now, and when they do, they're usually easier to fix. Plus, I've gotten better at spotting them.
If there's one big lesson I've learned, it's that working with AI isn't about sitting back and letting the computer do all the work. While we’re slowly moving towards that point, so far the quality of the output has been dependant on the quality of the input, which in other words translates to: The better you know what you want, the better you can put the AI to work.
In the near future I’ll be sharing what tools I use and how I use them. Copy and pasting from ChatGPT or Claude only gets you so far, and some of these tools really push what you can achieve with these models. Subscribe or follow to keep up!
Really aligned with this; I've been having a similar experience Sjoerd! You certainly cannot let it code like an agent, it does take a lot of iteration to find code that works. But I've found that it forces you to look at the code and actually learn coding. It's like the best teaching tool ever: teaching by doing by teaching by doing by teaching.