Can AI help with tedious dev tasks like cleaning up your codebase? I tested Cursor, one of the most promising AI coding assistants, to find out.
The results were impressive, but proved that the tool isn’t yet ready to replace dev teams with regular development tasks.
Why Cursor?
At RST Software, we constantly test emerging tools to see how they fit into real-world development.
Cursor caught my attention because of its tight integration with your codebase and its promise of context-aware, “vibe coding” – writing code by describing what you want, not how to do it (see F.A.Q.s at the bottom of the page for more details).
So I decided to throw it a simple but time-consuming task:
Refactor all files with the client suffix into a client folder and remove the redundant suffixes from filenames and class names.
This is the kind of mechanical task that should be perfect for an LLM – repetitive, uncreative, and easy to define. Or so I thought.
The prompt
I kept it simple:
Move all modules with the suffix 'client' to a subfolder named 'client'. Then remove the suffixes from the filenames and classes.
Cursor responded quickly. At first, I was impressed.

What Cursor got right
Cursor actually performed better than I expected. Here’s what it surprised me with:
- It generated a working mv script that moved all the files in one go. I had to give it permission to execute this command.
- It renamed filenames and class names, removing redundant suffixes from them. I did not have time to verify what it was doing, so I approved the changes.

At first glance, the results I received were mostly what I expected (roughly 80%). However, I knew that file imports still needed improvement, because paths to multiple files had changed, so at that time I ignored the errors that occurred literally in the entire project.
But then things started to break – and not just build-wise.
What Cursor got wrong
I was surprised with the number of unnecessary additions to the code - these were completely unwarranted, as I didn’t prompt the model to make any additional changes:
- When moving the service, e.g. CompaniesService, Cursor added new empty methods like getEmployees() and getCompany() to my services – functions I never asked for. Did it deem they could be useful? If so, what for? It didn’t know my domain and business logic, so it was a bold move.
- It rewrote my unit tests completely, throwing away my carefully crafted test cases and replacing them with extensive, less precise versions. I found those new tests too uncertain, and there is no way I would let these run.
- It inserted useless safety checks, like verifying a string before calling replaceAll() on it – adding complexity without value. It only obfuscated the code.
- It refactored critical logic incorrectly, like moving the fallback logic in a GeocoderService outside of a catch block, resulting in broken error handling. It replaced it with an empty response from the service and later verified whether the length of the response was > 0
- Worst of all: it missed parts of the renaming entirely, causing the build to fail.
The moment I realized I couldn’t trust the changes without reviewing everything line-by-line was the moment this stopped being a time-saver.
Second attempt: tighter prompt, same problem
Learning from the previous mistakes, I decided to start over. After rolling back changes with git stash, I gave Cursor another shot – this time explicitly telling it to do nothing except what I asked.
I instructed it to correct the paths to imports in the files.
It worked a bit better… until it didn’t.
At first, it followed the prompt. Then it started drifting again:
- Changing file content instead of just fixing imports
- Adding code I didn’t ask for
- Losing context from earlier instructions
Why? Most likely due to context window limitations. Once you pass a certain number of files and steps, the model starts “forgetting” earlier commands.
Eventually, it got caught in a loop.
It started adding aliases for everything, even for "src" which looked different depending on the location in the project, e.g. "./../../../src" vs "./../src" so it fixed one and broke the next and so on and so forth.

Final workaround: prompt by prompt, file by file
Eventually, I gave up on big-batch prompts and switched to step-by-step micromanagement.
For every action – move a file, rename a class, update an import – I gave Cursor one small instruction at a time and reviewed its output. Painstaking? Yes. Effective? Barely.
All in, I spent over two hours on a task I expected would take 15 minutes with AI. If I had done it manually from the start, I would have finished in 30.
I deliberately avoided “showing it the right solution” to see if it could learn from its mistakes. It didn’t. Instead, it spiraled into a loop of confusing path resolutions and unnecessary rewrites.

My verdict – not ready for production tasks
Cursor is like Junior Dev with – fast, willing, but doesn’t “know when to stop,” and that’s dangerous.
Jokes aside. What I am trying to highlight here is the fact that it seems to be losing focus and isn’t able to follow the instructions it’s given.
Why does it bother me?
Because more and more AI-generated code is showing up during code review. I personally feel that this is becoming a problem, because I’m seeing a lot of unnecessary modifications that often obscures the essence of the task, actually dragging out the code review process itself.
Tools like Cursor are powerful, but they’re not yet trustworthy in environments where precision matters.
So when would I use Cursor?
Would I use Cursor in any projects? Only in brand-new ones, for quick prototyping of some small fragment, preparing a demo version of the application.
- In brand-new projects requiring quick prototyping of isolated components
- For early-stage demos
But for tasks within production projects – especially those involving existing logic and dependencies – I’d rather do the work myself than spend twice the time cleaning up after a tool that lost the plot halfway through.