AI Tools for Developers: Part 2 - Case Study

In the first part of our research on AI tools for developers, we conducted a general analysis of tools and compared the features and pricing of the most popular ones in each category.

In this article, I’d like to explore specific use cases for AI Code Assistants. We’ll select the model that best solves the tasks and the most user-friendly interface.

We’ll evaluate the three most popular extensions for VS Code: GitHub Copilot, Tabnine, and Codeium, using models GPT-4o, o1-mini, Claude 3.5 Sonnet, and Codeium Premier. Each model will be asked to perform the same tasks, and we’ll determine which one delivers the best results and which interface offers the best user experience.

Case 1: A Simple Business Task

We’ll use a typical web application built with Ruby 3.3, Ruby on Rails 7, and a PostgreSQL database.

In our project, there is an Account model to which we’ll ask the assistants to add a feature for writing notes. We won’t provide specific details about the required functionality or implementation. Instead, we’ll let the models interpret the task and compare their approaches.

GitHub Copilot with GPT-4o

GitHub Copilot uses the GPT-4o model by default, so we’ll utilize it for this test.

GPT-4o is OpenAI’s high-performance flagship model designed for complex, multi-step tasks. It matches GPT-4 Turbo in intelligence but is significantly more efficient, generating text 2x faster.

We asked GPT-4o to implement a feature for adding notes and attached the project’s codebase. This is done by including @workspace in the prompt or clicking the "Attach" button and selecting "Codebase."

Results

➡️ The model generated an implementation plan:

➡️ It also provided necessary commands, conveniently accompanied by a "Run in Terminal" button:

➡️ To edit files, you can click on their names to open them in the editor. The "Apply in Editor" button allows changes to be applied directly, displaying them in the editor like this:

After completing this step, GPT suggested asking how to display the notes. A blue prompt appeared in the bottom-left corner, and clicking it sent the request automatically.

➡️ However, GPT-4o assumed that Account already had multiple notes, even though the previous step generated a migration and corresponding code for note as a single field in Account:

There’s a "Regenerate" button that creates a new response. In the regenerated solution, GPT suggested verifying if the model already had the note field (though it could have done this itself). It also proposed controller changes. However, GitHub Copilot failed to open the correct file and simply inserted the content into the currently active file.

➡️ If the proper file was opened, the changes merged seamlessly:

When generating views, GPT-4o overlooked the modular structure of our files (Main for the client side and System for the admin panel), even though the project context was attached.

➡️ In a new chat, I asked for editable notes. This time, the solution was fully functional:

💡Summary

A bug in GitHub Copilot caused it to freeze during file saving if changes were applied to a file other than the one expected. Reopening the file didn’t resolve the issue, and the file remained unsavable. The only workaround was to restart VS Code.

GitHub Copilot with GPT-4o demonstrated powerful functionality and convenience but was prone to context misinterpretation and occasional bugs that interrupted workflow.
Tabnine with Claude 3.5 Sonnet

➡️ Tabnine allows switching between models, offering options such as Claude 3.5 Sonnet, GPT-4o, and others:

➡️ According to Claude, their model excels in coding tasks, so we chose it for this test:

Image: Introducing Claude 3.5 Sonnet

Claude 3.5 Sonnet sets industry benchmarks in graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency (HumanEval). It can independently write, edit, and execute code with advanced reasoning and troubleshooting capabilities. Additionally, it excels at code translations, making it particularly effective for updating legacy applications and migrating codebases.

Tabnine also offers Jira integration, which allows Jira tickets to be viewed and tasks performed based on them—an appealing feature. The codebase is automatically attached via a slider in the bottom-right corner.

Implementation Process

I asked Claude 3.5 Sonnet to add note-taking functionality for accounts. The suggested solution involved creating a separate table to store notes, including the user who wrote each note.

➡️ However, it didn’t provide a command for generating the migration:

➡️ Then I requested the migration command, and a "Run" button allowed immediate execution in the terminal:

New files are created in the correct directory with appropriate names.

When editing an existing file, changes are displayed upon clicking "Apply," but there’s no option to merge changes.

➡️ For example, I needed to combine new code with existing logic, but I could only choose one version:

➡️ Applying changes doesn’t always work reliably:

Similar to GPT-4o, Claude 3.5 Sonnet didn’t consider the modular structure of the project (Main for the client side and System for the admin panel) when generating controllers and views.

➡️ After opening the page, I encountered an error and asked for a fix. Claude highlighted the problematic lines for me:

➡️ Another issue arose, and Claude initially failed to resolve it. After I provided a hint by highlighting the relevant controller code, it completed the fix correctly:

➡️ Once I adjusted the view file location, the feature worked as expected.

💡Summary

Tabnine with Claude 3.5 Sonnet offers solid coding capabilities, an intuitive interface, and useful features like Jira integration. However, it struggles with complex project contexts, and its inability to merge changes can hinder workflow. While it provides helpful fixes, some issues require additional guidance to resolve effectively.
Codeium with Codeium Premier

After installing Codeium, I found it initially unclear how to utilize the project’s codebase. It turns out Local Indexing must be enabled manually ( source ).

Codeium provides its proprietary models, and I decided to test Codeium Premier. I activated the free trial and selected this model.

Codeium Premier is available on the paid tier with unlimited usage. It is based on Meta’s Llama 3.1 405B, offering the highest performance among Codeium’s models. This is due to its large size, integration with Codeium’s reasoning engine, and compatibility with native workflows.

The codebase is attached by submitting the request with command + Enter. This method differs from other extensions, and it took me a while to figure out how to link the project’s code.

Implementation Process

File modifications in Codeium aren’t as intuitive as in other extensions.

➡️ You must manually open the target file, and pressing the "Insert" button places the changes at the cursor’s location:

Despite this, Codeium Premier recognized the project’s modular file structure. However, the solution it generated lacked data migrations.

➡️ Upon requesting migrations, it provided a complete set of instructions, with commands ready for direct insertion into the terminal:

➡️ Next, I asked it to display the text field in a form and render the notes in the view:

Unfortunately, the changes to the form didn’t save correctly, and Codeium Premier couldn’t resolve the issue independently.

➡️ After hinting that the issue might be related to a policy, it provided a solution:
Outcome

The feature was functional after the adjustments.
📌 Case 1 Summary

Models GPT-4o, Claude 3.5 Sonnet, and Codeium Premier successfully implemented the requested functionality, albeit with some assistance. Each faced challenges during the process, which were resolved after pointing out potential problem areas.

While all three models completed the task, the ease of use, troubleshooting efficiency, and interface functionality varied:
- GPT-4o excelled in speed and integration but occasionally failed to interpret context correctly.
- Claude 3.5 Sonnet demonstrated reasoning skills but required guidance for certain fixes.
- Codeium Premier handled the project structure well but lacked an intuitive interface for managing file changes.
Case 2: Writing Tests

After adding some validations to the Account model, I asked the assistants to generate tests for it. Although many tests are typical, writing tests often takese considerable time for developerl. Utilizing AI assistants here can save significant time.

GitHub Copilot with Claude 3.5 Sonnet

I switched the GitHub Copilot model to Claude 3.5 Sonnet for this case.

GitHub Copilot offers the "Copilot Edits" feature: Use Copilot Edits to start an AI-powered code editing session, allowing quick iterations on code changes using natural language. It proposes code changes across multiple files in your workspace, applied directly in the editor for easy review.

I opened the Copilot Edits tab with the model file and requested test generation. The changes for the test file and factory were automatically applied.

➡️ All I needed to do was validate and approve them:

The generated test worked correctly and covered all aspects of the model.

Tabnine with GPT-4o

For Tabnine, I switched to the GPT-4o model.

Tabnine’s "Test Agent" generates detailed test plans and cases. Invoke the test agent using the CodeLens quick access link, select an existing test file (or request a new one), and Tabnine generates comprehensive test cases. These include plain English descriptions for easy review and selection.

➡️ Following the instructions, it generated this test plan:

Using the "Smart Insert" option, individual test cases were placed in the correct locations within the test file. Each test case had a dedicated input field for further modifications. However, inserting all test cases at once was not possible.

The test structure was slightly less cohesive than GitHub Copilot’s output, likely because each test was generated individually rather than as a batch. Additionally, it did not suggest updates to the factory.

Codeium with Codeium Premier

Codeium does not have a dedicated feature for test generation, so I used its chat function.

➡️ The generated test required manually opening the target file and inserting the content at the cursor’s position using the "Insert" button:

This process meant manually removing any existing file content, pasting the generated content, and removing the filename suggestion.

The test was functional and included factory updates, but the structure was less refined than the Claude 3.5 Sonnet.
📌 Conclusions for Case 2

All models performed well in generating tests. Comparing convenience and functionality:
- GitHub Copilot provided the best experience with its cohesive test generation, seamless change application, and automatic factory updates.
- Tabnine excelled at creating a verbal test plan, but it lacked the option to insert all tests at once, and its structure was slightly less polished.
- Codeium delivered usable tests and factory updates but lacked a dedicated feature for testing, requiring more manual intervention.
GitHub Copilot stood out as the top choice for those prioritizing ease and efficiency. If a verbal plan is preferred, Tabnine is the better option. Codeium is a viable alternative but less specialized in this area.
Case 3: Fixing a Bug

Let’s ask AI assistants to fix a simple bug in an application.

The issue is with a form that allows multiple selections. The bug occurs when all checkboxes are deselected, and changes are saved — the previous selection persists. Deselecting works only if at least one checkbox remains active.

GitHub Copilot with GPT-4o

I submitted a detailed request outlining the steps to reproduce, actual and expected results, providing the model with a comprehensive context.

➡️ The proposed solution looked like this:

Clicking "Apply" isn’t an option this time, as the code is poorly-structured. However, file names are highlighted in blue, allowing me to open the relevant files directly. The solution included unnecessary steps, such as searching for an entity already handled in the controller. Additionally, some lines in the existing code needed removal, which GPT failed to address. The view file chosen for editing was incorrect, but the solution works if you know the correct file and apply GPT-4o’s changes to it.

Tabnine with Claude 3.5 Sonnet

I sent the same request to Tabnine with Claude 3.5 Sonnet.

➡️ The model correctly used the appropriate files and moved the solution from the controller to the model:

Claude also accounted for additional factors in the application, providing a more comprehensive fix that prevents the bug from recurring. Since the correct files were identified, changes could be applied with a single click of "Apply Changes."

Codeium with Base Model

Finally, I tested the free base model of Codeium, which offers unlimited free usage.

➡️ I submitted the same request and attached the codebase:

The solution included correct file paths, allowing the files to be opened directly, but changes couldn’t be applied automatically. The suggested fix wasn’t functional, as the bug persisted, and the model took an incorrect approach to the problem.

📌 Conclusions for Case 3

Claude 3.5 Sonnet provided the most comprehensive solution. GPT-4o also performed well, but its fix could introduce a new bug in the future. The free Codeium Base model offered a non-functional solution, but its cost-free availability remains a potential advantage for more straightforward tasks.

Extra Case: Technical Task

The next challenge for AI assistants is simplifying the launch process for our application. We will ask the models to containerize the project using Docker and Dev Containers in VS Code. This task is more complex but often essential for streamlining workflows.

GitHub Copilot with o1-mini

We used OpenAI’s new o1-mini model. This model excels at coding, nearly matching OpenAI’s o1 on benchmarks like AIME and Codeforces. OpenAI positions o1-mini as a faster, cost-efficient model for reasoning tasks that don’t require extensive world knowledge.

Using the @workspace command to attach the entire project’s codebase, I prompted: "Setup Dev Containers."

➡️ The response highlighted a mismatch between the Ruby version in the .ruby-version file and the version used in the environment:

The first command that creates the folder for Dev Containers can be easily executed in the terminal with a single click. When clicked, the extension automatically inserts the command into the Terminal and runs it.

However, if you need to create a new file, as in step 2 of the model answer, you need to open the drop-down menu and select the appropriate command, which is inconvenient.

➡️ The file is created without a name and in the root folder of the project, so you have to save it manually in the correct directory:

After creating two files according to the instructions, I sent the following GitHub Copilot request, indicating that the Ruby version does not match, and attached the .ruby-version file.

➡️ As a result, I get the corrected file:

➡️ After loading images, an error appeared:

Of course, we are here to solve it with GitHub Copilot! I like the "Apply" functionality here.

➡️ After clicking the button, the contents are inserted into the file, and the changes made are highlighted:

After about 10 attempts to fix the code using the instructions from o1-mini, the applicatimpossible to launch the Dev Containers application is in the PostgreSQL configuration, a typical task for setting up Ruby on Rails applications.

Also, the memory ran out at some point, and Copilot transferred the Dockerfile using Ruby 2.7, forgetting that the project uses Ruby 3.3.

Tabnine with Claude 3.5 Sonnet

In Tabnine with Claude 3.5 Sonnet, I insert a request to use Dev Containers.

➡️ Exactly the same as in Copilot with o1-mini:

I see that the Ruby version is the same. All the code is automatically added to the context, but it can be disabled. Under each code fragment, there is an "Apply" button, which automatically creates a file in the desired directory, which is very convenient.

➡️ An error occurs when starting. In Tabnine, I asked to fix it, and the log is attached automatically. After clicking "Apply", the changes are also visible directly in the file:

After 4 iterations of error fixing, connected to the database, the task was completed with Claude 3.5 Sonnet in Tabnine.

The usage of Tabnine was easier because it creates necessary files with proper names and locations. But in VS Code, after opening the app in the Dev Container, it turns off Tabnine's extension, and it needs to be installed and logged in manually once again. I think it's a limitation of VS Code and not connected with Tabnine.

Codeium with Codeium Premier and GPT-4o

First, I’ll complete the task with Codeium Premier.

I ask to set up Dev Containers. The Ruby version is slightly inconsistent, but I’m more concerned about the lack of a Dockerfile. The "Insert" button simply inserts the contents where your cursor is.

➡️ You’ll have to create and name the file manually:

VS Code generates a log file that is not in the project directory, and Codeium cannot access it. I pasted the error from the log and asked to fix the problem; in response.

➡️ Codeium asked for more information about the project, although this chat already uses the context with the project codebase:

After making several more requests, I could not get a clear result from Codeium Premier.

Since we need to select not only the model but also the interface, I decided to change the model to GPT-4o to test the Codeium extension for VS Code.

The result was better. The interface is inconvenient in that the proposed solution only has the "Copy" and "Insert" buttons.

➡️ To replace a file’s contents, you need to select it completely and click the "Insert" button in the Codeium tab.

After about 5 attempts to solve the problems, Postgres was added. However, there was also a problem with npm and it was never solved. The reason is in what files Codeium selects for the context and other extensions cope with this task better.

📌 Results of the Extra Case

Claude 3.5 Sonnet was the only model to fully complete the task. Codeium Premier, OpenAI’s o1-mini, and ChatGPT-4o fell short.

While Codeium offers access to Claude 3.5 Sonnet and GPT-4o, its functionality and usability lag behind other extensions.

Conclusion on Models and Extensions

In conclusion, the different models and extensions explored in this evaluation each have strengths and limitations when assisting with tasks such as setting up Dev Containers, fixing errors, and generating code.

While some models, like Claude 3.5 Sonnet, proved more effective in completing tasks accurately and efficiently, others, like Codeium Premier, OpenAI’s o1-mini, and ChatGPT-4o, fell short. Tabnine with Claude 3.5 Sonnet stood out for its ease of use and seamless ability to create files with proper names and locations.

At the same time, GitHub Copilot has native integration with VS Code, and its functionality is constantly improving. Developers have more integration options than others since GitHub Copilot is developed in conjunction with the VS Code team. So, if you are choosing an extension for the long term and do not want to learn a new interface or conduct periodic tests in the future, then give preference to GitHub Copilot.

The choice of model and extension can greatly impact the success of a task, and it is important to consider functionality and usability when selecting the most suitable tool for the job.