Is ChatGPT Getting Worse Over Time?

Krissy Davis

OpenAI launched ChatGPT-3 at the end of 2022, and while most would agree that it's by far the best model available, a few people have been noticing a change in the output quality. Now, we want to preface this discussion by saying that ChatGPT is still awesome, and this article is in no way a critique of the technology, simply a discussion around performance.

We've heard the question ("is Chatgpt getting worse?") raised a number of times on social media and online forums, and thought it might be interesting to take a closer look to see what people have been experiencing and what the likely cause is.

Is Chatgpt getting worse?

Large language models like ChatGPT indeed have the potential to learn and improve their intelligence over time. OpenAI has implemented a two-step process to enhance ChatGPT's capabilities: pre-training and fine-tuning.

However, despite these efforts, new research indicates that ChatGPT may be worse at certain tasks compared to this time last year.

A recent study by researchers from Stanford University and UC Berkeley found that there were some issues with the accuracy of two AI models, GPT-3.5 and GPT-4.

The researchers tested the models on various tasks, such as solving math problems, answering sensitive questions, and generating code. They found that the models gave different answers to the same questions and that GPT-4 performed worse on math problems in June 2023 than it did in March 2023.

For example, when asked to identify prime numbers using a certain method, GPT-4's accuracy dropped from 84% in March to 51% in June, while GPT -3.5's accuracy improved from 49% to 76%.

Overall, the study suggests that these AI models may not always be reliable and accurate and that more work needs to be done to improve their performance.

Why is ChatGPT getting worse?

The Stanford University study highlights that Improving the overall abilities of language models is a challenging task. In an attempt to enhance the performance of language models, providing them with more data can sometimes result in a decline in their ability to perform other tasks. The study found that both GPT-3.5 and GPT-4 showed improvements in some areas, but their performance declined in others.

However, there are also other several reasons why ChatGPT's accuracy may have dropped. Here are some of the most likely factors:

1. Changes to the model

OpenAI is constantly updating and improving its GPT models, and these changes can sometimes have unintended consequences. For example, a recent update may have introduced a bug that is causing the model to generate inaccurate or nonsensical responses.

2. Sampling

ChatGPT uses a technique called sampling to generate its responses. This means that the model does not always choose the most likely or accurate response. Sometimes, it may choose a response that is less likely, but still plausible. This can result in responses that are incorrect or unclear.

3. Data quality

The efficacy of ChatGPT is heavily reliant on the quality of data used for its training. The presence of biased or inaccurate data can lead to distorted responses, thereby affecting the overall performance of the model.

4. Compute resources

Running large language models like ChatGPT can be very computationally expensive. OpenAI may be limiting the amount of compute resources that are available to ChatGPT in order to save money or to improve the performance of other AI models.

5. Data drift

As the world around us changes, so too does the data that ChatGPT is trained on. This can lead to a phenomenon called data drift, where the model is no longer able to accurately generate responses to new data.

6. Hallucination

Large language models like ChatGPT are sometimes prone to hallucinations, where they generate text that is not grounded in reality. This can be caused by a number of factors, such as the model's training data or the way that it is used.

What’s the future for ChatGPT?

OpenAI hasn’t publicly addressed the findings that ChatGPT is getting worse. However, they have made some general statements about their commitment to improving the quality of their models. For example, in a recent blog post, they stated that they're "continuously working to improve the quality and safety of our models." They also stated that they are "committed to being transparent about our work and sharing our progress with the public."

Currently, ChatGPT's future seems uncertain. However, over the past few months alone, other AI chatbots have emerged and are thriving and outperforming the once-dominant GPT. And with the increasing buzz around Artificial Intelligence and Large Language Models, it's inevitable that more AI chatbots will emerge in the near future, better and bigger than their older relatives.

We did our own research

We asked GPT-3.5, five High School Level 3 math questions. Here were the outputs:

Statistics level 3 question: “A survey of 100 students found that 60 students liked pizza, 35 students liked hamburgers, and 15 students liked both pizza and hamburgers. How many students liked pizza or hamburgers?”

The calculations performed by the chatbot were accurate and correct, except for the final step where it made an error in a simple math equation. Regardless, the chatbot demonstrated proficiency in solving complex equations, which is noteworthy.

However, despite its rocky start, GPT-3.5 was able to redeem itself by accurately answering the other four questions.

Geometry level 3 question: “A triangle has side lengths of 3 cm, 4 cm, and 5 cm. Is this triangle a right triangle?”

Level 3 math question: “Find the greatest common factor of 12 and 18.”

Algebra level 3 question: “Factor the expression: x^2 + 5x + 6”

Logic and reasoning level 3 question: “If it is raining, then the ground is wet. The ground is wet. Therefore, it is raining. Is this a valid argument?”

Krissy Davis

See all articles

WeAreDevelopers Dev Digest

Your weekly digest of news, tools, and expert tips to elevate your developer career.