How Good is ChatGPT, Really?

When I wrote my last op-ed for the Devil’s Advocate, I thought it was a comprehensive review of all of the cutting-edge machine learning innovations at the time. Two days later, OpenAI released ChatGPT, their take on a chatbot, and it seemed like every corner of the internet was ablaze with the new thing. Suddenly everyone and their mother knew about OpenAI and ChatGPT, and machine learning was thrusted into the forefront of the public sphere. As I am writing this article OpenAI has just released GPT-4, a large multimodal model that has many more capabilities than its predecessors and scored a 1510 on the SAT, but I’ll leave that for another day and another article. While there has been a lot of talk about the resounding impact ChatGPT is going to have in academia and more, I believe the long term effects of this chatbot will be small.

For those who have conveniently missed the last three months, OpenAI is an artificial intelligence research company focused on developing “friendly” AI, or AI that would help humanity. With one billion in backing from just about every big name in tech, they set out pushing the frontier of machine learning. In 2020 they released GPT-3, a large language model (LLM) capable of answering natural language questions and prompts with human-like text. In 2021 they released DALL-E , which could generate images based on natural language descriptions, such as generating “A hyper realistic rendition of a treehouse on the moon.” And on November 30th, 2022, they released ChatGPT, a chatbot based on a more refined version of GPT-3, or “GPT-3.5”. One big difference between GPT-3 and ChatGPT is that ChatGPT has the ability to remember questions previously asked, and build upon those answers in following responses. Very quickly the internet picked up on its true capacity, and we were all flooded with a wave of ChatGPT content. Give it a prompt about your book report, or history source, and it could spit back something good enough to turn in. The news pronounced the “The College Essay is Dead” and that ChatGPT would soon be replacing all written work in the future. This was mostly sensational, as a lot of news is, but ChatGPT still holds a lot of potential. It had the capability to not just write prose and poetry, but write software, music, play board games, and more.

Now, I’m sure almost all of you have tried ChatGPT at some point, even using it to aid in schoolwork. Even I have put in my HUSH assessment questions, to see if a machine could really produce a better crafted, fluid piece of writing in seconds than I could in hours. Spoilers: it didn’t. While the content was there, logically connecting the dots between major events and ideologies, it was bland, and generic. After a few more questions, it became clear that when prompted for an academic essay, it used the same formulaic structures of evidence and reasoning, producing a milquetoast piece that sounded exactly like a thousand others. The voice I spent all of English II crafting and developing was nowhere to be found. Worse yet, LLMs are prone to “hallucinations”, where a model produces an answer with high confidence, despite being wrong. This can lead to answers making seemingly trivial mistakes, or lying to you to create resolving narratives. By prompting ChatGPT with a false premise, such as inventing a person that never existed, ChatGPT will continue to conjure up ideas and descriptions of the person, despite having no actual evidence to base it on. Now could I have turned this in for a decent grade? Potentially, and that seems like the real issue at heart. What made me even more concerned was not the piece itself, but that many academic institutions across the world were finding that this piece of writing was a sufficient replacement for student-generated writing. I would argue that the writing ChatGPT produces does not employ high level critical reasoning. Any student at UHS, with time and effort, could replicate and exceed the level of writing. But if so much of school can be satisfied by the bare minimum of thought, is it really succeeding at teaching, or is it just busywork? It seems like ChatGPT has revealed how much of our school system is built on meaningless or contrived tasks, designed to simply occupy time in a student’s life. I spoke with English teacher Adrian Acu on this topic, and here’s what he had to say. “I think [ChatGPT] forces us to be better teachers. I think what ChatGPT makes us vulnerable to is really uninspired and unoriginal, and very basic assignment work. Like asking you to write a five-page paper on a text that has x and y features is kind of silly. But again, trying to get you to think originally about a specific encounter you have in the text, that’s a little bit more useful.” While ChatGPT has exposed this side of schoolwork and it seems like a problem now, it can’t be like this forever, and I’m confident schools everywhere will shift to more original assignments, as our English department already has.

A big issue that has surfaced is the morality of using ChatGPT for schoolwork. Now first off obviously it is not moral to copy text from ChatGPT directly to an assignment. Artificial intelligence tends to get a reputation in the news and the media of having a mind of its own, but we are not nearly close to that point yet. There is no hidden consciousness lying deep within the bowels of OpenAI’s servers, and ChatGPT’s responses are humanlike, not human. At this point in technological advancement, artificial intelligence is still a tool that we wield, and its uses reflect our own personal values. Even though the debate of originality in large language models is a heavily debated topic, it is not ethical to pass off the work of an LLM as your own, any more than it is to use google translate on a language assignment. Still, schools are having a hard time adapting to the change. As of writing there is still no explicit integrity violation clause in the UHS Student Handbook on machine generated homework, with a tenuous stretch from “misrepresent the work of others as your own” (referring to a LLM as another opens a whole other can of worms). There are still no reliable ways to detect AI-generated text, with tools like GPT-Zero mislabeling text left and right. But the biggest issue comes not with the technology itself, but how it is used, and the culture of cheating that exists in academia. It’s not merely that ChatGPT is a new technology that has never existed before, because GPT-3, the underlying algorithm, has existed since 2020. But the ease of use of ChatGPT threw a new wrench in the mix, as now anyone could get answers from a web browser in seconds, instead of having to make API requests. And so while ChatGPT does pose a threat to the integrity of schoolwork, it’s part of a larger culture that is not so easily shifted. ChatGPT is just another tool to be misused, like Google Translate or CliffNotes, which has been around since the 50s. Once again, it seems like a big cultural shift now, but looking at the broader context, it’s just another cog in a much larger system.

Reflecting back on the four months since ChatGPT was released, it seems like a lot of the fanfare has died down, although I might be speaking prematurely with the release of GPT-4. Just like any new technology, everyone was quick to pounce on it, and ChatGPT has become integrated into many people’s lives already. But when it comes to schoolwork, while ChatGPT seems to have had a massive short term impact, I can’t see it having any large long term impacts on school and homework. Schools will have to adapt assignments or accept the futility of their work, and it will become yet another tool for people to use to cut corners. And in the end, the machines don’t have the capability to do things on their own right now. It all comes down to how we use this technology, and whether we wield it for good or bad purposes.

Taken from the Devil's Advocate April 2023 Issue