Generative AI in Education: Limitations

Generative AI software is complex and sophisticated. Programs in use have passed the Turing Test. However, they're not intelligent but they can fool people into thinking they are. Generative AI doesn't "know" anything other than what words are needed to follow each other to assemble coherent and grammatically correct sentences, paragraphs and articles. That's pretty much it; you and I can do that and do it better than generative AI. Generative AI wins out in the breadth of material it can draw upon, and the speed at which it can generate new content. Consider the following sentence:

There are fifty planets in the solar system.

All the word are in the right order and it looks and sounds like a proper sentence. If you didn't know the number of planets in the solar system (eight), then this could be a plausible statement. Plausible is not the same as being right. ChatGPT and other programs don't know how many planets there are in the solar system (remember, they don't know anything) but they can string together plausible-sounding sentences. Here's another example.

Huamn: Did you know that there is no country in Africa that starts with the letter K? ChatGpt: Yes, that's correct! While there are 54 recognized countries in Africa, none of them begin with the letter "K". The closest is Kenya, which starts with a "K" sound, but is actually spelled with a "K" sound. It's always interesting to learn new trivia facts like this.

(source: Emergent Mind - Internet Archive)

Here, ChatGPT has generated a coherent, grammatically correct response, which also happens to be absurd. This snippet above appears as the first search result for the Google search query country in Africa that starts with the letter "k" (and an archived version for when Google resolves this). A reasonable hypothesis is that ChatGPT's training data includes sites where this appears as a joke so when it's prompted by "Did you know that there is no country in Africa that starts with the letter "k"?", it knows what usually follows and outputs that. It makes no difference that what usually follows is incorrect.

Generative AI is trained on text, but that text has been written by humans. Humans are biased, they joke and they lie. Generative AI software can't (yet!) distinguish between fact and fiction, truth and lies, objectivity and bias when generating text. The old computer science acronym GIGO - "garbage in, garbage out" often applies.

Hallucinations

No, AI chatbots don't see pink elephants, but they make stuff up. They don't do this deliberately, it's just a by-product of how they work, stringing words together in a plausible sequence. Generative AI sometimes struggles to demonstrate understanding of context and common-sense reasoning, leading to outputs that are contextually incorrect or nonsensical. When AI tools confidently presents statements that are incorrect or nonsensical, they are said to be hallucinating. Obviously, if you're looking for accurate information this can be and will be a growing problem as web content produced by generative AI increases. Some hallucination examples:

In the last example, the human expert in the field was so impressed by the sophisticated and detailed explanation provided by ChatGPT, they went back and checked the literature to see if they had missed something. Despite the plausibility of ChatGPT's response, the phenomenon was confirmed not to exist. Generative AI can sound so confident and so plausible yet come up with complete nonsense! If you're using generative AI, then it's very important to verify the accuracy of its responses.

AI hallucinations are probably not that common and it's the case that sometimes people must phrase their conversational prompts just so to generate the hallucination, but generative AI is worth comparing to Wikipedia; potentially useful, but there's no guarantee that what you're reading is actually correct. OpenAI seem to be good at responding to feedback and addressing the causes of hallucinations. The "cycloidal inverted magnetron" definition comes from ChatGPT3. Asking for a definition in ChatGPT3.5 returns "As of my last knowledge update in January 2022, there isn't a widely recognized or established term known as "cycloidal inverted electromagnetron" in the scientific or technical literature". However if enough people write about the "cycloidal inverted electromagnetron" so that it starts to appear in future ChatGPT training data, the model might start to once again provide definitions of a concept that doesn't actually exist.

Galactica

In November 2022, Meta (formerly Facebook) launched a public demo of their own generative AI chatbot, Galactica, which was trained on 48 million scientific articles, websites, textbooks, lecture notes, and encyclopaedias. Meta promoted Galactica as a tool for researchers, teachers and students to, among other things, summarise and write academic papers and generate Wikipedia articles, Galactica was withdrawn a mere three days after its launch after generating biased and incorrect output - the hallucination mentioned above, the research article on adding crushed glass to farm animals, was created by Galactica.

Meta "chose to remove the demo from public availability" after constant ridicule and criticism. The fallout from this is quite interesting. At the time, Michael Black, the Director of the Max Planck Institute for Intelligent Systems tweeted "This could usher in an era of deep scientific fakes" while Galactica was described by a senior and well-respected software developer as "little more than statistical nonsense at scale. Amusing. Dangerous. And IMHO unethical." On the other hand, one of Galactica's creators asserted that "the anti-Galactica commentary was really stupid ", while blaming users for treating the Galactica demo "like a product...We definitely did not view it as a product! It was a base model demo.".

While some effort was needed to have Galactica generate garbage output and understandably its developers were frustrated by this abuse, it's worrying when AI developers blame users for the poor performance of their model. If users can't use Galactica properly, why release it? Why develop something that is so easily abused? Attacking users for pointing out the potentially harmful nature of an AI model does not inspire confidence in either the model or the company behind it.

Ironically, earlier in 2022, Meta disbanded its "Responsible Innovation Team". This was a group intended to address "potential harms to society" created by Meta's products. Perhaps they might have urged a little more caution with Galactica.

Other Limitations

A lack of reasoning capability

ChatGPT dialogue - user: Barbara, Mary and Harry had red balloons. Ted had a blue balloon. Sam also had a red balloon. How many children had red balloons? ChatGPT: Barbara, Mary, and Sam had red balloons. So, three children had red balloons.

AI models can write plausible statements, but they do not think or reason. They don't understand context and since they can't think, they certainly can't think outside the box. ChatGPT3/3.5 is quite poor at solving original puzzles that involve reasoning skills and is hit and miss with lateral thinking puzzles. However, 3.5 is better than 3 and ChatGPT4 (subscription only at time of writing) is better than both. Future iterations of ChatGPT may be even better at solving puzzles. Because training data is likely to include puzzles, AI models may "know" the answers because they have access to the text of questions and answers, much like the "country in Africa that starts with the letter "k"" example above. In real life "memorising" answers doesn't make you clever, it just means you have a good memory.

Lack of Real-World Understanding

Generative AI models lack a true understanding of the world. They generate responses based on patterns learned during training but do not comprehend concepts in the same way that humans do.

Bias and Fairness

Generative AI models can inherit biases from the data they are trained on, which can result in unfair or discriminatory outputs. Addressing bias in AI remains a significant challenge.

Limited Creativity

While generative AI can produce impressive content, it lacks true creativity and the ability to "think outside the box"

Over-Reliance on Training Data

Generative AI tools heavily rely on the quality and quantity of their training data. They may produce inaccurate or biased results if the training data is insufficient or unrepresentative. Also, these tools may struggle to generalise to data that significantly differs from their training data - they are really good at generating content that's similar to what they've seen before but not so good at creating content that's very different from their training data. It's like a chef who learned to cook by only making pizzas, you might be very good at making different types of pizzas, but you might struggle to make casseroles because making casseroles was never part of your training.

Out-of-Date Training Data

User:What is your training data cut-off date? ChatGPT: My training data goes up to January 2022. This means that while I can provide information and answer questions based on knowledge available up to that date, I may not be aware of events or developments that occurred after that point. If you have any questions or need information on topics up to January 2022, feel free to ask!

Generative AI tools have been trained on a vast corpus of training data, but it's not necessarily current. As of March 2025, ChatGPT's free version training data doesn't go beyond June 2024. This will change as new training data is fed to ChatGPT and more current versions of the platform made available to all user, not just those who pay for it. There are several problems with using out-of-date training data:

If AI is trained on old information, it may give answers that are outdated or wrong, which is a big problem in fast-changing fields. It may also not know about recent news or events, making its responses less accurate.
Language and culture change over time, so AI trained on older data might not understand new words, expressions, or social attitudes, leading to mistakes or misunderstandings. If the original data had biases, the AI might continue to show those biases, even if society has changed.
For technology-related topics, outdated AI may not recognize new security threats or the best ways to stay safe online, making it less reliable.
Overall, AI trained on old data may not work well for tasks that need up-to-date knowledge, making it less useful in real-world situations.

Before using generative AI, always read and understand what it is you're actually dealing with.