Artificial intelligence has become a popular tool for finding information, summarizing texts, and even suggesting academic sources. But while AI can speed up the research process, it also introduces a set of risks that many users overlook.
Invented Information and “Hallucinated” Sources
Some widely used AI tools—such as general-purpose models like GPT—often produce citations, authors, or articles that look convincing but simply don’t exist. A recent study by Oladokun et al. (2025) found a disturbingly high frequency of false or non-existent citations, with 42.9% in ChatGPT-3.5 and 51.8% in ChatGPT-4o, underscoring how unreliable GPT models are for this task. And this problem is also present in other AI search tools like Perplexity, Perplexity Pro, DeepSeek Serarch, Copilot, Grok2, Grok3 y Gemini (Jaźwińska et al., 2025)
However, there are specialized AI tools designed specifically for locating scientific articles—such as Consensus (free) or Elicit (freemium)—that significantly reduce the risk of invented references. Nevertheless, while they avoid generating fake citations, they still present other challenges that we will explore below.
Researcher-Induced Bias and Selective Queries
Another subtle but important problem arises when the researcher’s own question reinforces pre-existing assumptions. For example, a query such as “show me articles that support [specific claim]” inadvertently promotes confirmation bias, because the AI will prioritize papers aligned with that statement while ignoring literature that challenges or contradicts it.
And Another Layer of Bias: Training Data Limitations
Beyond user-induced bias, AI systems also reproduce biases embedded in their training data. Because they learn from internet-scale text and highly digitized sources, they tend to favor:
English-language publications,
Dominant theoretical perspectives over minority or emerging viewpoints.
Misinterpretations and Mixed Content
Even when an AI retrieves a real and correctly cited article, it may still misunderstand the study’s conclusions or blend information from multiple texts. This can produce summaries that appear accurate at first glance but actually distort the original meaning. In some cases, the AI may attribute a conclusion to a paper that never makes such a claim—or worse, one that argues the opposite. This issue is especially common in specialized academic retrieval tools, such as the previously mentioned Consensus and Elicit, and it can even occur in systems like NotebookLM, where the model may misinterpret scientific content that the user has personally uploaded.
Difficulty Assessing Source Reliability
Another major limitation of AI-powered research tools is their inability to consistently distinguish between high-quality, peer-reviewed literature and sources with no credibility, for example, from journals known for questionable research practices (QRP).
That’s why AI should be understood as a complementary tool—not a substitute for academic judgment. Using it responsibly requires verifying references, consulting original papers, questioning assumptions, and engaging critically with the literature. When researchers combine the efficiency of AI with thoughtful human oversight, they can benefit from its capabilities while minimizing its most common risks.
Oladokun, B. D., Enakrire, R. T., Emmanuel, A. K., Ajani, Y. A., & Adetayo, A. J. (2025). Hallucitation in Scientific Writing: Exploring Evidence from ChatGPT Versions 3.5 and 4o in Responses to Selected Questions in Librarianship. Journal of Web Librarianship, 19(1), 62–92. https://doi.org/10.1080/19322909.2025.2482093