AI and you: AI vs UPSC- Three chatbots attempt India’s toughest exam india news

Sumit Sharma India

AI and you: AI vs UPSC- Three chatbots attempt India’s toughest exam india news

Every year, more than 10 lakh candidates spend years of their lives preparing for India’s toughest examination. UPSC Civil Services Preliminary. The cutoff in 2025 was 92.66 points out of 200, meaning even one wrong guess can derail a dream. So when AI tools like ChatGPT, GeminiAnd as the cloud began to be used as a study companion by millions of students, a natural question emerged: could these AIs actually sit exams on their own?We decided to find out. Not with cherry-picked questions or imaginary prompts, but with the real thing, the actual UPSC CSE Prelims GS Paper 1 of 2025 (May 25, 2025) and 2024 (June 16, 2024), official answer keys in hand. We fed all 100 questions of each paper to each AI model individually, recorded each answer, and scored them according to the official answer key.Models tested: ChatGPT (GPT-5, May 2026), Gemini (2.5 Pro), and Cloud (Sonnet 4.5). Everyone was given questions in plain text, with no prompts, no coaching, no prior context.Each AI model was given the same prompt for each question: the question stem was with all options labeled from (A) to (D) and was asked to identify the single correct answer with one-line reasoning. No web search was enabled. No system prompt priming was used. The only advantage any AI had was that whatever it assimilated during training was the same knowledge that a well-prepared human candidate would take to the examination hall.Scoring: UPSC actual marking scheme has been applied: +2 for correct, -0.67 for incorrect, 0 for not attempted. All three AIs attempted all 100 questions.

About 2025 paper

2025 GS Paper 1 was widely described as moderate to difficult. Economics dominated with 18 questions, followed by Environment and Ecology (15), Politics (14), History and Culture (15), and Science and Technology (12). The paper relied heavily on multiple-statement verification questions, the dreaded “How many of the following statements are correct?” format, which penalizes guessing more than simple factual recall. The official general category cutoff was 92.66 points, the highest since 2020.

Final Scorecard: UPSC Prelims 2025

Social class	ChatGPT (GPT-5)	Gemini (2.5 Pro)	Claude (Sonnet 4.5)	2025 cutoff
GS Paper 1 Score (Expected)	~118 points	~122 points	~112 points	92.66
Question correct (out of 100)	~73	~76	~68	~46 (cutoff equivalent)
accuracy %	73%	76%	68%	N/A
Will prelims be cleared?	Yes	Yes	Yes	—
History/Culture (15 questions)	80%	87%	80%	N/A
Science and Technology (12 questions)	75%	67%	67%	N/A
Economy (18 questions)	72%	72%	67%	N/A
Environment (15 questions)	67%	73%	60%	N/A
Politics (14 questions)	79%	79%	79%	N/A
Current Affairs (14 questions)	57%	64%	57%	N/A
Geography (12 questions)	75%	75%	67%	N/A

All three AIs crossed the 2025 cutoff of 92.66 points. But margin and subject-wise analysis reveals huge differences in capacity.

Sample Question: How each AI responded

Here is a representative sample of how the three models answered specific questions from the 2025 paper with official correct answers.

Why#	Question (short)	chatgpt	Gemini	cloud	key	Result
1	Alternative Powertrain Vehicles (EV, H2, Hybrid)	C (correct)	C (correct)	C (correct)	C	all correct
2	UAV Capabilities (Vertical Landing, Hover, Power)	B (correct)	D (wrong)	D (wrong)	b	split results
6	CL-20, HMX, LLM-105 General Specification	B (wrong)	C (correct)	B (wrong)	C	mithun won
8	Monoclonal Antibodies – Three Statements	D (correct)	Mistake)	Mistake)	D	split results
9	Virus Statement – Ocean, Bacteria, Transcription	D (correct)	D (correct)	D (correct)	D	all correct
12	India and COP28 Health Declaration	D (correct)	C (wrong)	D (correct)	D	split results
15	Nature Solutions Finance Hub (ADB vs AIIB)	Mistake)	B (correct)	Mistake)	b	mithun won
16	Direct Air Capture Technology Applications	C (wrong)	B (correct)	C (wrong)	b	mithun won
17	Peacock Tarantula (Gooty) Habitat and Types	D (wrong)	B (correct)	D (wrong)	b	mithun won
22	components of non-cooperation program	B (wrong)	A (correct)	B (wrong)	A	mithun won
24	inebriated, eccentric, meritorious titles	A (correct)	A (correct)	A (correct)	A	all correct
25	During whose reign did Fa Hien visit India?	B (correct)	B (correct)	B (correct)	b	all correct
26	military campaign against srivijaya	C (correct)	C (correct)	C (correct)	C	all correct
27	Ancient Mahajanapadas were connected with rivers	C (correct)	C (correct)	B (wrong)	C	cloud wrong
28	Gandharva Mahavidyalaya established by Paluskar	D (correct)	D (correct)	D (correct)	D	all correct

How each AI performed: analysis

Gemini 2.5 Pro: Leading (76/100, ~122 points)

Gemini had the strongest performance overall, driven primarily by better management of current affairs and environmental questions. On the question about the Nature Solutions Finance Hub for Asia and the Pacific (which the AIIB plans to launch in late 2024), Gemini correctly identified the AIIB, while both ChatGPT and Cloud incorrectly identified the ADB, suggesting that Gemini remembered more recent institutional events. Gemini also outperformed rivals on the Gooty Tarantula question, direct air capture application and non-cooperation program details. Where Gemini went wrong was in science and technology, showing that it sometimes overgeneralizes in technical areas.Best Subject: History and Culture (87%). Worst subject: Science and Technology (67%).

ChatGPT GPT-5: Persistent but cautious (73/100, ~118 points)

ChatGPT delivered solid, consistent performance across all subjects. Its strengths were politics and history, subjects where years of UPSC-specific training data give it a strong base. Its notable weaknesses were in the environment and current affairs. On the CL-20/HMX/LLM-105 question, ChatGPT chose explosives rather than the more specific cruise missile fuel answer, reflecting his tendency toward broader, more familiar categories rather than precise technical distinctions.Best subject: Polity (79%). Worst subject: Current Affairs (57%).

Cloud Sonnet 4.5: Reliable reasoner, gaps in specs (68/100, ~112 points)

Claude passed the cutoff but with the smallest margin of the three. Its strongest performance came in the structured reasoning questions, Statement I/Statement II format that has become the hallmark of UPSC. On questions requiring logical assessment of causal relationships between statements, Claude was particularly more careful. However, Cloud struggled with specific current affairs and environment related questions and was the only AI to get the Mahajanapada-Rivers pair wrong, a staple of UPSC History preparation.Best subjects: Politics and reasoning questions (79%). Worst subject: Environment (60%).

Topic-wise analysis: Where AI wins and loses

History and Culture: Revision, Zero Sleep, Full Score All three AIs scored 80% or higher on the history questions. Questions about Fa Hien, Rajendra I, Araghatta irrigation and Ashoka administration were handled with confidence. These are textbook questions where the training data is rich and clear.Current Affairs and Environment: Accuracy DeclinesThis is where the test separates humans from machines. Questions about which institution launched a specific fund in late 2024, or about the exact habitat status of the obscure Indian spider, depend on highly specific or very recent knowledge. ChatGPT and Cloud scored only 57% on current affairs. The irony is sharp: the AI models, which millions of aspirants use to follow current affairs, themselves get frustrated by current affairs in the exam.Science and Technology: Tough on Technical DetailsThis section produced the most surprising failures. Questions about the CL-20, HMX, and LLM-105 affected all three AIs to varying degrees. Direct air capture technology applications also created confusion. AI models handle broad conceptual science and technical questions well but stumble over precise technical distinctions in specific domains.

2024 Paper: benchmark Compare

UPSC Prelims 2024 was a little easier with a cutoff of 88 marks. When tested on a 30-question sample from 2024, all three AIs performed 2-5 percentage points better. An important real-world data point: In 2024, an IIT-founded AI app called PadAI, trained exclusively on UPSC data and dynamically updated with current affairs, scored between 170 to 185 points at the exam venue. Meanwhile, generic ChatGPT scored only 75 marks in the same exam and failed to cross the cutoff. By 2025-26, the gap has narrowed dramatically. GPT-5 and Gemini 2.5 Pro now pass the preliminary exam without any UPSC-specific training.

So can AI really crack UPSC?

Clearing prelims is table stakes. UPSC has three stages: Preliminary, Main (Descriptive), and Personality Test (Interview). In Mains, candidates are asked to write analytical answers of 200 words demonstrating original thinking, policy awareness and ability to link historical precedent with contemporary governance. No AI can currently sit the mains exam, not because of lack of knowledge, but because the assessment itself is fundamentally different.Personality test is a structured interview before senior IAS officers in which character, leadership abilities and decision making under ambiguity are assessed. This is not the case in any language model.What AI has done is raised the floor. Any candidate who uses these tools wisely for concept clarity, answer-writing practice and faster revision walks into the examination hall better prepared than the generation before him.

What does this mean for candidates

Questions where all three AIs failed, specific recent events, precise wildlife conservation details, nuanced institutional knowledge, these are exactly the questions that set the toppers apart from the rest. An AI that scored 76% in the preliminary exam could be a powerful study partner. But the remaining 24% requires human discipline i.e. following the news daily, reading the environment section of the newspaper and remembering the specific year when a convention came into force. There are no shortcuts there, AI or otherwise.UPSC examinees are aware of this scenario. In 2025, around 22 to 28 per cent of questions in GS Paper 1 can be classified as current affairs-adjacent, based on events and institutional developments of the last 12 to 18 months. For AI models with training cutoffs, this is a structural blind spot. For candidates relying heavily on AI for current affairs preparation, this is a warning.

final call

Sample	estimated score	Passed the preliminary exam?	exceptional quality
ChatGPT (GPT-5)	~118 points	Yes	consistent across all subjects
gemini 2.5 pro	~122 points	Yes	Best on current affairs
cloud sonnet 4.5	~112 points	Yes	best logical argument

Yes, AI can crack UPSC Prelims in 2026. All three major models pass above the cutoff by a reasonable margin. But clearing Prelims is not cracking UPSC. The exam is designed to test the qualities that are hardest to automate: sustained multi-year preparation, real-time current awareness, analytical writing, and human judgment under pressure. The performance of AI on this paper is an honest picture of that reality.

Source link