Google’s Gemini AI may be less accurate after new update
The new policy guidelines appear to lower the bar for quality control.
Just a heads up, if you buy something through our links, we may get a small share of the sale. It’s one of the ways we keep the lights on here. Click here for more.
A recent controversy highlights the potential challenges in maintaining human oversight in AI systems, particularly in Google’s Gemini project.
Human involvement is often touted as a safeguard against AI errors, with tasks like coding, dataset management, and output evaluation being vital components.
However, these safeguards are only as strong as the policies guiding them. A new report raises concerns about Google’s approach, specifically using outsourced labor through companies like GlobalLogic.
Google’s Gemini raises accuracy concerns
Historically, GlobalLogic reviewers were instructed to skip prompts requiring expertise they lacked, such as coding or mathematics.
This policy seemed reasonable, aiming to prevent non-experts from inadvertently influencing AI evaluation.
However, a recent shift directs reviewers to no longer skip such prompts, even if they lack the requisite domain knowledge.
Instead, reviewers are asked to rate the parts of the prompts they understand while noting the system’s limitations.
This change has sparked concern. While evaluating AI responses involves more than just technical accuracy — style, format, and relevance are also critical — the new guidelines appear to lower the bar for quality control.
Critics argue this could undermine the integrity of AI oversight, with some reviewers reportedly voicing similar worries in internal discussions.
Google spokesperson Shira McNamara responded to TechCrunch about the situation, emphasizing that raters contribute across various tasks.
She noted that their input doesn’t directly affect algorithms but serves as aggregated feedback for system evaluation. However, this explanation might not fully alleviate public skepticism.
The controversy highlights broader anxieties about balancing efficiency and accuracy in AI development.
By potentially prioritizing data volume over specialized scrutiny, the policy shift raises questions about the depth of human oversight and the implications for AI reliability.
Given humans’ critical role in curbing undesirable AI behavior, any perception of lowered standards is likely to amplify existing fears about AI systems’ ethical and functional ramifications.
What are your thoughts on this policy update for Google Gemini AI? Are you concerned about the output issues that may follow? Let us hear you out in the comments, down below, or ping us via our Twitter or Facebook.