All News

Google Upgrades Gemini Live with Visual Guidance and App Control

Google is rolling out major updates to Gemini Live: it can now highlight items the camera sees, interact with Messages, Phone, and Clock apps to draft texts or make calls, and uses an improved audio model for more natural intonation and adjustable speaking speed. Pixel 10 owners get it at launch, with wider Android and iOS rollout to follow.

Published August 20, 2025 at 06:30 PM EDT in Artificial Intelligence (AI)

What Google announced

Google is expanding Gemini Live with three practical upgrades that bring the assistant closer to the physical world and everyday workflows. Starting August 28, Pixel 10 owners will get a visual guidance mode that can place a box on-screen around objects the camera detects. Google will roll the feature out to more Android devices and then to iOS in the coming weeks.

Key feature highlights

  • Visual guidance: Gemini Live can highlight specific items in your camera view (think pointing out the right tool in a toolbox).
  • Cross-app interaction: soon Gemini Live can draft texts, make calls, and interact with Clock and Phone apps on your behalf during conversations.
  • Richer audio: an updated speech model improves intonation, rhythm, and pitch, lets Gemini adjust tone for sensitive topics, change speaking speed, and even adopt accents for narrative flair.

Why this matters

These upgrades push conversational AI from passive chat toward active, context-aware assistance. Visual guidance reduces ambiguity when instructions rely on physical objects. Cross-app actions mean fewer context switches — a single assistant session can route calls or notify contacts. Better speech makes interactions feel more human and appropriate to the moment.

Practical use cases

  • Field service: technicians use the camera to confirm the correct replacement part and have Gemini guide next steps hands-free.
  • Retail and shopping: shoppers point at shirts or gadgets and get instant product identification or inventory checks highlighted on-screen.
  • Accessibility: visual cues combined with clearer speech can help users with vision or cognitive challenges follow instructions more easily.

Risks and considerations

Powerful as these features are, they raise questions about privacy, permissioned access, and reliability. Visual guidance requires careful handling of on-device camera data and clear user consent. Cross-app actions need tight controls so automated messages or calls never happen unexpectedly.

What organizations should do next

  • Map workflows that benefit from camera-aware guidance and prototype small, controllable pilots.
  • Define permissions and audit trails for cross-app actions to prevent unwanted messages or calls.
  • Benchmark speech model variants for tone and comprehension in target user groups, especially for high-stress contexts.

In short, Gemini Live's updates make assistants more action-oriented and context aware. For developers and product leaders this opens new interaction models, but it also requires thoughtful UX, privacy controls, and robust testing before wide deployment.

At QuarkyByte we translate these advances into practical roadmaps: mapping use cases, measuring accuracy and user trust, and designing governance guardrails so organizations can adopt camera-aware assistants safely and effectively.

Keep Reading

View All
The Future of Business is AI

AI Tools Built for Agencies That Move Fast.

Explore how QuarkyByte helps teams design camera-guided assistants, evaluate app-level integrations, and benchmark speech models for clarity and compliance. We translate these features into deployment plans, risk assessments, and measurable ROI to speed safe, user-ready launches.