Posted: 2024-05-19 03:30:00

In a further change of direction, away from OpenAI’s roots as a not-for-profit, the company’s chief executive Sam Altman said in a blog post that the future of GPT is likely not one product, but a set of technologies that other companies will use to develop chatbots of their own. This also rings some alarm bells; an assistant that speaks as naturally and smoothly as a person, but is fallible in indecipherable ways and must be serving the commercial interests of its developers somehow.

A day after OpenAI’s announcement, Google shared a look at a very similar voice assistant advancement based on its Gemini models, called Gemini Live. But it also showed an early assistant called Project Astra, envisioning what kind of conversational AI we’ll be using a few years down the track.

Gemini has always been multi-modal, but the early versions can have a long lag between input and output, and there’s a limited amount of data it can hold on to when considering a prompt. But in a demonstration of Astra, those issues seem significantly reduced.

The user walks around the office firing off questions like “what does this code do”, “what neighbourhood am I in”, and “do you remember where you’ve seen my glasses”. By seeing through the user’s smartphone’s camera, Astra answers quickly and accurately each time.

Then the user puts the glasses on. These are apparently glasses with cameras, microphones and speakers attached, like Meta and Ray-Ban’s Smart Glasses, and Astra can work through them. The user points to a plan for a computer server and asks Astra how to optimise it, which it does.

It’s clear that this much more intuitive and human way of accessing online data and expertise could be tremendously useful. But there’s also the feeling that it’s all happening very fast – Google only announced Gemini a year ago. Is there a danger that enough people could become used to using digital assistants like this, and not want to go back, before we’ve thought enough about the pros and cons?

Google says its latest Gemini model has a context window of two million tokens, meaning it could consider every frame of an entire movie when coming up with a response.

Google says its latest Gemini model has a context window of two million tokens, meaning it could consider every frame of an entire movie when coming up with a response.

That ends up being the troubling thing in many current AI advances. It’s not that they’re bad, they’re just happening quickly.

Also at I/O, Google detailed another experimental feature that would listen in on your phone calls, and would be able to alert you if it thought the person on the other side was trying to scam you, by identifying common conversation patterns. In an example, a caller identifies themselves as a bank representative and asks the victim to transfer funds, prompting a notification that says, “banks will never ask you to move your money to keep it safe”, and an option to end the call.

Loading

This feature would make use of Gemini Nano, meaning the entire language model is on the phone, so it wouldn’t be recording your calls or sharing information outside your device. In a security sense, it’s not necessarily more concerning than an AI program that monitors your battery usage. And we already have computer programs keeping us safe from things like malicious software downloads, so why not have AI keep us safe from human attacks?

But are we headed towards a future where our data assistants will be assessing personal threats to us? Combined with GPT-4o’s ability to read faces, could they tell us about a person’s intent, whether they’re lying to us or being manipulative? Do human interactions need that?

Like OpenAI, Google is also planning to sell the ability for other companies and developers to create bots and agents based on Gemini, and it’s also beginning to roll out an AI module in its Google Search results, which will summarise answers and point to products before showing the traditional list of links.

That likely won’t do anything to satisfy the mass of creators and companies who are already complaining that it’s getting harder for people to find their companies and content through search, or that chatbots are effectively ripping them off without attribution.

Tellingly, Google now features an option next to “images”, “video”, “news”, “shopping”, and “flights” called “web”, which you can click if you want to look for results from websites specifically. It doesn’t seem like that long ago that every Google Search was a search through websites, and I’m not certain that option will be available when you’re asking Gemini about it through your smart glasses.

Get news and reviews on technology, gadgets and gaming in our Technology newsletter every Friday. Sign up here.

View More
  • 0 Comment(s)
Captcha Challenge
Reload Image
Type in the verification code above