ReALM: Apple presents AI model comparable to GPT-4

A Apple presented [PDF] a new model of artificial intelligence that promises to revolutionize the capabilities of Crab – it’s called ReALM (Reference Resolution As Language Modelingor Reference Resolution as Language Modeling).

When running on the device, it seeks to make Siri more intelligent, allowing the assistant to understand “entities” on the user’s screen, as well as conversations and background processes, such as alarms and music, to respond more accurately.

Are they: Screen (what is being displayed), Conversational (data relevant to the conversation, from previous interactions or from the virtual assistant) and Bottom (background processes, which influence the context of the interaction).

A recently published research paper explores ReALM and demonstrates that it can outperform existing systems, supporting its effectiveness in enhancing Siri’s usefulness through advanced language modeling.

We demonstrated major improvements over an existing system with similar functionality across different reference types, with our most compact model achieving absolute gains of over 5% for on-screen references. We also performed comparisons with GPT-3.5 and GPT-4, with our more compact model achieving comparable performance to GPT-4, and our larger models substantially outperforming it.

Furthermore, testing benchmark against OpenAI’s ChatGPT 3.5 and ChatGPT 4.0 show that ReALM achieves comparable performance to GPT-4, with larger models considerably outperforming it.

Our goal is to have both variants predict a list of entities from an available set. In the case of GPT-3.5, which only accepts text, our input consists of just the prompt; however, in the case of GPT-4, which also has the ability to contextualize images, we provided the system with a screenshot for the on-screen landmark resolution task, which we found helped substantially improve performance.

Even with fewer parameters than GPT-4, everything indicates that ReALM outperforms it in textual tasks and understanding user commands, despite GPT-4 having the ability to contextualize images.

It is very likely that we will see news related to these technologies next June, at WWDC24. This year, after all, the event is expected to place special emphasis on the integration of artificial intelligence features into Apple’s main operating systems. Let’s wait!

via 9to5Mac

The article is in Portuguese

Tags: ReALM Apple presents model comparable GPT4

Related posts