OpenAI has recently introduced new multimodal capabilities for ChatGPT, enabling users to engage in voice conversations and share images with the model in real-time. These features will be initially rolled out for Plus and Enterprise subscribers, followed by availability for free users.

ChatGPT

Voice Input for Mobile Users

The voice input, similar to what we find in mobile voice assistants like Siri and Google Assistant, allows users to speak their queries. The system then translates this speech into text, processes it, and responds vocally. This feature is accessible on both iOS and Android and has great potential for various applications.

Image Input for Visual Queries

Similarly, the image input feature, resembling Google Lens, enables users to convey their questions using images. ChatGPT analyzes the uploaded images and provides relevant responses. Users can even use drawing tools to highlight specific parts of the image or clarify their questions through text or voice. This feature aims to enhance the user experience by enabling dynamic conversations. It can be especially helpful for tasks like bicycle repairs or cooking as demonstrated by the OpenAI, where users can upload images of objects or instruction manuals and receive explanations and solutions.

OpenAI’s decision to integrate voice and image capabilities into ChatGPT not only broadens the range of interactions but also opens up exciting creative possibilities. However, OpenAI is aware of the potential for misuse and is taking steps to prevent unethical applications of these advanced features.

How to use the features?

To start using voice with ChatGPT, go to Settings in the mobile app, access New Features, opt into voice conversations, tap the headphone icon on the home screen’s top-right corner, and select your preferred voice from five options.

For image prompts, tap on the plus button and then either capture or select an image and use drawing tools to guide the assistant.

OpenAI has also launched DALL-E 3 recently, an upgraded AI art tool that boasts integration with ChatGPT, enabling users to create detailed prompts more easily. DALL-E 3 addresses the challenge of generating realistic human hands and improves its performance with complex commands compared to its predecessor, DALL-E 2.

RELATED: