A VoiceOver user can access all the tools available to non-VoiceOver users and Voiceover provides additional tools to have descriptions and text spoken automatically.
Let's begin with what should happen every time a VoiceOver user encounters an image. Every image should have something called alt text provided with it. alt text is a text description provided by the person publishing the image. When VoiceOver encounters an image with alt text VoiceOver will speak the word "image" followed by the description provided in the alt text. This is wonderful since the person who published the image will know better than any AI what the intended message of the image is. The good news is that many organisations and individuals now provide alt text with images on web sites, social networks, emails and in apps. The bad news is that alt text still isn't universal and in particular social network posts by individuals frequently omit alt text.
So what can be done if all you hear is "image" or worse still, a long and meaningless image file name spoken? VoiceOver has a built in Text Recognition feature which will work with text in fancy fonts and with handwritten text.
So long as you are running iOS 14 or later and you have an iPhone XS, XS Max, XR or later you will be able to use VoiceOver Recognition, which is designed to improve the accessibility of images, websites and apps where the author has failed to provide accessibility information for blind people, such as alt text for images and labels for icons.
The settings for VoiceOver Recognition are at Settings / Accessibility / VoiceOver / VoiceOver Recognition. The available features will depend on your iPhone or iPad model. The features described here should be available on all supported iPhones and iPads. The Screen Recognition feature can help to make inaccessible apps and websites more accessible but this is beyond the scope of this FAQ. If you don't understand it, please leave this switch turned off.
VoiceOver Recognition uses on-device intelligence to generate image descriptions and discover text. Image descriptions will be basic and sometimes wide of the mark but identifying and speaking text is a much simpler task and is usually accurate.
The relevant switches for describing images and for identifying text in images are Image Descriptions and Text Recognition. I normally have both of these turned on but if your vision is good enough to identify most images you may prefer to turn on just Text Recognition.
The other setting you need to consider is Feedback Style. Whichever setting you choose, you will hear the description and text spoken but the Speak and Play Sound settings will indicate when VoiceOver Recognition is providing information. Experiment with these. You may prefer the Do Nothing setting.
So exactly what happens if you turn on Image Description or Text Recognition? You need to be patient. VoiceOver will usually speak everything it would normally speak, including the word "image" and at the very end it will speak the image description and any text it discovers. Text discovery is normally excellent but image descriptions are basic and sometimes inaccurate.
If you need a more detailed analysis of an image then you can use the techniques for low-vision users. These include asking Siri to describe the image if your device supports Apple Intelligence or getting the image described by apps such as Be My Eyes or SeeingAI.