We’ll introduce two further Amazon Nova models in 2025, together with a speech-to-speech mannequin and a local multimodal-to-multimodal—or “any-to-any” modality mannequin. Our speech-to-speech mannequin will perceive streaming speech enter in pure language, deciphering verbal and nonverbal cues (like tone and cadence), and delivering pure humanlike interactions, whereas our any-to-any mannequin will likely be able to processing textual content, pictures, audio, and video, as each enter and output. It should simplify the event of functions the place the identical mannequin can be utilized to carry out all kinds of duties, corresponding to translating content material from one modality to a different, enhancing content material, and powering AI brokers that may perceive and generate all modalities.