Google researchers are working on a way to directly translate speech into another language, without first converting it into text. Google's Translatotron can also maintain the speaker's voice.
The technique works with a neural network that analyzes spectograms and converts them into a spectogram that matches the language to be translated to. According to the researchers, Translatotron is the first end-to-end model that can directly translate speech into another language.
It is already possible to translate spoken texts and let them speak again in another language, but the speech is first converted into text, which is then translated and then converted back into speech. That is also the way Google Translate works now.
By translating speech directly, without first making text, the speaker's voice can also be preserved according to Google. For this, an optional one speaker encoder used to ensure that the characteristics of the translated speech are preserved. Whether and when Translatotron will be used in practice is not yet known.