Transcribe & Translate Audio Between Languages with OpenAI & S3 Storage
Go to WorkflowDescription
This workflow automatically transcribes audio files, translates the content between languages, and generates natural-sounding speech from the translated text - all in one seamless process.
Who's it for
Content creators, educators, and businesses needing to make their audio content accessible across language barriers. Perfect for translating podcasts, voice messages, lectures, or any audio content while preserving the spoken format.
How it works
The workflow receives an audio file through a webhook, transcribes it using OpenAI's Whisper, translates and structures the text with GPT-4, generates new audio in the target language, and stores it in S3 for easy access. The entire process takes seconds and returns both the transcribed/translated text and a URL to the translated audio file.
How to set up
Configure OpenAI credentials - Add your OpenAI API key for Whisper transcription and GPT-4 translation
Set up AWS S3 - Create a bucket with public read permissions for audio storage
Update configuration - Replace 'YOUR-BUCKET-NAME' with your actual S3 bucket name
Activate webhook - Deploy and copy your webhook URL for receiving audio files
Send a POST request with:
Binary audio file (as 'audiofile')
Languages parameter (e.g., "English, Spanish")
Requirements
OpenAI API account with access to Whisper and GPT-4
AWS account with S3 bucket configured
Basic understanding of webhooks and API requests
How to customize
Add language detection** - Automatically detect source language if not specified
Customize voice settings** - Adjust speech speed, pitch, or select different voices
Add file validation** - Implement size limits and format checks
Enhance security** - Add webhook authentication and rate limiting
Extend functionality** - Add subtitle generation or multiple output formats