To achieve the best transcription results with OpenAI’s Whisper model, it’s essential to capture audio in a format that aligns with Whisper's specifications:
- Format: Mono, uncompressed (WAV) or losslessly compressed (FLAC)
- Sample Depth: 16-bit
- Sampling Rate: 16 kHz
While the MediaStream Recording API (MediaRecorder) primarily records audio in compressed formats like WebM or Ogg with codecs such as Opus or Vorbis, you can optimize its settings to approximate Whisper’s requirements as closely as possible. Additionally, post-processing steps (e.g., using ffmpeg) can convert the recorded audio into the desired format.
This guide outlines how to configure MediaRecorder for optimal audio quality, focusing on audio-related options, and provides steps for converting the recorded audio to WAV or FLAC.
- Understanding MediaRecorder Limitations
- Configuring MediaRecorder for High-Quality Audio
- Recording Audio
- Converting Recorded Audio to WAV or FLAC
- Complete Example
- Best Practices and Considerations
- Conclusion
- Supported Formats:
MediaRecordertypically supports formats likeaudio/webm;codecs=opus,audio/ogg;codecs=vorbis, andaudio/mpeg. - Direct Output: It does not natively support uncompressed formats like WAV or losslessly compressed formats like FLAC.
- Sample Rate and Channels: While you can request specific audio constraints,
MediaRecorderrelies on the underlying hardware and browser to honor these requests, which may not always be possible.
Given these limitations, the strategy involves:
- Capturing the highest possible quality with
MediaRecorder. - Converting the recorded audio to WAV or FLAC using additional tools or libraries.
To guide the browser towards capturing audio with a 16 kHz sampling rate and mono channels, use getUserMedia with appropriate constraints:
const audioConstraints = {
audio: {
channelCount: 1, // Mono
sampleRate: 16000, // 16 kHz
sampleSize: 16, // 16-bit depth (Note: Not directly supported in constraints)
echoCancellation: false, // Disable echo cancellation for raw audio
}
};
navigator.mediaDevices.getUserMedia(audioConstraints)
.then(stream => {
// Proceed with creating MediaRecorder
})
.catch(error => {
console.error('Error accessing microphone:', error);
});Notes:
channelCount: 1: Requests mono audio. However, not all browsers may honor this constraint.sampleRate: 16000: Requests a 16 kHz sampling rate. Browser support varies, and some may use the default 44.1 kHz or 48 kHz.sampleSize: 16: Not directly supported ingetUserMediaconstraints. Audio depth is typically handled by the codec and format.
Configure MediaRecorder with the highest possible audio quality settings:
const options = {
mimeType: 'audio/webm;codecs=opus', // Prefer Opus codec for high quality and compression efficiency
audioBitsPerSecond: 256000, // 256 kbps for high-quality audio
audioBitrateMode: 'constant', // Constant bitrate for consistent quality
};Explanation of Options:
-
mimeType:'audio/webm;codecs=opus'is chosen for its superior audio quality at lower bitrates and broad browser support. Ensure this MIME type is supported before using it.const desiredMimeType = 'audio/webm;codecs=opus'; if (MediaRecorder.isTypeSupported(desiredMimeType)) { options.mimeType = desiredMimeType; } else { console.warn(`${desiredMimeType} is not supported. Falling back to default MIME type.`); delete options.mimeType; // Use default MIME type }
-
audioBitsPerSecond: Setting this to256000bps (256 kbps) ensures high audio fidelity, which is beneficial for accurate transcription. -
audioBitrateMode:'constant'bitrate mode maintains consistent audio quality throughout the recording.
Complete MediaRecorder Configuration:
let mediaRecorder;
const recordedChunks = [];
const desiredMimeType = 'audio/webm;codecs=opus';
if (MediaRecorder.isTypeSupported(desiredMimeType)) {
options.mimeType = desiredMimeType;
} else {
console.warn(`${desiredMimeType} is not supported. Falling back to default MIME type.`);
delete options.mimeType;
}
mediaRecorder = new MediaRecorder(stream, options);
mediaRecorder.ondataavailable = event => {
if (event.data.size > 0) {
recordedChunks.push(event.data);
}
};
mediaRecorder.onstop = () => {
const recordedBlob = new Blob(recordedChunks, { type: mediaRecorder.mimeType });
// Proceed with conversion to WAV or FLAC
};Implement the recording logic, ensuring that the recording aligns with Whisper’s specifications:
// Start recording with a timeslice to receive data periodically
mediaRecorder.start(); // Alternatively, mediaRecorder.start(1000) for 1-second chunks
// Optionally, stop recording after a specific duration
const recordingDuration = 30000; // 30 seconds
setTimeout(() => {
mediaRecorder.stop();
}, recordingDuration);Handling Events:
mediaRecorder.onstart = () => {
console.log('Recording started');
};
mediaRecorder.onpause = () => {
console.log('Recording paused');
};
mediaRecorder.onresume = () => {
console.log('Recording resumed');
};
mediaRecorder.onerror = event => {
console.error('Recording error:', event.error);
};
mediaRecorder.onstop = () => {
console.log('Recording stopped');
const recordedBlob = new Blob(recordedChunks, { type: mediaRecorder.mimeType });
// Proceed with conversion
};Since MediaRecorder does not support WAV or FLAC directly, you need to convert the recorded Blob to the desired format. Below are two approaches:
Steps:
-
Upload the Recorded Blob to the Server:
const uploadAudio = async (blob) => { const formData = new FormData(); formData.append('audio', blob, 'recording.webm'); const response = await fetch('/upload-audio', { // Replace with your server endpoint method: 'POST', body: formData, }); if (!response.ok) { throw new Error('Failed to upload audio'); } const data = await response.json(); console.log('Audio uploaded and converted:', data); }; mediaRecorder.onstop = () => { const recordedBlob = new Blob(recordedChunks, { type: mediaRecorder.mimeType }); uploadAudio(recordedBlob).catch(console.error); };
-
Convert to WAV or FLAC on the Server Using
ffmpeg:Example Using Node.js and
fluent-ffmpeg:const express = require('express'); const multer = require('multer'); const ffmpeg = require('fluent-ffmpeg'); const fs = require('fs'); const path = require('path'); const app = express(); const upload = multer({ dest: 'uploads/' }); app.post('/upload-audio', upload.single('audio'), (req, res) => { const inputPath = req.file.path; const outputFormat = 'flac'; // or 'wav' const outputPath = path.join('converted', `${req.file.filename}.${outputFormat}`); ffmpeg(inputPath) .audioChannels(1) .audioFrequency(16000) .format(outputFormat) .on('end', () => { fs.unlinkSync(inputPath); // Clean up the uploaded file res.json({ message: 'Conversion successful', file: outputPath }); }) .on('error', (err) => { console.error('Conversion error:', err); res.status(500).json({ error: 'Conversion failed' }); }) .save(outputPath); }); app.listen(3000, () => { console.log('Server listening on port 3000'); });
Explanation:
audioChannels(1): Ensures the audio is mono.audioFrequency(16000): Sets the sampling rate to 16 kHz.format(outputFormat): Converts the audio to FLAC or WAV.
If you prefer to handle conversion on the client side, you can use libraries like ffmpeg.js or Recorder.js to convert the Blob to WAV. However, client-side conversion can be resource-intensive and may not support FLAC out-of-the-box.
Example Using Recorder.js for WAV Conversion:
-
Include Recorder.js:
<script src="https://cdnjs.cloudflare.com/ajax/libs/recorderjs/0.1.0/recorder.min.js"></script>
-
Capture and Convert Audio:
navigator.mediaDevices.getUserMedia({ audio: true }) .then(stream => { const audioContext = new (window.AudioContext || window.webkitAudioContext)(); const source = audioContext.createMediaStreamSource(stream); const recorder = new Recorder(source, { numChannels: 1 }); recorder.record(); // Stop recording after 30 seconds setTimeout(() => { recorder.stop(); // Export to WAV recorder.exportWAV(blob => { // Handle the WAV Blob const url = URL.createObjectURL(blob); const audio = new Audio(url); audio.play(); // Optionally, upload or process the WAV file }); // Clear the recorder for the next recording recorder.clear(); }, 30000); }) .catch(error => { console.error('Error accessing microphone:', error); });
Notes:
numChannels: 1: Ensures mono audio.- Sample Rate:
Recorder.jsuses theAudioContext’s sample rate, which can be set to 16 kHz when initializingAudioContext. However, not all browsers allow setting the sample rate explicitly.
const audioContext = new (window.AudioContext || window.webkitAudioContext)({ sampleRate: 16000 });
Limitations:
- Browser Support: Not all browsers support setting the
AudioContextsample rate. - Performance: Client-side conversion can be CPU-intensive, especially for large or long recordings.
- Format Support:
Recorder.jsprimarily supports WAV. FLAC support would require additional libraries or custom encoding.
Below is a comprehensive example that combines microphone access with high-quality MediaRecorder settings and client-side WAV conversion using Recorder.js. This approach focuses on producing WAV files suitable for Whisper after conversion.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>High-Quality Audio Recorder for Whisper</title>
</head>
<body>
<h1>High-Quality Audio Recorder for Whisper</h1>
<button id="start-btn">Start Recording</button>
<button id="stop-btn" disabled>Stop Recording</button>
<audio id="audio-playback" controls></audio>
<!-- Include Recorder.js -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/recorderjs/0.1.0/recorder.min.js"></script>
<script>
const startBtn = document.getElementById('start-btn');
const stopBtn = document.getElementById('stop-btn');
const audioPlayback = document.getElementById('audio-playback');
let audioContext;
let recorder;
startBtn.addEventListener('click', async () => {
startBtn.disabled = true;
stopBtn.disabled = false;
try {
const stream = await navigator.mediaDevices.getUserMedia({
audio: {
channelCount: 1, // Mono
sampleRate: 16000, // 16 kHz
echoCancellation: false,
}
});
audioContext = new (window.AudioContext || window.webkitAudioContext)({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);
recorder = new Recorder(source, { numChannels: 1 });
recorder.record();
console.log('Recording started');
} catch (error) {
console.error('Error accessing microphone:', error);
startBtn.disabled = false;
stopBtn.disabled = true;
}
});
stopBtn.addEventListener('click', () => {
stopBtn.disabled = true;
startBtn.disabled = false;
recorder.stop();
console.log('Recording stopped');
// Export to WAV
recorder.exportWAV(blob => {
const url = URL.createObjectURL(blob);
audioPlayback.src = url;
// Optionally, upload the WAV file to your server for further processing
// uploadWav(blob);
});
recorder.clear();
});
// Optional: Function to upload WAV to the server
/*
const uploadWav = async (blob) => {
const formData = new FormData();
formData.append('audio', blob, 'recording.wav');
const response = await fetch('/upload-wav', { // Replace with your server endpoint
method: 'POST',
body: formData,
});
if (response.ok) {
console.log('WAV file uploaded successfully');
} else {
console.error('Failed to upload WAV file');
}
};
*/
</script>
</body>
</html>- MediaRecorder API Documentation: Comprehensive guide with detailed explanations and browser compatibility.
- Recorder.js: A library for recording audio directly in the browser and exporting it as WAV files.
- ffmpeg Documentation: Official documentation for the versatile audio and video conversion tool.
- OpenAI Whisper Documentation: Official repository with guidelines and usage examples for the Whisper model.](https://www.w3.org/TR/mediastream-recording)