Skip to content

Instantly share code, notes, and snippets.

@yigitkonur
Last active October 31, 2024 04:48
Show Gist options
  • Select an option

  • Save yigitkonur/20feaedfb8823f65e03ae89a32d3e0b7 to your computer and use it in GitHub Desktop.

Select an option

Save yigitkonur/20feaedfb8823f65e03ae89a32d3e0b7 to your computer and use it in GitHub Desktop.
How to optimize MediaRecorder for High-Quality Audio Suitable for OpenAI’s Whisper Model

To achieve the best transcription results with OpenAI’s Whisper model, it’s essential to capture audio in a format that aligns with Whisper's specifications:

  • Format: Mono, uncompressed (WAV) or losslessly compressed (FLAC)
  • Sample Depth: 16-bit
  • Sampling Rate: 16 kHz

While the MediaStream Recording API (MediaRecorder) primarily records audio in compressed formats like WebM or Ogg with codecs such as Opus or Vorbis, you can optimize its settings to approximate Whisper’s requirements as closely as possible. Additionally, post-processing steps (e.g., using ffmpeg) can convert the recorded audio into the desired format.

This guide outlines how to configure MediaRecorder for optimal audio quality, focusing on audio-related options, and provides steps for converting the recorded audio to WAV or FLAC.

Table of Contents

  1. Understanding MediaRecorder Limitations
  2. Configuring MediaRecorder for High-Quality Audio
  3. Recording Audio
  4. Converting Recorded Audio to WAV or FLAC
  5. Complete Example
  6. Best Practices and Considerations
  7. Conclusion

Understanding MediaRecorder Limitations

  • Supported Formats: MediaRecorder typically supports formats like audio/webm;codecs=opus, audio/ogg;codecs=vorbis, and audio/mpeg.
  • Direct Output: It does not natively support uncompressed formats like WAV or losslessly compressed formats like FLAC.
  • Sample Rate and Channels: While you can request specific audio constraints, MediaRecorder relies on the underlying hardware and browser to honor these requests, which may not always be possible.

Given these limitations, the strategy involves:

  1. Capturing the highest possible quality with MediaRecorder.
  2. Converting the recorded audio to WAV or FLAC using additional tools or libraries.

Configuring MediaRecorder for High-Quality Audio

Accessing the Microphone with Specific Constraints

To guide the browser towards capturing audio with a 16 kHz sampling rate and mono channels, use getUserMedia with appropriate constraints:

const audioConstraints = {
  audio: {
    channelCount: 1,          // Mono
    sampleRate: 16000,        // 16 kHz
    sampleSize: 16,            // 16-bit depth (Note: Not directly supported in constraints)
    echoCancellation: false,  // Disable echo cancellation for raw audio
  }
};

navigator.mediaDevices.getUserMedia(audioConstraints)
  .then(stream => {
    // Proceed with creating MediaRecorder
  })
  .catch(error => {
    console.error('Error accessing microphone:', error);
  });

Notes:

  • channelCount: 1: Requests mono audio. However, not all browsers may honor this constraint.
  • sampleRate: 16000: Requests a 16 kHz sampling rate. Browser support varies, and some may use the default 44.1 kHz or 48 kHz.
  • sampleSize: 16: Not directly supported in getUserMedia constraints. Audio depth is typically handled by the codec and format.

Setting MediaRecorderOptions

Configure MediaRecorder with the highest possible audio quality settings:

const options = {
  mimeType: 'audio/webm;codecs=opus', // Prefer Opus codec for high quality and compression efficiency
  audioBitsPerSecond: 256000,         // 256 kbps for high-quality audio
  audioBitrateMode: 'constant',        // Constant bitrate for consistent quality
};

Explanation of Options:

  • mimeType: 'audio/webm;codecs=opus' is chosen for its superior audio quality at lower bitrates and broad browser support. Ensure this MIME type is supported before using it.

    const desiredMimeType = 'audio/webm;codecs=opus';
    if (MediaRecorder.isTypeSupported(desiredMimeType)) {
      options.mimeType = desiredMimeType;
    } else {
      console.warn(`${desiredMimeType} is not supported. Falling back to default MIME type.`);
      delete options.mimeType; // Use default MIME type
    }
  • audioBitsPerSecond: Setting this to 256000 bps (256 kbps) ensures high audio fidelity, which is beneficial for accurate transcription.

  • audioBitrateMode: 'constant' bitrate mode maintains consistent audio quality throughout the recording.

Complete MediaRecorder Configuration:

let mediaRecorder;
const recordedChunks = [];

const desiredMimeType = 'audio/webm;codecs=opus';
if (MediaRecorder.isTypeSupported(desiredMimeType)) {
  options.mimeType = desiredMimeType;
} else {
  console.warn(`${desiredMimeType} is not supported. Falling back to default MIME type.`);
  delete options.mimeType;
}

mediaRecorder = new MediaRecorder(stream, options);

mediaRecorder.ondataavailable = event => {
  if (event.data.size > 0) {
    recordedChunks.push(event.data);
  }
};

mediaRecorder.onstop = () => {
  const recordedBlob = new Blob(recordedChunks, { type: mediaRecorder.mimeType });
  // Proceed with conversion to WAV or FLAC
};

Recording Audio

Implement the recording logic, ensuring that the recording aligns with Whisper’s specifications:

// Start recording with a timeslice to receive data periodically
mediaRecorder.start(); // Alternatively, mediaRecorder.start(1000) for 1-second chunks

// Optionally, stop recording after a specific duration
const recordingDuration = 30000; // 30 seconds
setTimeout(() => {
  mediaRecorder.stop();
}, recordingDuration);

Handling Events:

mediaRecorder.onstart = () => {
  console.log('Recording started');
};

mediaRecorder.onpause = () => {
  console.log('Recording paused');
};

mediaRecorder.onresume = () => {
  console.log('Recording resumed');
};

mediaRecorder.onerror = event => {
  console.error('Recording error:', event.error);
};

mediaRecorder.onstop = () => {
  console.log('Recording stopped');
  const recordedBlob = new Blob(recordedChunks, { type: mediaRecorder.mimeType });
  // Proceed with conversion
};

Converting Recorded Audio to WAV or FLAC

Since MediaRecorder does not support WAV or FLAC directly, you need to convert the recorded Blob to the desired format. Below are two approaches:

Using ffmpeg Server-Side

Steps:

  1. Upload the Recorded Blob to the Server:

    const uploadAudio = async (blob) => {
      const formData = new FormData();
      formData.append('audio', blob, 'recording.webm');
      
      const response = await fetch('/upload-audio', { // Replace with your server endpoint
        method: 'POST',
        body: formData,
      });
      
      if (!response.ok) {
        throw new Error('Failed to upload audio');
      }
      
      const data = await response.json();
      console.log('Audio uploaded and converted:', data);
    };
    
    mediaRecorder.onstop = () => {
      const recordedBlob = new Blob(recordedChunks, { type: mediaRecorder.mimeType });
      uploadAudio(recordedBlob).catch(console.error);
    };
  2. Convert to WAV or FLAC on the Server Using ffmpeg:

    Example Using Node.js and fluent-ffmpeg:

    const express = require('express');
    const multer = require('multer');
    const ffmpeg = require('fluent-ffmpeg');
    const fs = require('fs');
    const path = require('path');
    
    const app = express();
    const upload = multer({ dest: 'uploads/' });
    
    app.post('/upload-audio', upload.single('audio'), (req, res) => {
      const inputPath = req.file.path;
      const outputFormat = 'flac'; // or 'wav'
      const outputPath = path.join('converted', `${req.file.filename}.${outputFormat}`);
    
      ffmpeg(inputPath)
        .audioChannels(1)
        .audioFrequency(16000)
        .format(outputFormat)
        .on('end', () => {
          fs.unlinkSync(inputPath); // Clean up the uploaded file
          res.json({ message: 'Conversion successful', file: outputPath });
        })
        .on('error', (err) => {
          console.error('Conversion error:', err);
          res.status(500).json({ error: 'Conversion failed' });
        })
        .save(outputPath);
    });
    
    app.listen(3000, () => {
      console.log('Server listening on port 3000');
    });

    Explanation:

    • audioChannels(1): Ensures the audio is mono.
    • audioFrequency(16000): Sets the sampling rate to 16 kHz.
    • format(outputFormat): Converts the audio to FLAC or WAV.

Using Client-Side Libraries

If you prefer to handle conversion on the client side, you can use libraries like ffmpeg.js or Recorder.js to convert the Blob to WAV. However, client-side conversion can be resource-intensive and may not support FLAC out-of-the-box.

Example Using Recorder.js for WAV Conversion:

  1. Include Recorder.js:

    <script src="https://cdnjs.cloudflare.com/ajax/libs/recorderjs/0.1.0/recorder.min.js"></script>
  2. Capture and Convert Audio:

    navigator.mediaDevices.getUserMedia({ audio: true })
      .then(stream => {
        const audioContext = new (window.AudioContext || window.webkitAudioContext)();
        const source = audioContext.createMediaStreamSource(stream);
        const recorder = new Recorder(source, { numChannels: 1 });
    
        recorder.record();
    
        // Stop recording after 30 seconds
        setTimeout(() => {
          recorder.stop();
          
          // Export to WAV
          recorder.exportWAV(blob => {
            // Handle the WAV Blob
            const url = URL.createObjectURL(blob);
            const audio = new Audio(url);
            audio.play();
            
            // Optionally, upload or process the WAV file
          });
    
          // Clear the recorder for the next recording
          recorder.clear();
        }, 30000);
      })
      .catch(error => {
        console.error('Error accessing microphone:', error);
      });

    Notes:

    • numChannels: 1: Ensures mono audio.
    • Sample Rate: Recorder.js uses the AudioContext’s sample rate, which can be set to 16 kHz when initializing AudioContext. However, not all browsers allow setting the sample rate explicitly.
    const audioContext = new (window.AudioContext || window.webkitAudioContext)({ sampleRate: 16000 });

    Limitations:

    • Browser Support: Not all browsers support setting the AudioContext sample rate.
    • Performance: Client-side conversion can be CPU-intensive, especially for large or long recordings.
    • Format Support: Recorder.js primarily supports WAV. FLAC support would require additional libraries or custom encoding.

Complete Example

Below is a comprehensive example that combines microphone access with high-quality MediaRecorder settings and client-side WAV conversion using Recorder.js. This approach focuses on producing WAV files suitable for Whisper after conversion.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>High-Quality Audio Recorder for Whisper</title>
</head>
<body>
  <h1>High-Quality Audio Recorder for Whisper</h1>
  <button id="start-btn">Start Recording</button>
  <button id="stop-btn" disabled>Stop Recording</button>
  <audio id="audio-playback" controls></audio>

  <!-- Include Recorder.js -->
  <script src="https://cdnjs.cloudflare.com/ajax/libs/recorderjs/0.1.0/recorder.min.js"></script>
  <script>
    const startBtn = document.getElementById('start-btn');
    const stopBtn = document.getElementById('stop-btn');
    const audioPlayback = document.getElementById('audio-playback');

    let audioContext;
    let recorder;

    startBtn.addEventListener('click', async () => {
      startBtn.disabled = true;
      stopBtn.disabled = false;

      try {
        const stream = await navigator.mediaDevices.getUserMedia({
          audio: {
            channelCount: 1,    // Mono
            sampleRate: 16000,  // 16 kHz
            echoCancellation: false,
          }
        });

        audioContext = new (window.AudioContext || window.webkitAudioContext)({ sampleRate: 16000 });
        const source = audioContext.createMediaStreamSource(stream);
        recorder = new Recorder(source, { numChannels: 1 });

        recorder.record();
        console.log('Recording started');
      } catch (error) {
        console.error('Error accessing microphone:', error);
        startBtn.disabled = false;
        stopBtn.disabled = true;
      }
    });

    stopBtn.addEventListener('click', () => {
      stopBtn.disabled = true;
      startBtn.disabled = false;

      recorder.stop();
      console.log('Recording stopped');

      // Export to WAV
      recorder.exportWAV(blob => {
        const url = URL.createObjectURL(blob);
        audioPlayback.src = url;

        // Optionally, upload the WAV file to your server for further processing
        // uploadWav(blob);
      });

      recorder.clear();
    });

    // Optional: Function to upload WAV to the server
    /*
    const uploadWav = async (blob) => {
      const formData = new FormData();
      formData.append('audio', blob, 'recording.wav');

      const response = await fetch('/upload-wav', { // Replace with your server endpoint
        method: 'POST',
        body: formData,
      });

      if (response.ok) {
        console.log('WAV file uploaded successfully');
      } else {
        console.error('Failed to upload WAV file');
      }
    };
    */
  </script>
</body>
</html>

Additional Resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment