ScribeberryScribeberry Docs

Audio Format

Audio format requirements for Scribeberry realtime and async transcription.

Realtime Transcription

For realtime WebSocket streaming, audio must be in a very specific format:

PropertyRequirement
EncodingPCM 16-bit signed little-endian (pcm_s16le)
Sample rate16,000 Hz (16 kHz)
Channels1 (mono)
Byte orderLittle-endian
ContainerRaw PCM (no headers, no container)

Why These Requirements?

  • PCM 16-bit — uncompressed audio for maximum transcription quality and minimum latency
  • 16 kHz — the standard sample rate for speech recognition models; higher rates don't improve accuracy and waste bandwidth
  • Mono — speech recognition works on a single channel; stereo doubles bandwidth for no benefit

Converting Float32 to Int16 PCM

Browser audio APIs (AudioContext, getUserMedia) produce Float32 samples in the range [-1.0, 1.0]. You must convert them to Int16 ([-32768, 32767]) before sending:

function float32ToInt16(float32: Float32Array): Int16Array {
  const int16 = new Int16Array(float32.length);
  for (let i = 0; i < float32.length; i++) {
    int16[i] = Math.max(
      -32768,
      Math.min(32767, Math.round(float32[i] * 32767)),
    );
  }
  return int16;
}

Buffer Size

When using ScriptProcessorNode or AudioWorkletNode, use a buffer size of 4096 samples for a good balance between latency and efficiency:

  • 4096 samples at 16kHz = 256ms per chunk (recommended)
  • 2048 samples = 128ms (lower latency, more overhead)
  • 8192 samples = 512ms (higher latency, less overhead)
const processor = audioContext.createScriptProcessor(4096, 1, 1);

Async Transcription (Audio Files)

For file-based transcription via the notes API (audioUrl parameter), Scribeberry accepts a wider range of formats:

FormatExtensionNotes
WAV.wavPreferred — lossless, any sample rate
MP3.mp3Widely supported, lossy compression
WebM.webmCommon browser recording format
OGG/Opus.ogg, .opusGood compression for speech
FLAC.flacLossless compression
M4A/AAC.m4aApple ecosystem standard

ℹ️ Info: For async transcription, the server automatically resamples and converts audio as needed. You don't need to preprocess files to a specific format.

Recording from Browser

MediaRecorder (for file upload)

If you're recording audio for file upload (not realtime), use the browser's MediaRecorder API:

const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const recorder = new MediaRecorder(stream, { mimeType: 'audio/webm;codecs=opus' });
const chunks: Blob[] = [];
 
recorder.ondataavailable = (e) => chunks.push(e.data);
recorder.onstop = () => {
  const blob = new Blob(chunks, { type: 'audio/webm' });
  // Upload blob to your storage, then pass the URL to sb.notes.generate()
};
 
recorder.start();
// ... later
recorder.stop();

Web Audio API (for realtime streaming)

For realtime streaming, use the Web Audio API to get raw PCM samples. See the Browser Integration guide for a complete example.

Bandwidth Estimates

Audio FormatBitratePer Minute
PCM 16-bit, 16kHz, mono256 kbps~1.9 MB
PCM 16-bit, 44.1kHz, stereo1.41 Mbps~10.6 MB
WebM Opus, 16kHz~32 kbps~240 KB

For realtime streaming, you'll send approximately 1.9 MB per minute of raw PCM audio.

On this page