Audio Format

Realtime Transcription

For realtime WebSocket streaming, audio must be in a very specific format:

Property	Requirement
Encoding	PCM 16-bit signed little-endian (`pcm_s16le`)
Sample rate	16,000 Hz (16 kHz)
Channels	1 (mono)
Byte order	Little-endian
Container	Raw PCM (no headers, no container)

Why These Requirements?

PCM 16-bit — uncompressed audio for maximum transcription quality and minimum latency
16 kHz — the standard sample rate for speech recognition models; higher rates don't improve accuracy and waste bandwidth
Mono — speech recognition works on a single channel; stereo doubles bandwidth for no benefit

Converting Float32 to Int16 PCM

Browser audio APIs (AudioContext, getUserMedia) produce Float32 samples in the range [-1.0, 1.0]. You must convert them to Int16 ([-32768, 32767]) before sending:

function float32ToInt16(float32: Float32Array): Int16Array {
  const int16 = new Int16Array(float32.length);
  for (let i = 0; i < float32.length; i++) {
    int16[i] = Math.max(
      -32768,
      Math.min(32767, Math.round(float32[i] * 32767)),
    );
  }
  return int16;
}

Buffer Size

When using ScriptProcessorNode or AudioWorkletNode, use a buffer size of 4096 samples for a good balance between latency and efficiency:

4096 samples at 16kHz = 256ms per chunk (recommended)
2048 samples = 128ms (lower latency, more overhead)
8192 samples = 512ms (higher latency, less overhead)

const processor = audioContext.createScriptProcessor(4096, 1, 1);

Async Transcription (Audio Files)

For file-based transcription via the notes API (audioUrl parameter), Scribeberry accepts a wider range of formats:

Format	Extension	Notes
WAV	`.wav`	Preferred — lossless, any sample rate
MP3	`.mp3`	Widely supported, lossy compression
WebM	`.webm`	Common browser recording format
OGG/Opus	`.ogg`, `.opus`	Good compression for speech
FLAC	`.flac`	Lossless compression
M4A/AAC	`.m4a`	Apple ecosystem standard

ℹ️ Info: For async transcription, the server automatically resamples and converts audio as needed. You don't need to preprocess files to a specific format.

Recording from Browser

MediaRecorder (for file upload)

If you're recording audio for file upload (not realtime), use the browser's MediaRecorder API:

const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const recorder = new MediaRecorder(stream, { mimeType: 'audio/webm;codecs=opus' });
const chunks: Blob[] = [];
 
recorder.ondataavailable = (e) => chunks.push(e.data);
recorder.onstop = () => {
  const blob = new Blob(chunks, { type: 'audio/webm' });
  // Upload blob to your storage, then pass the URL to sb.notes.generate()
};
 
recorder.start();
// ... later
recorder.stop();

Web Audio API (for realtime streaming)

For realtime streaming, use the Web Audio API to get raw PCM samples. See the Browser Integration guide for a complete example.

Bandwidth Estimates

Audio Format	Bitrate	Per Minute
PCM 16-bit, 16kHz, mono	256 kbps	~1.9 MB
PCM 16-bit, 44.1kHz, stereo	1.41 Mbps	~10.6 MB
WebM Opus, 16kHz	~32 kbps	~240 KB

For realtime streaming, you'll send approximately 1.9 MB per minute of raw PCM audio.

On this page