Audio Format
Audio format requirements for Scribeberry realtime and async transcription.
Realtime Transcription
For realtime WebSocket streaming, audio must be in a very specific format:
| Property | Requirement |
|---|---|
| Encoding | PCM 16-bit signed little-endian (pcm_s16le) |
| Sample rate | 16,000 Hz (16 kHz) |
| Channels | 1 (mono) |
| Byte order | Little-endian |
| Container | Raw PCM (no headers, no container) |
Why These Requirements?
- PCM 16-bit — uncompressed audio for maximum transcription quality and minimum latency
- 16 kHz — the standard sample rate for speech recognition models; higher rates don't improve accuracy and waste bandwidth
- Mono — speech recognition works on a single channel; stereo doubles bandwidth for no benefit
Converting Float32 to Int16 PCM
Browser audio APIs (AudioContext, getUserMedia) produce Float32 samples in the range [-1.0, 1.0]. You must convert them to Int16 ([-32768, 32767]) before sending:
Buffer Size
When using ScriptProcessorNode or AudioWorkletNode, use a buffer size of 4096 samples for a good balance between latency and efficiency:
4096samples at 16kHz = 256ms per chunk (recommended)2048samples = 128ms (lower latency, more overhead)8192samples = 512ms (higher latency, less overhead)
Async Transcription (Audio Files)
For file-based transcription via the notes API (audioUrl parameter), Scribeberry accepts a wider range of formats:
| Format | Extension | Notes |
|---|---|---|
| WAV | .wav | Preferred — lossless, any sample rate |
| MP3 | .mp3 | Widely supported, lossy compression |
| WebM | .webm | Common browser recording format |
| OGG/Opus | .ogg, .opus | Good compression for speech |
| FLAC | .flac | Lossless compression |
| M4A/AAC | .m4a | Apple ecosystem standard |
ℹ️ Info: For async transcription, the server automatically resamples and converts audio as needed. You don't need to preprocess files to a specific format.
Recording from Browser
MediaRecorder (for file upload)
If you're recording audio for file upload (not realtime), use the browser's MediaRecorder API:
Web Audio API (for realtime streaming)
For realtime streaming, use the Web Audio API to get raw PCM samples. See the Browser Integration guide for a complete example.
Bandwidth Estimates
| Audio Format | Bitrate | Per Minute |
|---|---|---|
| PCM 16-bit, 16kHz, mono | 256 kbps | ~1.9 MB |
| PCM 16-bit, 44.1kHz, stereo | 1.41 Mbps | ~10.6 MB |
| WebM Opus, 16kHz | ~32 kbps | ~240 KB |
For realtime streaming, you'll send approximately 1.9 MB per minute of raw PCM audio.