Realtime Transcription

Stream live audio to Scribeberry and receive instant transcript segments via WebSocket.

Realtime transcription lets you stream audio from a microphone (or any audio source) and receive transcript segments as speech is recognized — with sub-second latency.

How It Works

Realtime transcription uses a WebSocket connection between your application and the Scribeberry API. The flow is:

Connect — open a WebSocket to wss://api.scribeberry.com/ws/realtime
Start — send a start command with your session configuration
Stream — send raw audio chunks as binary WebSocket frames
Receive — get partial (interim) and final (confirmed) transcript events
Stop — send a stop command and receive the final transcript + optional note

Quick Example (Node.js)

realtime-node.ts

import { Scribeberry } from '@scribeberry/sdk';
 
const sb = new Scribeberry({ apiKey: 'sk_test_...' });
 
const session = sb.realtime.transcribe({
  language: 'en-US',
  enableDiarization: true,
});
 
session.on('partial', (text) => {
  process.stdout.write(`\r  Hearing: ${text}`);
});
 
session.on('final', (segment) => {
  console.log(`\n✓ ${segment.text}`);
});
 
session.on('error', (err) => {
  console.error('Error:', err.message);
});
 
// Stream audio from your source (e.g., file, microphone)
// Audio must be PCM 16-bit, 16kHz, mono
session.sendAudio(audioChunk);
 
// When done
const result = await session.stop();
console.log('Full transcript:', result.transcript);

Session Lifecycle

idle → connecting → active → stopping → stopped
                      ↕
                    paused

State	Description
`idle`	Session created, not yet connected
`connecting`	WebSocket open, waiting for server acknowledgment
`active`	Streaming audio, receiving transcripts
`paused`	Audio paused, connection alive
`stopping`	Stop requested, waiting for server to flush
`stopped`	Session complete, final results available

Events

`partial` — Interim Transcript

Fired rapidly as speech is recognized. Each partial replaces the previous one. Use this for live display of what the user is saying.

session.on('partial', (text: string, speaker?: number) => {
  // Update the UI with the current interim text
  interimElement.textContent = text;
});

`final` — Confirmed Segment

Fired when a segment of speech is fully recognized. This text is stable — it won't change. Accumulate final segments to build the complete transcript.

session.on('final', (segment: TranscriptSegment) => {
  // segment.text — confirmed text
  // segment.speaker — speaker ID (if diarization enabled)
  // segment.startMs — start time in ms
  // segment.endMs — end time in ms
  transcriptDiv.textContent += segment.text + ' ';
});

`endpoint` — Utterance Boundary

Fired when a natural pause in speech is detected. Use this to insert paragraph breaks or punctuation.

session.on('endpoint', () => {
  transcriptDiv.textContent += '\n';
});

`started` — Session Ready

session.on('started', (sessionId: string) => {
  console.log(`Session ${sessionId} is ready`);
});

`stopped` — Session Complete

session.on('stopped', (result: RealtimeSessionResult) => {
  console.log(`Transcript: ${result.transcript}`);
  console.log(`Duration: ${result.durationSeconds}s`);
  console.log(`Segments: ${result.segments.length}`);
});

`note` — Note Generated

Fired only if you provided a templateId in the session config. The server generates a note from the accumulated transcript after you stop the session.

session.on('note', (note: Note) => {
  console.log(note.markdown);
});

`error` — Error Occurred

session.on('error', (error: Error) => {
  console.error(`Realtime error: ${error.message}`);
});

Session Methods

Method	Description
`sendAudio(data)`	Send a binary audio chunk
`sendStream(iterable)`	Stream from an async iterable
`getTranscript()`	Get accumulated transcript text so far
`getSegments()`	Get all confirmed segments so far
`pause()`	Pause audio (connection stays alive)
`resume()`	Resume after pause
`finalize()`	Force-flush pending audio
`stop()`	Stop the session and get final results

Configuration

const session = sb.realtime.transcribe({
  language: 'en-US',        // Language code
  enableDiarization: true,   // Identify speakers (default: true)
  templateId: 'template-id', // Auto-generate note on stop (optional)
});

Auto Note Generation

If you pass a templateId, the server automatically generates a note from the accumulated transcript when you stop the session:

const session = sb.realtime.transcribe({
  language: 'en-US',
  templateId: 'soap-note-template-id',
});
 
// ... stream audio ...
 
session.on('note', (note) => {
  // Fired after stop, once the note is ready
  console.log(note.markdown);
});
 
const result = await session.stop();
// result.note is also available here

Next Steps

Browser Integration: Set up realtime transcription in a web browser with temporary tokens.
Audio Format: Detailed audio format requirements for realtime streaming.

On this page