ScribeberryScribeberry Docs

Realtime Transcription

Stream live audio to Scribeberry and receive instant transcript segments via WebSocket.

Realtime transcription lets you stream audio from a microphone (or any audio source) and receive transcript segments as speech is recognized — with sub-second latency.

How It Works

Realtime transcription uses a WebSocket connection between your application and the Scribeberry API. The flow is:

  1. Connect — open a WebSocket to wss://api.scribeberry.com/ws/realtime
  2. Start — send a start command with your session configuration
  3. Stream — send raw audio chunks as binary WebSocket frames
  4. Receive — get partial (interim) and final (confirmed) transcript events
  5. Stop — send a stop command and receive the final transcript + optional note

Quick Example (Node.js)

realtime-node.ts
import { Scribeberry } from '@scribeberry/sdk';
 
const sb = new Scribeberry({ apiKey: 'sk_test_...' });
 
const session = sb.realtime.transcribe({
  language: 'en-US',
  enableDiarization: true,
});
 
session.on('partial', (text) => {
  process.stdout.write(`\r  Hearing: ${text}`);
});
 
session.on('final', (segment) => {
  console.log(`\n✓ ${segment.text}`);
});
 
session.on('error', (err) => {
  console.error('Error:', err.message);
});
 
// Stream audio from your source (e.g., file, microphone)
// Audio must be PCM 16-bit, 16kHz, mono
session.sendAudio(audioChunk);
 
// When done
const result = await session.stop();
console.log('Full transcript:', result.transcript);

Session Lifecycle

idle → connecting → active → stopping → stopped

                    paused
StateDescription
idleSession created, not yet connected
connectingWebSocket open, waiting for server acknowledgment
activeStreaming audio, receiving transcripts
pausedAudio paused, connection alive
stoppingStop requested, waiting for server to flush
stoppedSession complete, final results available

Events

partial — Interim Transcript

Fired rapidly as speech is recognized. Each partial replaces the previous one. Use this for live display of what the user is saying.

session.on('partial', (text: string, speaker?: number) => {
  // Update the UI with the current interim text
  interimElement.textContent = text;
});

final — Confirmed Segment

Fired when a segment of speech is fully recognized. This text is stable — it won't change. Accumulate final segments to build the complete transcript.

session.on('final', (segment: TranscriptSegment) => {
  // segment.text — confirmed text
  // segment.speaker — speaker ID (if diarization enabled)
  // segment.startMs — start time in ms
  // segment.endMs — end time in ms
  transcriptDiv.textContent += segment.text + ' ';
});

endpoint — Utterance Boundary

Fired when a natural pause in speech is detected. Use this to insert paragraph breaks or punctuation.

session.on('endpoint', () => {
  transcriptDiv.textContent += '\n';
});

started — Session Ready

session.on('started', (sessionId: string) => {
  console.log(`Session ${sessionId} is ready`);
});

stopped — Session Complete

session.on('stopped', (result: RealtimeSessionResult) => {
  console.log(`Transcript: ${result.transcript}`);
  console.log(`Duration: ${result.durationSeconds}s`);
  console.log(`Segments: ${result.segments.length}`);
});

note — Note Generated

Fired only if you provided a templateId in the session config. The server generates a note from the accumulated transcript after you stop the session.

session.on('note', (note: Note) => {
  console.log(note.markdown);
});

error — Error Occurred

session.on('error', (error: Error) => {
  console.error(`Realtime error: ${error.message}`);
});

Session Methods

MethodDescription
sendAudio(data)Send a binary audio chunk
sendStream(iterable)Stream from an async iterable
getTranscript()Get accumulated transcript text so far
getSegments()Get all confirmed segments so far
pause()Pause audio (connection stays alive)
resume()Resume after pause
finalize()Force-flush pending audio
stop()Stop the session and get final results

Configuration

const session = sb.realtime.transcribe({
  language: 'en-US',        // Language code
  enableDiarization: true,   // Identify speakers (default: true)
  templateId: 'template-id', // Auto-generate note on stop (optional)
});

Auto Note Generation

If you pass a templateId, the server automatically generates a note from the accumulated transcript when you stop the session:

const session = sb.realtime.transcribe({
  language: 'en-US',
  templateId: 'soap-note-template-id',
});
 
// ... stream audio ...
 
session.on('note', (note) => {
  // Fired after stop, once the note is ready
  console.log(note.markdown);
});
 
const result = await session.stop();
// result.note is also available here

Next Steps

  • Browser Integration: Set up realtime transcription in a web browser with temporary tokens.

  • Audio Format: Detailed audio format requirements for realtime streaming.