Browser Integration

Set up realtime transcription in a web browser using the getRealtimeToken callback and the Web Audio API.

This guide walks through a complete browser-side realtime transcription integration. The SDK manages token lifecycle automatically — you just provide a callback that fetches tokens from your server.

Architecture

┌──────────────────────────────────────────────────────┐
│                    Your Backend                      │
│                                                      │
│  POST /api/realtime-token                            │
│    → sb.realtime.createToken()                       │
│    ← { token: "sb_rt_...", expiresAt: "..." }        │
└──────────────────────┬───────────────────────────────┘
                       │ SDK calls your callback
                       │ automatically (on connect
                       │ and before token expiry)
                       ▼
┌──────────────────────────────────────────────────────┐
│                    Browser                           │
│                                                      │
│  new Scribeberry({                                    │
│    getRealtimeToken: () => fetch('/api/rt-token')     │
│  })                                                   │
│                                                      │
│  sb.realtime.transcribe({ language: 'en-US' })       │
│  → SDK fetches token → connects WS → streams audio   │
│  → auto-refreshes token before expiry                 │
└──────────────────────────────────────────────────────┘

Step 1: Server — Token Endpoint

Create an endpoint on your server that generates temporary tokens for authenticated users.

Express

import express from 'express';
import { Scribeberry } from '@scribeberry/sdk';
 
const app = express();
const sb = new Scribeberry({ apiKey: process.env.SCRIBEBERRY_API_KEY! });
 
app.post('/api/realtime-token', async (req, res) => {
  // Verify your own user is authenticated first!
  const { token, wsUrl, expiresAt } = await sb.realtime.createToken({
    expiresInSeconds: 3600,
  });
  res.json({ token, wsUrl, expiresAt });
});

Next.js

import { Scribeberry } from '@scribeberry/sdk';
import { NextResponse } from 'next/server';
 
const sb = new Scribeberry({ apiKey: process.env.SCRIBEBERRY_API_KEY! });
 
export async function POST() {
  // Verify your own user is authenticated first!
  const { token, wsUrl, expiresAt } = await sb.realtime.createToken({
    expiresInSeconds: 3600,
  });
  return NextResponse.json({ token, wsUrl, expiresAt });
}

Fastify

import Fastify from 'fastify';
import { Scribeberry } from '@scribeberry/sdk';
 
const app = Fastify();
const sb = new Scribeberry({ apiKey: process.env.SCRIBEBERRY_API_KEY! });
 
app.post('/api/realtime-token', async (req, reply) => {
  // Verify your own user is authenticated first!
  const { token, wsUrl, expiresAt } = await sb.realtime.createToken({
    expiresInSeconds: 3600,
  });
  return { token, wsUrl, expiresAt };
});

⚠️ Warning: Always authenticate your own users before issuing Scribeberry tokens. Don't expose this endpoint without your own auth layer.

Step 2: Browser — Initialize SDK with Token Callback

Instead of manually managing tokens, provide a getRealtimeToken callback. The SDK calls it automatically when it needs a token (on connect and before expiry).

import { Scribeberry } from '@scribeberry/sdk';
 
const sb = new Scribeberry({
  baseUrl: 'https://sandbox.api.scribeberry.com',
  getRealtimeToken: async () => {
    const response = await fetch('/api/realtime-token', {
      method: 'POST',
      headers: { 'Authorization': `Bearer ${yourUserJwt}` },
    });
 
    if (!response.ok) {
      throw new Error('Failed to get realtime token');
    }
 
    return response.json(); // must return { token, expiresAt }
  },
});

The SDK will:

Call your callback on first connect
Cache the token until it's close to expiry
Auto-refresh ~60 seconds before expiry (no interruption to active sessions)
Retry if refresh fails

Step 3: Browser — Start Session

async function startTranscription() {
  // Start realtime session — token is fetched automatically
  const session = sb.realtime.transcribe({
    language: 'en-US',
    enableDiarization: true,
  });
 
  // Set up event handlers
  session.on('partial', (text) => {
    document.getElementById('interim')!.textContent = text;
  });
 
  session.on('final', (segment) => {
    const el = document.getElementById('transcript')!;
    el.textContent += segment.text + ' ';
    document.getElementById('interim')!.textContent = '';
  });
 
  session.on('error', (err) => {
    console.error('Transcription error:', err.message);
  });
 
  return session;
}

Step 4: Browser — Capture Microphone Audio

Scribeberry expects PCM 16-bit signed little-endian, 16kHz, mono audio. Here's how to capture and convert it from the browser's Web Audio API:

async function startAudioCapture(session: RealtimeTranscriptionSession) {
  // Request microphone access
  const stream = await navigator.mediaDevices.getUserMedia({
    audio: {
      sampleRate: 16000,
      channelCount: 1,
      echoCancellation: true,
      noiseSuppression: true,
    },
  });
 
  // Create audio processing pipeline
  const audioContext = new AudioContext({ sampleRate: 16000 });
  const source = audioContext.createMediaStreamSource(stream);
  const processor = audioContext.createScriptProcessor(4096, 1, 1);
 
  processor.onaudioprocess = (event) => {
    if (session.state !== 'active') return;
 
    // Convert Float32 samples to Int16 PCM
    const float32 = event.inputBuffer.getChannelData(0);
    const int16 = new Int16Array(float32.length);
    for (let i = 0; i < float32.length; i++) {
      int16[i] = Math.max(
        -32768,
        Math.min(32767, Math.round(float32[i] * 32767)),
      );
    }
 
    session.sendAudio(int16.buffer);
  };
 
  source.connect(processor);
  processor.connect(audioContext.destination);
 
  // Return cleanup function
  return () => {
    processor.disconnect();
    source.disconnect();
    audioContext.close();
    stream.getTracks().forEach((track) => track.stop());
  };
}

Step 5: Putting It All Together

let session: RealtimeTranscriptionSession | null = null;
let cleanup: (() => void) | null = null;
 
async function onStartClick() {
  session = await startTranscription();
  cleanup = await startAudioCapture(session);
  console.log('Recording started');
}
 
async function onStopClick() {
  // Stop microphone
  cleanup?.();
  cleanup = null;
 
  // Stop session and get results
  if (session) {
    const result = await session.stop();
    console.log('Final transcript:', result.transcript);
    console.log('Duration:', result.durationSeconds, 'seconds');
    session = null;
  }
}

Complete React Example

import { useState, useRef, useCallback, useMemo } from 'react';
import { Scribeberry, RealtimeTranscriptionSession } from '@scribeberry/sdk';
 
export function TranscriptionWidget() {
  const [isRecording, setIsRecording] = useState(false);
  const [transcript, setTranscript] = useState('');
  const [interim, setInterim] = useState('');
  const sessionRef = useRef<RealtimeTranscriptionSession | null>(null);
  const cleanupRef = useRef<(() => void) | null>(null);
 
  // Create SDK once with token callback — tokens auto-refresh
  const sb = useMemo(() => new Scribeberry({
    getRealtimeToken: async () => {
      const res = await fetch('/api/realtime-token', { method: 'POST' });
      return res.json();
    },
  }), []);
 
  const start = useCallback(async () => {
    const session = sb.realtime.transcribe({ language: 'en-US' });
 
    session.on('partial', (text) => setInterim(text));
    session.on('final', (seg) => {
      setTranscript((prev) => prev + seg.text + ' ');
      setInterim('');
    });
 
    sessionRef.current = session;
 
    // Start microphone capture (simplified)
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    const ctx = new AudioContext({ sampleRate: 16000 });
    const src = ctx.createMediaStreamSource(stream);
    const proc = ctx.createScriptProcessor(4096, 1, 1);
 
    proc.onaudioprocess = (e) => {
      if (session.state !== 'active') return;
      const f32 = e.inputBuffer.getChannelData(0);
      const i16 = new Int16Array(f32.length);
      for (let i = 0; i < f32.length; i++) {
        i16[i] = Math.max(-32768, Math.min(32767, Math.round(f32[i] * 32767)));
      }
      session.sendAudio(i16.buffer);
    };
 
    src.connect(proc);
    proc.connect(ctx.destination);
    cleanupRef.current = () => {
      proc.disconnect(); src.disconnect(); ctx.close();
      stream.getTracks().forEach((t) => t.stop());
    };
 
    setIsRecording(true);
  }, []);
 
  const stop = useCallback(async () => {
    cleanupRef.current?.();
    const result = await sessionRef.current?.stop();
    setIsRecording(false);
    setInterim('');
  }, []);
 
  return (
    <div>
      <button onClick={isRecording ? stop : start}>
        {isRecording ? '⏹ Stop' : '🎙 Start'}
      </button>
      <div style={{ whiteSpace: 'pre-wrap' }}>
        {transcript}
        <span style={{ color: 'gray' }}>{interim}</span>
      </div>
    </div>
  );
}

Token Lifecycle

When using the getRealtimeToken callback, the SDK handles token lifecycle automatically:

First connect — SDK calls your callback to get a token
During session — SDK tracks the expiresAt timestamp
Before expiry — SDK calls your callback again ~60s before the token expires
Seamless refresh — active sessions continue uninterrupted

You don't need to write any token refresh logic. If you need manual control, you can still use the static apiKey: 'sb_rt_...' pattern instead.

Troubleshooting

Issue	Solution
No audio data received	Check that `getUserMedia` permissions are granted and the `AudioContext` sample rate is 16000
WebSocket closes immediately	Verify the token is valid and hasn't expired. Check CORS configuration.
Transcript is garbled	Ensure audio is PCM 16-bit, 16kHz, mono. Float32 audio will not work.
"Session not active" errors	Wait for the `started` event before calling `sendAudio()`
High latency	Use a buffer size of 4096 or smaller in `createScriptProcessor()`

On this page