ScribeberryScribeberry Docs

Browser Integration

Set up realtime transcription in a web browser using the getRealtimeToken callback and the Web Audio API.

This guide walks through a complete browser-side realtime transcription integration. The SDK manages token lifecycle automatically — you just provide a callback that fetches tokens from your server.

Architecture

┌──────────────────────────────────────────────────────┐
│                    Your Backend                      │
│                                                      │
│  POST /api/realtime-token                            │
│    → sb.realtime.createToken()                       │
│    ← { token: "sb_rt_...", expiresAt: "..." }        │
└──────────────────────┬───────────────────────────────┘
                       │ SDK calls your callback
                       │ automatically (on connect
                       │ and before token expiry)

┌──────────────────────────────────────────────────────┐
│                    Browser                           │
│                                                      │
│  new Scribeberry({                                    │
│    getRealtimeToken: () => fetch('/api/rt-token')     │
│  })                                                   │
│                                                      │
│  sb.realtime.transcribe({ language: 'en-US' })       │
│  → SDK fetches token → connects WS → streams audio   │
│  → auto-refreshes token before expiry                 │
└──────────────────────────────────────────────────────┘

Step 1: Server — Token Endpoint

Create an endpoint on your server that generates temporary tokens for authenticated users.

Express

import express from 'express';
import { Scribeberry } from '@scribeberry/sdk';
 
const app = express();
const sb = new Scribeberry({ apiKey: process.env.SCRIBEBERRY_API_KEY! });
 
app.post('/api/realtime-token', async (req, res) => {
  // Verify your own user is authenticated first!
  const { token, wsUrl, expiresAt } = await sb.realtime.createToken({
    expiresInSeconds: 3600,
  });
  res.json({ token, wsUrl, expiresAt });
});

Next.js

import { Scribeberry } from '@scribeberry/sdk';
import { NextResponse } from 'next/server';
 
const sb = new Scribeberry({ apiKey: process.env.SCRIBEBERRY_API_KEY! });
 
export async function POST() {
  // Verify your own user is authenticated first!
  const { token, wsUrl, expiresAt } = await sb.realtime.createToken({
    expiresInSeconds: 3600,
  });
  return NextResponse.json({ token, wsUrl, expiresAt });
}

Fastify

import Fastify from 'fastify';
import { Scribeberry } from '@scribeberry/sdk';
 
const app = Fastify();
const sb = new Scribeberry({ apiKey: process.env.SCRIBEBERRY_API_KEY! });
 
app.post('/api/realtime-token', async (req, reply) => {
  // Verify your own user is authenticated first!
  const { token, wsUrl, expiresAt } = await sb.realtime.createToken({
    expiresInSeconds: 3600,
  });
  return { token, wsUrl, expiresAt };
});

⚠️ Warning: Always authenticate your own users before issuing Scribeberry tokens. Don't expose this endpoint without your own auth layer.

Step 2: Browser — Initialize SDK with Token Callback

Instead of manually managing tokens, provide a getRealtimeToken callback. The SDK calls it automatically when it needs a token (on connect and before expiry).

import { Scribeberry } from '@scribeberry/sdk';
 
const sb = new Scribeberry({
  baseUrl: 'https://sandbox.api.scribeberry.com',
  getRealtimeToken: async () => {
    const response = await fetch('/api/realtime-token', {
      method: 'POST',
      headers: { 'Authorization': `Bearer ${yourUserJwt}` },
    });
 
    if (!response.ok) {
      throw new Error('Failed to get realtime token');
    }
 
    return response.json(); // must return { token, expiresAt }
  },
});

The SDK will:

  • Call your callback on first connect
  • Cache the token until it's close to expiry
  • Auto-refresh ~60 seconds before expiry (no interruption to active sessions)
  • Retry if refresh fails

Step 3: Browser — Start Session

async function startTranscription() {
  // Start realtime session — token is fetched automatically
  const session = sb.realtime.transcribe({
    language: 'en-US',
    enableDiarization: true,
  });
 
  // Set up event handlers
  session.on('partial', (text) => {
    document.getElementById('interim')!.textContent = text;
  });
 
  session.on('final', (segment) => {
    const el = document.getElementById('transcript')!;
    el.textContent += segment.text + ' ';
    document.getElementById('interim')!.textContent = '';
  });
 
  session.on('error', (err) => {
    console.error('Transcription error:', err.message);
  });
 
  return session;
}

Step 4: Browser — Capture Microphone Audio

Scribeberry expects PCM 16-bit signed little-endian, 16kHz, mono audio. Here's how to capture and convert it from the browser's Web Audio API:

async function startAudioCapture(session: RealtimeTranscriptionSession) {
  // Request microphone access
  const stream = await navigator.mediaDevices.getUserMedia({
    audio: {
      sampleRate: 16000,
      channelCount: 1,
      echoCancellation: true,
      noiseSuppression: true,
    },
  });
 
  // Create audio processing pipeline
  const audioContext = new AudioContext({ sampleRate: 16000 });
  const source = audioContext.createMediaStreamSource(stream);
  const processor = audioContext.createScriptProcessor(4096, 1, 1);
 
  processor.onaudioprocess = (event) => {
    if (session.state !== 'active') return;
 
    // Convert Float32 samples to Int16 PCM
    const float32 = event.inputBuffer.getChannelData(0);
    const int16 = new Int16Array(float32.length);
    for (let i = 0; i < float32.length; i++) {
      int16[i] = Math.max(
        -32768,
        Math.min(32767, Math.round(float32[i] * 32767)),
      );
    }
 
    session.sendAudio(int16.buffer);
  };
 
  source.connect(processor);
  processor.connect(audioContext.destination);
 
  // Return cleanup function
  return () => {
    processor.disconnect();
    source.disconnect();
    audioContext.close();
    stream.getTracks().forEach((track) => track.stop());
  };
}

Step 5: Putting It All Together

let session: RealtimeTranscriptionSession | null = null;
let cleanup: (() => void) | null = null;
 
async function onStartClick() {
  session = await startTranscription();
  cleanup = await startAudioCapture(session);
  console.log('Recording started');
}
 
async function onStopClick() {
  // Stop microphone
  cleanup?.();
  cleanup = null;
 
  // Stop session and get results
  if (session) {
    const result = await session.stop();
    console.log('Final transcript:', result.transcript);
    console.log('Duration:', result.durationSeconds, 'seconds');
    session = null;
  }
}

Complete React Example

import { useState, useRef, useCallback, useMemo } from 'react';
import { Scribeberry, RealtimeTranscriptionSession } from '@scribeberry/sdk';
 
export function TranscriptionWidget() {
  const [isRecording, setIsRecording] = useState(false);
  const [transcript, setTranscript] = useState('');
  const [interim, setInterim] = useState('');
  const sessionRef = useRef<RealtimeTranscriptionSession | null>(null);
  const cleanupRef = useRef<(() => void) | null>(null);
 
  // Create SDK once with token callback — tokens auto-refresh
  const sb = useMemo(() => new Scribeberry({
    getRealtimeToken: async () => {
      const res = await fetch('/api/realtime-token', { method: 'POST' });
      return res.json();
    },
  }), []);
 
  const start = useCallback(async () => {
    const session = sb.realtime.transcribe({ language: 'en-US' });
 
    session.on('partial', (text) => setInterim(text));
    session.on('final', (seg) => {
      setTranscript((prev) => prev + seg.text + ' ');
      setInterim('');
    });
 
    sessionRef.current = session;
 
    // Start microphone capture (simplified)
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    const ctx = new AudioContext({ sampleRate: 16000 });
    const src = ctx.createMediaStreamSource(stream);
    const proc = ctx.createScriptProcessor(4096, 1, 1);
 
    proc.onaudioprocess = (e) => {
      if (session.state !== 'active') return;
      const f32 = e.inputBuffer.getChannelData(0);
      const i16 = new Int16Array(f32.length);
      for (let i = 0; i < f32.length; i++) {
        i16[i] = Math.max(-32768, Math.min(32767, Math.round(f32[i] * 32767)));
      }
      session.sendAudio(i16.buffer);
    };
 
    src.connect(proc);
    proc.connect(ctx.destination);
    cleanupRef.current = () => {
      proc.disconnect(); src.disconnect(); ctx.close();
      stream.getTracks().forEach((t) => t.stop());
    };
 
    setIsRecording(true);
  }, []);
 
  const stop = useCallback(async () => {
    cleanupRef.current?.();
    const result = await sessionRef.current?.stop();
    setIsRecording(false);
    setInterim('');
  }, []);
 
  return (
    <div>
      <button onClick={isRecording ? stop : start}>
        {isRecording ? '⏹ Stop' : '🎙 Start'}
      </button>
      <div style={{ whiteSpace: 'pre-wrap' }}>
        {transcript}
        <span style={{ color: 'gray' }}>{interim}</span>
      </div>
    </div>
  );
}

Token Lifecycle

When using the getRealtimeToken callback, the SDK handles token lifecycle automatically:

  1. First connect — SDK calls your callback to get a token
  2. During session — SDK tracks the expiresAt timestamp
  3. Before expiry — SDK calls your callback again ~60s before the token expires
  4. Seamless refresh — active sessions continue uninterrupted

You don't need to write any token refresh logic. If you need manual control, you can still use the static apiKey: 'sb_rt_...' pattern instead.

Troubleshooting

IssueSolution
No audio data receivedCheck that getUserMedia permissions are granted and the AudioContext sample rate is 16000
WebSocket closes immediatelyVerify the token is valid and hasn't expired. Check CORS configuration.
Transcript is garbledEnsure audio is PCM 16-bit, 16kHz, mono. Float32 audio will not work.
"Session not active" errorsWait for the started event before calling sendAudio()
High latencyUse a buffer size of 4096 or smaller in createScriptProcessor()

On this page