(04) — EMBEDDABLE AI CHAT / MULTI-TENANT SAAS WIDGET
A multi-tenant embeddable AI chat widget with a FastAPI backend, React admin dashboard, and serverless deployment. Each client gets their own widget instance, system prompt, styling tokens, and API keys — managed through a three-panel admin interface.
Most embeddable chat widgets are either configuration-less drop-ins with no backend control, or enterprise platforms with six-figure contracts. I wanted to build the middle ground: a lightweight widget that any site owner could embed with a single script tag, backed by a full multi-tenant platform for managing clients, conversations, and AI behavior.
The system is designed as a SaaS platform, not a single-use widget. Each client (organization) gets their own configured instance: a unique slug, API keys with per-key rate limits, a custom system prompt that controls the bot's personality, allowed CORS origins, and a full design token system for styling the widget to match their brand. Conversations are isolated per client. The admin dashboard manages everything through a three-panel interface — client list, configuration editor, and live widget preview.
The system is three services working in concert: a Web Component widget compiled to a standalone JS bundle via esbuild, a FastAPI backend serving the chat API and admin endpoints, and a React dashboard for client management. Each service is independently deployable — the widget ships as a CDN-hosted script, the backend runs on AWS Lambda via Mangum, and the dashboard is a static Vite build.
A. Dual authentication
Two different audiences need two different auth mechanisms. The widget authenticates via API key — sent as an X-API-Key header on every request, SHA-256 hashed server-side for lookup. I chose SHA-256 over bcrypt deliberately: bcrypt is intentionally slow (good for login protection), but per-request widget auth needs sub-millisecond verification. The dashboard uses JWT — bcrypt-hashed password at login, then HS256-signed tokens with 1-hour expiry for subsequent requests.
B. Streaming
Chat responses stream token-by-token via Server-Sent Events. The backend receives a message, loads the conversation history (capped at 50 messages), and streams the Gemini response through an async generator that emits SSE events. Each event is typed: token, done, or error. The assistant message is persisted to the database only after the stream completes.
async def stream_chat(message: str, session_id: str, client: Client, db: AsyncSession):
"""Stream AI response token-by-token via SSE."""
conversation = await get_or_create_conversation(db, client.id, session_id)
await save_message(db, conversation.id, "user", message)
history = await get_history(db, conversation.id, limit=MAX_HISTORY)
system_prompt = client.system_prompt # never sent to frontend
async def event_generator():
full_response = ""
async for chunk in gemini.stream_response(history, system_prompt):
full_response += chunk
yield f"event: token\ndata: {json.dumps({'content': chunk})}\n\n"
await save_message(db, conversation.id, "assistant", full_response)
yield f"event: done\ndata: {json.dumps({'status': 'complete'})}\n\n"
return StreamingResponse(event_generator(), media_type="text/event-stream")C. Async-native
The backend is async throughout — not as an afterthought, but as an architectural constraint. SQLAlchemy async sessions with async_sessionmaker, async generators for SSE streaming, and FastAPI's dependency injection for auth and database access. The Gemini client initializes lazily so the app starts without an API key in dev. Rate limiting uses a sliding window algorithm tracking per-key request timestamps, with an abstraction layer that swaps between in-memory (single instance) and Redis (distributed) backends.
The widget is a <chat-widget> custom HTML element that attaches a Shadow DOM root. Shadow DOM was non-negotiable — the widget embeds on third-party sites where host CSS could break the layout and widget styles could pollute the host page. The Shadow DOM boundary provides complete isolation. Styling is driven by CSS variables injected from the server's design token system.
The design token system is a structured Pydantic model with five token groups: brand (primary and secondary colors), typography (font family, sizes, weights), shape (border radius, width), layout (position, dimensions), and motion (animation style and speed). The admin dashboard edits these tokens through a visual interface with a live preview panel. On the widget side, tokens are converted to CSS custom properties and injected into the Shadow DOM at initialization.
Conversations are scoped per browser tab. A UUID is generated and stored in sessionStorage — not localStorage — so each tab gets its own conversation thread. Closing the tab discards the session. This was a deliberate UX choice: returning visitors start fresh rather than resuming a potentially stale conversation from days ago.
The system prompt is injected server-side only. It never appears in any client-facing response or configuration payload. This prevents prompt extraction attacks — a visitor can't inspect network traffic or DOM state to discover the bot's instructions.
Streaming SSE over POST
The browser's EventSourceAPI only supports GET requests. Chat messages need to be sent as POST with a JSON body — you can't stuff a conversation payload into query parameters. I had to build a manual SSE parser in the widget that uses fetch()with a readable stream, splits the incoming bytes on double newlines, parses event types and data fields, and dispatches tokens to the DOM for the typing effect. Error handling had to account for partial chunks, network interruptions mid-stream, and the backend's error events.
Widget CSS isolation
Shadow DOM solved the isolation problem, but introduced a new one: the design token system needed to inject styles intothe shadow root without breaking the boundary. Global stylesheets don't penetrate Shadow DOM. I built a CSS generation layer that converts the Pydantic token model into a complete stylesheet string, injects it as a <style> element inside the shadow root, and re-renders when the admin updates tokens in real time. The live preview in the dashboard renders the actual widget component — not a mockup — so what the admin sees is exactly what deploys.
BubbleChat is a working side project that covers the full stack: a compiled Web Component, an async Python API, a React admin interface, and container-based deployment to AWS Lambda. The database layer supports both SQLite (dev) and PostgreSQL (production) through a single connection URL swap, with Alembic managing schema migrations.
The project was built to demonstrate end-to-end product engineering — not just a frontend or just an API, but a complete multi-tenant platform with auth, rate limiting, real-time streaming, and infrastructure-as-code. Every service is containerized, the backend is tested with pytest and async fixtures, and the deployment is defined in an AWS SAM template.