Inside Heyyo: Architecture Deep Dive

This is a technical walkthrough of how Heyyo is put together. We’ll cover the stack choices (Node.js + TypeScript + Express + Socket.IO + PostgreSQL + Redis) and the system design decisions that make a multi-tenant chat product sane.

Contents

System overview
Why Socket.IO (vs raw WebSockets)
PostgreSQL for persistence
Redis for pub/sub + caching
Multi-tenant isolation (RLS + app scoping)
Webhooks: integration boundary
Rate limiting and abuse controls
Usage + billing (MAU)

System overview

At a high level, Heyyo is two planes:

Control plane (REST): create channels, list messages, manage uploads, configure webhooks.
Realtime plane (Socket.IO): low-latency events for message delivery, typing, and presence.

architecture sketch

Clients (web/mobile)
  |\
  | \__ REST (Express) -------------\
  |                                 \\
  \____ Socket.IO (realtime) ----->  API Service (Node.js)
                                     |         |
                                     |         |-- Redis (pub/sub, cache)
                                     |
                                     \-- PostgreSQL (durable storage)
                                         - apps, users, channels, messages
                                         - row-level security (RLS)

                                     \-- Webhooks worker (deliver events)

                                     \-- Billing/usage (MAU-based)

Why Socket.IO (vs raw WebSockets)

Raw WebSockets are a great building block, but a chat product needs a lot of behavior on top:

Authentication and re-auth on reconnect
Reliable reconnect + backoff
Room semantics (channels, threads)
Ack/callback patterns (e.g. send message and receive an ID)
Cross-instance fanout when you run multiple servers

Socket.IO gives a pragmatic set of primitives for those problems. You still need good server-side validation and persistence, but Socket.IO reduces the amount of custom protocol surface area you have to maintain.

Event shape

Events are message-primitive first (e.g. message.new) rather than transport primitives. That keeps the client SDK ergonomic and the React package easy to wire up.

PostgreSQL for persistence

Chat data is durable product data, not cache. PostgreSQL is the system of record for:

Apps and API credentials
Users and identity mappings
Channels and membership metadata
Messages, threads, reactions
Audit logs and webhook configuration

The critical design constraint is multi-tenancy. We model tenant ownership explicitly (e.g. app_id) and ensure that every query is app-scoped.

Redis for pub/sub + caching

Redis plays two roles:

Pub/sub for realtime fanout when running multiple API instances. Socket.IO emits on one node; Redis broadcasts; other nodes deliver to connected clients.
Caching for hot, bounded state: rate limit counters, session-ish ephemeral values, and cheap lookups.

Why not store everything in Redis?

Redis is fantastic for ephemeral coordination, but it’s not our durability story. Messages live in Postgres so you can fetch history, build search, and have strong invariants.

Multi-tenant isolation (RLS + app scoping)

The fastest way to break trust is a tenant data leak. Heyyo uses “defense in depth”:

Application-level scoping: every request is associated with an app; queries filter by app_id.
Database-level enforcement: PostgreSQL Row Level Security (RLS) policies prevent cross-tenant access even if a bug slips into application logic.
JWT claims: tokens contain the app context and user identity (e.g. app_id, user_id) so authorization is always explicit.

Webhooks: integration boundary

Webhooks exist because chat is rarely an island. Most products need to integrate chat events into other systems:

Notify a support workflow when a message arrives in a “support” channel
Trigger moderation pipelines
Update analytics or data warehouses

The design goal is simple: webhook delivery should be reliable and non-blocking. Requests that create messages shouldn’t wait on external HTTP calls.

Rate limiting and abuse controls

A chat API is a magnet for “unfriendly traffic.” Rate limiting is not just performance; it’s a security and cost-control boundary.

App-level limits: protect the platform from accidental loops and runaway clients.
User/IP limits: protect individual tenants from abuse.
Plan-aware enforcement: higher tiers can carry higher burst capacity.

Usage + billing (MAU)

Heyyo is priced by MAU because it aligns with how chat creates value: it’s directly tied to the number of end users you serve.

Internally, that means we track “active” users per app over a billing period and enforce plan limits in a few strategic places (e.g. token minting / auth, and connection initialization).

If you want more detail

The best next step is the API docs and quickstart. They show the concrete endpoints and event names used in the SDK and React package.

Docs → Quickstart 5-minute tutorial

We’ll continue to publish deeper implementation details (schema choices, indexing, retention, and search) as the platform evolves.