Skip to content

Architecture & data flow

This page documents Pensum’s full architecture: which code runs on your machine, what runs on Pensum’s servers, and exactly what data crosses the network and when. If you’re evaluating Pensum for privacy, security, or compliance reasons, this should answer every question you’d ask.

This page covers the public-facing architecture. Internal implementation details (database schema, deployment pipeline, admin tooling) aren’t covered here because they’re not user-facing and aren’t relevant to the trust questions.

Pensum is mostly software that runs on your own computer, plus a small cloud backend that exists almost entirely to validate licenses. Your vault content — tasks, notes, projects, meeting transcripts — lives on your machine and never touches our servers. The only time data leaves your device is when you explicitly use a feature that requires it (license validation, AI features, transcription).

The whole picture in one diagram. Every component, every network edge.

flowchart TB
    classDef user fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
    classDef cf fill:#fed7aa,stroke:#f97316,color:#7c2d12
    classDef third fill:#e9d5ff,stroke:#a855f7,color:#581c87
    classDef vault fill:#d1fae5,stroke:#10b981,color:#064e3b

    subgraph User["Your machine"]
        Plugin["Pensum Plugin"]:::user
        CLI["Pensum CLI"]:::user
        MCP["Pensum MCP"]:::user
        Vault[("Your vault")]:::vault
    end

    subgraph CF["Pensum on Cloudflare"]
        API["api.pensum.dev<br/>Worker"]:::cf
        Backend["D1 · R2 ·<br/>Analytics Engine"]:::cf
    end

    subgraph Third["Third parties"]
        AI["OpenAI / Anthropic"]:::third
        DG["Deepgram"]:::third
        MoR["Polar.sh"]:::third
    end

    Plugin --> Vault
    CLI --> Vault
    MCP --> Vault
    API --- Backend

    Plugin -- "license<br/>+ managed AI" --> API
    Plugin -. "BYO direct" .-> AI
    Plugin -. "BYO direct" .-> DG
    API --> AI
    API --> DG
    MoR -- "webhooks" --> API

Four pieces of software run on your machine; everything else is small. The cloud backend exists primarily to validate licenses; everything else is opt-in based on your plan.

Legend:

  • Blue boxes run on your machine
  • Orange boxes are Pensum’s cloud infrastructure on Cloudflare
  • Purple boxes are third-party services
  • Green is your vault — Pensum reads and writes it, but it stays on your machine
  • Solid arrows are network calls Pensum’s servers participate in
  • Dotted arrows are direct calls from your device that bypass Pensum entirely (BYO mode)

A different cut of the same picture: not by component, but by data category. What’s on your machine, what’s on our servers, and what’s on third parties.

flowchart TB
    classDef yours fill:#d1fae5,stroke:#10b981,color:#064e3b
    classDef ours fill:#fed7aa,stroke:#f97316,color:#7c2d12
    classDef third fill:#e9d5ff,stroke:#a855f7,color:#581c87
    classDef nope fill:#fef3c7,stroke:#eab308,color:#713f12

    subgraph On["On your machine"]
        V["Your vault<br/>tasks, projects, notes, transcripts"]:::yours
        S["Pensum's index<br/>.pensum/index.json"]:::yours
        K["AI provider keys<br/>OS keychain via SecretStorage"]:::yours
        A["Meeting audio files"]:::yours
    end

    subgraph CF["On Pensum's servers"]
        CFnote[/"No vault content.<br/>No audio. No transcripts.<br/>No file names."/]:::nope
        Acc["Account email"]:::ours
        Sub["Subscription state"]:::ours
        Lic["License key + device IDs"]:::ours
        Use["Usage metadata<br/>call count, duration, timestamp"]:::ours
    end

    subgraph Third["On third-party services"]
        MOR["Merchant of record:<br/>payment information"]:::third
        Prov["AI / transcription providers:<br/>request content during call"]:::third
    end

The yellow note is load-bearing: we have no way to see your vault content because we never receive it. AI providers see request content for the duration of the call they’re processing, but our servers don’t store it.

When you have a Pro subscription, your plugin checks in with our license server approximately every 7 days to confirm your license is still active. This is the only “phone home” behavior in the plugin.

sequenceDiagram
    autonumber
    box rgb(219, 234, 254) Your machine
    participant P as Plugin
    end
    box rgb(254, 215, 170) Pensum (Cloudflare)
    participant W as Worker
    participant D as D1 database
    end

    P->>W: POST /v1/validate<br/>{ license_key, device_id }
    W->>D: Look up license + plan + features<br/>(indexed read, ~10ms)
    D-->>W: License record
    Note over W: Verify status,<br/>bind device if new,<br/>sign fresh JWT
    W-->>P: { valid: true, jwt, features }
    Note over P: Verify JWT signature locally<br/>using embedded public key<br/>(no further server calls<br/>until next 7-day refresh)

What’s sent in the request: your license key and a randomly-generated device ID (a UUID created the first time you activate Pro). What’s not sent: your vault content, file names, task descriptions, settings, or anything about how you use the plugin.

If our license server is unavailable, the plugin continues to work for up to 30 days using the cached JWT (the “offline grace period”).

When you’re on the Pro All-in-One plan and record a meeting under 20 minutes, your audio is sent to our worker, which streams it directly to Deepgram and returns the transcript. Nothing is written to disk on our side.

sequenceDiagram
    autonumber
    box rgb(219, 234, 254) Your machine
    participant P as Plugin
    end
    box rgb(254, 215, 170) Pensum (Cloudflare)
    participant W as Worker
    end
    box rgb(233, 213, 255) Third party
    participant D as Deepgram
    end

    P->>W: POST /v1/transcribe<br/>(audio bytes, < 20 min)
    Note over W: Verify license,<br/>check monthly quota
    W->>D: Forward audio bytes
    Note over D: Transcribe + diarize<br/>(speaker labels)
    D-->>W: Transcript JSON
    Note over W: Log usage event:<br/>account, duration, timestamp.<br/>No content stored.
    W-->>P: Transcript
    Note over P: Render into vault.<br/>Plugin owns the transcript.

The worker holds the audio in memory only for the duration of the request. No R2 bucket, no D1 row referencing content, no caching of transcripts. After the response is returned to your plugin, the audio is gone from our infrastructure.

For audio over 20 minutes, the synchronous path doesn’t work (Cloudflare worker request limits). We use an asynchronous flow with R2 storage as a staging area, but we delete both the audio and transcript immediately once your plugin confirms receipt.

sequenceDiagram
    autonumber
    box rgb(219, 234, 254) Your machine
    participant P as Plugin
    end
    box rgb(254, 215, 170) Pensum (Cloudflare)
    participant W as Worker
    participant R as R2 bucket<br/>(1h TTL backstop)
    end
    box rgb(233, 213, 255) Third party
    participant D as Deepgram
    end

    P->>W: POST /v1/transcribe/start
    Note over W: Verify, check quota,<br/>create job row (metadata only)
    W-->>P: { job_id, upload_url }
    P->>R: PUT audio (signed URL)
    P->>W: POST /v1/transcribe/{id}/submit
    W->>D: Submit with R2 URL + webhook
    D->>R: Fetch audio
    Note over D: Process asynchronously<br/>(typically 1-3 min for 1h audio)
    D->>W: POST /v1/webhooks/deepgram<br/>(transcript)
    Note over W: Verify signature,<br/>log usage event
    W-->>P: Push transcript<br/>(SSE or short-lived fetch URL)
    P->>W: Confirm receipt
    Note over W,R: Delete audio + transcript<br/>from R2 immediately

Two retention windows worth understanding:

  • Audio in R2: From upload until Deepgram fetches it, plus a 1-hour TTL backstop in case the plugin disconnects. Under normal flow: minutes, not hours.
  • Transcript: From the moment Deepgram returns it until your plugin confirms receipt. Under normal flow: seconds.

After deletion, the only record on our side is a metadata row: (job_id, user_id, status=completed, duration_seconds, timestamp). No content references.

We log server-side events for billing, abuse detection, and product analytics. These are stored in our analytics layer and inform our admin dashboard. We do not run analytics inside the plugin and do not embed analytics SDKs in your Obsidian.

EventWhenWhat’s recorded
license_validatedPlugin checks in (~weekly)Your account, device ID, plan, success/failure
subscription_createdPayment webhook firesAccount, plan, timestamp
subscription_canceledPayment webhook firesAccount, plan, total months active
smart_capture_callManaged text AI requestAccount, model, latency, success
transcription_completedDeepgram webhook returnsAccount, duration in seconds, timestamp
quota_exceededYou hit a daily capAccount, feature, attempted action

What’s not logged: prompt text, task descriptions, transcript content, audio, file paths, settings values, or anything about how you use Pensum locally.

Pensum supports three different privacy postures for AI features. Mode is plan-driven, not a per-feature toggle: your plan picks the mode, then a per-feature dropdown chooses which model handles each call within that mode.

ModePlanAudio / prompts
BYO keyPro $9Goes directly from your machine to your chosen provider. Pensum’s servers are not involved.
ManagedPro All-in-One $29 / TrialPasses through our worker transiently. Not stored. Logged metadata only.
LocalFuture, no committed dateRuns entirely on your machine via Whisper.cpp or similar. Nothing leaves your device.

Pro All-in-One users who want to override a specific feature with their own key can still configure a BYO key for the relevant provider — the per-feature model dropdown then routes that feature through BYO while the rest stay managed. Pensum never silently swaps modes; any fallback is initiated by you from a failure-notice action link.

If you want to verify any of the claims on this page:

  • Plugin source code is public: github.com/craigmccaskill/pensum
  • License server source code is in the same repo under apps/license-server/
  • Network traffic from the plugin can be inspected with any HTTP proxy (Charles, mitmproxy)
  • Privacy claims for third parties are linked from our Privacy Policy
  • Specific concerns can be sent to support@pensum.dev — we’ll get back to you

This page changes as the system evolves. Each material change is announced in our release notes.