Architecture & data flow
This page documents Pensum’s full architecture: which code runs on your machine, what runs on Pensum’s servers, and exactly what data crosses the network and when. If you’re evaluating Pensum for privacy, security, or compliance reasons, this should answer every question you’d ask.
This page covers the public-facing architecture. Internal implementation details (database schema, deployment pipeline, admin tooling) aren’t covered here because they’re not user-facing and aren’t relevant to the trust questions.
The short version
Section titled “The short version”Pensum is mostly software that runs on your own computer, plus a small cloud backend that exists almost entirely to validate licenses. Your vault content — tasks, notes, projects, meeting transcripts — lives on your machine and never touches our servers. The only time data leaves your device is when you explicitly use a feature that requires it (license validation, AI features, transcription).
System overview
Section titled “System overview”The whole picture in one diagram. Every component, every network edge.
flowchart TB
classDef user fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
classDef cf fill:#fed7aa,stroke:#f97316,color:#7c2d12
classDef third fill:#e9d5ff,stroke:#a855f7,color:#581c87
classDef vault fill:#d1fae5,stroke:#10b981,color:#064e3b
subgraph User["Your machine"]
Plugin["Pensum Plugin"]:::user
CLI["Pensum CLI"]:::user
MCP["Pensum MCP"]:::user
Vault[("Your vault")]:::vault
end
subgraph CF["Pensum on Cloudflare"]
API["api.pensum.dev<br/>Worker"]:::cf
Backend["D1 · R2 ·<br/>Analytics Engine"]:::cf
end
subgraph Third["Third parties"]
AI["OpenAI / Anthropic"]:::third
DG["Deepgram"]:::third
MoR["Polar.sh"]:::third
end
Plugin --> Vault
CLI --> Vault
MCP --> Vault
API --- Backend
Plugin -- "license<br/>+ managed AI" --> API
Plugin -. "BYO direct" .-> AI
Plugin -. "BYO direct" .-> DG
API --> AI
API --> DG
MoR -- "webhooks" --> API
Four pieces of software run on your machine; everything else is small. The cloud backend exists primarily to validate licenses; everything else is opt-in based on your plan.
Legend:
- Blue boxes run on your machine
- Orange boxes are Pensum’s cloud infrastructure on Cloudflare
- Purple boxes are third-party services
- Green is your vault — Pensum reads and writes it, but it stays on your machine
- Solid arrows are network calls Pensum’s servers participate in
- Dotted arrows are direct calls from your device that bypass Pensum entirely (BYO mode)
Where your data actually lives
Section titled “Where your data actually lives”A different cut of the same picture: not by component, but by data category. What’s on your machine, what’s on our servers, and what’s on third parties.
flowchart TB
classDef yours fill:#d1fae5,stroke:#10b981,color:#064e3b
classDef ours fill:#fed7aa,stroke:#f97316,color:#7c2d12
classDef third fill:#e9d5ff,stroke:#a855f7,color:#581c87
classDef nope fill:#fef3c7,stroke:#eab308,color:#713f12
subgraph On["On your machine"]
V["Your vault<br/>tasks, projects, notes, transcripts"]:::yours
S["Pensum's index<br/>.pensum/index.json"]:::yours
K["AI provider keys<br/>OS keychain via SecretStorage"]:::yours
A["Meeting audio files"]:::yours
end
subgraph CF["On Pensum's servers"]
CFnote[/"No vault content.<br/>No audio. No transcripts.<br/>No file names."/]:::nope
Acc["Account email"]:::ours
Sub["Subscription state"]:::ours
Lic["License key + device IDs"]:::ours
Use["Usage metadata<br/>call count, duration, timestamp"]:::ours
end
subgraph Third["On third-party services"]
MOR["Merchant of record:<br/>payment information"]:::third
Prov["AI / transcription providers:<br/>request content during call"]:::third
end
The yellow note is load-bearing: we have no way to see your vault content because we never receive it. AI providers see request content for the duration of the call they’re processing, but our servers don’t store it.
License validation flow
Section titled “License validation flow”When you have a Pro subscription, your plugin checks in with our license server approximately every 7 days to confirm your license is still active. This is the only “phone home” behavior in the plugin.
sequenceDiagram
autonumber
box rgb(219, 234, 254) Your machine
participant P as Plugin
end
box rgb(254, 215, 170) Pensum (Cloudflare)
participant W as Worker
participant D as D1 database
end
P->>W: POST /v1/validate<br/>{ license_key, device_id }
W->>D: Look up license + plan + features<br/>(indexed read, ~10ms)
D-->>W: License record
Note over W: Verify status,<br/>bind device if new,<br/>sign fresh JWT
W-->>P: { valid: true, jwt, features }
Note over P: Verify JWT signature locally<br/>using embedded public key<br/>(no further server calls<br/>until next 7-day refresh)
What’s sent in the request: your license key and a randomly-generated device ID (a UUID created the first time you activate Pro). What’s not sent: your vault content, file names, task descriptions, settings, or anything about how you use the plugin.
If our license server is unavailable, the plugin continues to work for up to 30 days using the cached JWT (the “offline grace period”).
Transcription: managed mode, short audio
Section titled “Transcription: managed mode, short audio”When you’re on the Pro All-in-One plan and record a meeting under 20 minutes, your audio is sent to our worker, which streams it directly to Deepgram and returns the transcript. Nothing is written to disk on our side.
sequenceDiagram
autonumber
box rgb(219, 234, 254) Your machine
participant P as Plugin
end
box rgb(254, 215, 170) Pensum (Cloudflare)
participant W as Worker
end
box rgb(233, 213, 255) Third party
participant D as Deepgram
end
P->>W: POST /v1/transcribe<br/>(audio bytes, < 20 min)
Note over W: Verify license,<br/>check monthly quota
W->>D: Forward audio bytes
Note over D: Transcribe + diarize<br/>(speaker labels)
D-->>W: Transcript JSON
Note over W: Log usage event:<br/>account, duration, timestamp.<br/>No content stored.
W-->>P: Transcript
Note over P: Render into vault.<br/>Plugin owns the transcript.
The worker holds the audio in memory only for the duration of the request. No R2 bucket, no D1 row referencing content, no caching of transcripts. After the response is returned to your plugin, the audio is gone from our infrastructure.
Transcription: managed mode, long audio
Section titled “Transcription: managed mode, long audio”For audio over 20 minutes, the synchronous path doesn’t work (Cloudflare worker request limits). We use an asynchronous flow with R2 storage as a staging area, but we delete both the audio and transcript immediately once your plugin confirms receipt.
sequenceDiagram
autonumber
box rgb(219, 234, 254) Your machine
participant P as Plugin
end
box rgb(254, 215, 170) Pensum (Cloudflare)
participant W as Worker
participant R as R2 bucket<br/>(1h TTL backstop)
end
box rgb(233, 213, 255) Third party
participant D as Deepgram
end
P->>W: POST /v1/transcribe/start
Note over W: Verify, check quota,<br/>create job row (metadata only)
W-->>P: { job_id, upload_url }
P->>R: PUT audio (signed URL)
P->>W: POST /v1/transcribe/{id}/submit
W->>D: Submit with R2 URL + webhook
D->>R: Fetch audio
Note over D: Process asynchronously<br/>(typically 1-3 min for 1h audio)
D->>W: POST /v1/webhooks/deepgram<br/>(transcript)
Note over W: Verify signature,<br/>log usage event
W-->>P: Push transcript<br/>(SSE or short-lived fetch URL)
P->>W: Confirm receipt
Note over W,R: Delete audio + transcript<br/>from R2 immediately
Two retention windows worth understanding:
- Audio in R2: From upload until Deepgram fetches it, plus a 1-hour TTL backstop in case the plugin disconnects. Under normal flow: minutes, not hours.
- Transcript: From the moment Deepgram returns it until your plugin confirms receipt. Under normal flow: seconds.
After deletion, the only record on our side is a metadata row: (job_id, user_id, status=completed, duration_seconds, timestamp). No content references.
What gets logged
Section titled “What gets logged”We log server-side events for billing, abuse detection, and product analytics. These are stored in our analytics layer and inform our admin dashboard. We do not run analytics inside the plugin and do not embed analytics SDKs in your Obsidian.
| Event | When | What’s recorded |
|---|---|---|
license_validated | Plugin checks in (~weekly) | Your account, device ID, plan, success/failure |
subscription_created | Payment webhook fires | Account, plan, timestamp |
subscription_canceled | Payment webhook fires | Account, plan, total months active |
smart_capture_call | Managed text AI request | Account, model, latency, success |
transcription_completed | Deepgram webhook returns | Account, duration in seconds, timestamp |
quota_exceeded | You hit a daily cap | Account, feature, attempted action |
What’s not logged: prompt text, task descriptions, transcript content, audio, file paths, settings values, or anything about how you use Pensum locally.
The three privacy modes
Section titled “The three privacy modes”Pensum supports three different privacy postures for AI features. Mode is plan-driven, not a per-feature toggle: your plan picks the mode, then a per-feature dropdown chooses which model handles each call within that mode.
| Mode | Plan | Audio / prompts |
|---|---|---|
| BYO key | Pro $9 | Goes directly from your machine to your chosen provider. Pensum’s servers are not involved. |
| Managed | Pro All-in-One $29 / Trial | Passes through our worker transiently. Not stored. Logged metadata only. |
| Local | Future, no committed date | Runs entirely on your machine via Whisper.cpp or similar. Nothing leaves your device. |
Pro All-in-One users who want to override a specific feature with their own key can still configure a BYO key for the relevant provider — the per-feature model dropdown then routes that feature through BYO while the rest stay managed. Pensum never silently swaps modes; any fallback is initiated by you from a failure-notice action link.
Auditing this
Section titled “Auditing this”If you want to verify any of the claims on this page:
- Plugin source code is public: github.com/craigmccaskill/pensum
- License server source code is in the same repo under
apps/license-server/ - Network traffic from the plugin can be inspected with any HTTP proxy (Charles, mitmproxy)
- Privacy claims for third parties are linked from our Privacy Policy
- Specific concerns can be sent to
support@pensum.dev— we’ll get back to you
This page changes as the system evolves. Each material change is announced in our release notes.