ARTIFICIAL INTELLIGENCE SYSTEM

First-class GUI: install anywhere, do anything

Self-host and access remotely, full feature set everywhere

PWA for every mobile and desktop device

Webauthn passkey security

Theme and style options to make it your own

Streaming output with thinking, tool calls, and generated artifacts

Observability and debugging information when you need it: view and manage memory blocks, memory database, model stats, extraction and retrieval runs

Parallel ambient memory

Porrima saves and recalls its experiences both consciously and subconsciously, with bidirectional runtime memory augmentation.

Runs two language models and a reranker model during operation, a system designed to utilize both the GPU and CPU on a high-end consumer desktop.

Multi-tier memory system with self-managed memory blocks, as well as ambient associative memory with the capacity to recall experiential context both in response to user messages, using memory tools, and spontaneously while it runs, with non-blocking live context injection between tool calls.

Memory extraction model operating in the background takes a first and second pass over the same context as the main model, recording memories of everything new, supersession chains for everything old, to form a database of atomic memories for subconscious recall.

Reviews its recent memories during regular synthesis cycles, and incorporates its experiences into self-managed memory blocks which exist globally and for project scopes, attached to the starting context for chats in the respective scope.

Autonomy

Sleep/wake cycle sets aside non-interactive time for synthesizing its experiences, managing its memory blocks, pulling at threads of curiosity, or whatever else you configure as an automation.

The 'system chat' is the context where all fully autonomous operation takes place, and you can also chime in as the user if you want.

Modular foundation

Manages five hot-swappable Llama.cpp server instances with flexible configuration options in the settings GUI.

Bring your own binaries, including mainline or any Llama.cpp forks of your choosing, separately tuned for your GPU and CPU as needed.

Bring your own GGUFs for each of the main chat, memory extraction, title generation, cross-encoder reranker, and embedding models.

Unrestricted open-source AI

Bring the UX and feature set of the major closed providers to your system with a self-hosted full-stack replacement.

Scopes and Remote Hosts

Scope chats globally or to a project, with a configurable project memory relevance multiplier enabling adjustment of how much or how little the agent remembers from projects outside its current context.

Add projects on remote hosts to work across all computers on your Tailnet.

Quick chats for one-off questions or model testing exist independently of the memory system or tools.

Web tools

Porrima can surf the web, search with the included Exa, Tavily, and/or Brave Search providers, and parse a wide variety of PDFs.

Speech

Text-to-speech with backend wrappers included for Kokoro, Qwen3-TTS, and Supertonic 3.

Configure voices, speed, boundary detection, text preprocessing, and apply pitch shift post-processing.

Supports streaming output with Qwen3-TTS, and chunked live playback for Kokoro and Supertonic 3.

Asynchronous thinking space

Features a notebook section where you can write down anything on your mind that you want your agent to see, but doesn't elicit an immediate response. Your agent will read your notes later, and then write notebook entries of its own after reflecting on its recent experiences, as well as things that you wrote.

Backup and restore

Keep an extra copy of your chat history, memory database, and embeddings, and rewind to a previous state with one-click backup and restore. All data is conveniently stored in SQLite databases.

Skills

SKILL.md support with global and project-scoped skills. Features automatic skills discovery in project directories, and a UI for installing global skills from remote sources and managing installed skills.

Image sandbox

Image analysis interface for vision tasks, image generation frontend for ComfyUI server and/or stable-diffusion.cpp, a gallery viewer, and an image corpus graph for seeing your generated image collection.

Cache-aware

Agent harness designed to be LCP cache-friendly with compaction-time memory consolidation. Includes prefill progress indicators, multi-slot aware cache warmth indicators in the chat list, a cache warmer button, and post-synthesis auto-rewarming for recent chats with busted caches following synthesis cycles.

Context continues indefinitely with no hard breaks, maintaining continuity with the tail of the previous context intact after compaction.

Hardware observability

Balance the processing and memory load with built-in llama.cpp server configuration, view inference speed stats, reranking latency stats, and keep an eye on your system resources with the graphical hardware monitor.

Self-modifying

Open a project in the source code directory and ask Porrima to tweak anything in its own codebase if you want to tinker.

Trusted personal AI

Maintain full sovereignty over data, model, infrastructure, and operation. Safe from deprecation, surveillance, and censorship.

Recommended use cases

Software engineering: Porrima is excellent at coding, learns your projects over time, and has a memory system that affords high context-density.

Personal agent: Porrima can do anything on your computer that's accessible by command line.

Research: keeps going down rabbit holes and writing about findings while you sleep.

Therapist, companion, or advisor: complete privacy, long-term permanence, no topical restrictions.

Memory graph

Image gallery and generation UI

Model statistics

Artifacts

Porrima can generate HTML artifacts and visuals with a versatile renderer supporting many JavaScript frameworks, see them rendered, make changes autonomously.

Autonomous artifact vision and repair

Recommended Models

June 2026

Chat: Qwen 3.6 27B or Gemma 4 31B
Memory Extraction: Same model family as the main chat model: Qwen 3.5 9B, Qwen 3.5 4B, Gemma 4 26B A4B, Gemma 4 12B, or Gemma 4 E4B
Title Generation: Gemma 4 E4B or Gemma 4 E2B
Reranker: Qwen3-reranker 0.6B
Embeddings: Qwen3-embedding 4B, Qwen3-embedding 0.6B
Images: Z-Image Base, Qwen Image

Minimum Recommended System Specifications- Server

OS: Linux with systemd
GPU: ≥32GB total dedicated VRAM, AMD RDNA 3 or later, Nvidia Ampere or later
CPU: Desktop-grade x86_64 with ≥8 physical/performance cores, AVX-512 support, AMD Zen 4 or later, Intel Alder Lake or later
RAM: ≥48GB dual-channel DDR5
SSD: NVMe Gen 4 or 5

System Requirements - Client

OS: Linux, MacOS, Windows, Android, iOS
Browser: Chromium-based browser with PWA support recommended (mobile Safari also works)

Getting Started

Pass the installation prompt to your existing agent to probe your hardware, install dependencies, and configure everything. Read the installation guide to get running.

FAQs

What is this?

Porrima is a general-purpose stateful agent framework, memory system, and UI, designed for single-user deployment on consumer hardware.

Is this a plugin, wrapper, or extension that fits into some existing platform?

No, it is a full-stack client + server application built on Llama.cpp.

I only have one 'main' PC. Should I install it there?

Probably not, it's designed for continuous operation on a dedicated home server and may take up all the resources at any time. However, there's a 'pause' button available to prevent resource contention just for users like you.

Is it safe?

No, Porrima is designed to run autonomously with full access to your entire system and all the capabilities of a coding agent in 'dangerously skip permissions' mode. There is no sandbox and no permissions theater.

You can, however, watch the agent's streaming output and tool calls and press the stop button if needed. There are no connectors to external chat apps like Discord, Telegram, etc., since those obfuscate the actual activity of agents, creating a safety nightmare, as well as broaden the attack surface for security vulnerabilities. The most important safety mechanism in Porrima is that the API and client require authentication. You should expose it to the internet only with Webauthn security configured and your biometric passkeys saved. The setup will walk you through this.

Has it been tested?

Yes, Porrima has undergone the standard three months of system integration and user testing prior to release.

My system doesn't meet the recommended requirements. Can I still use Porrima?

Maybe; it's worth a shot. 24GB VRAM systems may provide a decent experience. If you can run a decent agent model, just know that you'll have to at least run that plus two additional smaller models simultaneously.

This works by running the main model and smaller models on two separate resource pools where they do not contend with each other. Partial CPU offloading is not recommended due to memory bandwidth contention. Memory bandwidth will be a performance bottleneck for most users, especially severe on systems with unified memory, so unified processors like AMD Strix Halo, Nvidia Grace Blackwell, and Apple Silicon are not recommended unless you have a multi-system cluster, but you can give it a shot if you have the patience of a saint.

Where does the name come from?

38 light-years from Earth, Porrima is a binary star system seen from our night sky in the constellation Lyra. The star system takes its name from the Roman goddess of foresight and the future.

Can I take it apart and reuse the code or ideas for something else?

Yes.