RedStar Foundry

What We Build

Portfolio

Two production systems. One shared obsession: building AI that works at the technical layer — not as a wrapper around an API, but as deeply integrated architecture where models, evaluation frameworks, and domain logic are inseparable.

Disendya

Multi-Agent AI Dungeon Master

AI-Driven RPG Engine

Cyberpunk × High Fantasy

We don't build NPCs that wait for players to show up. We build characters that live lives when you're not watching.

A multi-agent AI Dungeon Master that runs a procedurally generated cyberpunk-fantasy RPG for multiple human players. Nine specialized AI agents coordinate in real-time — from narrative generation and tactical combat to NPC relationships and moral consequence tracking.

Features a canon story mode with intelligent divergence tracking, where the system gracefully adapts when players go off-script, alongside a fully procedural freeplay mode. AI companions have real personalities, opinions, and the ability to disagree.

LangGraph Claude API React / Next.js WebSockets FastAPI Redis PostgreSQL LangSmith React Native

AI Agents

1K+

Agentic Characters

Personality Axes

Memory Layers

Explore Project Story & World

PATHL GHTER

AI-Powered Risk Intelligence

Compliance Monitoring Platform

Financial Services × Generative AI

Monitoring, testing, issues, and reporting should be instant with AI-enabled help — not buried in six weeks of spreadsheets.

A full-stack AI-powered compliance monitoring and control testing platform. Global Systemically Important Banks run massive control inventories across dozens of regulatory domains — PathLighter automates the decision-making layer, from data ingestion through AI-assisted evaluation and workpaper generation.

The platform flags compliance exceptions, identifies design and operating effectiveness deficiencies, and surfaces control gaps that manual review would take weeks to find. An LLM-as-a-Judge evaluation engine scores controls across seven testability dimensions, while the monitoring module runs continuous surveillance with exception queues, health score gauges, and threshold alerting across 950+ controls spanning 10 regulatory archetypes.

Monitoring Control Testing Issue Intelligence Reports & Analytics

React / Next.js TypeScript Python / FastAPI PostgreSQL Claude API Tailwind CSS LLM-as-a-Judge

950+

Controls Mapped

Archetypes

Test Dimensions

Modules

Explore Project

Technical Depth

Capabilities

Multi-Agent AI Systems

Specialized agent architectures where each AI handles a distinct responsibility — orchestrated through graph-based workflows with full observability, tracing, and tiered processing that scales cost to complexity.

In Practice

LangGraph workflow orchestration with conditional routing
Tiered processing: frontier models for critical decisions, lightweight for routine
LLM-as-a-Judge evaluation across structured scoring dimensions
Full LangSmith observability across agent decision chains

LLM Evaluation & Governance

Production-grade LLM evaluation frameworks that go beyond vibes — structured scoring rubrics, bias quantification, domain calibration, and human-in-the-loop governance for high-stakes AI outputs.

In Practice

Multi-dimension scoring with weighted aggregation
Bias detection: position bias, verbosity inflation, self-enhancement
Domain-calibrated rubrics for specialized language and context
Automated output generation with regulatory-ready formatting

Persistent Memory Systems

AI that remembers — episodic, emotional, and semantic memory layers with compression strategies that keep context windows manageable while preserving meaningful history across extended runtime.

In Practice

Episodic memory with timestamps, participants, and emotional tags
Tiered decay: emotional memory persists longer than factual by design
Compression pipelines that distill old memories into semantic knowledge
Embedding-based retrieval surfaces relevant context on demand

LLM Production Architecture

Production-grade LLM applications with context window management, tool use, structured outputs, rolling memory, and cost optimization strategies that make complex AI systems economically viable.

In Practice

Claude API with structured outputs and tool-use patterns
Context window management with token-aware memory compression
Cost tiering: frontier models for critical decisions, lightweight for routine
Prompt engineering for personality and domain consistency across sessions

Full-Stack Delivery

End-to-end system architecture from database schema to production UI — React frontends, Python backends, PostgreSQL data layers, and deployment pipelines built for real users, not demos.

In Practice

React + Next.js frontends with TypeScript and Tailwind
Python FastAPI backends with PostgreSQL + Redis
WebSocket real-time sync and REST API design
Complete SDLC: architecture, build, test, deploy, iterate

Who We Are

About

RedStar Foundry is led by Taiin — a terrible gamer, obsessive data nerd, movie lover, voracious reader, and chaotic energy all wrapped in one.

By day, AI governance, model risk, for Tier 1 financial institutions. By night, building worlds where AI characters live their own lives. The through-line is the same: we build systems where AI isn't a feature — it's the architecture.

Our philosophy: build things that work at the technical layer at scale. Not wrappers. Not demos. Not pitch decks. Working software that solves hard problems — whether that's a thousand NPCs with genuine personalities or a compliance platform that turns six weeks of manual data work into one autonomous agent executing instantly inside a governed framework.