
One of the biggest challenges AI engineers face today is memory management. Large language models are stateless. Every request must carry all the information the model needs to respond well.
This is the art of context engineering. Building agents is about shaping context. Previous messages, available tools, defined roles, and, just as importantly, information that must persist across sessions. In other words, memory.
But memory is more than storage. Like humans, agents need to form memories from everyday interactions and manage them over time. What matters should stand out. What does not should fade. And when the moment comes, the right memory should surface naturally.
At Redis, we work closely with teams building agents. And regardless of industry or use case, the same memory challenges keep appearing. So instead of only documenting solutions, we built one. An open source platform designed to make agent memory practical, scalable, and most importantly, fast.
That is how the Redis Agent Memory Server was born.
A Two-Tier Memory Architecture
The Redis Agent Memory Server addresses agent memory through a two tier architecture: working memory and long term memory.
Working memory operates at the session level. It maintains conversation state and session metadata.
As conversations grow, the system can summarize earlier context to stay within model token limits while preserving coherence. Sessions are persisted in Redis, allowing them to survive restarts and support multiple concurrent conversations.
Long-term memory stores persistent, searchable knowledge beyond a single session. Each memory includes content, embeddings, and structured metadata. By combining semantic similarity search with metadata filtering, the system retrieves memories based on meaning and context, not just keywords.
The two layers are connected through automatic promotion. As conversations evolve, Agent Memory Server extracts and stores important information in long-term memory asynchronously, keeping interactions responsive while allowing knowledge to accumulate over time.
Together, these components provide agents with structured, persistent, and context-aware memory, without sacrificing performance.
Intelligent Retrieval
Storage preserves information. Retrieval creates intelligence.
The server uses semantic search to retrieve memories based on meaning rather than exact wording. Because this capability is built directly into Redis, retrieval remains fast even as the memory store grows.
But semantic similarity alone is not enough. Searches can also be constrained using structured metadata such as user, session, topics, entities, or time ranges. This ensures that results are not only relevant, but precisely scoped.
The system also supports recency aware ranking, allowing newer memories to carry more weight when appropriate. For more advanced use cases, the server can optimize queries and apply hybrid retrieval strategies to fine-tune results.
Together, these capabilities ensure that memory retrieval is not only fast, but context aware and production ready.
Smart Memory Management
Creating and retrieving memories is essential. Managing them over time is what makes a memory system truly intelligent.
The server includes configurable extraction strategies that automatically transform conversations into structured memories. Extraction captures important details as facts, summaries, preferences, or domain specific knowledge.
Extraction is powered by LLM calls that analyze conversation context and return structured memory data, including text and metadata. This process runs asynchronously, keeping the main interaction responsive.
Before storing memories, the system performs contextual grounding. References and pronouns are resolved so that stored memories become self contained and meaningful outside their original conversation. This ensures that memories remain clear and reusable over time.
To maintain quality, the server implements deduplication using both content hashing and semantic similarity checks. If a new memory closely matches an existing one, it can be skipped or merged. This prevents redundancy and keeps the memory store clean.
Memories fully support creation, updates, and deletion. When a memory is updated, its embedding and index are refreshed to keep retrieval accurate and consistent.
The system also enriches memories with structured metadata. The server can generate topics automatically through modeling techniques or rely on an LLM to extract them. Named entity recognition identifies people, organizations, and other entities, making them queryable fields. This allows precise filtering such as retrieving all memories related to a specific company or person.
Production-Ready and Flexible
The Redis Agent Memory Server is built for real world deployment.
It provides both a REST API and Model Context Protocol interface, powered by a shared memory engine. Security is enforced through token based authentication and strict data isolation. Heavy tasks such as embedding and memory extraction run asynchronously to keep the system responsive.
The architecture is modular and configurable, with Redis as the default backend and flexible model configuration. Cloud ready features such as environment based settings, structured logging, and health checks are built in.
Get Started and Start Building
Agent memory is a universal challenge. Redis Agent Memory Server provides a practical, open source foundation to solve it.
Deploy in minutes, integrate with Python, JavaScript, or Java, and move from prototype to production with confidence.
Explore the documentation and start building.