updated to cue memory now.

2026-03-03 21:20:38 +01:00
parent 55b5e88acb
commit 68d7dd747f
6 changed files with 70 additions and 16 deletions
--- a/README.md
+++ b/README.md
@@ -94,13 +94,13 @@ Nova is a friendly, slightly witty Discord companion that chats naturally in DMs
   README.md
   CHANGELOG.md
   ```
-
-   - **Short-term (recency buffer):** Last turns kept verbatim for style and continuity. `SHORT_TERM_LIMIT` (default 12) controls how many of those turns persist, and you can lower it further if you prefer tighter buffers.
+okay
+   - **Short-term (recency buffer):** Last turns kept verbatim for style and continuity. `SHORT_TERM_LIMIT` (default 12) controls how many of those turns persist, and you can lower it further if you prefer tighter buffers. Nova only auto-summarizes the buffer once the transcript crosses `SUMMARY_TRIGGER_TURNS` or `summaryTriggerChars`, so the raw text stays for regular chat while a concise recap is generated every dozen turns to keep token usage manageable.
   - **Long-term (vector store):** Every user message + bot reply pair becomes an embedding via `text-embedding-3-small`. Embeddings, raw text, timestamps, and heuristic importance scores live in the same SQLite file. Retrieval uses cosine similarity plus a small importance boost; top 5 results feed the prompt.
-   - **Summary layer:** When the recency buffer grows past ~3000 characters, Nova asks OpenAI to condense the transcript to <120 words, keeps the summary, and trims the raw buffer down to the last few turns. This keeps token usage low while retaining story arcs, but you can disable it with `ENABLE_SHORT_TERM_SUMMARY=false` if you want the raw buffer to stay intact.
+   - **Summary layer:** When the recency buffer grows past ~3000 characters or `SUMMARY_TRIGGER_TURNS`, Nova asks OpenAI to condense the transcript to <120 words, keeps the summary, and trims the raw buffer down to the last few turns. This keeps token usage low while retaining story arcs, but you can disable it with `ENABLE_SHORT_TERM_SUMMARY=false` if you want the raw buffer to stay intact.
   - **Importance scoring:** Messages mentioning intent words ("plan", "remember", etc.), showing length, or emotional weight receive higher scores. When the store exceeds its cap, the lowest-importance/oldest memories are pruned. You can also call `pruneLowImportanceMemories()` manually if needed.

-   - **Embedding math:** `text-embedding-3-small` returns 1,536 floating-point numbers for each text chunk. That giant array is a vector map of the message’s meaning; similar moments land near each other in 1,536-dimensional space.
+   - **Pattern-aware long-term recall:** Long-term memory is only queried when Nova detects a recall cue (`remember`, `do you know`, `we talked`, `refresh my memory`, etc.). When a cue fires, she fetches the top cosine-similar memories but only keeps the ones whose score meets `MEMORY_RECALL_SIMILARITY_THRESHOLD` (default 0.62); otherwise the conversation stays anchored on the short-term buffer and summary. This keeps memory-driven context from popping up during casual chat unless you explicitly ask for it.
   - **What gets embedded:** After every user→bot turn, `recordInteraction()` (see [src/memory.js](src/memory.js)) bundles the pair, scores its importance, asks OpenAI for an embedding, and stores `{ content, embedding, importance, timestamp }` inside the SQLite tables.
 - **Why so many numbers:** Cosine similarity needs raw vectors to compare new thoughts to past ones. When a fresh message arrives, `retrieveRelevantMemories()` embeds it too, calculates cosine similarity against every stored vector, adds a small importance boost, and returns the top five memories to inject into the system prompt.
 - **Memory cooldown:** `MEMORY_COOLDOWN_MS` (defaults to 180000 ms) keeps a long-term memory out of the retrieval window for a few minutes after it was just used so Nova has to pull fresh context before repeating herself, while still falling back automatically if there isn’t anything new to surface.