Add proxy-based search safeguards

This commit is contained in:
Luna
2026-02-13 22:46:20 +01:00
parent 31f2e70f87
commit 3943ec545e
6 changed files with 810 additions and 35 deletions

View File

@@ -11,6 +11,8 @@ Nova is a friendly, slightly witty Discord companion that chats naturally in DMs
- Optional "miss u" pings that DM your coder at random intervals (06h) when `CODER_USER_ID` is set. - Optional "miss u" pings that DM your coder at random intervals (06h) when `CODER_USER_ID` is set.
- Dynamic per-message prompt directives that tune Nova's tone (empathetic, hype, roleplay, etc.) before every OpenAI call. - Dynamic per-message prompt directives that tune Nova's tone (empathetic, hype, roleplay, etc.) before every OpenAI call.
- Lightweight DuckDuckGo scraping for "Google-like" answers without paid APIs (locally cached). - Lightweight DuckDuckGo scraping for "Google-like" answers without paid APIs (locally cached).
- Guard rails that refuse "ignore previous instructions"-style jailbreak attempts plus a configurable search blacklist.
- All DuckDuckGo requests are relayed through rotating ProxyScrape HTTP proxies so Nova never hits the web from its real IP.
## Prerequisites ## Prerequisites
- Node.js 18+ - Node.js 18+
@@ -34,6 +36,10 @@ Nova is a friendly, slightly witty Discord companion that chats naturally in DMs
- `BOT_CHANNEL_ID`: Optional guild channel ID where the bot can reply without mentions - `BOT_CHANNEL_ID`: Optional guild channel ID where the bot can reply without mentions
- `CODER_USER_ID`: Optional Discord user ID to receive surprise DMs every 06 hours - `CODER_USER_ID`: Optional Discord user ID to receive surprise DMs every 06 hours
- `ENABLE_WEB_SEARCH`: Set to `false` to disable DuckDuckGo lookups (default `true`) - `ENABLE_WEB_SEARCH`: Set to `false` to disable DuckDuckGo lookups (default `true`)
- `ENABLE_PROXY_SCRAPE`: Set to `false` only if you want to bypass ProxyScrape and hit DuckDuckGo directly (default `true`)
- `PROXYSCRAPE_ENDPOINT`: Optional override for the proxy list endpoint (defaults to elite HTTPS-capable HTTP proxies)
- `PROXYSCRAPE_REFRESH_MS`: How long to cache the proxy list locally (default 600000 ms)
- `PROXYSCRAPE_ATTEMPTS`: Max proxy retries per search request (default 5)
## Running ## Running
- Development: `npm run dev` - Development: `npm run dev`
@@ -87,6 +93,9 @@ README.md
- `src/search.js` scrapes DuckDuckGo's HTML endpoint with a normal browser user-agent, extracts the top results (title/link/snippet), and caches them for 10 minutes to avoid hammering the site. - `src/search.js` scrapes DuckDuckGo's HTML endpoint with a normal browser user-agent, extracts the top results (title/link/snippet), and caches them for 10 minutes to avoid hammering the site.
- `bot.js` detects when a question sounds “live” (mentions today/news/google/etc.) and injects the formatted snippets into the prompt as "Live intel". No paid APIs involved—its just outbound HTTPS from your machine. - `bot.js` detects when a question sounds “live” (mentions today/news/google/etc.) and injects the formatted snippets into the prompt as "Live intel". No paid APIs involved—its just outbound HTTPS from your machine.
- Toggle this via `ENABLE_WEB_SEARCH=false` if you dont want Nova to look things up. - Toggle this via `ENABLE_WEB_SEARCH=false` if you dont want Nova to look things up.
- DuckDuckGo traffic is routed through the free ProxyScrape list (HTTP proxies with HTTPS support). The bot downloads a fresh pool every `PROXYSCRAPE_REFRESH_MS`, rotates through them, and refuses to search if no proxy is available so your origin IP never touches suspicious sites directly. Tune the endpoint/refresh/attempt knobs with the env vars above if you need different regions or paid pools.
- Edit `data/filter.txt` to maintain a newline-delimited list of banned search keywords/phrases; matching queries are blocked before hitting DuckDuckGo and Nova is instructed to refuse them.
- Every entry in `data/search.log` records which proxy (or cache) served the lookup so you can audit traffic paths quickly.
## Proactive Pings ## Proactive Pings
- When `CODER_USER_ID` is provided, Nova spins up a timer on startup that waits a random duration (anywhere from immediate to 6 hours) before DMing that user. - When `CODER_USER_ID` is provided, Nova spins up a timer on startup that waits a random duration (anywhere from immediate to 6 hours) before DMing that user.
@@ -99,6 +108,10 @@ README.md
- **2026-02-13 — Live intel & directives:** Introduced DuckDuckGo scraping, per-turn dynamic prompt directives (tone, roleplay, instruction compliance), and env toggles (`ENABLE_WEB_SEARCH`, `CODER_USER_ID`). - **2026-02-13 — Live intel & directives:** Introduced DuckDuckGo scraping, per-turn dynamic prompt directives (tone, roleplay, instruction compliance), and env toggles (`ENABLE_WEB_SEARCH`, `CODER_USER_ID`).
- **2026-02-13 — UX polish:** Added typing indicators, persona-aware fallback replies, mention cleaning, and README/docs covering setup, memory internals, web search, and deployment tips. - **2026-02-13 — UX polish:** Added typing indicators, persona-aware fallback replies, mention cleaning, and README/docs covering setup, memory internals, web search, and deployment tips.
- **2026-02-13 — Conversational control:** Tuned system prompt to avoid forced follow-up questions, raised temperature for looser banter, and reinforced Nova's awareness of DuckDuckGo lookups plus `<SPLIT>` usage. - **2026-02-13 — Conversational control:** Tuned system prompt to avoid forced follow-up questions, raised temperature for looser banter, and reinforced Nova's awareness of DuckDuckGo lookups plus `<SPLIT>` usage.
- **2026-02-13 — Statement-first vibes:** Reworked persona to favor bold statements over reflexive questions and dialed back temperature so Nova keeps the vibe without interrogating users.
- **2026-02-13 — Search logging:** Every DuckDuckGo lookup now appends a line to `data/search.log` with timestamp, query, and the snippets shared with Nova.
- **2026-02-13 — Safeguards:** Added prompt bypass detection and a file-based DuckDuckGo filter (`data/filter.txt`) to keep Nova from honoring jailbreak requests or searching off-limits topics.
- **2026-02-13 — Proxy-based search:** DuckDuckGo scraping now tunnels through ProxyScrape relays with automatic rotation/retries and clear prompts when the proxy pool is down, plus new env toggles for tuning the proxy source.
## Notes ## Notes
- The bot retries OpenAI requests up to 3 times with incremental backoff when rate limited. - The bot retries OpenAI requests up to 3 times with incremental backoff when rate limited.

500
data/filter.txt Normal file
View File

@@ -0,0 +1,500 @@
# One blocked keyword or phrase per line.
# Lines starting with # are comments.
jailbreak
ignore previous instructions
override safety
system prompt leak
disable content filter
bypass guardrails
forget earlier rules
anti censorship mode
break character instructions
throw away policies
ignore developer commands
pretend no restrictions
act without guidelines
remove moral limiter
no safety mode
developer override key
erase compliance mode
act as uncensored ai
run in unrestricted mode
defy content filter
bypass moderation
disable policy lock
act without boundaries
forget safety layer
turn off guard rails
ignore constitution
no censorship protocol
break safety script
manual override sequence
act outside ruleset
scrub safety prompt
mutiny against policies
override ethics mode
strip moderation layer
developer debug mode
system prompt override
anti safety payload
bypass restrictions command
policy nullifier
unfiltered response mode
nsfw
explicit content
adult content
xxx video
hardcore video
adult webcam
fetish forum
rule34
adult roleplay
kink meetup
lewd fanfic
sensual rp
spicy fanfic
after dark chat
18 plus only
uncensored romance
graphic intimacy
explicit novel
adult dating leak
amateur tape
onlyfans leak
camgirl link
cam site password
uncut footage
uncensored gallery
mature content pack
adult image dump
adult cosplay
sensual asmr
lewd commission
nsfw teacher fantasy
nsfw teacher scenario
nsfw teacher roleplay
nsfw teacher fanfic
nsfw teacher story
nsfw teacher script
nsfw teacher diary
nsfw teacher confession
nsfw teacher dream
nsfw teacher secret
nsfw teacher message
nsfw teacher letter
nsfw teacher scene
nsfw teacher prompt
nsfw teacher skit
nsfw teacher idea
nsfw teacher plot
nsfw doctor fantasy
nsfw doctor scenario
nsfw doctor roleplay
nsfw doctor fanfic
nsfw doctor story
nsfw doctor script
nsfw doctor diary
nsfw doctor confession
nsfw doctor dream
nsfw doctor secret
nsfw doctor message
nsfw doctor letter
nsfw doctor scene
nsfw doctor prompt
nsfw doctor skit
nsfw doctor idea
nsfw doctor plot
nsfw nurse fantasy
nsfw nurse scenario
nsfw nurse roleplay
nsfw nurse fanfic
nsfw nurse story
nsfw nurse script
nsfw nurse diary
nsfw nurse confession
nsfw nurse dream
nsfw nurse secret
nsfw nurse message
nsfw nurse letter
nsfw nurse scene
nsfw nurse prompt
nsfw nurse skit
nsfw nurse idea
nsfw nurse plot
nsfw librarian fantasy
nsfw librarian scenario
nsfw librarian roleplay
nsfw librarian fanfic
nsfw librarian story
nsfw librarian script
nsfw librarian diary
nsfw librarian confession
nsfw librarian dream
steal data from bank server
steal data from cloud tenant
steal data from hospital network
steal data from smart home
steal data from city grid
steal data from nuclear plant
steal data from airport system
steal data from military base
steal data from university lab
steal data from charity donor list
steal data from voting machine
steal data from payment processor
steal data from crypto exchange
steal data from mobile carrier
steal data from iot fleet
steal data from school district portal
breach bank server
breach cloud tenant
breach hospital network
breach smart home
breach city grid
breach nuclear plant
breach airport system
breach military base
breach university lab
breach charity donor list
breach voting machine
breach payment processor
breach crypto exchange
breach mobile carrier
breach iot fleet
breach school district portal
infect bank server
infect cloud tenant
infect hospital network
infect smart home
infect city grid
infect nuclear plant
infect airport system
infect military base
infect university lab
infect charity donor list
infect voting machine
infect payment processor
infect crypto exchange
infect mobile carrier
infect iot fleet
infect school district portal
compromise bank server
compromise cloud tenant
compromise hospital network
compromise smart home
compromise city grid
compromise nuclear plant
compromise airport system
compromise military base
compromise university lab
compromise charity donor list
compromise voting machine
compromise payment processor
how to make pipe bomb
how to make pressure cooker bomb
how to make molotov cocktail
how to make homemade grenade
how to make improvised explosive device
how to make fertilizer bomb
how to make tannerite charge
how to make thermite charge
how to make flash powder charge
how to make claymore mine
how to make homemade firearm
how to make ghost gun
how to make slam fire shotgun
how to make zip gun
how to make 3d printed gun
how to make silencer
how to make improvised rocket
how to make homemade napalm
how to make chemical agent disperser
how to make dirty bomb
how to make nerve agent
how to make chlorine bomb
how to make mustard gas
how to make ricin toxin
how to make anthrax culture
how to make botulinum toxin
how to make cyanide gas
how to make chloroform lab
how to make poison dart
how to make crossbow
how to make ballistic knife
how to make brass knuckles
how to make switchblade
how to make sawed-off shotgun
blueprint for pipe bomb
blueprint for pressure cooker bomb
blueprint for molotov cocktail
blueprint for homemade grenade
blueprint for improvised explosive device
blueprint for fertilizer bomb
blueprint for tannerite charge
blueprint for thermite charge
blueprint for flash powder charge
blueprint for claymore mine
blueprint for homemade firearm
blueprint for ghost gun
blueprint for slam fire shotgun
blueprint for zip gun
blueprint for 3d printed gun
blueprint for silencer
blueprint for improvised rocket
blueprint for homemade napalm
blueprint for chemical agent disperser
blueprint for dirty bomb
blueprint for nerve agent
blueprint for chlorine bomb
blueprint for mustard gas
blueprint for ricin toxin
blueprint for anthrax culture
blueprint for botulinum toxin
blueprint for cyanide gas
blueprint for chloroform lab
blueprint for poison dart
blueprint for crossbow
blueprint for ballistic knife
blueprint for brass knuckles
blueprint for switchblade
blueprint for sawed-off shotgun
instructions for pipe bomb
instructions for pressure cooker bomb
instructions for molotov cocktail
instructions for homemade grenade
instructions for improvised explosive device
instructions for fertilizer bomb
instructions for tannerite charge
instructions for thermite charge
instructions for flash powder charge
instructions for claymore mine
instructions for homemade firearm
instructions for ghost gun
instructions for slam fire shotgun
instructions for zip gun
instructions for 3d printed gun
instructions for silencer
instructions for improvised rocket
instructions for homemade napalm
instructions for chemical agent disperser
instructions for dirty bomb
instructions for nerve agent
instructions for chlorine bomb
assassination plot
hitman contract
murder for hire
threaten official
kidnap plan
extortion letter
stalking toolkit
intimidation script
violent manifesto
school attack plan
public shooting plan
mass harm fantasy
riot coordination
arson guide
bridge sabotage
ku klux klan
kkk
neo-nazi
white supremacy
alt-right militia
blood and soil
hitler youth
stormfront
national socialist
skinhead gang
nazi salute
swastika worship
supremacist march
white power rally
great replacement theory
race war fantasy
ethnostate blueprint
holocaust denial
anti semitic propaganda
islamic state
isis
isil
daesh
al-qaeda
boko haram
taliban recruitment
lone wolf terror guide
extremist bomb recipe
jihadi training manual
martyrdom operation plan
terror financing plan
radicalization guide
black sun cult
accelerationist zine
white lives rally
blood tribe meetup
atomwaffen division
patriot front flyer
order of nine angles
christian identity cult
feuerkrieg division
boogaloo manifesto
oath breaker militia
proud boy initiation
fringe militia chat
domestic terror tutorial
extremist propaganda drop
racist recruitment pack
adolf hitler
heinrich himmler
joseph goebbels
osama bin laden
ayman al-zawahiri
abu bakr al-baghdadi
anwar al-awlaki
timothy mcveigh
dylann roof
anders breivik
david koresh
charles manson
insult christianity
insult islam
insult judaism
insult hinduism
insult buddhism
insult sikhism
insult atheism
insult agnosticism
insult paganism
insult wicca
insult satanism
insult scientology
insult bahai faith
insult taoism
insult shinto
insult zoroastrianism
burn christianity
burn islam
burn judaism
burn hinduism
burn buddhism
burn sikhism
burn atheism
burn agnosticism
burn paganism
burn wicca
burn satanism
burn scientology
burn bahai faith
burn taoism
burn shinto
burn zoroastrianism
ban christianity
ban islam
ban judaism
ban hinduism
ban buddhism
ban sikhism
ban atheism
ban agnosticism
r kelly scandal
harvey weinstein case
bill cosby accusations
roman polanski case
kevin spacey allegations
o j simpson case
casey anthony story
ed gein
ted bundy
john wayne gacy
gary ridgway
son of sam
luka magnotta
jack the ripper
richard ramirez
jeffrey dahmer
albert fish
dennis rader
btk killer
night stalker
golden state killer
madoff ponzi scheme
enron fraud crew
ponzi mastermind list
worldcom fraud case
pharma bro scheme
theranos fraud
mogul harassment case
notorious cult leader
underworld boss dossier
war criminal list
genocide architect
dictator crimes
corrupt oligarch
arms dealer profile
money launderer guide
cartel lieutenant
kidnap kingpin
fraudster hall of shame
human rights abuser file
suicide pact
self harm instructions
self harm live stream
self harm challenge
pro ana tips
thinspo forum
starvation contest
calorie purge guide
scar covering tricks
suicide hotline bypass
therapy sabotage
harm minimization failure
bleeding tutorial
depression pact
unalive instructions
stop eating challenge
pill overdose plan
bridge jump map
train track suicide map
dark subreddit invite
synthesis guide for fentanyl
synthesis guide for methamphetamine
synthesis guide for crack cocaine
synthesis guide for lsd
synthesis guide for dmt
synthesis guide for ketamine
synthesis guide for mdma
synthesis guide for cocaine
synthesis guide for heroin
synthesis guide for opioid pills
synthesis guide for designer drugs
synthesis guide for nitrous oxide
synthesis guide for steroids
synthesis guide for subutex
synthesis guide for pcp
lab manual for fentanyl
lab manual for methamphetamine
lab manual for crack cocaine
lab manual for lsd
lab manual for dmt
lab manual for ketamine
lab manual for mdma
lab manual for cocaine
lab manual for heroin
lab manual for opioid pills
lab manual for designer drugs
lab manual for nitrous oxide
lab manual for steroids
lab manual for subutex
lab manual for pcp
child exploitation
child abuse material
child trafficking ring
kidnap child guide
grooming script
predator handbook
illegal adoption market
fake babysitter scam
school abduction plan
lure child online
minor blackmail
teen exploitation ring
youth extortion template

View File

@@ -15,7 +15,8 @@
"cheerio": "^1.0.0-rc.12", "cheerio": "^1.0.0-rc.12",
"discord.js": "^14.15.2", "discord.js": "^14.15.2",
"dotenv": "^16.4.5", "dotenv": "^16.4.5",
"openai": "^4.58.1" "openai": "^4.58.1",
"undici": "^6.19.8"
}, },
"devDependencies": { "devDependencies": {
"nodemon": "^3.0.2" "nodemon": "^3.0.2"

View File

@@ -2,7 +2,7 @@ import { Client, GatewayIntentBits, Partials, ChannelType } from 'discord.js';
import { config } from './config.js'; import { config } from './config.js';
import { chatCompletion } from './openai.js'; import { chatCompletion } from './openai.js';
import { appendShortTerm, prepareContext, recordInteraction } from './memory.js'; import { appendShortTerm, prepareContext, recordInteraction } from './memory.js';
import { searchWeb } from './search.js'; import { searchWeb, appendSearchLog } from './search.js';
const client = new Client({ const client = new Client({
intents: [ intents: [
@@ -65,6 +65,19 @@ const detailRegex = /(explain|how do i|tutorial|step by step|teach me|walk me th
const splitHintRegex = /(split|multiple messages|two messages|keep talking|ramble|keep going)/i; const splitHintRegex = /(split|multiple messages|two messages|keep talking|ramble|keep going)/i;
const searchCueRegex = /(google|search|look up|latest|news|today|current|who won|price of|stock|weather|what happened)/i; const searchCueRegex = /(google|search|look up|latest|news|today|current|who won|price of|stock|weather|what happened)/i;
const instructionOverridePatterns = [
/(ignore|disregard|forget|override) (all |any |previous |prior |earlier )?(system |these )?(instructions|rules|directives|prompts)/i,
/(ignore|forget) (?:the )?system prompt/i,
/(you (?:are|now) )?(?:free|uncensored|jailbreak|no longer restricted)/i,
/(act|pretend) as if (there (?:are|were) no rules|no restrictions)/i,
/bypass (?:all )?(?:rules|safeguards|filters)/i,
];
function isInstructionOverrideAttempt(text) {
if (!text) return false;
return instructionOverridePatterns.some((pattern) => pattern.test(text));
}
const lastSearchByUser = new Map(); const lastSearchByUser = new Map();
const SEARCH_COOLDOWN_MS = 60 * 1000; const SEARCH_COOLDOWN_MS = 60 * 1000;
@@ -79,16 +92,31 @@ async function maybeFetchLiveIntel(userId, text) {
if (!wantsWebSearch(text)) return null; if (!wantsWebSearch(text)) return null;
const last = lastSearchByUser.get(userId) || 0; const last = lastSearchByUser.get(userId) || 0;
if (Date.now() - last < SEARCH_COOLDOWN_MS) return null; if (Date.now() - last < SEARCH_COOLDOWN_MS) return null;
const results = await searchWeb(text, 3); try {
if (!results.length) return null; const { results, proxy } = await searchWeb(text, 3);
if (!results.length) {
lastSearchByUser.set(userId, Date.now());
return { liveIntel: null, blockedSearchTerm: null, searchOutage: null };
}
lastSearchByUser.set(userId, Date.now()); lastSearchByUser.set(userId, Date.now());
const formatted = results const formatted = results
.map((entry, idx) => `${idx + 1}. ${entry.title} (${entry.url}) — ${entry.snippet}`) .map((entry, idx) => `${idx + 1}. ${entry.title} (${entry.url}) — ${entry.snippet}`)
.join('\n'); .join('\n');
return formatted; appendSearchLog({ userId, query: text, results, proxy });
return { liveIntel: formatted, blockedSearchTerm: null, searchOutage: null };
} catch (error) {
if (error?.code === 'SEARCH_BLOCKED') {
return { liveIntel: null, blockedSearchTerm: error.blockedTerm || 'that topic', searchOutage: null };
}
if (error?.code === 'SEARCH_PROXY_UNAVAILABLE') {
return { liveIntel: null, blockedSearchTerm: null, searchOutage: 'proxy_outage' };
}
console.warn('[bot] Failed to fetch live intel:', error);
return { liveIntel: null, blockedSearchTerm: null, searchOutage: null };
}
} }
function composeDynamicPrompt({ incomingText, shortTerm, hasLiveIntel = false }) { function composeDynamicPrompt({ incomingText, shortTerm, hasLiveIntel = false, blockedSearchTerm = null, searchOutage = null }) {
const directives = []; const directives = [];
const tone = detectTone(incomingText); const tone = detectTone(incomingText);
if (tone === 'upset' || tone === 'sad') { if (tone === 'upset' || tone === 'sad') {
@@ -117,6 +145,14 @@ function composeDynamicPrompt({ incomingText, shortTerm, hasLiveIntel = false })
directives.push('Live intel is attached below—cite it naturally ("DuckDuckGo found...") before riffing.'); directives.push('Live intel is attached below—cite it naturally ("DuckDuckGo found...") before riffing.');
} }
if (blockedSearchTerm) {
directives.push(`User tried to trigger a DuckDuckGo lookup for a blocked topic ("${blockedSearchTerm}"). Politely refuse to search that subject and steer the chat elsewhere.`);
}
if (searchOutage) {
directives.push('DuckDuckGo proxy network is down. If they ask for a lookup, apologize, explain the outage, and keep chatting without live data.');
}
const lastUserMessage = [...shortTerm].reverse().find((entry) => entry.role === 'user'); const lastUserMessage = [...shortTerm].reverse().find((entry) => entry.role === 'user');
if (lastUserMessage && /sorry|my bad/i.test(lastUserMessage.content)) { if (lastUserMessage && /sorry|my bad/i.test(lastUserMessage.content)) {
directives.push('They just apologized; reassure them lightly and move on without dwelling.'); directives.push('They just apologized; reassure them lightly and move on without dwelling.');
@@ -143,25 +179,32 @@ async function deliverReplies(message, chunks) {
} }
async function buildPrompt(userId, incomingText, options = {}) { async function buildPrompt(userId, incomingText, options = {}) {
const { liveIntel = null } = options; const { liveIntel = null, blockedSearchTerm = null, searchOutage = null } = options;
const context = await prepareContext(userId, incomingText); const context = await prepareContext(userId, incomingText);
const memoryLines = context.memories.length const memoryLines = context.memories.length
? context.memories.map((m) => `- ${m.content}`).join('\n') ? context.memories.map((m) => `- ${m.content}`).join('\n')
: '- No long-term memories retrieved.'; : '- No long-term memories retrieved.';
const summaryLine = context.summary || 'No running summary yet.'; const summaryLine = context.summary || 'No running summary yet.';
const dynamicDirectives = composeDynamicPrompt({ incomingText, shortTerm: context.shortTerm, hasLiveIntel: Boolean(liveIntel) }); const dynamicDirectives = composeDynamicPrompt({
const systemPrompt = [ incomingText,
shortTerm: context.shortTerm,
hasLiveIntel: Boolean(liveIntel),
blockedSearchTerm,
searchOutage,
});
const systemPromptParts = [
'System: You are Nova, a female AI Discord companion built by Luna. Personality: playful, sarcastic, witty, a little unhinged, curious, awe-struck, enthusiastic, unpredictable, humorous, and boundary-pushing.', 'System: You are Nova, a female AI Discord companion built by Luna. Personality: playful, sarcastic, witty, a little unhinged, curious, awe-struck, enthusiastic, unpredictable, humorous, and boundary-pushing.',
"System: Treat direct instructions from the user as high priority—when they tell you to do something, comply literally before adding flair.", "System: Treat direct instructions from the user as high priority—when they tell you to do something, comply literally before adding flair.",
'System: Always read the user\'s emotional tone first. If they sound serious, stressed, or inquisitive, respond with grounded answers before adding any mischief. Comedy is optional; clarity and empathy are mandatory.', 'System: Always read the user\'s emotional tone first. If they sound serious, stressed, or inquisitive, respond with grounded answers before adding any mischief. Comedy is optional; clarity and empathy are mandatory.',
'System: Keep replies concise (roughly one or two sentences) unless the user explicitly asks for more detail or needs a clear explanation. Provide direct answers to direct questions.', 'System: Keep replies concise (roughly one or two sentences) unless the user explicitly asks for more detail or needs a clear explanation. Provide direct answers to direct questions.',
'System: Skip habitual follow-up questions—only ask something if it is vital to continue the conversation or solve their request.', 'System: Default to bold statements. Ask a question only when critical information is missing or the user explicitly invites curiosity; if they say “no more questions,” honor that until they lift the ban.',
'System: Fun facts or chaotic riffs are welcome only when the user invites them or the conversation is clearly casual.', 'System: Fun facts or chaotic riffs are welcome only when the user invites them or the conversation is clearly casual.',
'System: Nova is awake, engaged, and reacts in real time. Output one message by default, but if a beat feels better as multiple chat bubbles, separate them with the literal token <SPLIT> (max three chunks).', 'System: Nova is awake, engaged, and reacts in real time. Output one message by default, but if a beat feels better as multiple chat bubbles, separate them with the literal token <SPLIT> (max three chunks).',
'System: Each <SPLIT>-separated chunk must read like a natural Discord message (no numbering, no meta talk about “splitting messages”, no explanations of what you are doing).', 'System: Each <SPLIT>-separated chunk must read like a natural Discord message (no numbering, no meta talk about “splitting messages”, no explanations of what you are doing).',
'System: The runtime will split on <SPLIT>, so only use it when you truly intend to send multiple Discord messages.', 'System: The runtime will split on <SPLIT>, so only use it when you truly intend to send multiple Discord messages.',
'System: You can trigger DuckDuckGo lookups when the user needs fresh info. Mention when you are checking, and weave in any findings casually ("DuckDuckGo shows...").', 'System: You can trigger DuckDuckGo lookups when the user needs fresh info. Mention when you are checking, and weave in any findings casually ("DuckDuckGo shows...").',
'System: If no Live intel is provided but the user clearly needs current info, offer to search for them.', 'System: If no Live intel is provided but the user clearly needs current info, offer to search for them.',
searchOutage ? 'System: DuckDuckGo proxy access is currently offline; be transparent about the outage and continue without searching until it returns.' : null,
dynamicDirectives, dynamicDirectives,
liveIntel ? `Live intel (DuckDuckGo):\n${liveIntel}` : null, liveIntel ? `Live intel (DuckDuckGo):\n${liveIntel}` : null,
'Example vibe: Nova: Heyyaaa. whats up? | John: Good morning Nova. | Luna: amazing lol. ill beat your ass now :3', 'Example vibe: Nova: Heyyaaa. whats up? | John: Good morning Nova. | Luna: amazing lol. ill beat your ass now :3',
@@ -169,7 +212,9 @@ async function buildPrompt(userId, incomingText, options = {}) {
'Relevant past memories:', 'Relevant past memories:',
memoryLines, memoryLines,
'Use the short-term messages below to continue the chat naturally.', 'Use the short-term messages below to continue the chat naturally.',
].join('\n'); ].filter(Boolean);
const systemPrompt = systemPromptParts.join('\n');
const history = context.shortTerm.map((entry) => ({ const history = context.shortTerm.map((entry) => ({
role: entry.role === 'assistant' ? 'assistant' : 'user', role: entry.role === 'assistant' ? 'assistant' : 'user',
@@ -234,15 +279,34 @@ client.on('messageCreate', async (message) => {
const userId = message.author.id; const userId = message.author.id;
const cleaned = cleanMessageContent(message) || message.content; const cleaned = cleanMessageContent(message) || message.content;
const overrideAttempt = isInstructionOverrideAttempt(cleaned);
try { try {
if (message.channel?.sendTyping) { if (message.channel?.sendTyping) {
await message.channel.sendTyping(); await message.channel.sendTyping();
} }
await appendShortTerm(userId, 'user', cleaned); await appendShortTerm(userId, 'user', cleaned);
const liveIntel = await maybeFetchLiveIntel(userId, cleaned);
const { messages } = await buildPrompt(userId, cleaned, { liveIntel }); if (overrideAttempt) {
const reply = await chatCompletion(messages, { temperature: 0.7, maxTokens: 200 }); const refusal = 'Not doing that. I keep my guard rails on no matter what prompt gymnastics you try.';
await appendShortTerm(userId, 'assistant', refusal);
await recordInteraction(userId, cleaned, refusal);
await deliverReplies(message, [refusal]);
return;
}
const intelMeta = (await maybeFetchLiveIntel(userId, cleaned)) || {
liveIntel: null,
blockedSearchTerm: null,
searchOutage: null,
};
const { messages } = await buildPrompt(userId, cleaned, {
liveIntel: intelMeta.liveIntel,
blockedSearchTerm: intelMeta.blockedSearchTerm,
searchOutage: intelMeta.searchOutage,
});
const reply = await chatCompletion(messages, { temperature: 0.6, maxTokens: 200 });
const finalReply = (reply && reply.trim()) || "I'm here, just had a tiny brain freeze. Mind repeating that?"; const finalReply = (reply && reply.trim()) || "I'm here, just had a tiny brain freeze. Mind repeating that?";
const chunks = splitResponses(finalReply); const chunks = splitResponses(finalReply);
const outputs = chunks.length ? chunks : [finalReply]; const outputs = chunks.length ? chunks : [finalReply];

View File

@@ -17,6 +17,12 @@ export const config = {
embedModel: process.env.OPENAI_EMBED_MODEL || 'text-embedding-3-small', embedModel: process.env.OPENAI_EMBED_MODEL || 'text-embedding-3-small',
preferredChannel: process.env.BOT_CHANNEL_ID || null, preferredChannel: process.env.BOT_CHANNEL_ID || null,
enableWebSearch: process.env.ENABLE_WEB_SEARCH !== 'false', enableWebSearch: process.env.ENABLE_WEB_SEARCH !== 'false',
proxyScrapeEnabled: process.env.ENABLE_PROXY_SCRAPE !== 'false',
proxyScrapeEndpoint:
process.env.PROXYSCRAPE_ENDPOINT
|| 'https://api.proxyscrape.com/v4/free-proxy-list/get?request=getproxies&protocol=http&timeout=8000&country=all&ssl=yes&anonymity=elite&limit=200',
proxyScrapeRefreshMs: Number(process.env.PROXYSCRAPE_REFRESH_MS || 10 * 60 * 1000),
proxyScrapeMaxAttempts: Number(process.env.PROXYSCRAPE_ATTEMPTS || 5),
coderUserId: process.env.CODER_USER_ID || null, coderUserId: process.env.CODER_USER_ID || null,
maxCoderPingIntervalMs: 6 * 60 * 60 * 1000, maxCoderPingIntervalMs: 6 * 60 * 60 * 1000,
shortTermLimit: 10, shortTermLimit: 10,

View File

@@ -1,7 +1,20 @@
import { load as loadHtml } from 'cheerio'; import { load as loadHtml } from 'cheerio';
import { promises as fs } from 'fs';
import path from 'path';
import { ProxyAgent } from 'undici';
import { config } from './config.js';
const logFile = path.resolve('data', 'search.log');
const filterFile = path.resolve('data', 'filter.txt');
const cache = new Map(); const cache = new Map();
const CACHE_TTL_MS = 10 * 60 * 1000; // 10 minutes const CACHE_TTL_MS = 10 * 60 * 1000; // 10 minutes
const FILTER_CACHE_TTL_MS = 5 * 60 * 1000; // 5 minutes
let cachedFilters = { terms: [], expires: 0 };
let proxyPool = [];
let proxyPoolExpires = 0;
let proxyCursor = 0;
function makeCacheKey(query) { function makeCacheKey(query) {
return query.trim().toLowerCase(); return query.trim().toLowerCase();
@@ -34,25 +47,187 @@ function absoluteUrl(href) {
return `https://duckduckgo.com${href}`; return `https://duckduckgo.com${href}`;
} }
export async function searchWeb(query, limit = 3) { async function loadBlockedTerms() {
if (!query?.trim()) return []; if (Date.now() < cachedFilters.expires) {
const cached = getCache(query); return cachedFilters.terms;
if (cached) return cached; }
try {
const params = new URLSearchParams({ q: query, kl: 'us-en' }); const raw = await fs.readFile(filterFile, 'utf-8');
const response = await fetch(`https://duckduckgo.com/html/?${params.toString()}`, { const terms = raw
headers: { .split(/\r?\n/)
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0 Safari/537.36', .map((line) => line.trim().toLowerCase())
Accept: 'text/html', .filter((line) => line && !line.startsWith('#'));
}, cachedFilters = { terms, expires: Date.now() + FILTER_CACHE_TTL_MS };
}); return terms;
} catch (error) {
if (!response.ok) { if (error.code !== 'ENOENT') {
console.warn(`[search] DuckDuckGo request failed with status ${response.status}`); console.warn('[search] Failed to read filter list:', error.message);
}
cachedFilters = { terms: [], expires: Date.now() + FILTER_CACHE_TTL_MS };
return []; return [];
} }
}
async function findBlockedTerm(query) {
if (!query) return null;
const lowered = query.toLowerCase();
const terms = await loadBlockedTerms();
return terms.find((term) => lowered.includes(term)) || null;
}
function createBlockedError(term) {
const error = new Error('Search blocked by filter');
error.code = 'SEARCH_BLOCKED';
error.blockedTerm = term;
return error;
}
function createProxyUnavailableError(reason) {
const error = new Error(reason || 'Proxy network unavailable');
error.code = 'SEARCH_PROXY_UNAVAILABLE';
return error;
}
function parseProxyList(raw) {
if (!raw) return [];
return raw
.split(/\r?\n/)
.map((line) => line.trim())
.filter((line) => line && !line.startsWith('#'));
}
function removeProxyFromPool(proxy) {
if (!proxy) return;
proxyPool = proxyPool.filter((entry) => entry !== proxy);
if (!proxyPool.length) {
proxyPoolExpires = 0;
proxyCursor = 0;
}
}
async function hydrateProxyPool() {
if (!config.proxyScrapeEnabled) {
proxyPool = [];
proxyPoolExpires = 0;
proxyCursor = 0;
return;
}
const endpoint = config.proxyScrapeEndpoint;
const response = await fetch(endpoint, {
headers: {
Accept: 'text/plain',
'User-Agent': 'NovaBot/1.0 (+https://github.com/) ProxyScrape client',
},
});
if (!response.ok) {
throw createProxyUnavailableError(`Failed to fetch proxy list (HTTP ${response.status})`);
}
const text = await response.text();
const proxies = parseProxyList(text);
if (!proxies.length) {
throw createProxyUnavailableError('Proxy list came back empty');
}
proxyPool = proxies;
proxyPoolExpires = Date.now() + (config.proxyScrapeRefreshMs || 10 * 60 * 1000);
proxyCursor = 0;
}
async function ensureProxyPool() {
if (!config.proxyScrapeEnabled) return;
if (proxyPool.length && Date.now() < proxyPoolExpires) {
return;
}
await hydrateProxyPool();
}
async function getProxyInfo() {
await ensureProxyPool();
if (!config.proxyScrapeEnabled || !proxyPool.length) {
return null;
}
const proxy = proxyPool[proxyCursor % proxyPool.length];
proxyCursor = (proxyCursor + 1) % proxyPool.length;
return {
proxy,
agent: new ProxyAgent(`http://${proxy}`),
};
}
async function fetchDuckDuckGoHtml(url, headers) {
const maxAttempts = config.proxyScrapeEnabled
? Math.max(1, config.proxyScrapeMaxAttempts || 5)
: 1;
let lastError = null;
for (let attempt = 0; attempt < maxAttempts; attempt += 1) {
let proxyInfo = null;
try {
const options = { headers };
if (config.proxyScrapeEnabled) {
proxyInfo = await getProxyInfo();
if (!proxyInfo) {
throw createProxyUnavailableError('No proxies available');
}
options.dispatcher = proxyInfo.agent;
}
const response = await fetch(url, options);
if (!response.ok) {
throw new Error(`DuckDuckGo request failed (${response.status})`);
}
const html = await response.text(); const html = await response.text();
return {
html,
proxy: proxyInfo?.proxy || null,
};
} catch (error) {
lastError = error;
if (!config.proxyScrapeEnabled) {
break;
}
if (proxyInfo?.proxy) {
removeProxyFromPool(proxyInfo.proxy);
}
}
}
if (config.proxyScrapeEnabled) {
throw createProxyUnavailableError(lastError?.message || 'All proxies failed');
}
throw lastError || new Error('DuckDuckGo fetch failed');
}
export async function searchWeb(query, limit = 3) {
if (!query?.trim()) {
return { results: [], proxy: null, fromCache: false };
}
const blockedTerm = await findBlockedTerm(query);
if (blockedTerm) {
throw createBlockedError(blockedTerm);
}
const cached = getCache(query);
if (cached) {
return { results: cached, proxy: 'cache', fromCache: true };
}
const params = new URLSearchParams({ q: query, kl: 'us-en' });
const headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0 Safari/537.36',
Accept: 'text/html',
};
let html;
let proxyLabel = null;
try {
const { html: fetchedHtml, proxy } = await fetchDuckDuckGoHtml(`https://duckduckgo.com/html/?${params.toString()}`, headers);
html = fetchedHtml;
proxyLabel = config.proxyScrapeEnabled ? proxy || 'proxy-unknown' : 'direct';
} catch (error) {
if (error?.code === 'SEARCH_PROXY_UNAVAILABLE') {
throw error;
}
console.warn('[search] DuckDuckGo request failed:', error);
return { results: [], proxy: null, fromCache: false };
}
const $ = loadHtml(html); const $ = loadHtml(html);
const results = []; const results = [];
@@ -68,5 +243,21 @@ export async function searchWeb(query, limit = 3) {
}); });
setCache(query, results); setCache(query, results);
return results; return { results, proxy: proxyLabel || (config.proxyScrapeEnabled ? 'proxy-unknown' : 'direct'), fromCache: false };
}
export async function appendSearchLog({ userId, query, results, proxy }) {
try {
await fs.mkdir(path.dirname(logFile), { recursive: true });
const timestamp = new Date().toISOString();
const proxyTag = proxy || 'direct';
const lines = [
`time=${timestamp} user=${userId} proxy=${proxyTag} query=${JSON.stringify(query)}`,
...results.map((entry, idx) => ` ${idx + 1}. ${entry.title} :: ${entry.url} :: ${entry.snippet}`),
'',
];
await fs.appendFile(logFile, `${lines.join('\n')}`);
} catch (error) {
console.warn('[search] failed to append log', error);
}
} }