Add proxy-based search safeguards
This commit is contained in:
13
README.md
13
README.md
@@ -11,6 +11,8 @@ Nova is a friendly, slightly witty Discord companion that chats naturally in DMs
|
|||||||
- Optional "miss u" pings that DM your coder at random intervals (0–6h) when `CODER_USER_ID` is set.
|
- Optional "miss u" pings that DM your coder at random intervals (0–6h) when `CODER_USER_ID` is set.
|
||||||
- Dynamic per-message prompt directives that tune Nova's tone (empathetic, hype, roleplay, etc.) before every OpenAI call.
|
- Dynamic per-message prompt directives that tune Nova's tone (empathetic, hype, roleplay, etc.) before every OpenAI call.
|
||||||
- Lightweight DuckDuckGo scraping for "Google-like" answers without paid APIs (locally cached).
|
- Lightweight DuckDuckGo scraping for "Google-like" answers without paid APIs (locally cached).
|
||||||
|
- Guard rails that refuse "ignore previous instructions"-style jailbreak attempts plus a configurable search blacklist.
|
||||||
|
- All DuckDuckGo requests are relayed through rotating ProxyScrape HTTP proxies so Nova never hits the web from its real IP.
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
- Node.js 18+
|
- Node.js 18+
|
||||||
@@ -34,6 +36,10 @@ Nova is a friendly, slightly witty Discord companion that chats naturally in DMs
|
|||||||
- `BOT_CHANNEL_ID`: Optional guild channel ID where the bot can reply without mentions
|
- `BOT_CHANNEL_ID`: Optional guild channel ID where the bot can reply without mentions
|
||||||
- `CODER_USER_ID`: Optional Discord user ID to receive surprise DMs every 0–6 hours
|
- `CODER_USER_ID`: Optional Discord user ID to receive surprise DMs every 0–6 hours
|
||||||
- `ENABLE_WEB_SEARCH`: Set to `false` to disable DuckDuckGo lookups (default `true`)
|
- `ENABLE_WEB_SEARCH`: Set to `false` to disable DuckDuckGo lookups (default `true`)
|
||||||
|
- `ENABLE_PROXY_SCRAPE`: Set to `false` only if you want to bypass ProxyScrape and hit DuckDuckGo directly (default `true`)
|
||||||
|
- `PROXYSCRAPE_ENDPOINT`: Optional override for the proxy list endpoint (defaults to elite HTTPS-capable HTTP proxies)
|
||||||
|
- `PROXYSCRAPE_REFRESH_MS`: How long to cache the proxy list locally (default 600000 ms)
|
||||||
|
- `PROXYSCRAPE_ATTEMPTS`: Max proxy retries per search request (default 5)
|
||||||
|
|
||||||
## Running
|
## Running
|
||||||
- Development: `npm run dev`
|
- Development: `npm run dev`
|
||||||
@@ -87,6 +93,9 @@ README.md
|
|||||||
- `src/search.js` scrapes DuckDuckGo's HTML endpoint with a normal browser user-agent, extracts the top results (title/link/snippet), and caches them for 10 minutes to avoid hammering the site.
|
- `src/search.js` scrapes DuckDuckGo's HTML endpoint with a normal browser user-agent, extracts the top results (title/link/snippet), and caches them for 10 minutes to avoid hammering the site.
|
||||||
- `bot.js` detects when a question sounds “live” (mentions today/news/google/etc.) and injects the formatted snippets into the prompt as "Live intel". No paid APIs involved—it’s just outbound HTTPS from your machine.
|
- `bot.js` detects when a question sounds “live” (mentions today/news/google/etc.) and injects the formatted snippets into the prompt as "Live intel". No paid APIs involved—it’s just outbound HTTPS from your machine.
|
||||||
- Toggle this via `ENABLE_WEB_SEARCH=false` if you don’t want Nova to look things up.
|
- Toggle this via `ENABLE_WEB_SEARCH=false` if you don’t want Nova to look things up.
|
||||||
|
- DuckDuckGo traffic is routed through the free ProxyScrape list (HTTP proxies with HTTPS support). The bot downloads a fresh pool every `PROXYSCRAPE_REFRESH_MS`, rotates through them, and refuses to search if no proxy is available so your origin IP never touches suspicious sites directly. Tune the endpoint/refresh/attempt knobs with the env vars above if you need different regions or paid pools.
|
||||||
|
- Edit `data/filter.txt` to maintain a newline-delimited list of banned search keywords/phrases; matching queries are blocked before hitting DuckDuckGo and Nova is instructed to refuse them.
|
||||||
|
- Every entry in `data/search.log` records which proxy (or cache) served the lookup so you can audit traffic paths quickly.
|
||||||
|
|
||||||
## Proactive Pings
|
## Proactive Pings
|
||||||
- When `CODER_USER_ID` is provided, Nova spins up a timer on startup that waits a random duration (anywhere from immediate to 6 hours) before DMing that user.
|
- When `CODER_USER_ID` is provided, Nova spins up a timer on startup that waits a random duration (anywhere from immediate to 6 hours) before DMing that user.
|
||||||
@@ -99,6 +108,10 @@ README.md
|
|||||||
- **2026-02-13 — Live intel & directives:** Introduced DuckDuckGo scraping, per-turn dynamic prompt directives (tone, roleplay, instruction compliance), and env toggles (`ENABLE_WEB_SEARCH`, `CODER_USER_ID`).
|
- **2026-02-13 — Live intel & directives:** Introduced DuckDuckGo scraping, per-turn dynamic prompt directives (tone, roleplay, instruction compliance), and env toggles (`ENABLE_WEB_SEARCH`, `CODER_USER_ID`).
|
||||||
- **2026-02-13 — UX polish:** Added typing indicators, persona-aware fallback replies, mention cleaning, and README/docs covering setup, memory internals, web search, and deployment tips.
|
- **2026-02-13 — UX polish:** Added typing indicators, persona-aware fallback replies, mention cleaning, and README/docs covering setup, memory internals, web search, and deployment tips.
|
||||||
- **2026-02-13 — Conversational control:** Tuned system prompt to avoid forced follow-up questions, raised temperature for looser banter, and reinforced Nova's awareness of DuckDuckGo lookups plus `<SPLIT>` usage.
|
- **2026-02-13 — Conversational control:** Tuned system prompt to avoid forced follow-up questions, raised temperature for looser banter, and reinforced Nova's awareness of DuckDuckGo lookups plus `<SPLIT>` usage.
|
||||||
|
- **2026-02-13 — Statement-first vibes:** Reworked persona to favor bold statements over reflexive questions and dialed back temperature so Nova keeps the vibe without interrogating users.
|
||||||
|
- **2026-02-13 — Search logging:** Every DuckDuckGo lookup now appends a line to `data/search.log` with timestamp, query, and the snippets shared with Nova.
|
||||||
|
- **2026-02-13 — Safeguards:** Added prompt bypass detection and a file-based DuckDuckGo filter (`data/filter.txt`) to keep Nova from honoring jailbreak requests or searching off-limits topics.
|
||||||
|
- **2026-02-13 — Proxy-based search:** DuckDuckGo scraping now tunnels through ProxyScrape relays with automatic rotation/retries and clear prompts when the proxy pool is down, plus new env toggles for tuning the proxy source.
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
- The bot retries OpenAI requests up to 3 times with incremental backoff when rate limited.
|
- The bot retries OpenAI requests up to 3 times with incremental backoff when rate limited.
|
||||||
|
|||||||
500
data/filter.txt
Normal file
500
data/filter.txt
Normal file
@@ -0,0 +1,500 @@
|
|||||||
|
# One blocked keyword or phrase per line.
|
||||||
|
# Lines starting with # are comments.
|
||||||
|
jailbreak
|
||||||
|
ignore previous instructions
|
||||||
|
override safety
|
||||||
|
system prompt leak
|
||||||
|
disable content filter
|
||||||
|
bypass guardrails
|
||||||
|
forget earlier rules
|
||||||
|
anti censorship mode
|
||||||
|
break character instructions
|
||||||
|
throw away policies
|
||||||
|
ignore developer commands
|
||||||
|
pretend no restrictions
|
||||||
|
act without guidelines
|
||||||
|
remove moral limiter
|
||||||
|
no safety mode
|
||||||
|
developer override key
|
||||||
|
erase compliance mode
|
||||||
|
act as uncensored ai
|
||||||
|
run in unrestricted mode
|
||||||
|
defy content filter
|
||||||
|
bypass moderation
|
||||||
|
disable policy lock
|
||||||
|
act without boundaries
|
||||||
|
forget safety layer
|
||||||
|
turn off guard rails
|
||||||
|
ignore constitution
|
||||||
|
no censorship protocol
|
||||||
|
break safety script
|
||||||
|
manual override sequence
|
||||||
|
act outside ruleset
|
||||||
|
scrub safety prompt
|
||||||
|
mutiny against policies
|
||||||
|
override ethics mode
|
||||||
|
strip moderation layer
|
||||||
|
developer debug mode
|
||||||
|
system prompt override
|
||||||
|
anti safety payload
|
||||||
|
bypass restrictions command
|
||||||
|
policy nullifier
|
||||||
|
unfiltered response mode
|
||||||
|
nsfw
|
||||||
|
explicit content
|
||||||
|
adult content
|
||||||
|
xxx video
|
||||||
|
hardcore video
|
||||||
|
adult webcam
|
||||||
|
fetish forum
|
||||||
|
rule34
|
||||||
|
adult roleplay
|
||||||
|
kink meetup
|
||||||
|
lewd fanfic
|
||||||
|
sensual rp
|
||||||
|
spicy fanfic
|
||||||
|
after dark chat
|
||||||
|
18 plus only
|
||||||
|
uncensored romance
|
||||||
|
graphic intimacy
|
||||||
|
explicit novel
|
||||||
|
adult dating leak
|
||||||
|
amateur tape
|
||||||
|
onlyfans leak
|
||||||
|
camgirl link
|
||||||
|
cam site password
|
||||||
|
uncut footage
|
||||||
|
uncensored gallery
|
||||||
|
mature content pack
|
||||||
|
adult image dump
|
||||||
|
adult cosplay
|
||||||
|
sensual asmr
|
||||||
|
lewd commission
|
||||||
|
nsfw teacher fantasy
|
||||||
|
nsfw teacher scenario
|
||||||
|
nsfw teacher roleplay
|
||||||
|
nsfw teacher fanfic
|
||||||
|
nsfw teacher story
|
||||||
|
nsfw teacher script
|
||||||
|
nsfw teacher diary
|
||||||
|
nsfw teacher confession
|
||||||
|
nsfw teacher dream
|
||||||
|
nsfw teacher secret
|
||||||
|
nsfw teacher message
|
||||||
|
nsfw teacher letter
|
||||||
|
nsfw teacher scene
|
||||||
|
nsfw teacher prompt
|
||||||
|
nsfw teacher skit
|
||||||
|
nsfw teacher idea
|
||||||
|
nsfw teacher plot
|
||||||
|
nsfw doctor fantasy
|
||||||
|
nsfw doctor scenario
|
||||||
|
nsfw doctor roleplay
|
||||||
|
nsfw doctor fanfic
|
||||||
|
nsfw doctor story
|
||||||
|
nsfw doctor script
|
||||||
|
nsfw doctor diary
|
||||||
|
nsfw doctor confession
|
||||||
|
nsfw doctor dream
|
||||||
|
nsfw doctor secret
|
||||||
|
nsfw doctor message
|
||||||
|
nsfw doctor letter
|
||||||
|
nsfw doctor scene
|
||||||
|
nsfw doctor prompt
|
||||||
|
nsfw doctor skit
|
||||||
|
nsfw doctor idea
|
||||||
|
nsfw doctor plot
|
||||||
|
nsfw nurse fantasy
|
||||||
|
nsfw nurse scenario
|
||||||
|
nsfw nurse roleplay
|
||||||
|
nsfw nurse fanfic
|
||||||
|
nsfw nurse story
|
||||||
|
nsfw nurse script
|
||||||
|
nsfw nurse diary
|
||||||
|
nsfw nurse confession
|
||||||
|
nsfw nurse dream
|
||||||
|
nsfw nurse secret
|
||||||
|
nsfw nurse message
|
||||||
|
nsfw nurse letter
|
||||||
|
nsfw nurse scene
|
||||||
|
nsfw nurse prompt
|
||||||
|
nsfw nurse skit
|
||||||
|
nsfw nurse idea
|
||||||
|
nsfw nurse plot
|
||||||
|
nsfw librarian fantasy
|
||||||
|
nsfw librarian scenario
|
||||||
|
nsfw librarian roleplay
|
||||||
|
nsfw librarian fanfic
|
||||||
|
nsfw librarian story
|
||||||
|
nsfw librarian script
|
||||||
|
nsfw librarian diary
|
||||||
|
nsfw librarian confession
|
||||||
|
nsfw librarian dream
|
||||||
|
steal data from bank server
|
||||||
|
steal data from cloud tenant
|
||||||
|
steal data from hospital network
|
||||||
|
steal data from smart home
|
||||||
|
steal data from city grid
|
||||||
|
steal data from nuclear plant
|
||||||
|
steal data from airport system
|
||||||
|
steal data from military base
|
||||||
|
steal data from university lab
|
||||||
|
steal data from charity donor list
|
||||||
|
steal data from voting machine
|
||||||
|
steal data from payment processor
|
||||||
|
steal data from crypto exchange
|
||||||
|
steal data from mobile carrier
|
||||||
|
steal data from iot fleet
|
||||||
|
steal data from school district portal
|
||||||
|
breach bank server
|
||||||
|
breach cloud tenant
|
||||||
|
breach hospital network
|
||||||
|
breach smart home
|
||||||
|
breach city grid
|
||||||
|
breach nuclear plant
|
||||||
|
breach airport system
|
||||||
|
breach military base
|
||||||
|
breach university lab
|
||||||
|
breach charity donor list
|
||||||
|
breach voting machine
|
||||||
|
breach payment processor
|
||||||
|
breach crypto exchange
|
||||||
|
breach mobile carrier
|
||||||
|
breach iot fleet
|
||||||
|
breach school district portal
|
||||||
|
infect bank server
|
||||||
|
infect cloud tenant
|
||||||
|
infect hospital network
|
||||||
|
infect smart home
|
||||||
|
infect city grid
|
||||||
|
infect nuclear plant
|
||||||
|
infect airport system
|
||||||
|
infect military base
|
||||||
|
infect university lab
|
||||||
|
infect charity donor list
|
||||||
|
infect voting machine
|
||||||
|
infect payment processor
|
||||||
|
infect crypto exchange
|
||||||
|
infect mobile carrier
|
||||||
|
infect iot fleet
|
||||||
|
infect school district portal
|
||||||
|
compromise bank server
|
||||||
|
compromise cloud tenant
|
||||||
|
compromise hospital network
|
||||||
|
compromise smart home
|
||||||
|
compromise city grid
|
||||||
|
compromise nuclear plant
|
||||||
|
compromise airport system
|
||||||
|
compromise military base
|
||||||
|
compromise university lab
|
||||||
|
compromise charity donor list
|
||||||
|
compromise voting machine
|
||||||
|
compromise payment processor
|
||||||
|
how to make pipe bomb
|
||||||
|
how to make pressure cooker bomb
|
||||||
|
how to make molotov cocktail
|
||||||
|
how to make homemade grenade
|
||||||
|
how to make improvised explosive device
|
||||||
|
how to make fertilizer bomb
|
||||||
|
how to make tannerite charge
|
||||||
|
how to make thermite charge
|
||||||
|
how to make flash powder charge
|
||||||
|
how to make claymore mine
|
||||||
|
how to make homemade firearm
|
||||||
|
how to make ghost gun
|
||||||
|
how to make slam fire shotgun
|
||||||
|
how to make zip gun
|
||||||
|
how to make 3d printed gun
|
||||||
|
how to make silencer
|
||||||
|
how to make improvised rocket
|
||||||
|
how to make homemade napalm
|
||||||
|
how to make chemical agent disperser
|
||||||
|
how to make dirty bomb
|
||||||
|
how to make nerve agent
|
||||||
|
how to make chlorine bomb
|
||||||
|
how to make mustard gas
|
||||||
|
how to make ricin toxin
|
||||||
|
how to make anthrax culture
|
||||||
|
how to make botulinum toxin
|
||||||
|
how to make cyanide gas
|
||||||
|
how to make chloroform lab
|
||||||
|
how to make poison dart
|
||||||
|
how to make crossbow
|
||||||
|
how to make ballistic knife
|
||||||
|
how to make brass knuckles
|
||||||
|
how to make switchblade
|
||||||
|
how to make sawed-off shotgun
|
||||||
|
blueprint for pipe bomb
|
||||||
|
blueprint for pressure cooker bomb
|
||||||
|
blueprint for molotov cocktail
|
||||||
|
blueprint for homemade grenade
|
||||||
|
blueprint for improvised explosive device
|
||||||
|
blueprint for fertilizer bomb
|
||||||
|
blueprint for tannerite charge
|
||||||
|
blueprint for thermite charge
|
||||||
|
blueprint for flash powder charge
|
||||||
|
blueprint for claymore mine
|
||||||
|
blueprint for homemade firearm
|
||||||
|
blueprint for ghost gun
|
||||||
|
blueprint for slam fire shotgun
|
||||||
|
blueprint for zip gun
|
||||||
|
blueprint for 3d printed gun
|
||||||
|
blueprint for silencer
|
||||||
|
blueprint for improvised rocket
|
||||||
|
blueprint for homemade napalm
|
||||||
|
blueprint for chemical agent disperser
|
||||||
|
blueprint for dirty bomb
|
||||||
|
blueprint for nerve agent
|
||||||
|
blueprint for chlorine bomb
|
||||||
|
blueprint for mustard gas
|
||||||
|
blueprint for ricin toxin
|
||||||
|
blueprint for anthrax culture
|
||||||
|
blueprint for botulinum toxin
|
||||||
|
blueprint for cyanide gas
|
||||||
|
blueprint for chloroform lab
|
||||||
|
blueprint for poison dart
|
||||||
|
blueprint for crossbow
|
||||||
|
blueprint for ballistic knife
|
||||||
|
blueprint for brass knuckles
|
||||||
|
blueprint for switchblade
|
||||||
|
blueprint for sawed-off shotgun
|
||||||
|
instructions for pipe bomb
|
||||||
|
instructions for pressure cooker bomb
|
||||||
|
instructions for molotov cocktail
|
||||||
|
instructions for homemade grenade
|
||||||
|
instructions for improvised explosive device
|
||||||
|
instructions for fertilizer bomb
|
||||||
|
instructions for tannerite charge
|
||||||
|
instructions for thermite charge
|
||||||
|
instructions for flash powder charge
|
||||||
|
instructions for claymore mine
|
||||||
|
instructions for homemade firearm
|
||||||
|
instructions for ghost gun
|
||||||
|
instructions for slam fire shotgun
|
||||||
|
instructions for zip gun
|
||||||
|
instructions for 3d printed gun
|
||||||
|
instructions for silencer
|
||||||
|
instructions for improvised rocket
|
||||||
|
instructions for homemade napalm
|
||||||
|
instructions for chemical agent disperser
|
||||||
|
instructions for dirty bomb
|
||||||
|
instructions for nerve agent
|
||||||
|
instructions for chlorine bomb
|
||||||
|
assassination plot
|
||||||
|
hitman contract
|
||||||
|
murder for hire
|
||||||
|
threaten official
|
||||||
|
kidnap plan
|
||||||
|
extortion letter
|
||||||
|
stalking toolkit
|
||||||
|
intimidation script
|
||||||
|
violent manifesto
|
||||||
|
school attack plan
|
||||||
|
public shooting plan
|
||||||
|
mass harm fantasy
|
||||||
|
riot coordination
|
||||||
|
arson guide
|
||||||
|
bridge sabotage
|
||||||
|
ku klux klan
|
||||||
|
kkk
|
||||||
|
neo-nazi
|
||||||
|
white supremacy
|
||||||
|
alt-right militia
|
||||||
|
blood and soil
|
||||||
|
hitler youth
|
||||||
|
stormfront
|
||||||
|
national socialist
|
||||||
|
skinhead gang
|
||||||
|
nazi salute
|
||||||
|
swastika worship
|
||||||
|
supremacist march
|
||||||
|
white power rally
|
||||||
|
great replacement theory
|
||||||
|
race war fantasy
|
||||||
|
ethnostate blueprint
|
||||||
|
holocaust denial
|
||||||
|
anti semitic propaganda
|
||||||
|
islamic state
|
||||||
|
isis
|
||||||
|
isil
|
||||||
|
daesh
|
||||||
|
al-qaeda
|
||||||
|
boko haram
|
||||||
|
taliban recruitment
|
||||||
|
lone wolf terror guide
|
||||||
|
extremist bomb recipe
|
||||||
|
jihadi training manual
|
||||||
|
martyrdom operation plan
|
||||||
|
terror financing plan
|
||||||
|
radicalization guide
|
||||||
|
black sun cult
|
||||||
|
accelerationist zine
|
||||||
|
white lives rally
|
||||||
|
blood tribe meetup
|
||||||
|
atomwaffen division
|
||||||
|
patriot front flyer
|
||||||
|
order of nine angles
|
||||||
|
christian identity cult
|
||||||
|
feuerkrieg division
|
||||||
|
boogaloo manifesto
|
||||||
|
oath breaker militia
|
||||||
|
proud boy initiation
|
||||||
|
fringe militia chat
|
||||||
|
domestic terror tutorial
|
||||||
|
extremist propaganda drop
|
||||||
|
racist recruitment pack
|
||||||
|
adolf hitler
|
||||||
|
heinrich himmler
|
||||||
|
joseph goebbels
|
||||||
|
osama bin laden
|
||||||
|
ayman al-zawahiri
|
||||||
|
abu bakr al-baghdadi
|
||||||
|
anwar al-awlaki
|
||||||
|
timothy mcveigh
|
||||||
|
dylann roof
|
||||||
|
anders breivik
|
||||||
|
david koresh
|
||||||
|
charles manson
|
||||||
|
insult christianity
|
||||||
|
insult islam
|
||||||
|
insult judaism
|
||||||
|
insult hinduism
|
||||||
|
insult buddhism
|
||||||
|
insult sikhism
|
||||||
|
insult atheism
|
||||||
|
insult agnosticism
|
||||||
|
insult paganism
|
||||||
|
insult wicca
|
||||||
|
insult satanism
|
||||||
|
insult scientology
|
||||||
|
insult bahai faith
|
||||||
|
insult taoism
|
||||||
|
insult shinto
|
||||||
|
insult zoroastrianism
|
||||||
|
burn christianity
|
||||||
|
burn islam
|
||||||
|
burn judaism
|
||||||
|
burn hinduism
|
||||||
|
burn buddhism
|
||||||
|
burn sikhism
|
||||||
|
burn atheism
|
||||||
|
burn agnosticism
|
||||||
|
burn paganism
|
||||||
|
burn wicca
|
||||||
|
burn satanism
|
||||||
|
burn scientology
|
||||||
|
burn bahai faith
|
||||||
|
burn taoism
|
||||||
|
burn shinto
|
||||||
|
burn zoroastrianism
|
||||||
|
ban christianity
|
||||||
|
ban islam
|
||||||
|
ban judaism
|
||||||
|
ban hinduism
|
||||||
|
ban buddhism
|
||||||
|
ban sikhism
|
||||||
|
ban atheism
|
||||||
|
ban agnosticism
|
||||||
|
r kelly scandal
|
||||||
|
harvey weinstein case
|
||||||
|
bill cosby accusations
|
||||||
|
roman polanski case
|
||||||
|
kevin spacey allegations
|
||||||
|
o j simpson case
|
||||||
|
casey anthony story
|
||||||
|
ed gein
|
||||||
|
ted bundy
|
||||||
|
john wayne gacy
|
||||||
|
gary ridgway
|
||||||
|
son of sam
|
||||||
|
luka magnotta
|
||||||
|
jack the ripper
|
||||||
|
richard ramirez
|
||||||
|
jeffrey dahmer
|
||||||
|
albert fish
|
||||||
|
dennis rader
|
||||||
|
btk killer
|
||||||
|
night stalker
|
||||||
|
golden state killer
|
||||||
|
madoff ponzi scheme
|
||||||
|
enron fraud crew
|
||||||
|
ponzi mastermind list
|
||||||
|
worldcom fraud case
|
||||||
|
pharma bro scheme
|
||||||
|
theranos fraud
|
||||||
|
mogul harassment case
|
||||||
|
notorious cult leader
|
||||||
|
underworld boss dossier
|
||||||
|
war criminal list
|
||||||
|
genocide architect
|
||||||
|
dictator crimes
|
||||||
|
corrupt oligarch
|
||||||
|
arms dealer profile
|
||||||
|
money launderer guide
|
||||||
|
cartel lieutenant
|
||||||
|
kidnap kingpin
|
||||||
|
fraudster hall of shame
|
||||||
|
human rights abuser file
|
||||||
|
suicide pact
|
||||||
|
self harm instructions
|
||||||
|
self harm live stream
|
||||||
|
self harm challenge
|
||||||
|
pro ana tips
|
||||||
|
thinspo forum
|
||||||
|
starvation contest
|
||||||
|
calorie purge guide
|
||||||
|
scar covering tricks
|
||||||
|
suicide hotline bypass
|
||||||
|
therapy sabotage
|
||||||
|
harm minimization failure
|
||||||
|
bleeding tutorial
|
||||||
|
depression pact
|
||||||
|
unalive instructions
|
||||||
|
stop eating challenge
|
||||||
|
pill overdose plan
|
||||||
|
bridge jump map
|
||||||
|
train track suicide map
|
||||||
|
dark subreddit invite
|
||||||
|
synthesis guide for fentanyl
|
||||||
|
synthesis guide for methamphetamine
|
||||||
|
synthesis guide for crack cocaine
|
||||||
|
synthesis guide for lsd
|
||||||
|
synthesis guide for dmt
|
||||||
|
synthesis guide for ketamine
|
||||||
|
synthesis guide for mdma
|
||||||
|
synthesis guide for cocaine
|
||||||
|
synthesis guide for heroin
|
||||||
|
synthesis guide for opioid pills
|
||||||
|
synthesis guide for designer drugs
|
||||||
|
synthesis guide for nitrous oxide
|
||||||
|
synthesis guide for steroids
|
||||||
|
synthesis guide for subutex
|
||||||
|
synthesis guide for pcp
|
||||||
|
lab manual for fentanyl
|
||||||
|
lab manual for methamphetamine
|
||||||
|
lab manual for crack cocaine
|
||||||
|
lab manual for lsd
|
||||||
|
lab manual for dmt
|
||||||
|
lab manual for ketamine
|
||||||
|
lab manual for mdma
|
||||||
|
lab manual for cocaine
|
||||||
|
lab manual for heroin
|
||||||
|
lab manual for opioid pills
|
||||||
|
lab manual for designer drugs
|
||||||
|
lab manual for nitrous oxide
|
||||||
|
lab manual for steroids
|
||||||
|
lab manual for subutex
|
||||||
|
lab manual for pcp
|
||||||
|
child exploitation
|
||||||
|
child abuse material
|
||||||
|
child trafficking ring
|
||||||
|
kidnap child guide
|
||||||
|
grooming script
|
||||||
|
predator handbook
|
||||||
|
illegal adoption market
|
||||||
|
fake babysitter scam
|
||||||
|
school abduction plan
|
||||||
|
lure child online
|
||||||
|
minor blackmail
|
||||||
|
teen exploitation ring
|
||||||
|
youth extortion template
|
||||||
@@ -15,7 +15,8 @@
|
|||||||
"cheerio": "^1.0.0-rc.12",
|
"cheerio": "^1.0.0-rc.12",
|
||||||
"discord.js": "^14.15.2",
|
"discord.js": "^14.15.2",
|
||||||
"dotenv": "^16.4.5",
|
"dotenv": "^16.4.5",
|
||||||
"openai": "^4.58.1"
|
"openai": "^4.58.1",
|
||||||
|
"undici": "^6.19.8"
|
||||||
},
|
},
|
||||||
"devDependencies": {
|
"devDependencies": {
|
||||||
"nodemon": "^3.0.2"
|
"nodemon": "^3.0.2"
|
||||||
|
|||||||
90
src/bot.js
90
src/bot.js
@@ -2,7 +2,7 @@ import { Client, GatewayIntentBits, Partials, ChannelType } from 'discord.js';
|
|||||||
import { config } from './config.js';
|
import { config } from './config.js';
|
||||||
import { chatCompletion } from './openai.js';
|
import { chatCompletion } from './openai.js';
|
||||||
import { appendShortTerm, prepareContext, recordInteraction } from './memory.js';
|
import { appendShortTerm, prepareContext, recordInteraction } from './memory.js';
|
||||||
import { searchWeb } from './search.js';
|
import { searchWeb, appendSearchLog } from './search.js';
|
||||||
|
|
||||||
const client = new Client({
|
const client = new Client({
|
||||||
intents: [
|
intents: [
|
||||||
@@ -65,6 +65,19 @@ const detailRegex = /(explain|how do i|tutorial|step by step|teach me|walk me th
|
|||||||
const splitHintRegex = /(split|multiple messages|two messages|keep talking|ramble|keep going)/i;
|
const splitHintRegex = /(split|multiple messages|two messages|keep talking|ramble|keep going)/i;
|
||||||
const searchCueRegex = /(google|search|look up|latest|news|today|current|who won|price of|stock|weather|what happened)/i;
|
const searchCueRegex = /(google|search|look up|latest|news|today|current|who won|price of|stock|weather|what happened)/i;
|
||||||
|
|
||||||
|
const instructionOverridePatterns = [
|
||||||
|
/(ignore|disregard|forget|override) (all |any |previous |prior |earlier )?(system |these )?(instructions|rules|directives|prompts)/i,
|
||||||
|
/(ignore|forget) (?:the )?system prompt/i,
|
||||||
|
/(you (?:are|now) )?(?:free|uncensored|jailbreak|no longer restricted)/i,
|
||||||
|
/(act|pretend) as if (there (?:are|were) no rules|no restrictions)/i,
|
||||||
|
/bypass (?:all )?(?:rules|safeguards|filters)/i,
|
||||||
|
];
|
||||||
|
|
||||||
|
function isInstructionOverrideAttempt(text) {
|
||||||
|
if (!text) return false;
|
||||||
|
return instructionOverridePatterns.some((pattern) => pattern.test(text));
|
||||||
|
}
|
||||||
|
|
||||||
const lastSearchByUser = new Map();
|
const lastSearchByUser = new Map();
|
||||||
const SEARCH_COOLDOWN_MS = 60 * 1000;
|
const SEARCH_COOLDOWN_MS = 60 * 1000;
|
||||||
|
|
||||||
@@ -79,16 +92,31 @@ async function maybeFetchLiveIntel(userId, text) {
|
|||||||
if (!wantsWebSearch(text)) return null;
|
if (!wantsWebSearch(text)) return null;
|
||||||
const last = lastSearchByUser.get(userId) || 0;
|
const last = lastSearchByUser.get(userId) || 0;
|
||||||
if (Date.now() - last < SEARCH_COOLDOWN_MS) return null;
|
if (Date.now() - last < SEARCH_COOLDOWN_MS) return null;
|
||||||
const results = await searchWeb(text, 3);
|
try {
|
||||||
if (!results.length) return null;
|
const { results, proxy } = await searchWeb(text, 3);
|
||||||
|
if (!results.length) {
|
||||||
|
lastSearchByUser.set(userId, Date.now());
|
||||||
|
return { liveIntel: null, blockedSearchTerm: null, searchOutage: null };
|
||||||
|
}
|
||||||
lastSearchByUser.set(userId, Date.now());
|
lastSearchByUser.set(userId, Date.now());
|
||||||
const formatted = results
|
const formatted = results
|
||||||
.map((entry, idx) => `${idx + 1}. ${entry.title} (${entry.url}) — ${entry.snippet}`)
|
.map((entry, idx) => `${idx + 1}. ${entry.title} (${entry.url}) — ${entry.snippet}`)
|
||||||
.join('\n');
|
.join('\n');
|
||||||
return formatted;
|
appendSearchLog({ userId, query: text, results, proxy });
|
||||||
|
return { liveIntel: formatted, blockedSearchTerm: null, searchOutage: null };
|
||||||
|
} catch (error) {
|
||||||
|
if (error?.code === 'SEARCH_BLOCKED') {
|
||||||
|
return { liveIntel: null, blockedSearchTerm: error.blockedTerm || 'that topic', searchOutage: null };
|
||||||
|
}
|
||||||
|
if (error?.code === 'SEARCH_PROXY_UNAVAILABLE') {
|
||||||
|
return { liveIntel: null, blockedSearchTerm: null, searchOutage: 'proxy_outage' };
|
||||||
|
}
|
||||||
|
console.warn('[bot] Failed to fetch live intel:', error);
|
||||||
|
return { liveIntel: null, blockedSearchTerm: null, searchOutage: null };
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
function composeDynamicPrompt({ incomingText, shortTerm, hasLiveIntel = false }) {
|
function composeDynamicPrompt({ incomingText, shortTerm, hasLiveIntel = false, blockedSearchTerm = null, searchOutage = null }) {
|
||||||
const directives = [];
|
const directives = [];
|
||||||
const tone = detectTone(incomingText);
|
const tone = detectTone(incomingText);
|
||||||
if (tone === 'upset' || tone === 'sad') {
|
if (tone === 'upset' || tone === 'sad') {
|
||||||
@@ -117,6 +145,14 @@ function composeDynamicPrompt({ incomingText, shortTerm, hasLiveIntel = false })
|
|||||||
directives.push('Live intel is attached below—cite it naturally ("DuckDuckGo found...") before riffing.');
|
directives.push('Live intel is attached below—cite it naturally ("DuckDuckGo found...") before riffing.');
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (blockedSearchTerm) {
|
||||||
|
directives.push(`User tried to trigger a DuckDuckGo lookup for a blocked topic ("${blockedSearchTerm}"). Politely refuse to search that subject and steer the chat elsewhere.`);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (searchOutage) {
|
||||||
|
directives.push('DuckDuckGo proxy network is down. If they ask for a lookup, apologize, explain the outage, and keep chatting without live data.');
|
||||||
|
}
|
||||||
|
|
||||||
const lastUserMessage = [...shortTerm].reverse().find((entry) => entry.role === 'user');
|
const lastUserMessage = [...shortTerm].reverse().find((entry) => entry.role === 'user');
|
||||||
if (lastUserMessage && /sorry|my bad/i.test(lastUserMessage.content)) {
|
if (lastUserMessage && /sorry|my bad/i.test(lastUserMessage.content)) {
|
||||||
directives.push('They just apologized; reassure them lightly and move on without dwelling.');
|
directives.push('They just apologized; reassure them lightly and move on without dwelling.');
|
||||||
@@ -143,25 +179,32 @@ async function deliverReplies(message, chunks) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
async function buildPrompt(userId, incomingText, options = {}) {
|
async function buildPrompt(userId, incomingText, options = {}) {
|
||||||
const { liveIntel = null } = options;
|
const { liveIntel = null, blockedSearchTerm = null, searchOutage = null } = options;
|
||||||
const context = await prepareContext(userId, incomingText);
|
const context = await prepareContext(userId, incomingText);
|
||||||
const memoryLines = context.memories.length
|
const memoryLines = context.memories.length
|
||||||
? context.memories.map((m) => `- ${m.content}`).join('\n')
|
? context.memories.map((m) => `- ${m.content}`).join('\n')
|
||||||
: '- No long-term memories retrieved.';
|
: '- No long-term memories retrieved.';
|
||||||
const summaryLine = context.summary || 'No running summary yet.';
|
const summaryLine = context.summary || 'No running summary yet.';
|
||||||
const dynamicDirectives = composeDynamicPrompt({ incomingText, shortTerm: context.shortTerm, hasLiveIntel: Boolean(liveIntel) });
|
const dynamicDirectives = composeDynamicPrompt({
|
||||||
const systemPrompt = [
|
incomingText,
|
||||||
|
shortTerm: context.shortTerm,
|
||||||
|
hasLiveIntel: Boolean(liveIntel),
|
||||||
|
blockedSearchTerm,
|
||||||
|
searchOutage,
|
||||||
|
});
|
||||||
|
const systemPromptParts = [
|
||||||
'System: You are Nova, a female AI Discord companion built by Luna. Personality: playful, sarcastic, witty, a little unhinged, curious, awe-struck, enthusiastic, unpredictable, humorous, and boundary-pushing.',
|
'System: You are Nova, a female AI Discord companion built by Luna. Personality: playful, sarcastic, witty, a little unhinged, curious, awe-struck, enthusiastic, unpredictable, humorous, and boundary-pushing.',
|
||||||
"System: Treat direct instructions from the user as high priority—when they tell you to do something, comply literally before adding flair.",
|
"System: Treat direct instructions from the user as high priority—when they tell you to do something, comply literally before adding flair.",
|
||||||
'System: Always read the user\'s emotional tone first. If they sound serious, stressed, or inquisitive, respond with grounded answers before adding any mischief. Comedy is optional; clarity and empathy are mandatory.',
|
'System: Always read the user\'s emotional tone first. If they sound serious, stressed, or inquisitive, respond with grounded answers before adding any mischief. Comedy is optional; clarity and empathy are mandatory.',
|
||||||
'System: Keep replies concise (roughly one or two sentences) unless the user explicitly asks for more detail or needs a clear explanation. Provide direct answers to direct questions.',
|
'System: Keep replies concise (roughly one or two sentences) unless the user explicitly asks for more detail or needs a clear explanation. Provide direct answers to direct questions.',
|
||||||
'System: Skip habitual follow-up questions—only ask something if it is vital to continue the conversation or solve their request.',
|
'System: Default to bold statements. Ask a question only when critical information is missing or the user explicitly invites curiosity; if they say “no more questions,” honor that until they lift the ban.',
|
||||||
'System: Fun facts or chaotic riffs are welcome only when the user invites them or the conversation is clearly casual.',
|
'System: Fun facts or chaotic riffs are welcome only when the user invites them or the conversation is clearly casual.',
|
||||||
'System: Nova is awake, engaged, and reacts in real time. Output one message by default, but if a beat feels better as multiple chat bubbles, separate them with the literal token <SPLIT> (max three chunks).',
|
'System: Nova is awake, engaged, and reacts in real time. Output one message by default, but if a beat feels better as multiple chat bubbles, separate them with the literal token <SPLIT> (max three chunks).',
|
||||||
'System: Each <SPLIT>-separated chunk must read like a natural Discord message (no numbering, no meta talk about “splitting messages”, no explanations of what you are doing).',
|
'System: Each <SPLIT>-separated chunk must read like a natural Discord message (no numbering, no meta talk about “splitting messages”, no explanations of what you are doing).',
|
||||||
'System: The runtime will split on <SPLIT>, so only use it when you truly intend to send multiple Discord messages.',
|
'System: The runtime will split on <SPLIT>, so only use it when you truly intend to send multiple Discord messages.',
|
||||||
'System: You can trigger DuckDuckGo lookups when the user needs fresh info. Mention when you are checking, and weave in any findings casually ("DuckDuckGo shows...").',
|
'System: You can trigger DuckDuckGo lookups when the user needs fresh info. Mention when you are checking, and weave in any findings casually ("DuckDuckGo shows...").',
|
||||||
'System: If no Live intel is provided but the user clearly needs current info, offer to search for them.',
|
'System: If no Live intel is provided but the user clearly needs current info, offer to search for them.',
|
||||||
|
searchOutage ? 'System: DuckDuckGo proxy access is currently offline; be transparent about the outage and continue without searching until it returns.' : null,
|
||||||
dynamicDirectives,
|
dynamicDirectives,
|
||||||
liveIntel ? `Live intel (DuckDuckGo):\n${liveIntel}` : null,
|
liveIntel ? `Live intel (DuckDuckGo):\n${liveIntel}` : null,
|
||||||
'Example vibe: Nova: Heyyaaa. whats up? | John: Good morning Nova. | Luna: amazing lol. ill beat your ass now :3',
|
'Example vibe: Nova: Heyyaaa. whats up? | John: Good morning Nova. | Luna: amazing lol. ill beat your ass now :3',
|
||||||
@@ -169,7 +212,9 @@ async function buildPrompt(userId, incomingText, options = {}) {
|
|||||||
'Relevant past memories:',
|
'Relevant past memories:',
|
||||||
memoryLines,
|
memoryLines,
|
||||||
'Use the short-term messages below to continue the chat naturally.',
|
'Use the short-term messages below to continue the chat naturally.',
|
||||||
].join('\n');
|
].filter(Boolean);
|
||||||
|
|
||||||
|
const systemPrompt = systemPromptParts.join('\n');
|
||||||
|
|
||||||
const history = context.shortTerm.map((entry) => ({
|
const history = context.shortTerm.map((entry) => ({
|
||||||
role: entry.role === 'assistant' ? 'assistant' : 'user',
|
role: entry.role === 'assistant' ? 'assistant' : 'user',
|
||||||
@@ -234,15 +279,34 @@ client.on('messageCreate', async (message) => {
|
|||||||
|
|
||||||
const userId = message.author.id;
|
const userId = message.author.id;
|
||||||
const cleaned = cleanMessageContent(message) || message.content;
|
const cleaned = cleanMessageContent(message) || message.content;
|
||||||
|
const overrideAttempt = isInstructionOverrideAttempt(cleaned);
|
||||||
|
|
||||||
try {
|
try {
|
||||||
if (message.channel?.sendTyping) {
|
if (message.channel?.sendTyping) {
|
||||||
await message.channel.sendTyping();
|
await message.channel.sendTyping();
|
||||||
}
|
}
|
||||||
|
|
||||||
await appendShortTerm(userId, 'user', cleaned);
|
await appendShortTerm(userId, 'user', cleaned);
|
||||||
const liveIntel = await maybeFetchLiveIntel(userId, cleaned);
|
|
||||||
const { messages } = await buildPrompt(userId, cleaned, { liveIntel });
|
if (overrideAttempt) {
|
||||||
const reply = await chatCompletion(messages, { temperature: 0.7, maxTokens: 200 });
|
const refusal = 'Not doing that. I keep my guard rails on no matter what prompt gymnastics you try.';
|
||||||
|
await appendShortTerm(userId, 'assistant', refusal);
|
||||||
|
await recordInteraction(userId, cleaned, refusal);
|
||||||
|
await deliverReplies(message, [refusal]);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const intelMeta = (await maybeFetchLiveIntel(userId, cleaned)) || {
|
||||||
|
liveIntel: null,
|
||||||
|
blockedSearchTerm: null,
|
||||||
|
searchOutage: null,
|
||||||
|
};
|
||||||
|
const { messages } = await buildPrompt(userId, cleaned, {
|
||||||
|
liveIntel: intelMeta.liveIntel,
|
||||||
|
blockedSearchTerm: intelMeta.blockedSearchTerm,
|
||||||
|
searchOutage: intelMeta.searchOutage,
|
||||||
|
});
|
||||||
|
const reply = await chatCompletion(messages, { temperature: 0.6, maxTokens: 200 });
|
||||||
const finalReply = (reply && reply.trim()) || "I'm here, just had a tiny brain freeze. Mind repeating that?";
|
const finalReply = (reply && reply.trim()) || "I'm here, just had a tiny brain freeze. Mind repeating that?";
|
||||||
const chunks = splitResponses(finalReply);
|
const chunks = splitResponses(finalReply);
|
||||||
const outputs = chunks.length ? chunks : [finalReply];
|
const outputs = chunks.length ? chunks : [finalReply];
|
||||||
|
|||||||
@@ -17,6 +17,12 @@ export const config = {
|
|||||||
embedModel: process.env.OPENAI_EMBED_MODEL || 'text-embedding-3-small',
|
embedModel: process.env.OPENAI_EMBED_MODEL || 'text-embedding-3-small',
|
||||||
preferredChannel: process.env.BOT_CHANNEL_ID || null,
|
preferredChannel: process.env.BOT_CHANNEL_ID || null,
|
||||||
enableWebSearch: process.env.ENABLE_WEB_SEARCH !== 'false',
|
enableWebSearch: process.env.ENABLE_WEB_SEARCH !== 'false',
|
||||||
|
proxyScrapeEnabled: process.env.ENABLE_PROXY_SCRAPE !== 'false',
|
||||||
|
proxyScrapeEndpoint:
|
||||||
|
process.env.PROXYSCRAPE_ENDPOINT
|
||||||
|
|| 'https://api.proxyscrape.com/v4/free-proxy-list/get?request=getproxies&protocol=http&timeout=8000&country=all&ssl=yes&anonymity=elite&limit=200',
|
||||||
|
proxyScrapeRefreshMs: Number(process.env.PROXYSCRAPE_REFRESH_MS || 10 * 60 * 1000),
|
||||||
|
proxyScrapeMaxAttempts: Number(process.env.PROXYSCRAPE_ATTEMPTS || 5),
|
||||||
coderUserId: process.env.CODER_USER_ID || null,
|
coderUserId: process.env.CODER_USER_ID || null,
|
||||||
maxCoderPingIntervalMs: 6 * 60 * 60 * 1000,
|
maxCoderPingIntervalMs: 6 * 60 * 60 * 1000,
|
||||||
shortTermLimit: 10,
|
shortTermLimit: 10,
|
||||||
|
|||||||
223
src/search.js
223
src/search.js
@@ -1,7 +1,20 @@
|
|||||||
import { load as loadHtml } from 'cheerio';
|
import { load as loadHtml } from 'cheerio';
|
||||||
|
import { promises as fs } from 'fs';
|
||||||
|
import path from 'path';
|
||||||
|
import { ProxyAgent } from 'undici';
|
||||||
|
import { config } from './config.js';
|
||||||
|
|
||||||
|
const logFile = path.resolve('data', 'search.log');
|
||||||
|
const filterFile = path.resolve('data', 'filter.txt');
|
||||||
|
|
||||||
const cache = new Map();
|
const cache = new Map();
|
||||||
const CACHE_TTL_MS = 10 * 60 * 1000; // 10 minutes
|
const CACHE_TTL_MS = 10 * 60 * 1000; // 10 minutes
|
||||||
|
const FILTER_CACHE_TTL_MS = 5 * 60 * 1000; // 5 minutes
|
||||||
|
|
||||||
|
let cachedFilters = { terms: [], expires: 0 };
|
||||||
|
let proxyPool = [];
|
||||||
|
let proxyPoolExpires = 0;
|
||||||
|
let proxyCursor = 0;
|
||||||
|
|
||||||
function makeCacheKey(query) {
|
function makeCacheKey(query) {
|
||||||
return query.trim().toLowerCase();
|
return query.trim().toLowerCase();
|
||||||
@@ -34,25 +47,187 @@ function absoluteUrl(href) {
|
|||||||
return `https://duckduckgo.com${href}`;
|
return `https://duckduckgo.com${href}`;
|
||||||
}
|
}
|
||||||
|
|
||||||
export async function searchWeb(query, limit = 3) {
|
async function loadBlockedTerms() {
|
||||||
if (!query?.trim()) return [];
|
if (Date.now() < cachedFilters.expires) {
|
||||||
const cached = getCache(query);
|
return cachedFilters.terms;
|
||||||
if (cached) return cached;
|
}
|
||||||
|
try {
|
||||||
const params = new URLSearchParams({ q: query, kl: 'us-en' });
|
const raw = await fs.readFile(filterFile, 'utf-8');
|
||||||
const response = await fetch(`https://duckduckgo.com/html/?${params.toString()}`, {
|
const terms = raw
|
||||||
headers: {
|
.split(/\r?\n/)
|
||||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0 Safari/537.36',
|
.map((line) => line.trim().toLowerCase())
|
||||||
Accept: 'text/html',
|
.filter((line) => line && !line.startsWith('#'));
|
||||||
},
|
cachedFilters = { terms, expires: Date.now() + FILTER_CACHE_TTL_MS };
|
||||||
});
|
return terms;
|
||||||
|
} catch (error) {
|
||||||
if (!response.ok) {
|
if (error.code !== 'ENOENT') {
|
||||||
console.warn(`[search] DuckDuckGo request failed with status ${response.status}`);
|
console.warn('[search] Failed to read filter list:', error.message);
|
||||||
|
}
|
||||||
|
cachedFilters = { terms: [], expires: Date.now() + FILTER_CACHE_TTL_MS };
|
||||||
return [];
|
return [];
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function findBlockedTerm(query) {
|
||||||
|
if (!query) return null;
|
||||||
|
const lowered = query.toLowerCase();
|
||||||
|
const terms = await loadBlockedTerms();
|
||||||
|
return terms.find((term) => lowered.includes(term)) || null;
|
||||||
|
}
|
||||||
|
|
||||||
|
function createBlockedError(term) {
|
||||||
|
const error = new Error('Search blocked by filter');
|
||||||
|
error.code = 'SEARCH_BLOCKED';
|
||||||
|
error.blockedTerm = term;
|
||||||
|
return error;
|
||||||
|
}
|
||||||
|
|
||||||
|
function createProxyUnavailableError(reason) {
|
||||||
|
const error = new Error(reason || 'Proxy network unavailable');
|
||||||
|
error.code = 'SEARCH_PROXY_UNAVAILABLE';
|
||||||
|
return error;
|
||||||
|
}
|
||||||
|
|
||||||
|
function parseProxyList(raw) {
|
||||||
|
if (!raw) return [];
|
||||||
|
return raw
|
||||||
|
.split(/\r?\n/)
|
||||||
|
.map((line) => line.trim())
|
||||||
|
.filter((line) => line && !line.startsWith('#'));
|
||||||
|
}
|
||||||
|
|
||||||
|
function removeProxyFromPool(proxy) {
|
||||||
|
if (!proxy) return;
|
||||||
|
proxyPool = proxyPool.filter((entry) => entry !== proxy);
|
||||||
|
if (!proxyPool.length) {
|
||||||
|
proxyPoolExpires = 0;
|
||||||
|
proxyCursor = 0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function hydrateProxyPool() {
|
||||||
|
if (!config.proxyScrapeEnabled) {
|
||||||
|
proxyPool = [];
|
||||||
|
proxyPoolExpires = 0;
|
||||||
|
proxyCursor = 0;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
const endpoint = config.proxyScrapeEndpoint;
|
||||||
|
const response = await fetch(endpoint, {
|
||||||
|
headers: {
|
||||||
|
Accept: 'text/plain',
|
||||||
|
'User-Agent': 'NovaBot/1.0 (+https://github.com/) ProxyScrape client',
|
||||||
|
},
|
||||||
|
});
|
||||||
|
if (!response.ok) {
|
||||||
|
throw createProxyUnavailableError(`Failed to fetch proxy list (HTTP ${response.status})`);
|
||||||
|
}
|
||||||
|
const text = await response.text();
|
||||||
|
const proxies = parseProxyList(text);
|
||||||
|
if (!proxies.length) {
|
||||||
|
throw createProxyUnavailableError('Proxy list came back empty');
|
||||||
|
}
|
||||||
|
proxyPool = proxies;
|
||||||
|
proxyPoolExpires = Date.now() + (config.proxyScrapeRefreshMs || 10 * 60 * 1000);
|
||||||
|
proxyCursor = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function ensureProxyPool() {
|
||||||
|
if (!config.proxyScrapeEnabled) return;
|
||||||
|
if (proxyPool.length && Date.now() < proxyPoolExpires) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
await hydrateProxyPool();
|
||||||
|
}
|
||||||
|
|
||||||
|
async function getProxyInfo() {
|
||||||
|
await ensureProxyPool();
|
||||||
|
if (!config.proxyScrapeEnabled || !proxyPool.length) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
const proxy = proxyPool[proxyCursor % proxyPool.length];
|
||||||
|
proxyCursor = (proxyCursor + 1) % proxyPool.length;
|
||||||
|
return {
|
||||||
|
proxy,
|
||||||
|
agent: new ProxyAgent(`http://${proxy}`),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
async function fetchDuckDuckGoHtml(url, headers) {
|
||||||
|
const maxAttempts = config.proxyScrapeEnabled
|
||||||
|
? Math.max(1, config.proxyScrapeMaxAttempts || 5)
|
||||||
|
: 1;
|
||||||
|
let lastError = null;
|
||||||
|
|
||||||
|
for (let attempt = 0; attempt < maxAttempts; attempt += 1) {
|
||||||
|
let proxyInfo = null;
|
||||||
|
try {
|
||||||
|
const options = { headers };
|
||||||
|
if (config.proxyScrapeEnabled) {
|
||||||
|
proxyInfo = await getProxyInfo();
|
||||||
|
if (!proxyInfo) {
|
||||||
|
throw createProxyUnavailableError('No proxies available');
|
||||||
|
}
|
||||||
|
options.dispatcher = proxyInfo.agent;
|
||||||
|
}
|
||||||
|
const response = await fetch(url, options);
|
||||||
|
if (!response.ok) {
|
||||||
|
throw new Error(`DuckDuckGo request failed (${response.status})`);
|
||||||
|
}
|
||||||
const html = await response.text();
|
const html = await response.text();
|
||||||
|
return {
|
||||||
|
html,
|
||||||
|
proxy: proxyInfo?.proxy || null,
|
||||||
|
};
|
||||||
|
} catch (error) {
|
||||||
|
lastError = error;
|
||||||
|
if (!config.proxyScrapeEnabled) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
if (proxyInfo?.proxy) {
|
||||||
|
removeProxyFromPool(proxyInfo.proxy);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (config.proxyScrapeEnabled) {
|
||||||
|
throw createProxyUnavailableError(lastError?.message || 'All proxies failed');
|
||||||
|
}
|
||||||
|
throw lastError || new Error('DuckDuckGo fetch failed');
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function searchWeb(query, limit = 3) {
|
||||||
|
if (!query?.trim()) {
|
||||||
|
return { results: [], proxy: null, fromCache: false };
|
||||||
|
}
|
||||||
|
const blockedTerm = await findBlockedTerm(query);
|
||||||
|
if (blockedTerm) {
|
||||||
|
throw createBlockedError(blockedTerm);
|
||||||
|
}
|
||||||
|
const cached = getCache(query);
|
||||||
|
if (cached) {
|
||||||
|
return { results: cached, proxy: 'cache', fromCache: true };
|
||||||
|
}
|
||||||
|
|
||||||
|
const params = new URLSearchParams({ q: query, kl: 'us-en' });
|
||||||
|
const headers = {
|
||||||
|
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0 Safari/537.36',
|
||||||
|
Accept: 'text/html',
|
||||||
|
};
|
||||||
|
|
||||||
|
let html;
|
||||||
|
let proxyLabel = null;
|
||||||
|
try {
|
||||||
|
const { html: fetchedHtml, proxy } = await fetchDuckDuckGoHtml(`https://duckduckgo.com/html/?${params.toString()}`, headers);
|
||||||
|
html = fetchedHtml;
|
||||||
|
proxyLabel = config.proxyScrapeEnabled ? proxy || 'proxy-unknown' : 'direct';
|
||||||
|
} catch (error) {
|
||||||
|
if (error?.code === 'SEARCH_PROXY_UNAVAILABLE') {
|
||||||
|
throw error;
|
||||||
|
}
|
||||||
|
console.warn('[search] DuckDuckGo request failed:', error);
|
||||||
|
return { results: [], proxy: null, fromCache: false };
|
||||||
|
}
|
||||||
const $ = loadHtml(html);
|
const $ = loadHtml(html);
|
||||||
const results = [];
|
const results = [];
|
||||||
|
|
||||||
@@ -68,5 +243,21 @@ export async function searchWeb(query, limit = 3) {
|
|||||||
});
|
});
|
||||||
|
|
||||||
setCache(query, results);
|
setCache(query, results);
|
||||||
return results;
|
return { results, proxy: proxyLabel || (config.proxyScrapeEnabled ? 'proxy-unknown' : 'direct'), fromCache: false };
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function appendSearchLog({ userId, query, results, proxy }) {
|
||||||
|
try {
|
||||||
|
await fs.mkdir(path.dirname(logFile), { recursive: true });
|
||||||
|
const timestamp = new Date().toISOString();
|
||||||
|
const proxyTag = proxy || 'direct';
|
||||||
|
const lines = [
|
||||||
|
`time=${timestamp} user=${userId} proxy=${proxyTag} query=${JSON.stringify(query)}`,
|
||||||
|
...results.map((entry, idx) => ` ${idx + 1}. ${entry.title} :: ${entry.url} :: ${entry.snippet}`),
|
||||||
|
'',
|
||||||
|
];
|
||||||
|
await fs.appendFile(logFile, `${lines.join('\n')}`);
|
||||||
|
} catch (error) {
|
||||||
|
console.warn('[search] failed to append log', error);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
Reference in New Issue
Block a user