discrawl 🛰️ — Mirror Discord into SQLite; search server history locally
discrawl mirrors Discord guild data into local SQLite so you can search, inspect, and query server history without depending on Discord search.
It is a bot-token crawler. No user-token hacks. Data stays local.
What It Does
- discovers every guild the configured bot can access
- syncs channels, threads, members, and message history into SQLite
- maintains FTS5 search indexes for fast local text search
- extracts small text-like attachments into the local search index
- records structured user and role mentions for direct querying
- tails Gateway events for live updates, with periodic repair syncs
- exposes read-only SQL for ad hoc analysis
- keeps schema multi-guild ready while preserving a simple single-guild default UX
Search defaults to all guilds. sync and tail default to the configured default guild when one exists, otherwise they fan out to all discovered guilds.
Requirements
- Go
1.26+ - a Discord bot token the bot can use to read the target guilds
- bot permissions for the channels you want archived
Discord Bot Setup
discrawl needs a real bot token. Not a user token.
Minimum practical setup:
- Create or reuse a Discord application in the Discord developer portal.
- Add a bot user to that application.
- Invite the bot to the target guilds.
- Enable these intents for the bot:
Server Members IntentMessage Content Intent
- Ensure the bot can at least:
- view channels
- read message history
Without those intents/permissions, sync, tail, member snapshots, or message content archiving will be partial or fail.
Bot Token Sources
Token resolution:
- OpenClaw config, if
discord.token_sourceis notenv DISCORD_BOT_TOKENor the configureddiscord.token_env
discrawl accepts either raw token text or a value prefixed with Bot . It normalizes that automatically.
Fastest env-only path:
export DISCORD_BOT_TOKEN="your-bot-token"
bin/discrawl doctor
bin/discrawl init
If you keep shell secrets in ~/.profile, add:
export DISCORD_BOT_TOKEN="your-bot-token"
Then reload your shell before running discrawl.
If you already use OpenClaw, discrawl can reuse the Discord token from ~/.openclaw/openclaw.json by default.
Default runtime paths:
- config:
~/.discrawl/config.toml - database:
~/.discrawl/discrawl.db - cache:
~/.discrawl/cache/ - logs:
~/.discrawl/logs/
Install
Build from source:
git clone https://github.com/steipete/discrawl.git
cd discrawl
go build -o bin/discrawl ./cmd/discrawl
./bin/discrawl --version
Homebrew tap:
brew tap steipete/tap
brew install steipete/tap/discrawl
Quick Start
Reuse an existing OpenClaw Discord bot config:
bin/discrawl init --from-openclaw ~/.openclaw/openclaw.json
bin/discrawl doctor
bin/discrawl sync --full
bin/discrawl search "panic: nil pointer"
bin/discrawl tail
Env-only setup:
export DISCORD_BOT_TOKEN="..."
bin/discrawl doctor
bin/discrawl init
bin/discrawl sync --full
init discovers accessible guilds and writes ~/.discrawl/config.toml. If exactly one guild is available, that guild becomes the default automatically.
doctor is the fastest sanity check:
- confirms config can be loaded
- shows where the token was resolved from
- verifies bot auth
- shows how many guilds the bot can access
- verifies DB + FTS wiring
Commands
init
Creates the local config and discovers accessible guilds.
bin/discrawl init
bin/discrawl init --from-openclaw ~/.openclaw/openclaw.json
bin/discrawl init --guild 123456789012345678
bin/discrawl init --db ~/data/discrawl.db
sync
Backfills guild state into SQLite.
bin/discrawl sync --full
bin/discrawl sync --guild 123456789012345678
bin/discrawl sync --guilds 123,456 --concurrency 8
bin/discrawl sync --channels 111,222 --since 2026-03-01T00:00:00Z
sync already uses parallel channel workers. --concurrency overrides the default, and the default is auto-sized from GOMAXPROCS with a floor of 8 and a cap of 32.
When --channels includes a forum channel id, discrawl expands that forum's threads and syncs their messages as part of the targeted run.
tail
Runs the live Gateway tail and periodic repair loop.
bin/discrawl tail
bin/discrawl tail --guild 123456789012345678
bin/discrawl tail --repair-every 30m
search
Runs FTS search over archived messages.
bin/discrawl search "panic: nil pointer"
bin/discrawl search --guild 123456789012345678 "payment failed"
bin/discrawl search --channel billing --author steipete --limit 50 "invoice"
bin/discrawl search --include-empty "GitHub"
bin/discrawl --json search "websocket closed"
By default, search skips rows with no searchable content. Attachment text, attachment filenames, embeds, and replies still count as content. Use --include-empty to opt back in.
messages
Lists exact message slices by channel, author, and time range.
bin/discrawl messages --channel maintainers --days 7 --all
bin/discrawl messages --channel maintainers --hours 6 --all
bin/discrawl messages --channel "#maintainers" --since 2026-03-01T00:00:00Z
bin/discrawl messages --channel 1456744319972282449 --author steipete --limit 50
bin/discrawl messages --channel maintainers --last 100 --sync
bin/discrawl messages --channel maintainers --days 7 --all --include-empty
bin/discrawl --json messages --channel maintainers --days 3
Notes:
--channelaccepts a channel id, exact name,#name, or partial name match--hoursis shorthand for "since now minus N hours"--daysis shorthand for "since now minus N days"--lastreturns the newestNmatching messages, then prints them oldest-to-newest--allremoves the safety limit; default is200--syncruns a blocking pre-query sync for the matching channel or guild scope before reading the local DB- rows with no displayable/searchable content are skipped by default;
--include-emptyopts back in - at least one filter is required
mentions
Lists structured user and role mentions.
bin/discrawl mentions --channel maintainers --days 7
bin/discrawl mentions --target steipete --type user --limit 50
bin/discrawl mentions --target 1456406468898197625
bin/discrawl --json mentions --type role --days 1
Notes:
--targetaccepts an id, exact name, or partial name match--typecan beuserorrole- same guild/time filters as
messages
sql
Runs read-only SQL against the local database.
bin/discrawl sql 'select count(*) as messages from messages'
echo 'select guild_id, count(*) from messages group by guild_id' | bin/discrawl sql -
members
bin/discrawl members list
bin/discrawl members show 123456789012345678
bin/discrawl members show --messages 10 steipete
bin/discrawl members search "peter"
bin/discrawl members search "github"
Notes:
searchmatches names plus any offline profile fields present in the archived member payloadshowaccepts a user id or query; if it resolves to one member, it also shows recent messages
channels
bin/discrawl channels list
bin/discrawl channels show 123456789012345678
status
Shows local archive status.
bin/discrawl status
doctor
Checks config, auth, DB, and FTS wiring.
bin/discrawl doctor
Configuration
init writes a complete config, so most users should not hand-edit anything initially.
Typical config shape:
version = 1
default_guild_id = ""
guild_ids = []
db_path = "~/.discrawl/discrawl.db"
cache_dir = "~/.discrawl/cache"
log_dir = "~/.discrawl/logs"
[discord]
token_source = "openclaw"
openclaw_config = "~/.openclaw/openclaw.json"
account = "default"
token_env = "DISCORD_BOT_TOKEN"
[sync]
concurrency = 16
repair_every = "6h"
full_history = true
attachment_text = true
[search]
default_mode = "fts"
[search.embeddings]
enabled = false
provider = "openai"
model = "text-embedding-3-small"
api_key_env = "OPENAI_API_KEY"
batch_size = 64
The value above is an example. init writes an auto-sized default based on the host: min(32, max(8, GOMAXPROCS*2)).
Config override rules:
--configbeats everythingDISCRAWL_CONFIGoverrides the default config pathdiscord.token_source = "env"forces env-only token lookup
Embeddings
Embeddings are optional. FTS is the default search path and the primary verification target.
If enabled, embeddings are intended to enrich recall in background batches, not block the hot sync path.
export OPENAI_API_KEY="..."
bin/discrawl init --with-embeddings
bin/discrawl sync --with-embeddings
Data Stored Locally
- guild metadata
- channels and threads in one table
- current member snapshot
- canonical message rows
- append-only message event records
- FTS index rows
- optional embedding backlog metadata
Attachment binaries are not stored in SQLite.
Set sync.attachment_text = false if you want to keep attachment metadata and filenames but disable attachment body fetches for text indexing.
Security
- do not commit bot tokens or API keys
- default config lives in your home directory, not inside the repo
- CI runs secret scanning with
gitleaks doctorreports token source, not token contents
Development
Local gate:
go run github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.11.1 run
go test ./... -coverprofile=/tmp/discrawl.cover
go tool cover -func=/tmp/discrawl.cover | tail -n 1
go build ./cmd/discrawl
Target coverage is >= 80%.
CI runs:
golangci-lintgo testwith coverage threshold enforcementgo build ./cmd/discrawlgitleaksagainst git history and the working tree
Notes
- the schema is multi-guild ready even when the common UX stays single-guild simple
- threads are stored as channels because that matches the Discord model
- archived threads are part of the sync surface
- live sync is resumable; large guilds still take time because Discord rate limits history backfill
License
MIT. See LICENSE.