Built over 30 crypto platforms across 12 countries Every time a client comes to us asking for an "AI trading bot", the first question we ask is: what kind of AI? Because the answer determines whether the system will actually work — or just look impressive in a pitch deck.
Crypto markets combine properties that defeat most conventional approaches simultaneously. They operate 24/7 with no circuit breakers. They react violently to unstructured signals — a regulatory tweet, a whale wallet movement, a macroeconomic print — that pure numerical models simply cannot see. And they switch market regimes without warning: the strategy that generates 60% accuracy in a trending market generates 41% accuracy in a sideways one. Apply the wrong model to the wrong regime and you lose money with high confidence.
The three approaches clients typically arrive with each have a structural blind spot:
The architecture we designed solves all three weaknesses at once — not by picking the best of the three, but by combining them so each compensates for the others' limitations.
A client approached us with a clear goal: build a decision-support platform that generates BTC and ETH long/short signals with full reasoning transparency, tracks every decision's outcome, and gets measurably better over time. Not an auto-trader — a research-grade signal engine for internal use by a trading team.
The constraint was equally clear: deliver a working POC in 4–6 weeks, operate entirely in paper-trading mode, and provide honest accuracy metrics validated against real historical data — not the inflated numbers you see in most backtest reports.
This is that system. Below is the architecture as we designed and built it — layer by layer, decision by decision.
The data layer is where most AI trading projects quietly fail. Teams reach for MongoDB or Redis because they're familiar, then spend months wrestling with time-series queries that SQL handles natively. We made a different choice early, and it saved weeks of engineering time downstream.
A single PostgreSQL 16 instance with two extensions covers every data workload without operational fragmentation:
| Extension | Role | Key Capability |
| TimescaleDB | Time-series workloads | Hypertables with automatic time partitioning; continuous aggregates for pre-computed rollups; 10–20× disk compression on historical data; 1M+ inserts/sec; sub-millisecond aggregation on years of OHLCV data |
| pgvector | Semantic / similarity search | Historical pattern retrieval (find past market situations similar to current); agent memory (retrieve past decisions and outcomes before generating a new signal); news deduplication and clustering |
The case against NoSQL here is straightforward: market data is fundamentally relational and time-ordered. The operations you need most — time-bucketed aggregations, multi-table JOINs on timestamp, window functions for technical indicators — are SQL-native. TimescaleDB outperforms document-oriented stores on this exact workload by a wide margin. NoSQL wins on schema-flexible documents and eventual consistency, neither of which applies here.
The case for pgvector is subtler. LLM agents have no built-in memory between API calls. Without vector search, every signal decision is made from scratch. With it, the Synthesizer agent can query: "Has this market configuration appeared before? What happened next?" — turning the entire historical dataset into queryable agent memory. That capability is impossible with time-series storage alone.
Data sources feeding the layer in real time:
| Data Type | Source | Feed |
| Price data (OHLCV) | Binance + Bybit | WebSocket API, 1-hour candles, free tier |
| On-chain metrics | Glassnode + CryptoQuant | Exchange flows, whale transactions, stablecoin supply changes, funding rates |
| Social signals | LunarCrush + Santiment | Sentiment scores, social volume, engagement metrics |
| News | CryptoPanic | Categorized crypto news with importance ranking |
| Macro data | Yahoo Finance | DXY, S&P 500, gold, VIX for risk-on/risk-off context |
Understanding how crypto trading platforms handle data at scale is essential before designing any signal system on top of exchange infrastructure — the same principles of partitioning, replication, and latency management apply.
A single LLM agent trying to do everything — technical analysis, sentiment, on-chain, macro, news — produces mediocre outputs in all domains. The architecture that works is specialization: six agents, each with a narrow domain, each returning a structured confidence score with explicit reasoning.
| Agent | Domain | Input | Note |
| Technical | Price action | Pre-computed RSI, MACD, Bollinger Bands, EMAs, volume profiles | Indicators calculated in Python via pandas-ta — not by the LLM — for precision and speed |
| Sentiment | Social signals | Top crypto Twitter accounts, Reddit hot threads | Detects extreme sentiment states as contrarian signals |
| On-Chain | Institutional behavior | Exchange netflows, whale movements, stablecoin supply, funding rates, open interest changes | Best signal of smart money positioning |
| News | Event classification | Regulatory decisions, hacks, ETF news, macro announcements | Uses pgvector to deduplicate against recent stories before scoring |
| Macro | Risk environment | Dollar strength, equity correlation, volatility regime | Critical for filtering signals during macro stress events |
| Synthesizer | Final signal | All five agent outputs + ML predictions + similar historical situations from vector store + current accuracy weights per agent | Issues the weighted long/short signal with full reasoning chain |
The Synthesizer is where the architecture becomes more than the sum of its parts. It doesn't average the agents — it weights them dynamically based on their demonstrated accuracy in the current market regime. An agent that performs well in trending markets gets downweighted automatically when the Regime Classifier determines we've entered a sideways phase. This is the mechanism that prevents the system from applying trending-market strategies to a ranging market.
For teams evaluating the cost side of this architecture, the breakdown of AI agent development costs in 2026 covers how compute, API calls, and orchestration layer expenses scale with the number of specialized agents.
LLM agents reason about context. ML models learn from numbers. The system needs both — and they train and infer independently before the Synthesizer combines their outputs.
Predicts price direction over the next 24 hours: up, down, or neutral. Trained on 40–60 engineered features across five categories:
| Feature Category | Examples |
| Technical | Multi-timeframe returns, RSI, MACD, Bollinger position, ATR volatility, distance from EMA |
| On-chain | Exchange netflow (24h & 7d), whale transaction count, stablecoin supply change, funding rates, open interest delta |
| Sentiment (numerical) | Fear & Greed Index, social volume, sentiment score, Twitter engagement rate |
| Macro | DXY return, S&P 500 return, gold return, VIX level, BTC dominance |
| Cross-asset | BTC/ETH correlation, BTC–S&P 500 30-day correlation |
Training methodology: walk-forward validation on 2–3 years of historical data. This is a non-negotiable choice. Random train/test split creates look-ahead bias and produces artificially high accuracy numbers that evaporate in live trading. Walk-forward validation honestly simulates real deployment: the model is trained on past data, tested on the next period it has never seen, then the window advances. The resulting accuracy numbers are lower — and they are real.
Classifies the current market state into one of four regimes: trending up, trending down, ranging, or high volatility. This model is the routing layer for the entire system. Sentiment-following strategies work in trends. Mean-reversion works in ranges. Without regime awareness, the system applies the wrong strategy at the wrong time — which is exactly how most single-model systems lose money.
Both models retrain weekly on new data. The learning loop is scheduled, not manual.
The vector memory layer is what separates this architecture from systems that "use AI" from systems that actually learn from history. Every use case runs on the same pgvector extension — no separate vector database, no additional infrastructure.
Use Case 1: Historical Pattern Search. Every 4 hours, the current market situation is encoded as a feature vector and embedded. Before generating a new signal, the Synthesizer queries the vector store for the top-N most similar historical market configurations and retrieves what happened afterward. This grounds every decision in concrete historical precedent rather than abstract model inference.
Use Case 2: Agent Memory. Every signal — with its full context, agent reasoning, and final outcome — is embedded and stored. Before new decisions, the Synthesizer retrieves the most contextually similar past decisions and their results. This gives the LLM system the long-term memory it lacks by default: the ability to say "the last three times we saw this configuration, two of the signals were correct and one was wrong — here is why."
Use Case 3: News Deduplication and Clustering. Incoming news is embedded and clustered against recent stories. Duplicates and rewrites are filtered. Genuinely novel stories are flagged for elevated attention. This prevents a single event from being counted multiple times in sentiment scoring — a surprisingly common failure mode in news-driven signal systems.
The learning loop is the component that makes the system measurably better over time rather than degrading silently as market conditions change. It operates on three timescales:
| Frequency | Action |
| Hourly | Full end-to-end pipeline runs. All six agents and both ML models produce outputs. Synthesizer generates the final signal. Everything logged to PostgreSQL with timestamp and current price. |
| Daily | Evaluator checks signals from 24, 48, and 72 hours ago against actual price movement. Marks each as correct, incorrect, or neutral. Updates per-agent, per-regime accuracy statistics. |
| Weekly | ML models retrain on new data. Agent accuracy weights recalculated by market regime. Synthesizer weighting logic updates automatically. New baseline metrics published to dashboard. |
The practical result: the system always knows that, for example, the Sentiment Agent achieves 67% accuracy in trending markets but only 41% in sideways markets — and automatically reduces its influence when the Regime Classifier reports a ranging environment. This is the difference between a system that uses AI as a feature and a system that genuinely adapts.
| Component | Technology | Role |
| Backend | Python 3.11, FastAPI, Celery | Core API, background jobs, task queue |
| Database | PostgreSQL 16 + TimescaleDB + pgvector | Time-series storage + semantic vector search |
| LLM Provider | Anthropic Claude API (Sonnet for analytical agents, Haiku for lightweight classification) | All six specialized agents |
| Embeddings | OpenAI text-embedding-3-small or sentence-transformers (self-hosted) | Vector generation for pgvector |
| ML Framework | scikit-learn, XGBoost, pandas, pandas-ta | Direction Predictor, Regime Classifier, technical indicators |
| Orchestration | n8n | Pipeline scheduling and coordination |
| Frontend | Next.js 15, React 19, shadcn/ui, Recharts | Signal dashboard with drill-down reasoning |
| Notifications | Telegram Bot API | Real-time signal delivery with reasoning to private channel |
| Hosting (POC) | Hetzner VPS or Railway | Documented migration path to production included |
| Monitoring | Grafana + Sentry | System health monitoring + error tracking |
For teams exploring how LLM infrastructure is designed and priced at the component level, our LLM development practice covers model selection, prompt engineering pipelines, and production deployment patterns.
The 4–6 week timeline is achievable because the architecture is defined upfront and the stack is one we work with daily. Each week produces testable deliverables — not just code commits.
| Week | Milestone | Deliverables |
| 1 | Data Layer | PostgreSQL with TimescaleDB + pgvector deployed. All API integrations live. 2-year historical backfill complete. Feature engineering pipeline operational. |
| 2 | ML Models + Vector Memory | Direction Predictor and Regime Classifier trained with walk-forward validation. Historical situations embedded into vector store. Baseline accuracy report ready. |
| 3 | LLM Agents | All six agents implemented and tested on historical scenarios. Synthesizer integrates ML outputs, vector retrievals, and agent outputs. Prompt library versioned. |
| 4 | Learning Loop + Dashboard | Hourly pipeline running live. Evaluator and weekly retraining jobs scheduled. Web dashboard and Telegram bot operational. Paper-trading P&L simulation active. |
| 5–6 | Hardening + Handover | Live observation refinements. Performance reporting. Production deployment. Knowledge transfer. Final stakeholder presentation. |
Weeks 5–6 are a flexible buffer. If the system stabilizes in week 4, they become optimization and polish time. If integration issues arise — data provider outages, API changes — the buffer absorbs them without breaking the core milestone schedule.
For reference on what end-to-end AI system development looks like from a process and cost standpoint, our guide on how to create an AI app breaks down architecture decisions, team composition, and realistic pricing across complexity tiers.
The development investment is one-time. The recurring costs to keep the system running are predictable and modest:
| Item | Type | Estimated Monthly (USD) |
| Claude API (Sonnet + Haiku) | Recurring | $50–100 |
| Embedding API (OpenAI) or self-hosted | Recurring | $20–50 |
| VPS + database infrastructure | Recurring | $50–100 |
| Data APIs (Glassnode, LunarCrush paid tiers) | Recurring | ~$150 |
| News, Telegram, macro data | Recurring | Free tier sufficient |
| Total recurring | ~$270–400/month |
One note on data costs: approximately 25–30% of ongoing maintenance effort is not ML logic but data resilience — on-chain data providers change methodologies, exchanges have outages, sentiment APIs adjust their scoring models. This is real operational overhead and should be budgeted for explicitly.
At the end of 4–6 weeks, the client receives a fully operational system with the following artifacts:
Full decision history in PostgreSQL — every agent output, every model prediction, every historical pattern retrieved, every final signal, every outcome evaluation. Complete audit trail.
Web dashboard with live signal feed, per-agent accuracy metrics by regime, paper-trading P&L simulation, drill-down reasoning for each decision, and visualization of matched historical situations.
Telegram bot delivering signals with full reasoning to a private stakeholder channel — for teams interested in the notification architecture, our guide on building a Telegram trading bot covers webhook setup, message formatting, and delivery reliability patterns.
Walk-forward backtest report on 2–3 years of historical data — no look-ahead bias, no survivorship bias.
Production migration roadmap covering auto-trading integration, asset expansion, mobile app, and regulatory considerations.
Complete technical documentation including architecture diagrams, API contracts, deployment instructions, and operational runbooks.
We include this section in every proposal we write. A client who understands the limitations is a client who makes better decisions about how to use the system — and doesn't attribute normal market uncertainty to system failure.
Accuracy expectations. A realistic target for 24-hour directional prediction under walk-forward validation is 54–58%. Backtests showing 70%+ almost always contain look-ahead bias, survivorship bias, or overfitting. Walk-forward numbers are lower — and they are honest. Any vendor quoting you 75%+ directional accuracy on crypto has a methodology problem.
Edge decay. Crypto markets are an adversarial environment dominated by algorithmic participants. Any edge the system finds will be partially arbitraged by other players over time. The adaptive learning loop compensates through continuous retraining, but no edge is permanent. This is not a flaw in the system — it is the nature of competitive markets.
POC does not guarantee profit. A successful POC delivers a working system with measurable accuracy — not guaranteed returns. The system should run in paper-trading mode for at least 2–3 months after delivery before real capital is considered, and validated across at least one full market cycle.
Scope boundary. This architecture is designed for swing trading on a 4-hour timeframe for BTC and ETH. It is not designed for high-frequency trading (HFT requires sub-millisecond latency that LLM API calls cannot provide) and not for low-liquidity altcoins where signal quality degrades rapidly.
Regulatory exposure. If the system is later offered as a service to third parties, it may constitute regulated financial advice in many jurisdictions. The POC is designed for internal stakeholder use. Regulatory analysis is required before any commercial deployment.
Teams considering whether to build a custom system or start with an existing foundation can compare approaches in our overview of AI trading bot development options, which covers architecture patterns, build vs. buy decisions, and cost structures across different complexity levels.
Under walk-forward validation — which is the only methodology that honestly simulates real deployment — a well-built hybrid system achieves 54–58% directional accuracy on a 24-hour BTC/ETH forecast. Backtests showing 70%+ almost always contain look-ahead bias or overfitting. Lower numbers from honest validation are more useful than inflated numbers from flawed methodology.
Pure ML models cannot process unstructured signals like regulatory news, sentiment shifts, or whale wallet movements. Pure LLM systems have no memory between calls and no mechanism to learn from outcomes over time. Pure rule-based systems don't adapt when market regimes change. The hybrid architecture assigns each component to the workload it handles best, with each compensating for the others' blind spots.
A fully operational POC — including data infrastructure, ML models, six specialized LLM agents, vector memory, adaptive learning loop, web dashboard, and Telegram delivery — takes 4–6 weeks. The timeline is achievable because the architecture is defined upfront and the stack is proven. After delivery, we recommend 2–3 months of paper-trading validation before any live capital allocation.
Market data is fundamentally relational and time-ordered. The primary operations — time-bucketed aggregations, multi-table JOINs on timestamps, window functions for technical indicators — are SQL-native. Adding a separate vector database introduces operational complexity with no performance benefit over pgvector for this workload. One PostgreSQL 16 instance with TimescaleDB and pgvector covers all data needs without fragmentation.
Recurring infrastructure costs run approximately $270–400 per month: LLM API usage ($50–100), embeddings ($20–50), VPS and database ($50–100), and data API subscriptions including Glassnode and LunarCrush paid tiers (~$150). News, Telegram, and macro data APIs are covered under free tiers. Note that 25–30% of maintenance effort is data resilience work — handling provider outages, API changes, and methodology shifts — rather than core ML logic.
The POC is designed as a decision-support platform, not an auto-trader. Every signal includes full reasoning transparency and is logged for outcome evaluation. The production migration roadmap covers auto-trading integration as an explicit next step — but this requires separate validation over at least one full market cycle in paper-trading mode before any live execution is connected.