Why I Built a Recommendation Engine Without GPT — And It's Better

▲Next.js ⚛️React 🔷TypeScript 🎨Tailwind CSS ⚡Vite 🚀FastAPI 🐍Python 🐘PostgreSQL 🐳Docker 🗺️Leaflet

The Problem

Coffee recommendation systems typically either use basic filtering (too simple) or LLM-powered suggestions (expensive, unpredictable latency, non-deterministic). Coffee Sommelier needed recommendations that feel personalized while remaining fast, predictable, and cost-free to operate at scale.

The Approach

Built a deterministic scoring engine using weighted cosine similarity between user preference vectors and product feature vectors. User vectors encode preferences for roast level, origin region, brew method, and flavor notes. Product vectors are pre-computed from cafe menu data. To prevent recommendation homogeneity, implemented Maximal Marginal Relevance (MMR) diversification — each successive recommendation is penalized for similarity to already-selected items. The scoring weights are configurable via the admin dashboard, allowing business operators to tune the recommendation behavior. Geolocation filtering uses the haversine formula to pre-filter cafes within a configurable radius before scoring.

The Outcome

Zero LLM costs with sub-50ms recommendation latency. The MMR diversification ensures users see variety rather than a cluster of similar cafes. Configurable weights mean the business can A/B test different scoring strategies without code changes. The multi-frontend architecture (consumer, admin, widget, B2B) allows the recommendation engine to serve different contexts through a single API.

Key Highlights

<50ms latency, $0 LLM cost

Cosine similarity + MMR diversification

Haversine geo-filtering

Admin-configurable scoring weights

Multi-frontend: consumer, admin, widget, B2B

PreviousDesigning an AI Detection Engine That Doesn't Need a Data Scientist

NextZero-Downtime AI: Building a Sentiment Pipeline That Never Fails