Technical Deep Dive

How Karinda Works

Built on Cloudflare's global edge network. Powered by Llama 3.1 and RAG. Runs entirely free on the Cloudflare free tier.

Architecture
The Full Stack

Everything runs on Cloudflare — no external APIs, no hidden costs, no vendor lock-in beyond Cloudflare's own free tier.

karinda.in (your domain) │ ├── Cloudflare Pages ← Admin Dashboard + Widget JS (FREE · unlimited) │ └── Cloudflare Workers ← 3 API workers (FREE · 100K req/day) │ ├── karinda-auth ← Signup / Login / API Keys │ └── D1 (SQLite) ← Business accounts + leads │ ├── karinda-crawler ← Reads SME websites │ ├── Workers AI ← bge-base-en-v1.5 (embeddings) │ └── Vectorize ← Stores 768-dim vectors │ └── karinda-chat ← RAG pipeline + LLM response ├── Workers AI ← bge-base-en-v1.5 (query embed) ├── Vectorize ← Semantic search (topK=5) └── Workers AI ← Llama 3.1 8B (answer gen)
The Crawler

When a business owner enters their website URL, our crawler visits every page — like a very fast human reading an entire book in 2 minutes. It follows internal links, skips non-content files, strips all HTML/CSS/JS, and keeps only clean readable text.

Max pages crawled: 20 per website (enough for any SME)
Text chunking: 500-word chunks with 50-word overlap for context continuity
Technology: Native fetch() in Cloudflare Workers — no external libraries
// Extract clean text from raw HTML function extractText(html) { return html .replace(/<script[\s\S]*?<\/script>/gi, ' ') .replace(/<style[\s\S]*?<\/style>/gi, ' ') .replace(/<[^>]+>/g, ' ') .replace(/\s+/g, ' ').trim(); } // Chunk into 500-word pieces function chunkText(text, size=500, overlap=50) { const words = text.split(' '); const chunks = []; let i = 0; while (i < words.length) { chunks.push(words.slice(i, i+size).join(' ')); i += size - overlap; // overlap for context } return chunks; }
// Embed each chunk with Cloudflare AI const result = await env.AI.run( '@cf/baai/bge-base-en-v1.5', { text: [chunkText] } ); const vector = result.data[0]; // 768 dimensions // Store in Vectorize with business metadata await env.VECTORIZE.upsert([{ id: `${businessId}_${pageNum}_${chunkIdx}`, values: vector, metadata: { business_id: businessId, url: pageUrl, chunk_text: chunkText.slice(0, 1000) } }]);
Embeddings + Vector DB

Each text chunk is converted into a 768-dimensional vector using bge-base-en-v1.5 — a BAAI embedding model running for free on Cloudflare AI. These vectors are stored in Cloudflare Vectorize, a purpose-built vector database.

A vector captures meaning — not just keywords. So when a customer asks "do you deliver near me?" it matches your content that says "we serve Pan-India via courier" even though those words don't overlap.

Free tier: 5 million vectors stored · 30M query dimensions/month — enough for ~500 businesses at MVP stage.
RAG — Retrieval Augmented Generation
RAG is the bridge between your stored knowledge and the AI's answer. It ensures the AI speaks from your actual website content — not hallucinated or generic responses.
1

Question arrives from visitor

"Do you have vegetarian food options?" — embedded into a 768-dim vector using the same bge model

2

Semantic search in Vectorize

Query vector is matched against all stored vectors for this business only (filtered by business_id). Top 5 most semantically similar chunks are returned.

3

Context + question sent to Llama 3.1

The retrieved chunks + conversation history + system prompt are passed to @cf/meta/llama-3.1-8b-instruct. The model generates a grounded, accurate answer.

4

Answer streamed back to visitor

Response is returned from the Cloudflare Worker to the widget on the visitor's browser. Average latency: under 2 seconds.

// RAG pipeline in the chat worker // 1. Embed the user's question const qEmbed = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: [userMessage] }); // 2. Search Vectorize — only this business's data const results = await env.VECTORIZE.query(qEmbed.data[0], { topK: 5, filter: { business_id: businessId }, returnMetadata: 'all' }); // 3. Build context from top matches (score > 0.4) const context = results.matches .filter(m => m.score > 0.4) .map(m => m.metadata.chunk_text) .join('\n\n---\n\n'); // 4. Call Llama 3.1 with grounded prompt const answer = await env.AI.run( '@cf/meta/llama-3.1-8b-instruct', { messages: buildMessages(context, history, userMessage) } );
The Embeddable Widget

A single vanilla JavaScript file (~8KB) that injects a floating chat bubble into any webpage. No React, no bundler, no dependencies. Uses the native browser fetch() API to talk to the chat worker.

✅  Works on any website, any hosting provider
✅  Stores conversation history in localStorage
✅  Customisable color, name, position via data-* attributes
✅  Built-in lead capture — asks name + phone after 3 messages

Embed code — paste before </body>

<script src="https://karinda.in/widget.js" data-business-id="your_id_here" data-primary-color="#007AFF" data-bot-name="Karinda" data-api-url="https://api.karinda.in"> </script> <!-- Widget auto-loads on page ready --> <!-- Chat bubble appears bottom-right --> <!-- No other setup required -->

Supported data-* attributes:

AttributeDefaultDescription
data-business-idrequiredYour unique business ID from dashboard
data-primary-color#007AFFWidget accent color (hex)
data-bot-nameKarindaName shown in chat header
data-api-urlautoOverride chat API endpoint
100% Free Until You Scale
Cloudflare's free tier is genuinely generous. Here's exactly when you'd hit limits.
ServiceFree LimitKarinda Usage @ 50 ClientsKarinda Usage @ 500 ClientsPaid From
Workers Requests100,000 / day~2,500 / day ✅~25,000 / day ✅$5/mo for 10M
Workers AI Neurons10,000 / day~3,000 / day ✅~30,000 / day ⚠️$0.011 / 1K neurons
Vectorize Stored5M vectors~50K vectors ✅~500K vectors ✅$0.05 / 1M vectors
D1 Reads5M / day~5K / day ✅~50K / day ✅$0.001 / 1M reads
Pages HostingUnlimitedNever

⚠️ At 500 clients you may need to upgrade Workers AI — cost would be approximately ₹1,500/month total.

Ready to build?

Create your free account and have a bot live on your website today.

Start Free → API Reference