How It Works — Karinda AI

Architecture

The Full Stack

Everything runs on Cloudflare — no external APIs, no hidden costs, no vendor lock-in beyond Cloudflare's own free tier.

karinda.in (your domain) │ ├── Cloudflare Pages ← Admin Dashboard + Widget JS (FREE · unlimited) │ └── Cloudflare Workers ← 3 API workers (FREE · 100K req/day) │ ├── karinda-auth ← Signup / Login / API Keys │ └── D1 (SQLite) ← Business accounts + leads │ ├── karinda-crawler ← Reads SME websites │ ├── Workers AI ← bge-base-en-v1.5 (embeddings) │ └── Vectorize ← Stores 768-dim vectors │ └── karinda-chat ← RAG pipeline + LLM response ├── Workers AI ← bge-base-en-v1.5 (query embed) ├── Vectorize ← Semantic search (topK=5) └── Workers AI ← Llama 3.1 8B (answer gen)

Component 1

The Crawler

When a business owner enters their website URL, our crawler visits every page — like a very fast human reading an entire book in 2 minutes. It follows internal links, skips non-content files, strips all HTML/CSS/JS, and keeps only clean readable text.

Max pages crawled: 20 per website (enough for any SME)

Text chunking: 500-word chunks with 50-word overlap for context continuity

Technology: Native fetch() in Cloudflare Workers — no external libraries

        
// Extract clean text from raw HTML
function extractText(html) {
  return html
    .replace(/<script[\s\S]*?<\/script>/gi, ' ')
    .replace(/<style[\s\S]*?<\/style>/gi, ' ')
    .replace(/<[^>]+>/g, ' ')
    .replace(/\s+/g, ' ').trim();
}

// Chunk into 500-word pieces
function chunkText(text, size=500, overlap=50) {
  const words = text.split(' ');
  const chunks = [];
  let i = 0;
  while (i < words.length) {
    chunks.push(words.slice(i, i+size).join(' '));
    i += size - overlap; // overlap for context
  }
  return chunks;
}
      

        // Embed each chunk with Cloudflare AI
const result = await env.AI.run(
  '@cf/baai/bge-base-en-v1.5',
  { text: [chunkText] }
);
const vector = result.data[0]; // 768 dimensions

// Store in Vectorize with business metadata
await env.VECTORIZE.upsert([{
  id: `${businessId}_${pageNum}_${chunkIdx}`,
  values: vector,
  metadata: {
    business_id: businessId,
    url: pageUrl,
    chunk_text: chunkText.slice(0, 1000)
  }
}]);
      

Component 2

Embeddings + Vector DB

Each text chunk is converted into a 768-dimensional vector using bge-base-en-v1.5 — a BAAI embedding model running for free on Cloudflare AI. These vectors are stored in Cloudflare Vectorize, a purpose-built vector database.

A vector captures meaning — not just keywords. So when a customer asks "do you deliver near me?" it matches your content that says "we serve Pan-India via courier" even though those words don't overlap.

        Free tier: 5 million vectors stored · 30M query dimensions/month — enough for ~500 businesses at MVP stage.
      

Component 3

RAG — Retrieval Augmented Generation

RAG is the bridge between your stored knowledge and the AI's answer. It ensures the AI speaks from your actual website content — not hallucinated or generic responses.

Question arrives from visitor

"Do you have vegetarian food options?" — embedded into a 768-dim vector using the same bge model

Semantic search in Vectorize

Query vector is matched against all stored vectors for this business only (filtered by business_id). Top 5 most semantically similar chunks are returned.

Context + question sent to Llama 3.1

The retrieved chunks + conversation history + system prompt are passed to @cf/meta/llama-3.1-8b-instruct. The model generates a grounded, accurate answer.

Answer streamed back to visitor

Response is returned from the Cloudflare Worker to the widget on the visitor's browser. Average latency: under 2 seconds.

      // RAG pipeline in the chat worker

// 1. Embed the user's question
const qEmbed = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: [userMessage]
});

// 2. Search Vectorize — only this business's data
const results = await env.VECTORIZE.query(qEmbed.data[0], {
  topK: 5,
  filter: { business_id: businessId },
  returnMetadata: 'all'
});

// 3. Build context from top matches (score > 0.4)
const context = results.matches
  .filter(m => m.score > 0.4)
  .map(m => m.metadata.chunk_text)
  .join('\n\n---\n\n');

// 4. Call Llama 3.1 with grounded prompt
const answer = await env.AI.run(
  '@cf/meta/llama-3.1-8b-instruct',
  { messages: buildMessages(context, history, userMessage) }
);
    

Component 4

The Embeddable Widget

A single vanilla JavaScript file (~8KB) that injects a floating chat bubble into any webpage. No React, no bundler, no dependencies. Uses the native browser fetch() API to talk to the chat worker.

✅ Works on any website, any hosting provider

✅ Stores conversation history in localStorage

✅ Customisable color, name, position via data-* attributes

✅ Built-in lead capture — asks name + phone after 3 messages

Embed code — paste before </body>

        
<script
  src="https://karinda.in/widget.js"
  data-business-id="your_id_here"
  data-primary-color="#007AFF"
  data-bot-name="Karinda"
  data-api-url="https://api.karinda.in">
</script>

<!-- Widget auto-loads on page ready -->
<!-- Chat bubble appears bottom-right -->
<!-- No other setup required         -->
      

Supported data-* attributes:

Attribute	Default	Description
`data-business-id`	required	Your unique business ID from dashboard
`data-primary-color`	#007AFF	Widget accent color (hex)
`data-bot-name`	Karinda	Name shown in chat header
`data-api-url`	auto	Override chat API endpoint

Cost Breakdown

100% Free Until You Scale

Cloudflare's free tier is genuinely generous. Here's exactly when you'd hit limits.

Service	Free Limit	Karinda Usage @ 50 Clients	Karinda Usage @ 500 Clients	Paid From
Workers Requests	100,000 / day	~2,500 / day ✅	~25,000 / day ✅	$5/mo for 10M
Workers AI Neurons	10,000 / day	~3,000 / day ✅	~30,000 / day ⚠️	$0.011 / 1K neurons
Vectorize Stored	5M vectors	~50K vectors ✅	~500K vectors ✅	$0.05 / 1M vectors
D1 Reads	5M / day	~5K / day ✅	~50K / day ✅	$0.001 / 1M reads
Pages Hosting	Unlimited	✅	✅	Never

⚠️ At 500 clients you may need to upgrade Workers AI — cost would be approximately ₹1,500/month total.

How Karinda Works

Question arrives from visitor

Semantic search in Vectorize

Context + question sent to Llama 3.1

Answer streamed back to visitor

Ready to build?