April 7, 2025

Turning eCommerce websites into working chatbots

We worked on an agentic RAG chatbot platform for eCommerce, with website-driven knowledge ingestion, hybrid search on Qdrant, and a real-time inbox for human handoff.

Goals

Doina is a chatbot platform for eCommerce store owners. A store owner signs up, drops in their storefront URL, and minutes later the chatbot is live on the site, already knowing the catalog. For a store with under a hundred products, the whole flow takes five to ten minutes. No spreadsheets to export, no documents to upload, nothing for the owner to prepare upfront.

Doina had to handle the conversations store teams have with shoppers in practice. Product questions, recommendations, order status lookups for WordPress stores, store policies. In pilots it generated coupon codes too. And it had to behave less like a bot and more like a teammate, with the owner watching from the inbox and stepping in whenever something got tricky.

Challenges

The first problem was ingestion. eCommerce sites come in every shape. Different CMSs, different page structures, different ways of laying out a product. For Doina to deliver on the minutes-to-live promise, the platform had to handle whatever a store happened to be running, with no per-client setup work. A growing share of those sites also render their catalog client-side, so a naive scraper opens the page and sees nothing.

Then there was the question of staying current. A store adds products on Monday and pulls others on Wednesday. If the chatbot is still answering Friday's questions against Sunday's catalog, the whole pitch falls apart. Knowledge had to refresh on its own, often enough that owners never had to think about it.

The conversations themselves had their own bar to clear. Returning a wall of text when the right answer was a specific product felt like a step backwards from a normal product page. The chat had to render products as cards, format links cleanly, and present information in a way that fit alongside the rest of the storefront.

The last piece was the human in the loop. Doina was meant to feel like a teammate, which only works if a real teammate can show up. Owners had to be able to watch a conversation in progress, take over mid-message, and pick up the thread without the shopper noticing a handoff or the bot losing context.

A multi-tenant foundation on Supabase

The platform layer is what makes Doina a single product rather than a series of one-off chatbots. It carries every tenant, isolates them from each other, and gives store owners a single place to do their work. This section covers the foundation Doina sits on, the surfaces owners interact with, and the onboarding flow that turns a fresh sign-up into a deployedwidget.

Doina runs on Supabase, Postgres for data, Supabase Auth for sign-in and team roles, WebSockets keep the owner inbox in sync as conversations happen and RabbitMQ handles the background work, since ingestion, syncs, and re-indexing all run asynchronously and shouldn't block the admin.

The admin is where store owners spend their time. Knowledge holds everything the chatbot can draw on, split into Products, Documents, and Q&A. Instructions is where owners shape how the chatbot speaks and what it should and shouldn't do. Aspect controls how the widget looks on the storefront. Actions covers what the chatbot is allowed to do beyond answering questions. Integrations connects the store, Inbox shows live conversations and history, and Analytics reports on what's been happening.

Customizer view inside Doina client panel

Setup is a short multi-step flow rather than a settings page. A new owner enters their website URL, sees a preview of the chat window already styled around their site, and picks a trial plan. By the time they finish those steps, ingestion is already running in the background and the widget is ready to embed. Five to ten minutes for a store with under a hundred products, depending mostly on how long the crawl takes.

Knowledge ingestion, from raw websites to structured products

Knowledge ingestion is what makes the minutes-to-live promise possible. The platform has to take whatever a store has on the public web and turn it into something a chatbot can reason about, without asking the owner to prepare anything in advance. This section covers how Doina crawls a storefront, how it pulls structured catalog data from WordPress, what owners can add by hand, and how everything stays current.

Crawling websites

The crawler works two ways depending on what a site exposes. If there's a sitemap, it follows that. If there isn't, it crawls recursively from the homepage. Most stores end up using a mix, since sitemaps tend to cover the catalog cleanly but miss things like policy pages or seasonal landing pages.

Playwright is the default rather than a fallback. A growing share of modern eCommerce sites render their catalog client-side, and a plain HTTP fetch on those returns a page with no products on it. Running a real headless browser everywhere is more expensive, but it's the only way to deliver on a promise that works for any storefront.

Once a page is loaded, it gets extracted as Markdown, which is the format the rest of the pipeline works with end to end. A small language model then runs a usefulness pass over the extracted content to drop obvious noise like cart fragments, account widgets, and footer boilerplate before anything reaches the index.

Product pages are handled separately from content pages. They go into their own index with their own retrieval path, since the way a chatbot reasons about a product (price, variants, availability, attributes) is different from how it reasons about a return policy. Owners see both kinds of ingested items in the admin and can remove anything they don't want included.

Direct integrations

For stores running WooCommerce, Doina skips the scrape and pulls catalog data directly through the WordPress REST API. Connecting is one button click from the Integrations panel. There's no API key to generate, no plugin to install on the WordPress side, no developer required. Integration guides exist for Shopify and custom websites as well, and a dedicated plugin handles the connection for PrestaShop.

The WordPress integration covers more than the catalog. It can look up an order by customer email and order number, which is what powers order status questions in the chat. It also reads metadata that other plugins have attached to orders, so if a shipping plugin has written the delivery address or carrier tracking onto an order, the chatbot can read those back to the shopper without any extra setup.

Manual uploads and curated QA

Some of what a chatbot needs to know isn't on the website. Internal policies, supplier-specific shipping rules, recently changed terms that haven't made it to the public site yet. The system accepts PDF uploads for these, including PDFs with tables, which get parsed structurally rather than flattened into a paragraph of numbers.

Owners can also write Q&A pairs directly. These are for cases where the owner wants the chatbot to give a specific answer in a specific situation, with no ambiguity. Q&A entries are the first thing the chatbot checks. If a curated answer applies, that's what the shopper gets; otherwise the chatbot moves on to product or content search.

The admin shows everything the AI has ingested for a store, grouped by source. Owners can browse what's been picked up and remove anything that shouldn't be there. The crawler runs again on a weekly schedule to catch new products and pages, and owners on higher plans can trigger an on-demand sync whenever they need to push a change through faster.

Agentic retrieval and prompting library

Once knowledge is in, the question is what to do with it. Doina runs as an agent rather than a one-shot retrieval pipeline, deciding for itself which tool to reach for and when to hand off to a human. This section covers how the chatbot reasons through a conversation, the hybrid search underneath it, and the prompting library that ties it all together.

The system is agentic, not a retrieval-then-generate pipeline. On every turn the conversational model decides what to do next: answer from what it already has, call a tool to find more, or flag the conversation for human handoff. The tools available to it are deliberately small in number. Q&A lookup, content search, product search, order lookup for WordPress stores, coupon generation in pilot deployments, and a handoff that surfaces the conversation in the owner's inbox.

The search tools query Qdrant with a hybrid of dense vector retrieval and BM25 lexical scoring, with separate indexes for products and content so the agent can call the right one for the question.

The agent's behavior is shaped by an in-house prompting library. It covers tool definitions, the decision prompts that steer when to retrieve and when to answer directly, conversation context handling across turns, and the response formatting that lets the chatbot return product cards and structured links instead of plain text. The conversational model is a configuration variable inside it, which means swapping providers on the backend doesn't require changes to any orchestration code.

Responses come back as full messages rather than streamed tokens, delivered over the same WebSocket channel that keeps the inbox in sync. The choice is deliberate. Streaming tokens looks impressive on plain-text answers, but it breaks down when a response includes rendered product cards or formatted links, where half-rendered output is worse than waiting an extra moment for the whole thing to arrive cleanly.

A widget that behaves like a teammate

The widget is the part of website shoppers actually see. It has to load fast on a host site without breaking it, present answers richly enough to feel like a storefront feature instead of a bolted-on bot, and step aside cleanly when the owner wants to take the conversation over. This section covers how the widget is embedded, what it knows about the page it's running on, and what handoff between bot and human looks like.

The widget ships as a single script tag the store owner pastes into their site. The script injects a launcher button on the page; clicking it opens the chat in an iframe. The iframe is doing real work here. It keeps the host site's CSS and JavaScript from interfering with the widget, and the widget from interfering with the storefront. The owner doesn't have to think about CSS conflicts or namespacing.

The widget also reads the page it's loaded on. A shopper who opens the chat from a product page and asks "what do you think about this" gets an answer about that specific product, without having to type its name. The same applies to category pages, blog posts, and anything else with a clear subject. The chat behaves like a colleague who walked up while the shopper was looking at something, not like a search box that just lost the thread.

Messages between the agent and the widget travel as structured JSON rather than plain text. That's what lets the chat render product cards with image, title, price, and link, and format outbound links cleanly. The set of things that can be rendered is deliberately small. Cards and links, nothing more. The widget isn't trying to be a second storefront inside the chat.

Conversations resume across page loads. A shopper who asks about returns on one page, navigates to another, and reopens the chat picks up the thread where they left it.

Handoff to a human is the part the section heading was getting at. Owners see every conversation in the inbox as it happens, not just the ones that get flagged. When something needs a person, the owner takes over with one click. The bot stops generating immediately, any reply in progress is cancelled, and no further auto-replies go out until the owner hands the conversation back. The shopper sees a small system message marking the change, the way a messaging app notes a participant entering or leaving. Hiding the handoff would feel deceptive when the tone of the conversation is about to shift. The owner sees exactly what the shopper sees: the full message history, no behind-the-scenes view of what the agent searched for or retrieved. The point is to walk into a conversation as a teammate, not to audit a bot.

Results

Doina has worked on more than 50 eCommerce stores. The deployments span different store sizes, catalog shapes, and underlying platforms, with no two configurations ending up identical. What "customized" means in practice is that each store's chatbot is tuned to its own catalog, voice, and policies entirely through the admin, with no per-store engineering work behind it.

The path from a public storefront to a chatbot that knows it stays the same for every store: paste a URL, watch the ingestion run, embed the widget.