TutorBot BD was my first run at the Bangla-tutor idea, and it stands on its own as a production WhatsApp bot. It answers text, photo, and voice questions over WhatsApp, grounded in the national curriculum. The brain is an 8-node LangGraph: a supervisor classifies the question, RAG retrieves the relevant textbook passage, then one of three complexity tiers fans 2–4 specialist agents out in parallel.
The messaging layer is the part I'm proudest of: an 11-state user lifecycle machine behind a BullMQ/Redis queue so webhooks ack instantly, a timing-safe HMAC-SHA256 signature guard, and a DI factory that swaps the Meta Cloud API and Twilio behind one token. Ingesting the curriculum was its own problem — standard Bengali text extraction from the NCTB PDFs is unreliable, so I screenshot every page in headless Chrome and OCR it with Groq's Llama 4 Scout.
The hard part
OCR-ing a curriculum no parser could read
The NCTB textbooks are image-heavy PDFs, and standard Bengali text extraction returns garbage. Rather than fight the parser, I rendered each page to an image in headless Chrome and ran it through Groq's Llama 4 Scout vision model for OCR — turning a 441 MB, 13-PDF corpus into clean, chunked, embeddable text the tutor could actually ground its answers in.
Highlights
- Drove the WhatsApp layer with an 11-state lifecycle machine and a BullMQ/Redis queue so webhooks ack instantly, behind a timing-safe HMAC-SHA256 guard.
- Swapped Meta Cloud API and Twilio behind one DI token, and delivered answers as Bangla-shaped PDFs rendered through headless Chrome.
- Built an 8-node LangGraph tutor: supervisor classification, RAG retrieval, then 3 complexity tiers fanning 2–4 specialist agents out in parallel.
- Ingested a 441 MB, 13-PDF NCTB corpus by screenshotting each page in headless Chrome and OCRing with Groq Llama 4 Scout.
- Ordered a 3-provider LLM failover chain by latency (Groq → Gemini → OpenAI), filtering to vision-capable providers for image questions.
Stack