Context Arithmetic for Voice: A Technical Primer

How the Context Engine computes what your voice agent should know at call tim...

Written by
Reviewed by
7 min read
Published at Today
Updated on Today
Table of Contents
({ title: a.title, href: `/blog/${a.slug}`, track: a.track }))} >

Context arithmetic is the computational process by which Alchemyst's Kathan engine determines what a voice agent should know at call time. It's the difference between prompt stuffing (giving the agent everything and hoping it picks the right parts) and context engineering (systematically selecting, filtering, and ranking information so the agent receives only what's relevant).

This article is a technical primer for developers and technical evaluators who want to understand the architecture of Kathan (कथन). For a higher-level overview, see our articles on why context matters more than voice quality and what changed in voice AI since 2024.

The Formula

At its core, context arithmetic follows a set-algebra approach to information retrieval:

Final Context = (Semantic Matches) ∩ (groupName Scope) ∩ (Metadata Filters) − (Superseded) → rank → top K

Each operation in this pipeline narrows the context from a broad corpus to a focused, actionable set. Let's walk through each stage.

Stage 1: Semantic Similarity Search

The first stage retrieves context documents that are semantically relevant to the current call's objective. The query is constructed from the campaign objective, the lead's profile, and any specific instructions for this call.

For example, if the campaign objective is "CA Foundation enrollment follow-up" and the lead is a Gujarat-based parent who previously expressed interest in the January batch, the semantic query captures these dimensions. The search returns documents from the interaction log, CRM notes, and campaign metadata that are semantically similar to this query.

Input: Full corpus of context documents (potentially 500,000+ across all leads and campaigns).
Output: ~2,000 semantically relevant documents, ranked by cosine similarity.

Stage 2: groupName Scoping

Not all semantically relevant documents are appropriate for this call. groupName scoping filters the semantic matches to only include documents that belong to the relevant scope — typically the current campaign, the lead's account, or a specific product line.

groupNames are hierarchical. A document tagged with jkshah/ca-foundation/gujarat is visible to queries scoped at jkshah/ca-foundation or jkshah, but not to queries scoped at jkshah/cs-executive. This prevents context leakage between unrelated campaigns while allowing shared context to flow where appropriate.

Input: ~2,000 semantic matches.
Output: ~200 documents within the relevant groupName scope.

Stage 3: Metadata Filtering

Metadata filters apply structured constraints: lead ID, language, date range, interaction type, campaign phase. These are exact-match or range filters that further narrow the context to documents that are not just relevant and in-scope, but specifically applicable to this lead at this moment.

For our Gujarat CA student example, metadata filters would include: lead_id = "GJ-4521", language IN ("gu", "hi", "en", "ta", "te", "kn", "mr", "bn", "ml", "pa", "or", "as", "ur"), date {'>'} "2026-01-01". This eliminates documents from other leads, irrelevant languages, and outdated interactions.

Input: ~200 scoped documents.
Output: ~50 documents matching all metadata constraints.

Stage 4: Deduplication of Superseded Context

Context evolves. A lead who said "I'm interested in the January batch" in December may have said "I've decided on March instead" in February. Both statements are in the context store. The deduplication stage identifies superseded context — older information that has been updated by newer information — and removes it.

This isn't simple timestamp-based deduplication. It's semantic deduplication: Kathan's voice OS identifies when a newer document contradicts or updates an older one on the same topic, and keeps only the most current version. This prevents the agent from referencing outdated information.

Input: ~50 filtered documents.
Output: ~15 deduplicated, current documents.

Stage 5: Ranking and Top-K Selection

The final stage ranks the remaining documents by a composite score that weighs recency, relevance to the call objective, and information density. The top K documents (typically 5–10) are selected and formatted into the agent's context window.

The ranking function balances three signals:

[Data table has been removed during migration]

Input: ~15 deduplicated documents.
Output: 5 context documents, totaling ~400 tokens, delivered to the agent's prompt.

A Concrete Example

Let's trace the full pipeline for a specific call: Agent dials Priya Mehta (lead GJ-4521), a Gujarat-based parent, for the third time across two campaigns. She's interested in CA Foundation for her son.

[Data table has been removed during migration]

Second Example: Unacademy NPS Feedback

Now, let's apply the same arithmetic to a different use case: collecting Net Promoter Score (NPS) feedback for Unacademy. The goal is to have a contextual conversation about a learner's experience, not just ask for a score.

[Data table has been removed during migration]

Resulting Agent Context for NPS Call (450 tokens)

Learner: Ananya Sharma | Learner ID: 84321

Course Enrollment: UPSC CSE - GS Mains, Batch 3

Engagement Metrics: 78% video completion, 3/5 assignments submitted. Last active: 2 days ago.

Support History: 1 open ticket re: doubt-clearing session schedule. 2 closed tickets re: payment confirmation.

Previous NPS (3 months ago): 8/10. Comment: "Good content, but live class timings are difficult."

Recommended approach: Open with reference to UPSC course. Ask for feedback, and if score is low, probe on session timings and doubt-clearing experience.

The Resulting Agent Context (400 tokens)

Lead: Priya Mehta (GJ-4521) | Parent | Gujarat | Preferred language: Gujarati

Prior interactions: 2 calls across campaigns "CA Foundation Gujarat" and "CA Foundation Retarget Q1"

Last call (Feb 15): Discussed March batch. Objection: fees too high. Requested fee breakdown by email. Call duration: 3m 42s.

Current status: Email sent Feb 16 with fee breakdown. No response. Callback requested for "after Holi."

Recommended approach: Open in Gujarati. Reference fee breakdown email. Address fee objection with installment option. Confirm March batch interest.

Compare: Prompt-Stuffed Agent (4,000 tokens)

A prompt-stuffed system would include all 50 documents from the semantic search stage — campaign descriptions, other leads' interactions, general product information, outdated batch schedules, and the relevant context buried somewhere in the middle. The agent would receive 4,000 tokens and need to figure out which 400 matter. In practice, it often picks the wrong ones.

[Stat card removed]

"Context arithmetic isn't about giving the agent more information. It's about giving the agent the right information — systematically filtered, ranked, and delivered so every token in the context window earns its place. This is a core principle of the Kathan voice OS. Built in India, for the world."

Implementation Implications

For teams evaluating or building context-aware voice AI systems, the key architectural requirements are:

Indexed interaction storage. Every call generates context that must be indexed for semantic search, not just logged to a flat file. This requires an embedding pipeline that processes call transcripts, extracts key information, and stores it with appropriate metadata and groupName tags.

Hierarchical scoping. groupName-based scoping must support hierarchical relationships so that context can flow across related campaigns without leaking to unrelated ones. The scoping model should be configurable per deployment.

Semantic deduplication. Simple timestamp-based deduplication isn't sufficient. The system must identify when newer information supersedes older information on the same topic — even when the phrasing is different.

Latency budget. The entire context arithmetic pipeline must complete within the telephony latency budget — typically under 200ms. This is crucial for handling our 500,000+ calls deployed daily.

Context arithmetic is the technical foundation that makes contextual voice AI possible. It's what separates a voice agent that sounds natural from one that acts intelligently — knowing who it's talking to, what they've discussed before, and what the optimal next step is.

See context arithmetic in action — start a pilot with Alchemyst's Kathan, the enterprise voice OS that supports over 12+ Indian languages and international languages like English, Arabic, Spanish, French, Mandarin, and Japanese.

Ready to build your next AI agent?