AI & Insurtech

NLP for Claims Intimation and FNOL Processing in India

Natural language processing is reshaping how Indian commercial insurers receive and process First Notice of Loss, extracting structured claim data from WhatsApp messages, emails, and call transcripts in seconds. This post examines how NLP works in practice across Indian non-life insurers and what IRDAI's claim intimation timelines mean for automation design.

Sarvada Editorial TeamInsurance Intelligence
13 min read
NLPFNOLclaims intimationAIcommercial insuranceclaims automation

Last reviewed: May 2026

Why FNOL Processing Is a Bottleneck in Indian Commercial Insurance

First Notice of Loss is the moment an insured informs their insurer that a loss event has occurred. In Indian commercial insurance, this notification arrives through a remarkable variety of channels: a voice call to a broker, a WhatsApp message to a relationship manager, an email to a claims inbox, a complaint portal submission, or a handwritten letter from a remote manufacturing site. Each channel produces data in a different format, and each format demands a different extraction approach before a claim can even be opened in a management system.

The resulting manual effort is substantial. A mid-sized non-life insurer handling 8,000 to 15,000 commercial claims per year might employ ten or more data-entry staff whose primary function is reading incoming notifications and keying policy numbers, incident descriptions, and estimated losses into the claims system. This process is slow, error-prone, and adds no investigative value. A miskeyed policy number can delay surveyor appointment by a full working day, which matters enormously given IRDAI's prescribed timelines.

IRDAI's Insurance Regulatory and Development Authority (Protection of Policyholders' Interests) Regulations, 2017, specifically Regulation 9, requires non-life insurers to acknowledge claim intimation and begin processing within 48 hours of receiving notification. For most commercial lines, a surveyor must be appointed within 72 hours. These timelines are non-negotiable and apply regardless of how the intimation arrives. Manual processing under these constraints is increasingly untenable as commercial claim volumes grow.

Natural language processing changes the equation by treating every incoming intimation, regardless of channel, as a structured data extraction task. The insurer defines the information elements it needs (policy number, date of loss, cause of loss, location, estimated amount), and the NLP model reads the incoming text to extract those elements automatically, flagging gaps for human follow-up rather than requiring a human to read the entire document from scratch.

How NLP Extracts Structured Data from Unstructured Claim Notifications

The core NLP task in FNOL processing is named entity recognition (NER): the identification of specific information types within free-form text. General-purpose NER models identify people, places, and organisations, but they are not trained to extract the specific entities that matter in a commercial insurance claim. Insurers either fine-tune pre-trained transformer models on labelled insurance claims data or use domain-specific models built from scratch on Indian insurance corpora.

A claim notification might read: "Our factory in Bhiwandi caught fire yesterday night around 10 PM. Policy number is HDFC-COM-2024-78432. We estimate around 2 crore loss in machinery." An NER model trained on insurance text extracts: policy number (HDFC-COM-2024-78432), incident type (fire), location (Bhiwandi), date of loss (resolved to a calendar date from "yesterday night"), and estimated loss (INR 2 crore, machinery category). The full claim pre-population takes seconds.

Modern transformer-based models fine-tuned on insurance-domain text achieve entity extraction accuracy rates of 88 to 94% on well-formed claim notifications in standard Indian English. Accuracy drops for highly colloquial text, abbreviations, or regional transliterations. In a controlled test using 500 labelled email intimations from a mid-size Indian non-life insurer (reported at the 2025 Insurance Data and Analytics summit in Mumbai), a fine-tuned multilingual model achieved 91% accuracy on policy number extraction and 84% accuracy on incident type classification. Accuracy fell to 72% for estimated loss quantum extraction, reflecting the inherent imprecision with which policyholders describe amounts at FNOL stage.

Relation extraction, a more advanced NLP capability, goes beyond identifying individual entities to understanding how they relate to each other. For a complex commercial claim, this might mean recognising that fire started in Building A but spread to Building B (relevant for separate sum insured limits), or that transit damage occurred during a third-party carrier's leg rather than the insured's own fleet (relevant for subrogation rights). New India Assurance and United India Insurance have both indicated in published digital transformation roadmaps that relation extraction is a second-phase capability under exploration, following foundational NER deployment at intake.

WhatsApp, Email, and Call Transcript Processing

Indian commercial insurers today receive claim intimations through at least five distinct channels: the traditional call centre, email, WhatsApp Business API, the insurer's own mobile app or portal, and broker intermediary platforms. Each channel presents distinct NLP challenges that require specific pipeline design.

Call centre conversations are transcribed using automatic speech recognition (ASR) before NLP processing begins. Indian commercial insurers typically handle calls in Hindi and English as a baseline, but regional operations require ASR models tuned to Marathi, Tamil, Telugu, Bengali, and Kannada. HDFC ERGO General Insurance and Bajaj Allianz General Insurance have both invested in speech-to-text pipelines supporting at least four Indian languages, according to published investor briefings and industry conference presentations from 2024 and 2025. ASR error rates above 8% materially degrade downstream entity extraction quality, making ASR tuning a prerequisite for claims automation.

WhatsApp messages present a different problem. Business WhatsApp is increasingly the preferred intimation channel for small and medium commercial clients dealing through brokers. Messages may combine text, audio notes, images, and PDFs. NLP applied to the text layer must handle code-switching (mid-sentence language shifts), abbreviations, and incomplete sentences. A message reading "Sir fire hogaya godown mein, policy ka number nahi pata, aap check karo" contains the incident type and asset type, but the policy number must be retrieved by cross-referencing the sender's registered mobile number against the policy database. A well-designed pipeline handles this lookup automatically rather than flagging the intimation as incomplete.

Email intimations from brokers or corporate policyholders tend to be more structured, but they introduce a different challenge: they are often long, include multiple attachments, and embed the actual claim detail several paragraphs into the body of a message that also contains boilerplate broker letterhead and auto-signatures. NLP models trained specifically for claims intimation learn to locate the relevant sections quickly, filtering out unrelated content that is forwarded along with the intimation.

Multi-Language Support for Regional Indian Languages

India's linguistic diversity is a structural feature of the insurance market that directly affects claims intake quality and settlement speed. Commercial policyholders in Tamil Nadu communicate in Tamil. Those in West Bengal communicate in Bengali. Businesses in Andhra Pradesh and Telangana use Telugu dialects. A national insurer operating branch offices in each of these states faces a language coverage problem that a single-language NLP model cannot solve.

Large private insurers like HDFC ERGO have built or licensed multi-language NLP pipelines supporting seven or more Indian languages at the entity extraction layer. Public-sector insurers like New India Assurance, with the broadest geographic footprint in the country through its 2,000-plus branch network, have historically relied on regional claims teams to translate intimations before feeding them into centralised NLP systems, creating a bottleneck that undercuts the speed advantage automation is intended to deliver.

A more scalable approach uses pre-trained multilingual transformer models such as IndicBERT and MuRIL, both developed with Indian language data, as the backbone, fine-tuned on insurance domain text. These models handle 12 or more Indian languages natively without requiring translation as an intermediate step. The trade-off is that fine-tuning multilingual models requires larger labelled datasets across each language, which is expensive to create. Hindi NLP is the most mature, achieving accuracy rates approaching those for English. Tamil and Telugu NLP achieves accuracy in the 72 to 80% range, still sufficient for automation with meaningful human review of low-confidence extractions.

Code-switching, writing that alternates between languages within a single message, is endemic in Indian business communication. "Fire ho gayi factory mein, policy number ABC-123, please send surveyor jaldi" requires a model that can process Hindi and English simultaneously. Multilingual models trained on mixed-language Indian text handle this reasonably well for common entity types, though idiomatic constructions remain challenging. For commercial insurance specifically, the added complexity is that policy documents and endorsements are almost always in English, even when the policyholder communicates in a regional language, requiring cross-language lookup between the intimation language and the policy record.

Automated Claims Routing After NLP Extraction

Extracting entities from an intimation is the first step; routing the claim correctly is the second and operationally more consequential step. The routing decision determines which surveyor team receives the assignment, which claims handler manages the file, and which reinsurance notification rules apply. Getting routing wrong at the outset creates delays that compound throughout the claim lifecycle.

NLP-driven routing uses extracted entities plus policy data to make these assignments automatically. An extracted cause of loss of "electrical short circuit" combined with a machinery breakdown policy triggers routing to the engineering claims team rather than the fire team, even if the visible damage was fire. An estimated loss exceeding a pre-defined threshold (commonly INR 50 lakh for commercial property claims) triggers automatic notification to the reinsurance desk. A loss location in a flood-prone district during monsoon season triggers a geographic risk flag for the assessor. The IRDAI (Surveyors and Loss Assessors) Regulations, 2015 mandate appointment of an IRDAI-licensed surveyor for claims above INR 75,000, a rule that routing logic must enforce automatically.

Bajaj Allianz General Insurance's commercial claims platform applies a routing waterfall: the NLP layer extracts claim characteristics, a business rules engine applies insurer-defined routing logic, and only ambiguous cases escalate to a human claims supervisor. The company reports that this approach handles over 60% of commercial FNOL intimations without any human routing decision, reducing average time-to-surveyor-appointment from 18 hours to under 4 hours for routable claims.

Routing accuracy is directly tied to entity extraction quality. If the NLP model extracts an incorrect cause of loss or misidentifies the policy type, the routing decision will be wrong regardless of how well the routing logic is designed. Routing error rates should be tracked separately from extraction error rates, as a single misclassified incident type can trigger a cascade of wrong process steps. Calibrating the confidence threshold for human review is an ongoing activity rather than a one-time configuration: as new types of claims appear, new failure modes emerge that require threshold adjustment and, periodically, model retraining.

IRDAI Claim Intimation Timeline Requirements and Compliance Monitoring

IRDAI's Protection of Policyholders' Interests Regulations, 2017, Regulation 9 specifies obligations on non-life insurers once a claim is intimated. The insurer must acknowledge receipt within 48 hours and, for claims requiring survey, appoint a licensed surveyor within 72 hours for losses above the de minimis threshold. Final claim settlement timelines vary by policy type but typically range from 30 to 90 days from intimation under IRDAI's master circulars on claims management.

NLP-automated FNOL processing directly supports compliance with these timelines. Automated systems can acknowledge an intimation within seconds of receipt, issue a claim reference number immediately, and log the intimation timestamp with precision. Manual systems create ambiguity about when exactly the insurer "received" the intimation, particularly for after-hours submissions. An automated system eliminates this ambiguity and creates an auditable record that can be produced to IRDAI examiners during inspection. IRDAI's 2024 Master Circular on Claims Management reinforced surveyor appointment timelines and added requirements for proactive policyholder communication on claim status, which NLP systems can support by triggering automated status updates using contact details extracted at FNOL.

Compliance monitoring dashboards, fed by the NLP extraction output, give claims managers real-time visibility into pending timeline obligations. A claim intimated at 11 PM on a Friday that requires surveyor appointment by 11 PM on Monday appears on a Monday morning dashboard flagged as requiring same-day action. Without this visibility, such claims routinely fell through the cracks in manual workflows, generating regulatory compliance failures that attract IRDAI scrutiny and potential penalty.

Insurers operating under IRDAI's watch should note that the regulator has increased thematic inspections on claims settlement processes since 2023, with particular attention to timeline adherence in commercial lines. NLP-generated audit trails, showing the precise timestamp of each processing step from intimation receipt through surveyor appointment, provide documentation that satisfies inspection requirements and demonstrates systematic compliance rather than reactive exception management.

Integration with Legacy Claims Management Systems

NLP for claims intimation feeds existing claims management systems rather than replacing them. The integration architecture between an NLP processing layer and the core claims platform is where many Indian insurer implementations encounter friction, particularly given the age profile of legacy platforms in the non-life sector.

The dominant claims management platforms in use by Indian non-life insurers include in-house legacy systems built on Oracle or IBM databases, customised versions of international platforms such as Guidewire ClaimCenter, and a growing set of cloud-native claims platforms offered by insurtechs. NLP vendors operating in India, including Artivatic.ai, Mantra Labs, and international players who have localised for the Indian market, typically offer a middleware approach: the NLP service exposes REST APIs that the claims platform calls after an intimation event triggers.

Data flow in a well-integrated system works as follows. When a WhatsApp message is received through the Business API, a webhook triggers the NLP pipeline. The pipeline returns a structured JSON payload containing extracted entities, confidence scores for each entity, the classified incident type, a recommended routing tag, and a risk flag if anomalies were detected. The claims system ingests this payload, creates a claim shell record, assigns a claim number, and sends an automated acknowledgement to the policyholder or broker. The entire loop can complete in under 90 seconds for a text-based intimation, well within the 48-hour IRDAI window.

For Indian public-sector insurers with older system architectures where direct API integration is not feasible, NLP output is written to a structured flat-file format that is batch-imported into the claims system at 15 or 30-minute intervals. This introduces delay but still represents a substantial improvement over manual data entry, and it preserves the 48-hour acknowledgement capability in all but the most extreme volume scenarios. Data quality feedback loops are essential for sustained performance: NLP extractions should be compared against the final adjudicated claim record periodically, identifying systematic errors and informing model retraining cycles.

Accuracy Governance and Human Oversight in Production

NLP automation in FNOL processing is a human-augmentation tool, not a replacement for trained claims professionals. IRDAI places regulatory accountability firmly with the insurer, not the technology vendor. Under IRDAI's Guidelines on Outsourcing of Activities by Indian Insurers (2017), any automation that touches the claim intake process must be auditable, and any regulatory breach resulting from a technology error remains the insurer's liability.

In practice, this means NLP-assisted claims intake requires a structured human oversight layer. Most Indian insurers implement a sample audit process: a claims quality team reviews a random sample of NLP-processed intimations weekly, comparing NLP output against the original raw input to measure accuracy and identify systematic errors. When model accuracy on critical entities drops below a defined threshold (typically 85% for policy number and incident type), the relevant entity type is moved to mandatory human review pending model correction.

Defining the right confidence threshold for automation versus human review is a calibration exercise with real financial stakes. A threshold set too high sends too many claims to human review, eliminating most of the efficiency gain. A threshold set too low allows incorrect extractions to enter the claims system, creating downstream errors that are more expensive to correct than the savings generated. Most Indian insurers in production deployments settle on thresholds that result in 15 to 25% of intimations requiring some degree of human review, with the remainder processed automatically. This division typically holds for standard commercial lines; catastrophe events or novel loss types temporarily shift the ratio toward higher human review until the model is exposed to sufficient new examples.

Monthly governance reviews using entity-level precision and recall metrics (not just overall accuracy), escalation rate trends over time, and downstream error rates (errors discovered only later in the claims lifecycle) are the minimum standard for responsible deployment in a regulated insurance environment. Both the claims operations team and the technology team responsible for the NLP model should attend these reviews, ensuring that accuracy feedback reaches the teams who can act on it.

Frequently Asked Questions

Does NLP-based claims intimation processing satisfy IRDAI's 48-hour acknowledgement requirement?
Yes, provided the NLP pipeline is configured to generate and dispatch an automated acknowledgement to the policyholder or broker as soon as the intimation is received and a claim record is created. The acknowledgement must include the claim reference number. Insurers should log the timestamp of both the incoming intimation and the outgoing acknowledgement to maintain an auditable record in case of regulatory review. If the NLP pipeline is unavailable due to a technical failure, a manual fallback process must be in place to preserve the 48-hour obligation.
What happens when the NLP model cannot extract the policy number from an intimation?
When policy number extraction fails or confidence is below the configured threshold, the system should attempt a fallback lookup using the claimant's registered mobile number, email address, or broker code against the policy database. If this secondary lookup also fails, the intimation is flagged as an unmatched claim and routed to a human claims officer for manual investigation. The 48-hour clock under Regulation 9 still runs from the point of intimation receipt, so the exception workflow must generate an acknowledgement even without a matched policy record.
Can NLP systems process claim intimations received as voice messages on WhatsApp?
Yes, with an automatic speech recognition layer that transcribes the audio to text before NLP extraction. Indian-language ASR has improved substantially but remains weaker on proper nouns and policy numbers than on general speech. Production systems apply policy number validation after extraction to catch transcription errors, and voice-sourced intimations typically carry higher human review rates than text-sourced intimations. The key design principle is to treat ASR error as an expected input condition rather than an edge case, building validation steps into the pipeline for every entity that matters for routing and coverage decisions.
How do Indian insurers retrain NLP models as claims language patterns change over time?
Well-designed NLP systems for claims intimation include a continuous learning pipeline where reviewed and corrected NLP outputs are periodically added to the training dataset. Claims quality teams who audit NLP accuracy generate the labelled data needed for retraining. Most Indian insurer implementations retrain models quarterly or after any significant product change, regulatory update, or catastrophe event that introduces new vocabulary. The insurer retains ownership of the training data under IRDAI's outsourcing governance requirements, so model retraining schedules should be documented in the vendor contract.
Which Indian insurers have deployed NLP for claims intimation, and what results have they reported?
Bajaj Allianz General Insurance has reported publicly that conversational AI for claims intake processes over 60% of commercial FNOL intimations without human routing intervention, reducing time-to-surveyor-appointment substantially. HDFC ERGO has deployed WhatsApp-based claims notification using combined text and image NLP. New India Assurance and United India Insurance have announced NLP integrations as part of broader digital transformation programmes, with multilingual processing cited as a key requirement. Specific accuracy and throughput figures from public-sector insurers are less frequently published but available through IRDAI annual report data on claims settlement timelines.

Related Glossary Terms

Related Insurance Types

Related Industries

Related Articles

Sarvada

Ready to see Sarvada in action?

Explore the platform workflow or start a product conversation with our underwriting automation team.

Explore the platform