Why Loss Run Extraction Is the Quiet Bottleneck in Indian Commercial Renewals
Every commercial renewal in India begins with a loss run. The insurer issues a statement of claims experience for the expiring policy period, the broker collates statements across all incumbent insurers for the corporate client, and the data is consolidated into a renewal submission that drives underwriting decisions, pricing negotiations, and capacity allocation. The process is fundamental to the renewal economics of mid-market and large commercial accounts. It is also one of the slowest, most error-prone steps in the broker workflow.
The operational problem is structural. Indian general insurers issue loss runs in materially different formats: some as native PDFs from their policy administration systems, some as scanned printouts of internal MIS reports, some as Excel exports with insurer-specific column conventions, and some as free-form letters from claims teams. A mid-market broker placing a corporate client with eight to twelve insurer relationships across property, marine, liability, and motor lines typically receives fifteen to thirty distinct loss run documents per renewal cycle, with no two insurers using the same field layout, currency convention, or claim status taxonomy.
The traditional response has been manual data entry. A broker operations analyst opens each document, reads the fields, types them into a renewal spreadsheet, and reconciles the totals against insurer summary statements. For a mid-market account with a three-year claims history and twenty individual claim records, this consumes three to six analyst hours per renewal. For a large commercial account with multi-year history across multiple lines, the consolidation effort can reach forty to sixty analyst hours per renewal. Brokers servicing fifty to two hundred renewal accounts per quarter absorb this cost through dedicated operations teams or through deferred renewal preparation that compresses the underwriting timeline.
The quality impact is equal to the time impact. Manual loss run consolidation introduces transcription errors that propagate into the renewal submission, the underwriting decision, and the final placement. A claim amount typed as INR 12,50,000 instead of INR 1,25,00,000 is a renewal-defining error. A claim status recorded as 'paid' instead of 'reserved' changes the loss ratio the underwriter sees by a material margin. These errors are caught at best by an experienced renewal lead reviewing the consolidated submission, at worst by the insurer's underwriter who returns the submission for correction and resets the renewal timeline.
Document AI has matured to the point where loss run extraction is now a solved problem in principle for Indian brokers, though the operational deployment requires care. The combination of layout-aware OCR, structured extraction models tuned for insurance documents, and validation against insurer-specific schemas produces extracted loss runs with field-level accuracy in the 95 to 99 percent range on first pass for native PDFs, and 88 to 95 percent on scanned documents. The remaining error budget is handled by exception review rather than by full document re-entry, and the broker operations team shifts from data entry to validation.
The Field Schema That Survives Multi-Insurer PDF Chaos
The first design decision in any broker loss run extraction system is the target schema. The schema is the canonical structure into which every insurer's loss run is normalised. Without a stable target schema, extraction outputs are as fragmented as the source documents and the downstream analytics layer cannot operate.
The working schema that has emerged across Indian broker deployments includes the following core fields per claim record:
- policy reference: the insurer policy number under which the claim was registered
- claim reference: the insurer's internal claim number
- loss date: the date of loss as recorded by the insurer (not the date of intimation)
- intimation date: the date the claim was first notified
- claim type: the peril or cause code, normalised against a broker-side taxonomy
- claim status: open, reserved, paid, closed, repudiated, withdrawn
- gross paid amount: total paid to date before recoveries
- gross reserve amount: outstanding reserve at the reporting date
- recoveries: salvage, subrogation, reinsurance recoveries where disclosed
- net incurred: gross paid plus reserve minus recoveries
- deductible applied: where disclosed by the insurer
- closing date: where the claim is closed
- remarks: free text from the insurer, retained for context
The schema also carries claim-level metadata that the extraction system populates automatically: the source document identifier, the source insurer, the source line of business, the source policy period, and a confidence score for each extracted field. The metadata is essential for the downstream validation step and for the audit trail that the broker must maintain under the IRDAI (Insurance Brokers) Regulations, 2018 and the documentation expectations attached to the broker performance scorecard.
The taxonomy normalisation step is the part of the pipeline that requires the most insurance-specific judgement. Insurers use different terms for the same claim status (one insurer's 'pending' is another insurer's 'open' is a third insurer's 'reserved-not-paid'), different peril codes (the property peril taxonomy used by ICICI Lombard differs from that used by Tata AIG which differs from that used by Bajaj Allianz), and different currency conventions (some insurers report in lakhs, some in crores, some in absolute rupees). The normalisation layer maps insurer-specific terms to the broker's canonical taxonomy through a maintained mapping table that the broker operations head owns.
A secondary schema captures policy-level metadata that is typically presented in the loss run header: the policy period, the sum insured, the premium, the deductible structure, the named insured, and any endorsements that affect the claims interpretation. This metadata feeds the broker's policy register independently and provides a cross-check against the schedule the broker holds from the original placement.
Extraction Pipeline Architecture: OCR, LLM, and Validation
A working loss run extraction pipeline for Indian brokers has four stages. The architecture has stabilised across deployments at mid-market and large brokers over 2025 and into 2026, with vendor differences appearing mainly in the specific models used at each stage rather than in the stage structure itself.
The first stage is document intake. The pipeline accepts loss runs through the broker's standard inbox channels (email attachment, broker portal upload, insurer API where available), classifies each incoming document as a loss run versus other claims correspondence, and routes it into the extraction queue with the source metadata attached. Misclassification at this stage is rare for native PDFs from known insurers but more common for scanned documents and for hybrid documents that combine loss run data with other claims correspondence.
The second stage is layout analysis and OCR. For native PDF loss runs, the pipeline extracts the document text directly with positional information for each text fragment. For scanned loss runs, the pipeline runs OCR with a model tuned for Indian commercial document layouts, which handles the typical artefacts of insurer MIS printouts (faint reprints, partial scans, multi-column tables that wrap across pages, handwritten margin notes). The output of this stage is a structured document representation with text fragments, their positions, and an OCR confidence score per fragment.
The third stage is structured extraction. A language model with a tightly defined output schema reads the document representation and produces the canonical claim records described in the previous section. The model is prompted with the target schema, the insurer-specific layout hints where available, and the broker's taxonomy mapping table. It produces extracted records with field-level confidence scores. The extraction is run insurer-by-insurer rather than as a single pass across all sources, which preserves the layout-specific accuracy and isolates failures to a single insurer's documents.
The fourth stage is validation. The extracted records are checked against rule-based validators (claim amounts are numeric, dates parse cleanly, status values map to the taxonomy, totals reconcile against insurer summary statements where present) and against a cross-reference layer that compares the extracted policy reference and claim reference against the broker's claims register. Any record failing validation is flagged for human review, with the failure reason and the underlying document fragment surfaced to the analyst.
The operational metrics that matter for this pipeline are first-pass extraction accuracy, exception rate, exception clearance time, and end-to-end loss run consolidation time per renewal. A working production pipeline at a mid-market Indian broker reports first-pass field accuracy of 96 to 98 percent on native PDFs from the top eight general insurers, exception rates of 4 to 12 percent of claim records requiring human review, exception clearance time of 2 to 5 minutes per flagged record, and end-to-end consolidation time of 30 to 90 minutes per renewal compared to the 3 to 6 hours under the manual baseline.
The pipeline must also handle the DPDP Act 2023 consent and data protection requirements. Loss runs contain personal data and sensitive personal data (claim details that may reveal medical conditions in health policies, employee details in workers compensation, individual claim narratives). The extraction pipeline must process this data with appropriate access controls, retention policies aligned to the broker's consent terms, and audit logging that documents every access to the extracted records.
Workflow Shift: From Data Entry to Validation and Analytics
The introduction of AI extraction changes the broker operations workflow more than it changes the broker's hardware or software stack. The team composition, the role definitions, and the time allocation across the renewal cycle all shift in measurable ways.
The largest shift is in the operations analyst role. Under the manual baseline, the analyst spends the majority of renewal preparation time on data entry: reading documents, typing fields, reconciling totals, and producing the consolidated submission spreadsheet. Under the AI extraction baseline, the analyst spends a small share of time on exception review and the majority on analytical work: identifying loss trends, comparing actual claims experience against the underwriting assumptions in the expiring policy, preparing the narrative that supports the renewal submission, and flagging claims patterns that the renewal lead should discuss with the client.
The second shift is in the renewal lead role. Renewal leads at mid-market brokers historically reviewed the consolidated submission spreadsheet against the source documents as a quality check on the analyst's data entry. Under AI extraction, the quality check shifts to the validation exceptions surfaced by the pipeline rather than to a line-by-line manual review. Renewal leads spend more time on the substantive renewal strategy (market selection, capacity strategy, pricing benchmarks, coverage enhancements) and less time on data quality checking.
The third shift is in the timing of analytics. Manual consolidation typically pushes claims analytics to the end of the renewal preparation window, where the consolidated spreadsheet is available only days before the submission deadline. AI extraction makes the consolidated dataset available within hours of the loss runs arriving, which shifts analytics earlier in the window and allows the renewal team to use claims insights to shape the submission rather than to summarise it.
The fourth shift is in client communication. AI extraction produces consistent, structured loss data across all of a client's insurers and lines, which enables the broker to present an integrated claims picture rather than a series of insurer-specific reports. Corporate clients with insurance spend above INR 5 crore annually increasingly expect integrated claims analytics from their broker, and the consistent dataset from AI extraction makes this presentation feasible without additional operations effort.
The workflow shift also has implications for team sizing and skills. Brokers report that AI extraction reduces the operations headcount required per hundred renewal accounts by 30 to 50 percent, with the remaining headcount upskilling from data entry to claims analytics and validation. The transition is rarely a headcount reduction in absolute terms because Indian brokers are growing their account books in parallel; rather, it is a productivity expansion that allows the same team to handle more accounts with stronger analytical depth.
Insurer-Side Considerations and the Data Exchange Question
Broker-side AI extraction solves a problem that insurers create. The fundamental cause of the extraction effort is that insurers issue loss runs in inconsistent formats with no standard schema across the industry. The question that follows is whether the insurance industry should solve the problem at source through standardised data exchange rather than at the broker end through repeated extraction.
The arguments for industry-level standardisation are direct. A standard loss run schema, adopted across general insurers and exchanged through a structured API or file format, would eliminate the extraction step entirely. The Bima Sugam platform under development by IRDAI is one candidate vehicle for such standardisation, with the platform's policy and claims data exchange specifications including loss run schemas that participating insurers can publish through. Industry bodies including the General Insurance Council have discussed standard claims data schemas, with limited progress through 2025 and renewed attention as Bima Sugam reaches operational maturity.
The arguments against expecting near-term standardisation are equally direct. Insurer policy administration systems are heterogeneous, with each major general insurer running a different vintage of core system from a different vendor. Mapping these heterogeneous systems to a common output schema requires investment that insurers currently lack a commercial incentive to make. Loss runs are not a revenue-generating activity for insurers; they are a cost of customer service. The standardisation question has been on the industry agenda for over a decade with limited progress, and brokers should plan their operating model on the assumption that document AI is the more reliable path to consistent loss data over the next three to five years.
A related question is whether the broker's extracted dataset should be shared back with the insurer. The case for sharing is that the insurer's own MIS may have data quality gaps that the broker's extraction identifies (claims with missing fields, status updates that have not propagated, reconciliation differences between branches and head office). The case against is that the broker has no obligation to share the corrected dataset and that doing so may expose the broker's analytical methodology to the insurer. Most brokers currently do not share extracted datasets back, treating the consolidated dataset as a broker-side analytical asset that supports the renewal submission rather than a data quality service for the insurer.
The regulatory dimension matters. The IRDAI (Maintenance of Insurance Records) Regulations, 2015 and the broker-side documentation expectations under the IRDAI (Insurance Brokers) Regulations, 2018 require both parties to maintain accurate claims records. Where the broker's extraction identifies an insurer-side reporting error, the broker has an interest in raising the correction with the insurer to ensure the policyholder's record is accurate. The mechanism for raising corrections is currently insurer-by-insurer and informal; the introduction of structured loss data exchange would formalise this loop.
For the next several quarters, brokers should continue investing in document AI extraction as the primary mechanism for loss run consolidation, while engaging with insurer counterparts and IRDAI working groups on the path toward standardised exchange. The two investments are complementary rather than substitutes: the extraction infrastructure remains valuable for historical loss runs, for insurers not yet on standardised exchange, and for the validation layer that any structured data exchange will still require.
Vendor and Build Considerations for Indian Brokers
Brokers evaluating loss run extraction face the standard build-versus-buy decision, with three viable paths: building an internal extraction stack, adopting a specialist vendor solution, or using the extraction capabilities embedded in a broker operations platform. Each path has cost, control, and capability trade-offs that broker leadership should evaluate against the firm's account volume, technology maturity, and strategic positioning.
The build path involves assembling an internal team that combines document AI engineering, insurance domain expertise, and operations integration. Indian brokers building internal solutions report total first-year investment in the range of INR 1.5 to 4 crore including team costs, infrastructure, model licensing, and integration with existing broker management systems. The ongoing maintenance cost is significant because the pipeline must be updated as insurer document formats change, as the taxonomy mapping table expands, and as new insurance lines are added. Internal builds are typically only justified for brokers with renewal volumes above two hundred accounts per quarter and with existing technology teams capable of supporting the maintenance burden.
The specialist vendor path involves subscribing to a document AI platform purpose-built for insurance loss run extraction. Several vendors are now active in the Indian market, with annual subscription costs in the range of INR 25 lakh to INR 1.5 crore depending on volume and integration depth. The vendor handles the model maintenance, insurer format updates, and taxonomy mapping; the broker provides the source documents and consumes the extracted output. This path has the lowest implementation risk and the fastest time to operational value but requires the broker to accept the vendor's schema and workflow conventions.
The broker platform path involves using the loss run extraction capabilities embedded in an integrated broker operations platform that also covers policy administration, renewal management, claims tracking, and client reporting. This approach minimises the integration effort because extraction outputs flow directly into downstream workflows without separate integration work. The trade-off is platform lock-in: the loss run capability is bundled with the broader platform and cannot be unbundled without replacing the operating system.
Brokers should evaluate vendor options on five operational criteria. First, accuracy on the broker's actual insurer mix, tested through a sample extraction across the broker's top eight to twelve insurers. Second, exception handling workflow, including how exceptions surface to operations analysts and how the analyst's correction feeds back into the model. Third, taxonomy management, including how insurer-specific terms map to the broker's canonical taxonomy and how the broker controls that mapping. Fourth, audit trail and data protection, including DPDP Act 2023 compliance and the retention policies that govern extracted records. Fifth, integration with the broker's existing systems, including the broker management system, the claims register, and the client reporting platform.
Platforms such as Sarvada are emerging in the Indian commercial broking market to consolidate loss run extraction with the broader renewal and client servicing workflow. Brokers evaluating their operations stack for the composite licence era should consider whether integrated platforms accelerate the multi-line workflow consolidation that the composite framework rewards. Request Access to evaluate platform options.
Governance, Audit, and the Compliance Posture
AI extraction of loss runs is a regulated data processing activity. The broker operates under the IRDAI (Insurance Brokers) Regulations, 2018, the DPDP Act 2023, and the broker performance scorecard documentation expectations that have been formalised under the composite licence framework. Each of these regulatory anchors has implications for how the extraction pipeline must be governed.
The IRDAI broker regulations require the broker to maintain accurate records of every claim handled for a corporate client. Where the broker uses AI extraction to produce the consolidated claims dataset, the broker retains accountability for the accuracy of that dataset. The vendor's model is not the regulated party; the broker is. The governance posture must therefore include validation procedures that establish reasonable assurance of accuracy, exception review procedures that catch and correct errors before they propagate, and audit trail mechanisms that allow IRDAI inspectors to reconstruct any extracted record back to its source document.
The DPDP Act 2023 requires lawful processing of personal data with explicit purpose limitation, retention discipline, and data principal rights handling. Loss runs contain personal data and, in some lines, sensitive personal data. The extraction pipeline must process this data under a documented lawful basis (typically the broker's contractual relationship with the corporate client), retain it only for the period necessary for the renewal cycle and any subsequent claims advocacy, and respond to data principal requests for access, correction, or deletion within the prescribed timelines. The broker's data protection officer should review the extraction pipeline's data flows and retention rules at deployment and at each material change.
The broker scorecard documentation expectations require the broker to maintain evidence of effective claims handling, including the broker's role in claims advocacy and renewal preparation. Where AI extraction produces the loss data that drives renewal submission, the broker should document the extraction methodology, the validation procedures, and the exception handling protocol. This documentation is protective in the event of regulatory inquiry, client dispute, or insurer challenge to the broker's loss data.
The internal governance structure for the extraction pipeline should include a designated owner (typically the broker operations head or chief operating officer), a clear escalation path for systemic accuracy issues, a quarterly review of pipeline performance metrics against defined thresholds, and an annual review of the taxonomy mapping table. The board or board-level committee should receive an annual report on the extraction pipeline's performance, accuracy, and any material incidents.
The outlook for the next two years is a steady maturation of the broker-side extraction ecosystem, with insurer-side standardisation moving more slowly. Brokers that invest now in extraction infrastructure, taxonomy discipline, and governance frameworks will hold a defensible operational advantage during the period when the manual baseline still dominates the broking market. As the Indian commercial broking industry consolidates under the composite licence regime, operational scale economies become a material competitive factor, and loss run extraction is one of the earliest workflow areas where these economies are realised.

