What is the minimum data history required to build a credible risk selection model?

For commercial lines in India, a minimum of three to five years of claims data (both frequency and severity) is recommended to build a statistically meaningful model. The data should cover at least two full underwriting cycles to capture the effects of market hardening and softening. For catastrophe-exposed lines such as fire and property, longer history spanning seven to ten years is preferable to capture tail events. If your own-book data is insufficient, supplement with IIB industry benchmarks and GIC Re's published loss data to improve model credibility. The key requirement is that the data must be at a policy-level granularity, not just aggregate line-of-business totals.

How do you prevent algorithmic bias in a risk selection scoring model?

Algorithmic bias can arise when the model learns patterns from historically biased underwriting decisions — for example, systematically declining risks from certain geographies or industry segments based on outdated assumptions rather than actual loss experience. To mitigate this, test the model for disparate impact across protected categories, ensure training data includes a representative sample of risks from all target segments, and validate model outputs against actual loss outcomes rather than past underwriting decisions. Regular audits by the appointed actuary and compliance team, as required under IRDAI's governance framework, should specifically assess whether the model produces fair and actuarially justified risk selection.

Can data-driven risk selection work for new product lines with no historical data?

For genuinely new product lines with no historical claims data, a pure data-driven model is not feasible at launch. However, a structured approach is still possible. Use proxy data from adjacent lines; for example, cyber insurance pricing can draw on professional indemnity and crime insurance loss experience for initial calibration. Engage reinsurers who have international data sets to provide indicative loss cost benchmarks. Launch with a conservative, expert-judgement-based framework and build in data collection from inception, so that the model can be calibrated once two to three years of experience accumulates.

Data-Driven Risk Selection for Commercial Lines

How Indian commercial insurers can apply structured data, analytics, and scoring models to make better risk selection decisions and improve portfolio profitability.

Tarun Kumar SinghStrategic Risk & Compliance SpecialistAIII · CRICP · CIAFP

January 29, 20263 min read

data-analyticsrisk-selectionunderwritingcommercial-linesinsurtech

Last reviewed: February 2026

The Shift from Intuition to Evidence

Traditional risk selection in Indian commercial insurance has relied heavily on underwriter experience, broker relationships, and market positioning. While institutional knowledge remains valuable, it produces inconsistent results. Two underwriters evaluating the same proposal may reach different conclusions based on their individual experience and risk appetite.

Data-driven risk selection replaces this inconsistency with evidence-based decision-making. By analysing historical loss data, external data sources, and real-time risk indicators, insurers can identify which risks are likely to be profitable and which are likely to produce adverse loss experience, before binding coverage.

Data Sources Available to Indian Underwriters

Indian underwriters have access to a growing array of structured data sources. Government databases include MCA filings (financial statements, director details, charges), GST return data (revenue proxies and business activity verification), and the eCourts portal (litigation history). IRDAI's Insurance Information Bureau provides claims frequency and severity benchmarks by industry and geography.

External commercial data sources include CIBIL business credit scores, Dun and Bradstreet business reports, and satellite imagery providers. Emerging data sources include IoT sensor feeds from manufacturing equipment, weather data from IMD, and supply chain risk indices. The challenge is not data availability — it is structuring these diverse inputs into a coherent scoring framework.

Building a Risk Selection Scoring Model

A practical scoring model for Indian commercial lines should incorporate four dimensions: financial risk (credit scores, use ratios, revenue trend), operational risk (industry hazard grade, compliance status, loss prevention measures), claims risk (historical loss frequency and severity, both own-book and IIB data), and accumulation risk (geographic concentration, catastrophe exposure).

Assign weights to each dimension based on actuarial analysis of your own portfolio's loss drivers. For example, if financial distress is the strongest predictor of claims in your manufacturing book, weight the financial risk dimension at 35-40%. Validate the model against three to five years of historical data before deployment, and recalibrate annually as new loss experience becomes available.

From Scoring to Decision Rules

A score alone is not useful unless it maps to clear decision rules. Define acceptance bands: for instance, scores above 75 qualify for automatic acceptance at standard rates, scores between 50 and 75 require senior underwriter review with potential loadings, and scores below 50 are declined or referred to the chief underwriter.

Build override controls that require documentation when an underwriter deviates from the model recommendation. Track override frequency and outcomes. If overrides consistently produce worse results than the model, the model is working and overrides should be reduced. If overrides consistently outperform, the model needs recalibration. This feedback loop is essential for continuous improvement.

Implementation Challenges in the Indian Market

Data quality remains the primary obstacle. Indian commercial insurance data is often trapped in PDF proposal forms, handwritten surveyor reports, and unstructured claims files. Before a scoring model can function, this data must be digitised, cleaned, and standardised. A prerequisite investment that many Indian insurers have deferred.

Organisational resistance is the second challenge. Experienced underwriters may perceive data-driven models as a threat to their autonomy. Address this by positioning the model as a decision-support tool that augments rather than replaces underwriter judgement. Involve senior underwriters in model design and calibration to build ownership and trust.

Measuring Impact and Iterating

Track the impact of data-driven risk selection through clear metrics: shift in loss ratio by scored segment, hit ratio improvements (ratio of quoted risks that bind), average premium adequacy, and portfolio mix changes. Compare the performance of model-selected risks against manually selected risks over equivalent exposure periods.

Expect the first version to be imperfect. The value lies in having a structured, measurable framework that can be improved iteratively. Review model performance quarterly, update weights based on emerging loss trends, and expand data inputs as new sources become available. Within two to three underwriting cycles, a well-maintained model should demonstrably outperform purely intuition-based selection.

Data-Driven Risk Selection for Commercial Lines

The Shift from Intuition to Evidence

Data Sources Available to Indian Underwriters

Building a Risk Selection Scoring Model

From Scoring to Decision Rules

Implementation Challenges in the Indian Market

Measuring Impact and Iterating

About the Author

Tarun Kumar Singh

Frequently Asked Questions

Related Glossary Terms

Related Insurance Types

Related Industries

Related Articles