Skip to content

PRD: Entity Triage & Multi-Entity Validation

Introduction

In a multi-entity business environment, maintaining clean customer data is critical. The Entity Triage & Multi-Entity Validation module ensures that every incoming lead or enquiry is cross-referenced against existing masters across all legal entities using official identifiers (GST/PAN/CIN). This prevents duplicate efforts and helps maintain a single global view of a customer's relationship with the group.

Goals

  • Identity Verification: Auto-verify the validity of GST numbers for all new entries.
  • Global Deduplication: Detect existing records across all 3 legal entities to prevent duplicate creation.
  • Conflict Resolution: Prompt users to decide between merging with an existing account or creating a new group-linked entity when a match is found in another entity's database.
  • Data Integrity: Implement moderate fuzzy matching (80%+) to catch variations in company names during the ingestion phase.

Use Case Mapping

This PRD provides the implementation blueprint for the following functional specifications: - MST-002: Online Verification (GST/MCA) - Automatic identity validation. - PRE-004: Raw-to-Recheck Mapping (De-dupe) - Cross-entity deduplication and fuzzy matching.

User Stories

US-001: Automatic GST Verification (Read-Only)

Description: As a system, I want to verify a company's GST number during lead ingestion so that I can flag invalid data for the screening team.

Acceptance Criteria:

  • System extracts 15-character GSTIN using regex from email body or dedicated "GST" field.
  • System calls POST /api/verify/gst with the extracted identifier.
  • UI displays badge on Kanban card: Green = "GST: Verified", Yellow = "Name Mismatch", Red = "Invalid".
  • Verification does NOT stop ingestion; it only adds the badge.
  • Verification: Pass a mock GST to the ingestion API and verify the badge color in the browser.

US-002: Cross-Entity Match Detection (Global Dedupe)

Description: As a user, I want to be alerted if a lead in Entity A already exists in Entity B.

Acceptance Criteria:

  • For every new inquiry, system queries customer_masters across all schemas using GSTIN.
  • If GSTIN match found in a DIFFERENT schema, set metadata.duplicate_found = true, metadata.duplicate_entity, and metadata.duplicate_record_id.
  • Verification: Create a customer in Entity B, then create same GST in Entity A. Ensure metadata is correct.

US-003: Multi-Entity Link/Merge Workflow

Description: As a screening user, I want a clear choice when a cross-entity duplicate is found.

Acceptance Criteria:

  • If metadata.duplicate_found is true, card detail view shows "Duplicate Match Found" alert.
  • Alert box has two buttons: [Link to Group] maps to existing global ID; [Create Independent] forces new local record with warning.
  • Clicking [Link to Group] populates group_parent_id in the new record.
  • Verification: Click "Link to Group" and verify group_parent_id matches existing record in DB.

US-004: Fuzzy Name Matching (80% Confidence)

Description: As a system, I want to catch duplicate company names using fuzzy logic when GST is missing.

Acceptance Criteria:

  • If no GST found, system calculates Jaro-Winkler distance between new_name and existing_names.
  • If distance > 0.8, set metadata.potential_duplicate = true and display "Similar to: [Name] (X%)" in UI.
  • Verification: Create "Pebble Pvt Ltd" when "Pebble Private Limited" exists. Ensure warning appears.

Functional Requirements

  • FR-1: The GST Number MUST be treated as the primary "Source of Truth" for deduplication.
  • FR-2: Verification results must be stored in the customer_master or preleads metadata.
  • FR-3: The system MUST support a "Group Account" hierarchy where multiple entities can be linked under one parent.
  • FR-4: Fuzzy matching results must be generated within <1 second during the ingestion phase.

Non-Goals

  • No automated blocking of lead creation based on verification (Read-only flags only for POC).
  • No direct integration with government portals (GSTN/MCA) for the POC; use API mocks or reliable sandbox services.
  • No automated merging of data without user confirmation.

Technical Considerations

  • Primary Key for Dedupe: GST Number.
  • Fuzzy Threshold: 80%.
  • Verification Service: Sandbox/Mock API for POC phase.
  • Workflow: Non-blocking alerts.

Success Metrics

  • 100% of GST-provided leads undergo auto-check.
  • Zero "Silent" duplicates created across multi-entity databases when GST is present.
  • 90% accuracy in detecting variations of company names via fuzzy logic.

Open Questions

  • Should we also verify the "Status" of the GST (Active/Cancelled)?
  • How do we handle companies that do not provide a GST? (Fallback to PAN/Domain).