PRD: Entity Triage & Multi-Entity Validation¶
Introduction¶
In a multi-entity business environment, maintaining clean customer data is critical. The Entity Triage & Multi-Entity Validation module ensures that every incoming lead or enquiry is cross-referenced against existing masters across all legal entities using official identifiers (GST/PAN/CIN). This prevents duplicate efforts and helps maintain a single global view of a customer's relationship with the group.
Goals¶
- Identity Verification: Auto-verify the validity of GST numbers for all new entries.
- Global Deduplication: Detect existing records across all 3 legal entities to prevent duplicate creation.
- Conflict Resolution: Prompt users to decide between merging with an existing account or creating a new group-linked entity when a match is found in another entity's database.
- Data Integrity: Implement moderate fuzzy matching (80%+) to catch variations in company names during the ingestion phase.
Use Case Mapping¶
This PRD provides the implementation blueprint for the following functional specifications: - MST-002: Online Verification (GST/MCA) - Automatic identity validation. - PRE-004: Raw-to-Recheck Mapping (De-dupe) - Cross-entity deduplication and fuzzy matching.
User Stories¶
US-001: Automatic GST Verification (Read-Only)¶
Description: As a system, I want to verify a company's GST number during lead ingestion so that I can flag invalid data for the screening team.
Acceptance Criteria:
- System extracts 15-character GSTIN using regex from email body or dedicated "GST" field.
- System calls
POST /api/verify/gstwith the extracted identifier. - UI displays badge on Kanban card: Green = "GST: Verified", Yellow = "Name Mismatch", Red = "Invalid".
- Verification does NOT stop ingestion; it only adds the badge.
- Verification: Pass a mock GST to the ingestion API and verify the badge color in the browser.
US-002: Cross-Entity Match Detection (Global Dedupe)¶
Description: As a user, I want to be alerted if a lead in Entity A already exists in Entity B.
Acceptance Criteria:
- For every new inquiry, system queries
customer_mastersacross all schemas using GSTIN. - If GSTIN match found in a DIFFERENT schema, set
metadata.duplicate_found = true,metadata.duplicate_entity, andmetadata.duplicate_record_id. - Verification: Create a customer in Entity B, then create same GST in Entity A. Ensure metadata is correct.
US-003: Multi-Entity Link/Merge Workflow¶
Description: As a screening user, I want a clear choice when a cross-entity duplicate is found.
Acceptance Criteria:
- If
metadata.duplicate_foundis true, card detail view shows "Duplicate Match Found" alert. - Alert box has two buttons: [Link to Group] maps to existing global ID; [Create Independent] forces new local record with warning.
- Clicking [Link to Group] populates
group_parent_idin the new record. - Verification: Click "Link to Group" and verify
group_parent_idmatches existing record in DB.
US-004: Fuzzy Name Matching (80% Confidence)¶
Description: As a system, I want to catch duplicate company names using fuzzy logic when GST is missing.
Acceptance Criteria:
- If no GST found, system calculates Jaro-Winkler distance between
new_nameandexisting_names. - If distance > 0.8, set
metadata.potential_duplicate = trueand display "Similar to: [Name] (X%)" in UI. - Verification: Create "Pebble Pvt Ltd" when "Pebble Private Limited" exists. Ensure warning appears.
Functional Requirements¶
- FR-1: The GST Number MUST be treated as the primary "Source of Truth" for deduplication.
- FR-2: Verification results must be stored in the
customer_masterorpreleadsmetadata. - FR-3: The system MUST support a "Group Account" hierarchy where multiple entities can be linked under one parent.
- FR-4: Fuzzy matching results must be generated within <1 second during the ingestion phase.
Non-Goals¶
- No automated blocking of lead creation based on verification (Read-only flags only for POC).
- No direct integration with government portals (GSTN/MCA) for the POC; use API mocks or reliable sandbox services.
- No automated merging of data without user confirmation.
Technical Considerations¶
- Primary Key for Dedupe: GST Number.
- Fuzzy Threshold: 80%.
- Verification Service: Sandbox/Mock API for POC phase.
- Workflow: Non-blocking alerts.
Success Metrics¶
- 100% of GST-provided leads undergo auto-check.
- Zero "Silent" duplicates created across multi-entity databases when GST is present.
- 90% accuracy in detecting variations of company names via fuzzy logic.
Open Questions¶
- Should we also verify the "Status" of the GST (Active/Cancelled)?
- How do we handle companies that do not provide a GST? (Fallback to PAN/Domain).