ORM Model Patterns¶
Purpose¶
Define SQLAlchemy ORM conventions for all Nextpoint service modules: naming, column types, composite keys for deduplication, soft deletes, timestamps, JSON columns, and the builder pattern for entity construction.
Model Catalog (documentloader)¶
18 models across three categories:
Core Entities¶
| Model | Table | Primary Key | Purpose |
|---|---|---|---|
| Exhibits | exhibits |
id (auto) |
Documents in the case |
| Attachments | attachments |
id (auto) |
File pages/versions of exhibits |
| Batches | batches |
id (auto) |
Import batch lifecycle |
| Npcases | npcases |
id (auto) |
Case (tenant) metadata |
| Labels | labels |
id (auto) |
Classification labels |
| Custodians | custodians |
id (auto) |
Document custodians |
| Tags | tags |
id (auto) |
Metadata tags |
| CustomFields | custom_fields |
id (auto) |
User-defined fields |
Relationship Tables¶
| Model | Table | Purpose |
|---|---|---|
| Taggings | taggings |
Exhibit ↔ Tag junction |
| CustodianExhibits | custodian_exhibits |
Custodian ↔ Exhibit junction |
| BatchSources | batch_sources |
Batch ↔ Source (polymorphic) |
| DocumentRelations | document_relations |
Exhibit parent/child relationships |
| DocumentRelationsInBatch | document_relations_in_batch |
Temp table for batch processing |
| DepositionVolumeExhibits | deposition_volume_exhibits |
Deposition ↔ Exhibit junction |
| ExhDesignations | exh_designations |
Exhibit ↔ Label with metadata |
Deduplication & Processing¶
| Model | Table | Composite PK | Purpose |
|---|---|---|---|
| ProcessCheckpoints | process_checkpoints |
(npcase_id, batch_id, nge_document_id) | Pipeline resume state |
| DocDedupe | doc_dedupe |
(npcase_id, message_id, bcc, md5, doc_type) | Content deduplication |
| TagDedupe | tag_dedupe |
(npcase_id, name, custom_field_id) | Tag deduplication (SHA256 name) |
Composite Primary Keys for Idempotency¶
Composite PKs serve as distributed locks and deduplication keys:
class ProcessCheckpoints(Base):
__tablename__ = "process_checkpoints"
# Composite PK — only one checkpoint per document per batch
npcase_id = Column(Integer, primary_key=True)
batch_id = Column(Integer, primary_key=True)
nge_document_id = Column(String(255), primary_key=True)
checkpoint_id = Column(Integer)
exhibit_ids = Column(String(2000)) # Comma-separated IDs
index_documents = Column(Boolean)
created_at_gmt = Column(DateTime)
class DocDedupe(Base):
__tablename__ = "doc_dedupe"
# 5-column composite PK for content deduplication
npcase_id = Column(Integer, primary_key=True, nullable=False)
message_id = Column(String(255), primary_key=True)
bcc = Column(String(2000), primary_key=True)
md5 = Column(String(255), primary_key=True)
doc_type = Column(String(255), primary_key=True, nullable=False)
created_at_gmt = Column(DateTime)
__table_args__ = (
UniqueConstraint('npcase_id', 'message_id', 'bcc', 'md5', 'doc_type',
name='_doc_dedupe_uc'),
)
class TagDedupe(Base):
__tablename__ = "tag_dedupe"
# Name column stores SHA256 hash of full tag content (255 char limit)
npcase_id = Column(Integer, primary_key=True)
name = Column(String(256), primary_key=True)
custom_field_id = Column(Integer, primary_key=True)
created_at_gmt = Column(DateTime)
Timestamp Conventions¶
Known inconsistency — the codebase has two timestamp naming patterns:
# Pattern 1: _gmt suffix (preferred for new code)
created_at_gmt = Column(DateTime)
updated_at_gmt = Column(DateTime)
deleted_at_gmt = Column(DateTime) # Soft delete
loader_status_updated_at_gmt = Column(DateTime)
# Pattern 2: Legacy Rails convention (existing tables)
created_on = Column(DateTime) # Exhibits
updated_on = Column(DateTime) # ExhDesignations, Attachments
created_at = Column(DateTime) # Batches (no _gmt suffix)
Rule for new models: Always use _gmt suffix to indicate UTC storage.
Soft Delete Pattern¶
Soft deletes use a nullable DateTime column. NULL = active, timestamp = deleted:
class Exhibits(Base):
# ...
delete_at_gmt = Column(DateTime) # NULL = active
class Npcases(Base):
# ...
deleted_at_gmt = Column(DateTime) # Note: different column name
class Batches(Base):
# ...
delete_at_gmt = Column(DateTime)
When querying: Always filter WHERE delete_at_gmt IS NULL unless
explicitly including deleted records.
JSON Columns¶
For flexible, schema-less metadata:
class Npcases(Base):
# ...
near_dupe_info = Column(JSON) # Near-duplicate configuration
bulk_redaction_info = Column(JSON) # Bulk redaction settings
Usage rules: - JSON columns for configuration/metadata that varies per case - NOT for data that needs to be queried or indexed - Always validate structure in Python before writing
Builder Pattern for Entity Construction¶
Map validated schema data to ORM model instances:
# core/models/db_models.py
def build_exhibit(exhibit_data: ExhibitData) -> Exhibits:
"""Construct an Exhibits instance from validated schema data."""
return Exhibits(
prefix=exhibit_data.prefix,
number=exhibit_data.number,
npcase_id=exhibit_data.npcase_id,
user_id=exhibit_data.user_id,
batch_id=exhibit_data.batch_id,
notes=exhibit_data.notes,
author=exhibit_data.author,
shortcut=exhibit_data.shortcut,
document_date=exhibit_data.document_date,
document_type=exhibit_data.document_type,
# ... all fields mapped
verified_page_count=None, # Set during processing, not creation
created_on=exhibit_data.created_on,
updated_at_gmt=exhibit_data.updated_at_gmt,
)
Why a builder? The ExhibitData schema has fields that don't map 1:1 to
the ORM model (e.g., file_name, display_name are used for processing
but not stored in the exhibits table). The builder function is the single
place that handles this mapping.
Validation Schemas¶
Input data validated with dataclasses before ORM construction:
# core/models/schemas.py
@dataclass
class ExhibitData:
"""Validated input for exhibit creation."""
npcase_id: int
prefix: Optional[str]
number: Optional[str]
user_id: int
batch_id: int
notes: Optional[str]
author: Optional[str]
# ... 40+ fields covering:
# - Document metadata (title, date, type)
# - Email metadata (message_id, author, sent/received dates)
# - Office metadata (last_author, app_name, created/modified dates)
# - Processing fields (family_id, nge_document_id, content_hash)
# - File info (extension, name, display_name)
@dataclass
class ProcessCheckpointParams:
"""Parameters for checkpoint create/update."""
npcase_id: int
batch_id: int
nge_document_id: str
created_at_gmt: Optional[datetime] = None
checkpoint_id: Optional[int] = None
exhibit_ids: Optional[str] = None
index_documents: Optional[bool] = None
class ExhibitCreateResponse(TypedDict):
"""Return type from exhibit creation."""
update_es: bool
exhibit_ids: list[int]
Index Conventions¶
Explicit indexes for query performance:
class DocumentRelations(Base):
__tablename__ = "document_relations"
# ...
__table_args__ = (
Index('index_document_relations_on_from_id_and_relation', 'from_id', 'relation'),
Index('index_document_relations_on_to_id_and_relation', 'to_id', 'relation'),
)
class DocumentRelationsInBatch(Base):
__tablename__ = "document_relations_in_batch"
# ...
__table_args__ = (
Index('index_document_relations_old_pk',
'npcase_id', 'batch_id', 'child_document_id', 'parent_document_id',
unique=True),
)
class CustodianExhibits(Base):
__tablename__ = "custodian_exhibits"
# ...
__table_args__ = (
Index('index_custodian_exhibits_on_custodian_id', 'custodian_id'),
Index('index_custodian_exhibits_on_exhibit_id', 'exhibit_id'),
Index('index_custodian_exhibits_on_user_id', 'user_id'),
)
Polymorphic Relationships¶
BatchSources uses a type column for polymorphic associations:
class BatchSources(Base):
__tablename__ = "batch_sources"
id = Column(Integer, primary_key=True, autoincrement=True)
batch_id = Column(Integer)
item_id = Column(Integer) # FK to different tables
item_type = Column(String(255)) # "Exhibit", "Attachment", etc.
Key Rules¶
- Composite PKs for deduplication — use database constraints, not application logic
_gmtsuffix on new timestamps — indicates UTC, distinguishes from legacy columns- Soft delete = nullable DateTime — never use boolean flags for soft deletes
- Builder functions for complex entities — keep mapping logic in one place
- Dataclass schemas for validation — validate before constructing ORM instances
- Explicit indexes — don't rely on ORM auto-indexing, declare what you need
- JSON columns for config only — not for queryable or indexable data
- Comma-separated IDs are technical debt — use JSON arrays or junction tables for new models
UniqueConstraint+ composite PK — belt and suspenders for deduplication tables
Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.