Reference Implementation: shared_libs (Legacy)¶
Overview¶
The shared_libs repo is a flat collection of 77+ Ruby modules providing the
foundational infrastructure used by both the Rails monolith and the workers repo.
It is not a gem — files are loaded via require_relative or symlinked into
consuming repos as lib/shared/.
This library provides: the Nextpoint API client (HMAC-SHA1 authenticated), S3 operations with local caching, AWS credential management, document format parsers (DAT, LEF, CMS, MBOX), file conversion tools (Ghostscript, TIFF), connection pooling, token encryption, email sending, and Nutrient (PSPDFKit) integration.
Architecture¶
shared_libs/
├── Gemfile # aws-sdk-s3, charlock_holmes, gd2-ffij, nokogiri, oj
├── nextpoint.rb # Core module — environment detection, config access
├── nextpoint_api.rb # HMAC-SHA1 API client (1127 lines, XML-based RPC)
├── nextpoint_s3.rb # S3 operations (933 lines, caching, multipart, dedup)
├── nextpoint_s3_client.rb # Low-level S3 client wrapper (list, copy, download)
├── nextpoint_s3_deleter.rb # S3 deletion wrapper
├── aws_multipart_data_upload.rb # Multipart upload from StringIO (in-memory data)
├── nextpoint_aws_credentials.rb # AWS credential strategy (dev/test/prod)
├── nextpoint_sqs.rb # SQS message sending (small/large queue routing)
├── nextpoint_sqs_client.rb # Aws::SQS::Client wrapper
├── nextpoint_ecs.rb # ECS service scaling
├── nextpoint_ecs_client.rb # Aws::ECS::Client wrapper
├── nextpoint_ssm.rb # SSM Parameter Store access
├── nextpoint_sm_client.rb # Secrets Manager client
├── nextpoint_nutrient.rb # Nutrient (PSPDFKit) PDF rendering (25KB)
├── nextpoint_emailer.rb # SES/SMTP email with dedup and pirate mode
├── nextpoint_zendesk.rb # Zendesk ticket creation
├── resource_pool.rb # Thread-safe connection pooling
├── locksmith.rb # AES-128-CBC token encryption for URLs
├── hash_compactor.rb # Hash → encrypted token serialization
├── global_npcase_id_handler.rb # Per-EC2-instance case ID file locking
├── trapped_shell.rb # Safe forked process execution with timeout/memory
├── fasthttp.rb # Ruby Net::HTTP performance patches
├── load_file.rb # Litigation load file parser (CSV/DAT)
├── csv_parser.rb # CSV encoding detection (CharlockHolmes)
├── dat.rb # Concordance DAT format parser
├── lef_converter.rb # LiveNote LEF archive extraction
├── cms_converter.rb # Microsoft JET/Access CMS transcript extraction
├── livenote_parser.rb # LiveNote PTF binary format parser
├── ghostscript_converter.rb # PDF → TIFF/PNG via Ghostscript
├── tiff_converter.rb # TIFF manipulation (split, convert, compress)
├── exhibit_image_helper.rb # Image file management and format routing (471 lines)
├── gd2_extras.rb # GD2 image library extensions (quantization, palettes)
├── png_info.rb # PNG metadata extraction
├── utf8_encoder.rb # UTF-8 conversion with CharlockHolmes detection
├── expansive_hash_calculator.rb # MD5 hash calculation for deduplication
├── huffman_decoder.rb # Generic Huffman code decoding
├── denist.rb # NIST NSRL hash database lookup (de-NISTing)
├── thread_safe_singleton.rb # Mutex-protected singleton mixin
├── server_beacon.rb # Server identity tracking
├── pirate.rb # Pirate-speak for staging email subjects
├── roman_numerals.rb # Roman numeral conversion
├── extensioned_tempfile.rb # Tempfile with custom extensions
├── shared_constants.rb # Production template enums
├── production_custom_delimiters.rb # Load file delimiter config
├── preserve_logs_on_s3.rb # Log rotation to S3
├── dump_debug_info_on_alarm.rb # Debug info on SIGALRM
├── email/
│ └── mbox.rb # MBOX format parser (Enumerable)
├── disk/
│ └── image.rb # Disk image mounting (raw, EWF forensic)
└── test/
└── unit/ # 20 Minitest files
Pattern Mapping¶
| Pattern | shared_libs Implementation | NGE Equivalent |
|---|---|---|
| API authentication | HMAC-SHA1 signing (Date + method + path → Base64 digest) |
Direct DB access; HMAC-SHA1 for documentpageservice API |
| API protocol | XML-based RPC via Net::HTTP with XmlSimple parsing |
N/A — NGE uses direct database access |
| Connection pooling | ResourcePool — mutex-based, configurable limit, block-based release |
SQLAlchemy session pools, Lambda connection reuse |
| S3 operations | NextPointS3 — local file cache (MD5 filenames), ETag freshness, mkdir locking |
shell/utils/s3_ops.py with boto3 |
| S3 path convention | /bucket/case_{npcase_id}/{model_type}/{unique_id}/{filename} |
Same convention preserved in NGE |
| S3 upload | Server-side AES256 encryption, 5 retries with 5*N backoff, multipart for StringIO | boto3 upload with SSE |
| S3 deletion prevention | delete raises error — must use PendingDelete table |
N/A — S3 lifecycle policies |
| AWS credentials | Dev: ~/.aws/credentials; Test: YAML; Prod: instance profile |
Lambda execution role; ECS task role |
| Multi-tenant case ID | global_npcase_id_handler.rb — file-based per-EC2 case locking |
Per-case DB schema: {base}_case_{case_id} |
| Token encryption | Locksmith — AES-128-CBC with URL-safe Base64 encoding |
JWT tokens for Nutrient; Secrets Manager for keys |
| Process execution | TrappedShell — fork/exec with signal trapping, timeout, memory limit |
Lambda timeout; ECS resource limits |
| Encoding detection | CharlockHolmes (ICU-based) with UTF-8 fallback |
Same approach in documentloader |
| Document format parsing | DAT (Concordance), LEF (LiveNote), CMS (Access DB), MBOX | documentloader handles these via Hyland Filters |
| File conversion | Ghostscript (PDF→image), TIFF tools, GD2/ImageMagick | Hyland Filters, Apache PDFBox, Nutrient |
| Email sending | SES SMTP via MailFactory, dedup (10-min window), pirate mode |
SNS notifications; no direct email in NGE |
| Content hashing | MD5 for deduplication (expansive_hash_calculator.rb) |
SHA256 in documentloader |
| NIST deduplication | denist.rb — SQLite3 NSRL hash lookup from S3 |
Part of documentloader dedup pipeline |
| Structured logging | $nextpoint_global_logger with environment prefixes |
JSON structured logging in CloudWatch |
| Configuration | YAML files per environment (nextpoint_global.yml, etc.) |
Environment variables + Secrets Manager |
Key Components¶
NextPointAPI Client (nextpoint_api.rb)¶
The central API client used by workers to communicate with the Rails app. 1127 lines.
Authentication: HMAC-SHA1 signing
- Signs: "#{date}#{method}#{path}" with shared secret key
- Header: API-Authorization: #{Base64.encode64(HMAC-SHA1(key, string))}
- Also supports AES-128-CBC encryption/decryption for sensitive payload data
Protocol: XML-based RPC
- All requests: POST with XML body via XmlSimple.xml_out()
- All responses: XML parsed via XmlSimple.xml_in()
- Response wrapped in NextPointAPI::Record — dynamic attribute access via method_missing
- Type casting: XML types (integer, boolean, datetime, yaml, binary) → Ruby types
Connection management: ResourcePool with mutex synchronization
- Retry: 5 attempts with exponential backoff for 502/503/504 responses
- Max response time: 180 seconds
- Background pinger thread for keepalive
Domain methods:
- Worker lifecycle: register_as_worker, shutdown_worker, ping
- Jobs: get_next_job, create_job, update_job, buffer_box_work_request
- Documents: create_exhibit, update_exhibit, create_attachment, update_attachment
- Batches: create_batch, update_batch, add_batch_part
- Search: OCR text limit of 8MB (MAX_ALLOWED_SEARCH_TEXT)
NextPointS3 (nextpoint_s3.rb)¶
The S3 operations layer. 933 lines.
S3 path convention (s3_path_for(info) method):
Detailed model type routing:
- attachment → /exhibits/{exhibit_id}/ or /wire/document_{id}/ or /zips/
- native_placeholder → /exhibits/{id}/native-placeholder-{uid}/
- deposition → /depositions/{uid}/
- transcript → /transcripts/{uid}/
- document/wire → /wire/document_{id}/{uid}/
- export, video, docshare, batch, ai_assistant, ai_summary
Filename sanitization: strips non-alphanumeric chars, prefixes reserved names
(current, source, preview, presentation) with original_
Local caching: Downloads cached in /tmp/np_caches/np_s3_cache/ using MD5-based
filenames. Cross-process safety via mkdir-based locking (atomic on POSIX).
ETag-based freshness checks.
Upload patterns:
- Server-side AES256 encryption on all uploads
- 5 retries with 5 * attempt second backoff
- Multipart upload for large files (automatic via SDK)
- aws_multipart_data_upload.rb extends SDK for StringIO (in-memory data from ZIP entries)
Deletion prevention: delete and delete_objects methods raise errors. All
deletions must go through the PendingDelete database table — a safety mechanism
to prevent accidental data loss.
IAM user management: Creates/deletes per-case IAM users with scoped S3 policies for direct case folder access.
Configuration Pattern¶
Three-tier configuration hierarchy:
1. nextpoint.rb — Core module: Nextpoint.config(), Nextpoint.domain,
Nextpoint.deployment_id, Nextpoint.deployment_name
2. nextpoint_shared_globals.rb — Loads nextpoint_shared_global.yml per environment
3. YAML config files — nextpoint_api.yml, nextpoint_s3.yml, nextpoint_mail.yml
(per-environment settings: host, port, credentials, buckets)
AWS region → deployment mapping (in nextpoint_ssm.rb):
- us-east-1 → c2
- us-west-1 → c4
- ca-central-1 → c5
Process Execution (trapped_shell.rb)¶
Safe external process execution for document conversion tools:
- Fork/exec with signal trapping (TERM/INT ignored in child)
- Configurable timeout enforcement
- Memory limit enforcement (checks RSS via
ps) - Non-blocking output reading
- Status callbacks for progress reporting
Used by workers for LibreOffice, Ghostscript, FFmpeg, Tesseract, etc.
Document Format Parsers¶
eDiscovery-specific format support:
| Format | File | Purpose |
|---|---|---|
| Concordance DAT | dat.rb |
Field separator \x14, quote char þ (thorn) → standard CSV |
| LiveNote LEF | lef_converter.rb |
Password-protected ZIP (livenote), extracts PTF/TXT/VID/XML |
| LiveNote PTF | livenote_parser.rb |
Binary format parsing (blocks, values, QuickMarks) |
| CMS (Access DB) | cms_converter.rb |
JET database via extract_cms_transcript binary |
| MBOX | email/mbox.rb |
Standard MBOX with From line splitting |
| CSV | csv_parser.rb |
Encoding detection via CharlockHolmes |
| Load files | load_file.rb |
Unified CSV/DAT parser with field mapping |
Integration Points with NGE¶
S3 Path Convention (Shared)¶
Both Legacy and NGE use the same S3 path structure. This is a critical integration point — NGE modules read/write files at paths that Legacy code also accesses:
s3://{bucket}/case_{npcase_id}/attachment/{unique_id}/{filename}
s3://{bucket}/case_{npcase_id}/export/{unique_id}/{filename}
HMAC-SHA1 Authentication (Shared)¶
The nextpoint_api.rb HMAC-SHA1 pattern is also used by NGE's documentpageservice
when calling back to the Rails API. The signing algorithm is:
Base64(HMAC-SHA1(secret_key, date + method + path))
AWS Credential Strategy¶
Legacy uses instance profiles on EC2. NGE uses Lambda execution roles and ECS task roles. Both rely on IAM for S3/SQS/SNS access — no hardcoded credentials.
Patterns to Preserve vs Deprecate¶
Preserve¶
- S3 path convention — consistent across Legacy and NGE
- HMAC-SHA1 auth — still used by documentpageservice API calls
- Content hashing for dedup — MD5 in Legacy, SHA256 in NGE (same concept)
- Encoding detection — CharlockHolmes approach carried forward
- Connection pooling (
ResourcePool) — pattern is sound, implementation differs
Deprecate¶
- XML-based API protocol — replaced by direct DB access in NGE
- YAML configuration files — replaced by env vars + Secrets Manager
- File-based case locking — replaced by per-case DB schema naming
- Local file caching with mkdir locks — Lambda has ephemeral
/tmp; ECS uses EFS PendingDeletetable for S3 — NGE uses S3 lifecycle policies- IAM user per case — legacy pattern for direct S3 access; NGE uses presigned URLs
- GD2 image library — replaced by Nutrient (PSPDFKit)
- Ruby 1.8 compatibility patches (
fasthttp.rb) — dead code
Key File Locations¶
| File | Purpose |
|---|---|
nextpoint_api.rb |
HMAC-SHA1 API client (1127 lines) |
nextpoint_s3.rb |
S3 operations with caching (933 lines) |
nextpoint_aws_credentials.rb |
AWS credential strategy |
resource_pool.rb |
Thread-safe connection pooling |
global_npcase_id_handler.rb |
Per-EC2 case ID locking |
locksmith.rb |
AES-128-CBC token encryption |
trapped_shell.rb |
Safe process execution |
exhibit_image_helper.rb |
Image format routing (471 lines) |
nextpoint_nutrient.rb |
Nutrient/PSPDFKit integration (25KB) |
load_file.rb |
Load file parser |
dat.rb |
Concordance DAT format |
lef_converter.rb |
LiveNote LEF extraction |
Ask questions about Nextpoint architecture, patterns, rules, or any module. Powered by Claude Opus 4.6.