Wiki › Tier 1

System Design & Databases

190 bookmarks · Last synthesized Jun 29, 2026

System Design & Databases

This page serves as a reference for core system design methodologies, database administration practices, and database performance principles derived from industry-standard engineering practices.

1. System Design & Architectural Decision-Making

Designing scalable software systems requires structured decision-making, quantitative analysis, and rigorous documentation. The problem should always drive the technology, not the other way around.

Problem-First Approach: System design begins with a deep understanding of the problem and its requirements, not by immediately selecting a technology stack.
Quantitative Design Selection: It is highly recommended to use back-of-the-envelope calculations to evaluate and choose the best system design for a given set of requirements. This involves estimating traffic patterns and resource needs.
Documenting Decisions: Architects and developers should always document the rationale behind database choices to ensure long-term maintainability and alignment across teams.
System Design Resources: Specialized architectures, such as online gaming protocols, serve as valuable system design refreshers for understanding low-latency, real-time data flow.
Architectural Mindset: Top-tier software architects act as amplifiers, making teams smarter rather than being oracles. They focus on making complex decisions understandable and lowering overall project risk.

Deep Dive: The 7-Step System Design Framework

A repeatable framework helps approach any system design problem systematically.

Understanding the Problem: Clearly define the core functionality. For example, a URL shortener takes a long URL, generates a short URL, stores the mapping, and redirects users when the short URL is accessed.
Identifying Requirements: Differentiate between functional requirements (what the system does) and non-functional requirements (performance, scalability, availability, consistency, etc.). Avoid silent assumptions; ask clarifying questions.
Predicting Traffic Patterns: Estimate request rates (reads vs. writes), data storage needs, and potential peaks to inform system capacity planning.
Designing High-Level Components: Break down the system into logical components (e.g., API servers, databases, caching layers, load balancers).
Defining APIs: Specify the contracts between components, outlining request and response formats.
Selecting a Tech Stack: Choose technologies that best meet the identified requirements and constraints. Consider trade-offs between performance, cost, and complexity.
Considering Implementation Constraints: Account for limitations like latency, availability, and specific technical challenges during development.

Latency Numbers Everyone Should Know

L1 cache reference: 0.5 ns
Branch mispredict: 5 ns
L2 cache reference: 7 ns
Mutex lock/unlock: 100 ns
Main memory reference: 100 ns
Compress 1K bytes with Zippy: 10,000 ns (10 µs)
Send 2K bytes over 1 Gbps network: 20,000 ns (20 µs)
Read 1 MB sequentially from memory: 250,000 ns (250 µs)
Round trip within same datacenter: 500,000 ns (500 µs / 0.5 ms)
Disk seek: 10,000,000 ns (10 ms)
Read 1 MB sequentially from network: 10,000,000 ns (10 ms)
Read 1 MB sequentially from disk: 30,000,000 ns (30 ms)
Send packet CA -> Netherlands -> CA: 150,000,000 ns (150 ms)

Key Architectural Principles:

Leverage Compression: Using a cheap compression algorithm (like Zippy) to reduce data size before transmission saves valuable network bandwidth (often by a factor of 2) with minimal CPU overhead.
Design for Write Scaling: Writes are roughly 40 times more expensive than reads. Systems should be designed to optimize for low write contention and scale "wide" by parallelizing writes as much as possible.
Avoid Global Shared Data: Lock contention in heavily-written shared objects kills distributed system performance as transactions serialize.
Understand Trade-offs: Every design decision involves balancing performance (latency), cost, and complexity.

The Architect's Role: Risk Management and Communication

Top-tier architects are distinguished by their ability to manage risk and communicate effectively across technical and non-technical stakeholders.

Risk Mitigation: A primary value proposition of an architect is identifying and mitigating technical risks early in the design process.
Simplifying Complexity: Architects should strive for simple, understandable architectures. Over-engineering often leads to maintainability issues.
Visualization Techniques: Tools like the "Phantom Sketch Artist" method can be employed to visualize and clarify unclear requirements and system behaviors.
Bridging the Gap: The "Architect Elevator" metaphor highlights the need to connect low-level technical details with high-level business strategy.
Technical Disagreement Resolution: Architects facilitate constructive technical discussions, aiming for consensus rather than imposing solutions.
Maintaining Technical Acumen: While moving into architecture, it's crucial to stay technically grounded. This can involve deep dives, code reviews, and hands-on problem-solving.
Amplifying the Team: Great architects focus on making their teams smarter by fostering understanding and clarity, rather than acting as solitary decision-makers.

Case Study: Scaling Data at Meta

Managing data at massive scale involves specific architectural patterns:
* Infrastructure Components: Data storage and processing at scale rely heavily on distributed systems and data warehouses.
* Schematization: Meta historically leveraged bespoke schematization and practices to understand and manage data across its vast ecosystem.

2. Database Fundamentals & Administration

Understanding where data resides and how it is managed is crucial for both software development and systems administration.

Relational Databases: Relational databases are fundamentally designed to store and retrieve data efficiently.
Importing & Exporting: Exporting and importing databases are common tasks shared by software developers and system administrators alike.
User Credentials & Permissions (MySQL/MariaDB):
- User credentials and permissions are stored within a dedicated system database named mysql.
- These credentials and permissions are not stored within individual application databases, making it critical to manage the system database separately during migrations.

Resource Management & Data Visibility

When resources are not appearing as expected, the most common culprit is often an audience-based filtering mechanism.

filter_by_user Scope: Many applications implement scopes that filter resources based on the current user's role and associated audiences.
- Provider users: Typically see resources with audiences all_audiences or providers.
- SPC users: Typically see resources with audiences all_audiences or states.
- Admin/SAMHSA/TA users: May see resources with all_audiences, providers, or states.

Diagnosis for Missing Resources:

Check Counts: Compare the total number of resources (Resource.count) with the number of resources a specific user can see (Resource.filter_by_user(user.id).count). Discrepancies indicate filtering is occurring.
Inspect Audiences: List all resources with their audiences (Resource.all.map { |r| [r.id, r.title, r.audience] }) and compare this to the audiences visible to a specific user.
Direct Database Query: Use SQL to inspect the resources table and verify the presence of resources and their audience settings:
```sql
-- Check all resources in database
SELECT id, title, audience, resourcetype, createdat, updatedat
FROM resources
ORDER BY createdat DESC;

-- Check if resources have files or URLs
SELECT id, title,
CASE WHEN url IS NOT NULL AND url != '' THEN 'hasurl' ELSE 'nourl' END as urlstatus,
(SELECT COUNT(*) FROM activestorageattachments
WHERE recordtype = 'Resource' AND recordid = resources.id) as filecount
FROM resources;
```

Resource Validation and Integrity

Beyond audience filtering, other factors can prevent resources from being saved or displayed correctly.

Model Validations: The Resource model often enforces strict validation rules:
- Must have either a file OR a url (but not both or neither).
- URLs must conform to a valid format.
- A title must be present.
- description has a maximum character limit (e.g., 390 characters).
URL Validation Issues: Custom or updated URL format validations might inadvertently reject valid URLs due to edge cases.
ActiveStorage File Issues: If an associated file in ActiveStorage is purged or deleted, and the resource lacks a valid URL, it may become invalid.
Database Constraints: Underlying database-level constraints could also be a source of validation failures.

Diagnosis for Validation Failures:

Inspect Model Errors: Iterate through all resources in the Rails console to check for validation errors: ruby # In Rails console Resource.all.each do |r| unless r.valid? puts "Resource #{r.id} (#{r.title}) is invalid: #{r.errors.full_messages}" end end
Database Check for URLs/Files: Query the database to confirm that resources have either a url or an associated file in active_storage_attachments.

Resource Deletion/Missing Diagnosis

When resources are not appearing or seem to have been deleted unexpectedly, the diagnosis often involves checking filtering mechanisms, validation rules, and direct database integrity.

Most Likely Cause: `filter_by_user` Scope

The resources index commonly uses Resource.filter_by_user(current_user.id). This scope filters resources based on the user's role and associated audiences.

The filter_by_user scope in app/models/resource.rb defines the following audience mappings:

Provider users: See resources with audience all_audiences or providers.
SPC users: See resources with audience all_audiences or states.
Admin/SAMHSA/TA users: See resources with audience all_audiences, providers, or states.

If a resource's audience attribute does not match the criteria for the current user's role, it will not appear in the list, even if it still exists in the database.

Other Potential Causes for Missing or Unsaved Resources

Validation Failures: The Resource model enforces strict validations:
- A resource must have either a file OR a url (not both, not neither).
- URLs must conform to a valid format.
- A title is mandatory.
- description has a maximum character limit (e.g., 390 characters). Resources failing these validations might not save properly or could become invalid after an update.
URL Validation Issues: Custom or recently updated URL format validations could inadvertently reject valid URLs if edge cases are not properly handled.
ActiveStorage File Issues: If a resource's attached file is purged or deleted from ActiveStorage, and the resource lacks a valid URL, it may become invalid and unrecoverable.
Database Constraints: Underlying database-level constraints could also prevent resource creation or updates.

Diagnostic Steps for Missing or Invalid Resources

Check if resources exist but are filtered out:
- In Rails console:
```ruby

Check total number of resources

Resource.count

List all resources with their IDs, titles, and audiences

Resource.all.map { |r| [r.id, r.title, r.audience] }

List resources visible to a specific user

user = User.find(YOURUSERID) # Replace YOURUSERID with the actual user ID
Resource.filterbyuser(user.id).map { |r| [r.id, r.title, r.audience] }

Compare the counts and lists. Discrepancies indicate filtering is occurring.
Check for validation errors:
- In Rails console: ruby # Find resources that might be invalid Resource.all.each do |r| unless r.valid? puts "Resource #{r.id} (#{r.title}) is invalid: #{r.errors.full_messages}" end end
Check database directly:
- SQL Query:
```sql
-- Check all resources in the database, including their audience
SELECT id, title, audience, resourcetype, createdat, updatedat
FROM resources
ORDER BY createdat DESC;

-- Check if resources have associated files or URLs
SELECT id, title,
CASE WHEN url IS NOT NULL AND url != '' THEN 'hasurl' ELSE 'nourl' END as urlstatus,
(SELECT COUNT(*) FROM activestorageattachments
WHERE recordtype = 'Resource' AND recordid = resources.id) as filecount
FROM resources;
```

3. Database Performance & Consistency

Different database engines come with varying capabilities and performance characteristics that developers must navigate.

Consistency Guarantees: Database consistency guarantees vary significantly between different products. Developers must not assume uniform consistency models across different relational or non-relational databases.
Sort Order Optimization: Databases perform significantly better when data is stored and accessed using matching sort orders. (For example, matching indexes to query sort orders is a key optimization pattern in systems like PostgreSQL).

Case Study: Horizontal Scaling at YouTube (MySQL & Vitess)

When scaling relational databases to support billions of users, physical hardware limitations, connection overhead, and monolithic table structures inevitably become critical bottlenecks. YouTube resolved these challenges at scale by creating Vitess, an open-source database clustering middleware system for MySQL.

The Problem with Native MySQL at Scale:
- Connection Limits: MySQL uses a thread-per-connection model. A massive influx of concurrent application connections rapidly exhausts memory and degrades performance.
- Sharding Complexity: Natively, MySQL does not support transparent horizontal sharding, forcing developers to write complex application logic to route queries to the correct database shards.
Key Architectural Components of Vitess:
- VTGate (SQL Proxy): A lightweight proxy server that routes application queries to the appropriate database shards. It parses SQL queries, orchestrates distributed transactions, and ensures the application layer remains agnostic of the underlying sharding topology.
- VTTablet (Connection Manager): Runs alongside each MySQL instance to manage performance. It acts as an aggressive connection pooler, dynamically managing active connections to prevent MySQL from running out of resources.
- Transparent Sharding: Enables seamless splitting (and re-sharding) of MySQL tables across multiple physical instances without requiring modifications to the core application code.

System Design & Databases

System Design & Databases

1. System Design & Architectural Decision-Making

Deep Dive: The 7-Step System Design Framework

Latency Numbers Everyone Should Know

Key Architectural Principles:

The Architect's Role: Risk Management and Communication

Case Study: Scaling Data at Meta

2. Database Fundamentals & Administration

Resource Management & Data Visibility

Diagnosis for Missing Resources:

Resource Validation and Integrity

Diagnosis for Validation Failures:

Resource Deletion/Missing Diagnosis

Most Likely Cause: `filter_by_user` Scope

Other Potential Causes for Missing or Unsaved Resources

Diagnostic Steps for Missing or Invalid Resources

Check total number of resources

List all resources with their IDs, titles, and audiences

List resources visible to a specific user

Compare the counts and lists. Discrepancies indicate filtering is occurring.

3. Database Performance & Consistency

Case Study: Horizontal Scaling at YouTube (MySQL & Vitess)

Source Tags

Recent Bookmarks

Research Conversations

Synthesis

Agent findings

Research

System Design & Databases

System Design & Databases

1. System Design & Architectural Decision-Making

Deep Dive: The 7-Step System Design Framework

Latency Numbers Everyone Should Know

Key Architectural Principles:

The Architect's Role: Risk Management and Communication

Case Study: Scaling Data at Meta

2. Database Fundamentals & Administration

Resource Management & Data Visibility

Diagnosis for Missing Resources:

Resource Validation and Integrity

Diagnosis for Validation Failures:

Resource Deletion/Missing Diagnosis

Most Likely Cause: filter_by_user Scope

Other Potential Causes for Missing or Unsaved Resources

Diagnostic Steps for Missing or Invalid Resources

Check total number of resources

List all resources with their IDs, titles, and audiences

List resources visible to a specific user

Compare the counts and lists. Discrepancies indicate filtering is occurring.

3. Database Performance & Consistency

Case Study: Horizontal Scaling at YouTube (MySQL & Vitess)

Source Tags

Recent Bookmarks

Research Conversations

Synthesis

Agent findings

Research

Most Likely Cause: `filter_by_user` Scope