<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Database on Vivek's Field Notes</title><link>https://heyyviv.github.io/tags/database/</link><description>Recent content in Database on Vivek's Field Notes</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sat, 28 Feb 2026 20:16:56 +0530</lastBuildDate><atom:link href="https://heyyviv.github.io/tags/database/index.xml" rel="self" type="application/rss+xml"/><item><title>Scaling Databases with Sharding</title><link>https://heyyviv.github.io/blog/scaling-databases-with-sharding/</link><pubDate>Sat, 28 Feb 2026 20:16:56 +0530</pubDate><guid>https://heyyviv.github.io/blog/scaling-databases-with-sharding/</guid><description>&lt;h2 id="introduction-to-sharding">Introduction to Sharding&lt;/h2>
&lt;p>Sharding is the process of scaling a database by spreading data across multiple servers, or &lt;strong>shards&lt;/strong>. It is the go-to solution for large organizations managing data at a petabyte scale. Industry leaders like Uber, Shopify, Slack, and OpenAI all leverage sharding to manage their massive datasets.&lt;/p>
&lt;p>In a typical small-scale application, one or more app servers connect to a single, monolithic database. This server stores all persistent data, from user accounts to application state. However, as data grows, this single point of failure and bottleneck must be addressed.&lt;/p>
&lt;h2 id="sharded-architecture">Sharded Architecture&lt;/h2>
&lt;p>In a sharded setup, we divide the total data into portions, each hosted on a separate database server.&lt;/p>
&lt;p>Initially, your application code might try to manage these shards directly—keeping track of which row lives where and maintaining multiple open connections. While manageable with two shards, this approach becomes a maintenance nightmare when dealing with hundreds.&lt;/p>
&lt;h3 id="the-proxy-layer">The Proxy Layer&lt;/h3>
&lt;p>A more robust solution is to use an &lt;strong>intermediary proxy&lt;/strong>. Application servers connect only to this proxy, which then routes queries to the correct shard.&lt;/p>
&lt;p>However, proxies introduce their own challenges:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Throughput Limits:&lt;/strong> If a proxy reaches its capacity, queries are queued, adding latency.&lt;/li>
&lt;li>&lt;strong>Scalability:&lt;/strong> To handle high volumes, you must deploy multiple proxy servers to prevent them from becoming the bottleneck.&lt;/li>
&lt;/ul>
&lt;h2 id="sharding-strategies">Sharding Strategies&lt;/h2>
&lt;p>The sharding strategy—the rules determining data placement—is critical for performance and balance. This usually involves a &lt;strong>shard key&lt;/strong>: the column(s) used to route data.&lt;/p>
&lt;h3 id="1-range-sharding">1. Range Sharding&lt;/h3>
&lt;p>Data is routed based on predefined ranges of values. For example, IDs 1-25 might go to Shard A, 26-50 to Shard B, and so on.&lt;/p>
&lt;blockquote>
&lt;p>[!WARNING]
Naive range-based sharding with monotonically increasing IDs often leads to &lt;strong>&amp;ldquo;Hot Shards&amp;rdquo;&lt;/strong>. If you insert IDs 1 to 25 sequentially, only the first shard is active while others remain idle.&lt;/p>
&lt;/blockquote>
&lt;h3 id="2-hash-sharding">2. Hash Sharding&lt;/h3>
&lt;p>The proxy generates a cryptographic hash of the shard key for each row. Each shard is then responsible for a specific range of hashes.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Best Practice:&lt;/strong> Choose a key with &lt;strong>high cardinality&lt;/strong> (e.g., &lt;code>user_id&lt;/code>).&lt;/li>
&lt;li>&lt;strong>Avoid:&lt;/strong> Columns like &lt;code>name&lt;/code>, where popular values can still create hotspots despite hashing.&lt;/li>
&lt;li>&lt;strong>Optimization:&lt;/strong> Hashing fixed-size integers (&lt;code>user_id&lt;/code>) is generally faster than hashing variable-width strings.&lt;/li>
&lt;/ul>
&lt;h3 id="3-lookup-sharding">3. Lookup Sharding&lt;/h3>
&lt;p>A separate mapping table tracks exactly which data belongs on which shard. This offers maximum flexibility but requires an additional lookup for every query.&lt;/p>
&lt;hr>
&lt;h2 id="real-world-case-study-postgresql-and-chatgpt">Real-World Case Study: PostgreSQL and ChatGPT&lt;/h2>
&lt;p>While sharding solves many scale problems, specific database architectures like PostgreSQL&amp;rsquo;s &lt;strong>MVCC (Multiversion Concurrency Control)&lt;/strong> introduce unique write penalties that companies like OpenAI have had to navigate.&lt;/p>
&lt;h3 id="the-copy-on-write-penalty">The &amp;ldquo;Copy-on-Write&amp;rdquo; Penalty&lt;/h3>
&lt;p>In Postgres, updates are not performed &amp;ldquo;in-place.&amp;rdquo; Updating even one byte results in &lt;strong>Write Amplification&lt;/strong>, where the entire row is copied to create a new version. This strains I/O and leads to &lt;strong>Read Amplification&lt;/strong>, as queries must scan through &amp;ldquo;dead&amp;rdquo; versions (old rows) to find live ones.&lt;/p>
&lt;h3 id="the-bloat-problem">The &amp;ldquo;Bloat&amp;rdquo; Problem&lt;/h3>
&lt;p>Old row versions (Dead Tuples) don&amp;rsquo;t disappear instantly, leading to table bloat and increased &lt;code>autovacuum&lt;/code> overhead. If writes outpace reclamation, performance collapses. Every update also requires updating all indexes to point to the new physical row location, adding CPU stress.&lt;/p>
&lt;h3 id="strategies-from-the-openai-engineering-team">Strategies from the OpenAI Engineering Team&lt;/h3>
&lt;p>To ensure services like ChatGPT and their API remain responsive during massive write spikes, several strategies are employed:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Minimizing Primary Load:&lt;/strong> Read traffic is offloaded to replicas whenever possible. Queries that must remain on the primary (e.g., those part of write transactions) are strictly optimized for efficiency.&lt;/li>
&lt;li>&lt;strong>Selective Migration:&lt;/strong> Shardable, write-heavy workloads are migrated to systems like &lt;strong>Azure CosmosDB&lt;/strong>.&lt;/li>
&lt;li>&lt;strong>Application-Level Optimizations:&lt;/strong> Redundant writes are eliminated, and &amp;ldquo;lazy writes&amp;rdquo; are introduced to smooth out traffic spikes.&lt;/li>
&lt;li>&lt;strong>Rate Limiting:&lt;/strong> Strict limits are enforced during background tasks, such as backfilling table fields, to prevent excessive write pressure.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="optimization--best-practices">Optimization &amp;amp; Best Practices&lt;/h2>
&lt;h3 id="query-optimization">Query Optimization&lt;/h3>
&lt;p>Avoid &amp;ldquo;OLTP anti-patterns&amp;rdquo; that can degrade services:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Simplify Joins:&lt;/strong> A query joining 12 tables (as seen in some historical ChatGPT SEVs) can crash a service during a spike. Move complex join logic to the application layer.&lt;/li>
&lt;li>&lt;strong>ORM Awareness:&lt;/strong> Object-Relational Mapping tools can generate inefficient SQL; always review the output.&lt;/li>
&lt;li>&lt;strong>Timeout Management:&lt;/strong> Configure &lt;code>idle_in_transaction_session_timeout&lt;/code> to prevent idle queries from blocking critical processes like autovacuum.&lt;/li>
&lt;/ul>
&lt;h3 id="cross-shard-penalties">Cross-Shard Penalties&lt;/h3>
&lt;p>Queries spanning multiple shards add excessive network and CPU overhead. Aim for single-shard queries whenever possible. Additionally, avoid shard keys that change frequently, as moving rows between shards to maintain strategy integrity is expensive.&lt;/p>
&lt;h2 id="infrastructure--latency">Infrastructure &amp;amp; Latency&lt;/h2>
&lt;p>Adding a proxy introduces a network hop, typically adding ~1ms of latency.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Server Proximity:&lt;/strong> If proxies and shards are in the same data center, this latency is negligible.&lt;/li>
&lt;li>&lt;strong>Proven Success:&lt;/strong> Slack uses Vitess to manage massive sharded clusters with an average query latency of just &lt;strong>2ms&lt;/strong>.&lt;/li>
&lt;/ul>
&lt;h2 id="high-availability">High Availability&lt;/h2>
&lt;p>Replicas aren&amp;rsquo;t just for reads; they are your safety net. If a primary fails, traffic can be instantly failed over to a replica, preventing hours of downtime.&lt;/p></description></item></channel></rss>