What is Sharding ?
Sharding is a strategy of partitioning a single data set into multiple databases that can then be stored on multiple machines. This allows larger data sets to be split into smaller chunks and stored in multiple data nodes, increasing the overall storage capacity of the system by distributing data across multiple machines, a sharded database can handle more queries than a single machine.
Sharding is a way of scaling called horizontal scaling or it is a scaling technique in which more nodes are added to distribute the load. Horizontal scaling enables almost unlimited scalability to handle large data and intensive workloads. Vertical scaling, on the other hand, refers to increasing the capacity of an individual machine or server by adding more powerful processor, RAM or storage capacity.

Sharding advantages:
Sharding lets your database handle massive amounts of data by spreading it out across multiple machines/databases. This gives you several benefits:
- Faster Reads and Writes: Splitting the data means each server handles a smaller chunk, allowing for quicker access to information, whether you’re retrieving existing data (reads) or adding new data (writes).
- Scale Like a Boss: Need more storage space? Just add more servers! Sharding lets you easily expand your database capacity as your data grows.
- Always Available (Almost): Even if a single server fails, your database stays up and running. This is because each shard is typically replicated, meaning the data is stored on multiple servers. Plus, other shards with different parts of the data are still accessible.
Sharding Drawbacks:
Sharding offers impressive scalability, but it’s not without downsides, consider these before you scale:
- Slower Queries: Finding data requires extra steps, potentially causing slight delays, especially for queries that involve multiple servers. Combining results from different shards can further impact speed.
- Increased Admin Complexity: Managing a sharded system is more intricate. You’ll need to maintain individual shards and the software that routes queries to the correct shard. Replication for data safety adds another layer of complexity, requiring consistent updates across all copies.
- Higher Infrastructure Costs: Sharding necessitates additional servers, leading to increased expenses. While it allows for more data storage, each added instances boosts your infrastructure costs. Optimizing a sharded system for optimal performance can also be resource-intensive.
Deepak Gogoi