Lessons learned from Downsizing Mongo 3.6 by removing Shards

Objective:
Remove 4 out of 12 shards from production sharded mongo cluster without impacting production traffic within reasonable timeline.
- Why ? : Certain datasets are moved to other datastores such as cassandra which lead to reduction in traffic, disk space etc.. in mongo cluster.
- How ? :https://docs.mongodb.com/manual/tutorial/remove-shards-from-cluster/
Steps in Removing a Shard :
- ChunkMigration/Draining: Drain the chunks present in the shard to other shards (chunk migration by enabling balancer)
db.adminCommand( { removeShard : "rs1" } );
- MovePrimary : If draining shard is primary for some databases, they should be moved to a different shard. (We choose shards for removal which are not primary for any dbs.)
db.adminCommand( { movePrimary : "fizz", to: "buzz"} );
Issues with default way of removing shard:
- Starting in MongoDB 4.4 (and 4.2.1), you can have more than one
removeShard
operation in progress. In MongoDB 4.2.0 and earlier,removeShard
returns an error if anotherremoveShard
operation is in progress. (ref) - Removing one-shard-at-a -time implicates unnecessary chunk movement to unwanted shards (to other to-be removed shards), extending the timeline even more.
- Another non-trivial point about chunk migration is that there can be at-most n/2 parallel chunks migrating across the cluster. Single node should participate in single chunk migration.
To have a better control over chunk migration and the concurrency, custom tool (in python) is opted.
Chunk Migration :
We have 12 shards (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11).
Out of these we decided to drain (7, 8, 9, 11) as they are not primary for any database.
Each of fromShard
(7, 8, 9, 11) is mapped to a couple of toShards
as below.

With this approach, we have avoided sharing any shards between any two migrations and achieved as much parallelism as possible (4x). For each fromShard
the chunks are load balanced
to each of toShards
. This would preserve the evenness in the distribution.
Issues during Custom Chunk migration :
- JumboChunks Issue: (Cannot move chunk Exception)
So, what is a “jumbo chunk”? It is a chunk whose size exceeds the maximum amount specified in the chunk size configuration parameter (which has a default value of 64 MB). When the value is greater than the limit, the balancer won’t move it. — https://www.percona.com/blog/2020/06/11/dealing-with-jumbo-chunks-in-mongodb/
Solution: Split the chunk and migrate.
2. Jumbo chunks that are not splittable: (single value in hash key range).
Solution: Delete and reinsert the documents.
3. Cannot have parallel splits within a collection : (Lock is acquired at collection level for splitting a chunk within the collection.)
Solution: Retry.
4. Due to share removal, expect rise in load on remaining nodes.
References:
- MongoDB draining shard but balancer not running? (removeShard taking too much time)
- https://dba.stackexchange.com/questions/203383/how-to-speed-up-mongodb-chunk-moving-between-shards
- Stop draining shard in mongodb
- https://docs.mongodb.com/manual/tutorial/remove-shards-from-cluster/
- https://jira.mongodb.org/browse/SERVER-11328