Lessons learned from Downsizing Mongo 3.6 by removing Shards

Saikumar Chintada
3 min readFeb 22, 2022

Objective:

Remove 4 out of 12 shards from production sharded mongo cluster without impacting production traffic within reasonable timeline.

Steps in Removing a Shard :

  1. ChunkMigration/Draining: Drain the chunks present in the shard to other shards (chunk migration by enabling balancer)
    db.adminCommand( { removeShard : "rs1" } );
  2. MovePrimary : If draining shard is primary for some databases, they should be moved to a different shard. (We choose shards for removal which are not primary for any dbs.)
    db.adminCommand( { movePrimary : "fizz", to: "buzz"} );

Issues with default way of removing shard:

  • Starting in MongoDB 4.4 (and 4.2.1), you can have more than one removeShard operation in progress. In MongoDB 4.2.0 and earlier, removeShard returns an error if another removeShard operation is in progress. (ref)
  • Removing one-shard-at-a -time implicates unnecessary chunk movement to unwanted shards (to other to-be removed shards), extending the timeline even more.
  • Another non-trivial point about chunk migration is that there can be at-most n/2 parallel chunks migrating across the cluster. Single node should participate in single chunk migration.

To have a better control over chunk migration and the concurrency, custom tool (in python) is opted.

Chunk Migration :

We have 12 shards (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11).
Out of these we decided to drain (7, 8, 9, 11) as they are not primary for any database.

Each of fromShard (7, 8, 9, 11) is mapped to a couple of toShards as below.

With this approach, we have avoided sharing any shards between any two migrations and achieved as much parallelism as possible (4x). For each fromShard the chunks are load balanced to each of toShards. This would preserve the evenness in the distribution.

Chunk migration script

Issues during Custom Chunk migration :

  1. JumboChunks Issue: (Cannot move chunk Exception)

So, what is a “jumbo chunk”? It is a chunk whose size exceeds the maximum amount specified in the chunk size configuration parameter (which has a default value of 64 MB). When the value is greater than the limit, the balancer won’t move it. — https://www.percona.com/blog/2020/06/11/dealing-with-jumbo-chunks-in-mongodb/
Solution: Split the chunk and migrate.

2. Jumbo chunks that are not splittable: (single value in hash key range).
Solution: Delete and reinsert the documents.

3. Cannot have parallel splits within a collection : (Lock is acquired at collection level for splitting a chunk within the collection.)
Solution: Retry.

4. Due to share removal, expect rise in load on remaining nodes.

References:

--

--