Streaming MongoExport to Blob Store (S3)

UseCase: Take a one-time mongoexport of a 1 TB sharded dataset from mongo cluster and upload to azure blob/s3.
Challenges:
- CPU: Heavy %cpu consumption
- Memory: Moderate
- Disk: Heavy IOPS consumption, Large Disk provisioning
Solutions:
- Simplest solution would be taking mongoexport dump onto disk and then upload to azure blob.
Problem: Needs > 1 TB disk space and much more IOPS. Slow.
Improvement: Parallelise mongoexport process per shard - Better one is to stream output of mongoexport (from each shard) to azure blob per shard.
mongoexport --host=$ip --db=$db --collection=$collection | python tap.py $db.$collection.$shard $location

tap.py :
python script that takes in stdin stream and writes onto s3 blob without consuming significant amount of resources (disk/memory). Library used: https://github.com/RaRe-Technologies/smart_openConclusion:
Had issue with this approach where cpu was not sufficient and complete dataset was not ingested into blob. Over provisioning worked.
With 12 shards and each mongo-export (on each shard) spawning 4 processes, a total of 48 processes were spawned. So, I used 48 core machine for streaming the 1 TB collection from mongo to azure blob.
Instance Used : 48 cores, 96 GB Mem
CPU utilised : 30% — 33%
RAM utilised : 7% — 10%
Total Time : 4–5 hrs