Primary motivation : https://techblog.citystoragesystems.com/p/ml-infrastructure-doesnt-have-to

Idea

We have pretty been exploiting some niche set of tools when it comes to MLOps. For e.g. we have only looked at ZenML as our orchestrator, Terraform for IaC, etc.

The idea here is to move away and explore some tools that will essentially complement the weaknesses of above tools. For e,g. we might learn that maybe ZenML is not good at large data scale training (100GB of data), what do we do in that case? What is the best tool that might address this need? Have we explored it? Can we as MLOps Experts confidently recommend our client that we are providing/applying the best possible solution to our knowledge? Is working with raw kubernetes a recommended approach or could we use Argo a better alternative?

Business Case

We can confidently recommend set of tools that best work for the client. Not all clients require same tools maybe in most cases it might work.

Tangible Outputs

Document discussing strengths/weakness of our current choices
Project exploring new tools
Technical blog
LI posts