It is an accepted practice in the machine learning (ML) community to train multiple deep learning models and perform numerous ablation studies in order to find the best model for a certain task.
To the best of my knowledge, the workflow for training and experimenting with multiple deep-learning models is rarely discussed. Check most open-source ML/AI papers and their accompanying Github source code, and you will only see the production-ready version of the AI model. In this session, I will share our experience in how we tailor “”software version control”” to maintain multiple deep learning models. Key discussions include configuring network models to take into account different network architectures, hyperparameter settings, data preparation and pre-processing, and different hardware configurations for training. Version control practices also benefit deep learning, as they allow researchers to track the performance of models more efficiently and train multiple models everywhere (cloud and local PC clusters) at once.
No gimmicky tools and paid software! We’re only going to maximize the open-source libraries available, such as Pytorch, OpenCV, Numpy, Visdom, YAML, and Matplotlib.