Getting started with MLOps, Part 3: Advanced model training in Kubeflow

Sam Ibou September 12, 2023 2 min read

[ad_1]

Getting started with MLOps, Part 3: Advanced model training in Kubeflow
- Transfer learning and fine-tuning as a basis
Training acceleration and outsourcing
Use distributed training and MPIJob resource
Monitoring and other functions
outlook

Read article in iX 10/2023

To work with large language models such as ChatGPT, Bard or LLaMA with several billion parameters, you need to expand your MLOps approach with advanced techniques. To optimize training times and model quality, data scientists use transfer learning and fine-tuning of pre-built base models, use additional or specialized hardware, rent resources from the cloud and distribute the training across multiple computing nodes.

Monitoring via TensorBoard provides developers with continuous feedback on the training status and model quality. After this rough optimization, if necessary, model training is repeated on more current training data and the training parameters are refined using AutoML and hyperparameter tuning. This last part of the tutorial shows how all of these techniques can be integrated into Kubeflow.

Dr. Sebastian Lehrig leads MLOps with open source at IBM. His goal: to offer solutions optimized on IBM infrastructure – maximally efficient, secure and reliable.

The first two articles in this series showed how data scientists Build AI models with Kubeflow pipelines, deploy, manage and use. Classifying the iris dataset as an example was useful because it can be easily analyzed with few resources and simple models. However, the techniques presented so far are not suitable for training more computationally intensive and complex models.