How to manage an end-to-end machine learning project with MLflow? part 2
In part 1, you know what is MLflow and why you should use it. You also learn about MLflow tracking which is one of 4 components of MLflow.
In this part, I will tell you the other components of MLflow including models, model registry, and project.
Contents of this article
- MLflow model example
- MLflow model registry example
- MLflow project example
MLflow model example
An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools — for example, real-time serving through a REST API or batch inference on Apache Spark. (from the MLflow official website)
The model format is called “flavors” that can be understood by different downstream tools.
All of the flavors that a particular model supports are defined in its MLmodel file in YAML format.
To make it clear, I will refer back to the MLflow tracking example in part 1. You don’t have to switch to that article. I copied the code for you here.
mlflow.set_experiment(experiment_name)
with mlflow.start_run(run_name=run_name):
mlflow.log_params(params)
mlflow.log_metrics(metrics)
mlflow.log_artifact('./scatter.png')
mlflow.sklearn.log_model(classifier, "model")
print('Run - %s is logged to Experiment - %s' %(run_name, experiment_name))
- The key function of the MLflow Models is mlflow.sklearn.log_model.
This function will log a trained model in the currently active run. - The flavor of this model is Scikit-learn (sklearn) which is a built-in model flavors of MLflow.
- You can view that model in mlflow UI (call mlflow ui on the Anaconda Prompt).
When you access http://127.0.0.1:5000/ and select the run your model was saved, you can see your model in the Artifacts section.
As you can see, the MLflow model folder comes with many files including:
- MLmodel: specify how a model can be loaded and used (including model flavors)
- conda.yaml: a conda environment parameter that can contain dependencies used by the model.
- model.pkl: a compacted machine learning model in the pickle format.
- python_env.yaml: the information for restoring a model environment using virtualenv.
- requirements.txt: created from the pip portion of the
conda.yaml
environment specification. Specify required library to tun the model.
MLflow models can include the following additional metadata about model inputs and outputs that can be used by downstream tooling:
- Model Signature — description of a model’s inputs and outputs.
- Model Input Example — example of a valid model input.
After the model was logged on the server, you have to serve the model to allow other users to make an API request to implement the served model.
Go to the command line and type the below code.
mlflow models serve --model-uri runs:/c35d6853ec1048a09acce6ddf1eb6b41/model --no-conda --port 4000
after the — model-uri is your model path which you can specify using run id or other methods (for example model name for the registered model. More in the model register section.). You can easily find the run id in mlflow ui or you may use mlflow.search_runs in Python as I described in part 1.
The number after — port is your port that you can fill in any number here. In this example, I used 4000, so the model will be served on http://127.0.0.1:4000.
If the model is served successfully, you will see the message below.
INFO:waitress:Serving on http://127.0.0.1:4000
To use the served model, you may use the request library on Python code editor or curl on the command line.
This is how to do it using the request library.
import requests
request_data = {
"dataframe_records": [[5,5,5,5],[1,2,3,4]]
}
endpoint = "http://localhost:4000/invocations"
response = requests.post(endpoint, json=request_data)
print(response.text)
I send the request_data which is a dataframe inside json. The input has 4 columns (like the training data) and there are 2 samples (2 rows). The input will be sent to http://localhost:4000/invocations using the POST method.
The result will be like
This is the predicted class of 2 input samples.
MLflow model registry example
The MLflow Model Registry component is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model. (from the MLflow official website)
Your team can store their trained model in the model registry and define the model version and model stage. Your team will work on a machine learning project in a more systematic manner.
The step to register the model is simple. In mlflow ui, go to the model you want to register.
Click “Register Model”. Then, Select “Create New Model”, entitle the model name, and click “Register”. After that, you will see the Models tab on top of the website.
* a typo here. I want to type iris_classifier lol.
When you click the Models tab, your will see your registered model.
You can add a new version model and change the model stage after that.
This is how to change the model stage to Production.
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
name="iris_classier",
version=1,
stage="Production"
)
After changing the model stage to production, you will see your model version in the Production column.
You can serve the registered model using mlflow model serve but specify models: location instead of runs.
mlflow models serve --model-uri models:/iris_classier/Production -p 4000 --no-conda
Then, you can use this model with the request library like in the previous section.
MLflow project example
An MLflow Project is a format for packaging data science code in a reusable and reproducible way. (from the MLflow official website)
A project is simply a directory of files, or a Git repository, containing your code.
The project environments that MLflow supports include the Virtualenv environment, conda environment, Docker container environment, and system environment.
In your project, you can include an MLflow project file to give the system the information of the project such as environment type or entry point.
I won’t go deep into the MLflow project in this article since this article may be too long. You can read more on the MLflow project’s official website.
You can try running the demo project using the command line and type the below code.
mlflow ui
mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=5.0
If you can run it successfully, you will see the result like the below image.
Conclusion
MLflow consists of tracking, models, model registry, and project. Tracking is used for logging. Model is used to save a model in a flavor format to serve and request. Model registry makes model management more systematic. Finally, project is a directory of files that package code in a reusable and reproducible way.
Thank you for reading till the end. Hope that you can learn the mlflow concept from the examples in the article series and apply it in your job. This is just the beginning of MLflow if you want to learn more, read the official MLflow website.
If this article is helpful, please clap and follow me for more data science articles.