Welcome back, fellow MLOps explorer! In our previous chapters, you mastered the fundamentals of setting up Trackio, initializing runs, and logging basic scalar metrics like loss and accuracy. That’s a fantastic start, giving you a real-time pulse on your model’s training performance. But what happens when you need to track more than just numbers?
In the real world of machine learning, experiments generate much more than simple metrics. You’ll produce trained models, preprocessed datasets, stunning visualizations, and custom data tables. Just logging numbers isn’t enough to fully reproduce an experiment or understand its nuances. This chapter is your gateway to “advanced logging” with Trackio, where we’ll learn to treat these critical outputs as first-class citizens: artifacts.
By the end of this chapter, you’ll not only understand what artifacts are and why they’re crucial for robust ML workflows but also how to effectively log them using Trackio. We’ll cover logging trained models, datasets, images, and other custom data, ensuring your experiments are fully reproducible and easy to debug. Let’s elevate your experiment tracking game!
Core Concepts: Beyond Metrics – Understanding Artifacts
Before we jump into code, let’s solidify our understanding of what “artifacts” are in the context of machine learning and how Trackio helps us manage them.
What Exactly are ML Artifacts?
Think of an artifact as any file or collection of files that is either an input to your experiment or a significant output generated by it, beyond just scalar metrics. These are the tangible pieces of your ML project that are often critical for reproducibility, deployment, or further analysis.
Common examples of ML artifacts include:
- Trained Models: The saved weights and architecture of your machine learning model (e.g.,
.pt,.h5,.pklfiles). - Datasets: Preprocessed training, validation, or test datasets (e.g.,
.csv,.parquet,.jsonfiles). - Configuration Files:
YAMLorJSONfiles detailing hyperparameter settings, model architecture, or data preprocessing steps. - Visualizations: Plots, charts, and images generated during training or evaluation (e.g.,
.png,.jpg,.svgfor loss curves, confusion matrices, ROC curves). - Evaluation Reports: Text files or custom data tables summarizing performance beyond simple metrics.
Why is Logging Artifacts So Important?
You might be thinking, “Can’t I just save these files to a folder?” And yes, you can. But simply saving them locally misses out on several key benefits that an experiment tracking system like Trackio provides:
- Reproducibility: To truly reproduce an experiment, you need not only the code and hyperparameters but also the exact data and model produced or consumed. Logging artifacts links them directly to a specific run, making it easy to retrieve them later.
- Version Control: Trackio, leveraging its integration with Hugging Face Datasets and Spaces, can help you implicitly manage versions of your artifacts. When you log an artifact, it’s associated with a specific experiment run, creating a historical record.
- Sharing & Collaboration: Easily share specific models or datasets with team members by pointing them to a Trackio run ID, rather than managing file paths or cloud storage links manually.
- Debugging & Auditing: If a model performs unexpectedly, having its exact weights, the data it was trained on, and all associated visualizations easily accessible through Trackio’s dashboard makes debugging much faster. It creates a clear audit trail.
- Deployment Readiness: A trained model logged as an artifact is a clear candidate for deployment. You can fetch the correct model version directly from your tracking system.
How Trackio Handles Artifacts
Trackio is designed to be lightweight and API-compatible with popular tracking libraries like Weights & Biases (WandB). This means its trackio.log() function is quite versatile. While it doesn’t have a dedicated log_artifact() function in the same way some heavier systems do, it effectively handles artifacts by:
- Logging File Paths: You save your artifact (model, data, image) to a local file, and then log the path to that file along with relevant metadata. Trackio’s backend then manages these files, potentially copying them to its local storage or preparing them for sync with Hugging Face Spaces.
- Specialized Data Types: Trackio can intelligently handle certain Python objects or data types when passed to
trackio.log, which it then serializes or processes for display in the dashboard.
Let’s visualize this workflow:
Notice how each artifact type follows a similar pattern: create it, save it locally, and then tell Trackio about its location (and maybe some descriptive metadata).
Step-by-Step Implementation: Logging Real-World Artifacts
Let’s get our hands dirty and log some actual artifacts. We’ll simulate a simple machine learning workflow.
Prerequisites
Make sure you have Trackio installed, along with scikit-learn for a simple model, matplotlib for plotting, and joblib for model serialization.
pip install trackio==0.2.1 scikit-learn==1.3.2 matplotlib==3.8.2 joblib==1.3.2 numpy==1.26.2
(Note: Versions are as of 2026-01-01. Always prefer the latest stable versions.)
Step 1: Initialize Your Experiment
First, let’s set up a new Python file, say advanced_logging_example.py, and initialize a Trackio run.
# advanced_logging_example.py
import trackio
import os
import joblib
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay
# Ensure a directory for artifacts exists
ARTIFACTS_DIR = "trackio_artifacts"
os.makedirs(ARTIFACTS_DIR, exist_ok=True)
# 1. Initialize Trackio Run
# Remember to give your run a descriptive name!
run = trackio.init(project="Advanced_Artifact_Logging", name="logistic_regression_run_001")
print(f"Trackio run initialized: {run.id}")
print(f"Dashboard URL: {trackio.get_dashboard_url()}")
# We'll add more code here!
Explanation:
- We import necessary libraries.
oshelps with directory creation,joblibfor saving models,matplotlibfor plots,numpyfor numerical operations, andscikit-learnfor our ML task. ARTIFACTS_DIRis a local folder where we’ll temporarily save our artifacts before logging them.trackio.init()starts a new run. We give it aprojectname and a specificnamefor this run.trackio.get_dashboard_url()prints the URL where you can view your experiment’s progress. Open this URL in your browser!
Run this initial script: python advanced_logging_example.py. You should see a message confirming the run initialization and a dashboard URL.
Step 2: Prepare Data and Train a Simple Model
Now, let’s generate some synthetic data and train a basic logistic regression model.
Add the following code to advanced_logging_example.py, right after the print statements:
# ... (previous code) ...
# 2. Prepare Data
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=0, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Log hyperparameters (a good practice!)
hyperparameters = {
"solver": "liblinear",
"penalty": "l1",
"C": 0.1,
"random_state": 42
}
trackio.log(hyperparameters) # Log these as a dictionary
# 3. Train Model
model = LogisticRegression(**hyperparameters)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.4f}")
trackio.log({"test_accuracy": accuracy})
Explanation:
make_classificationcreates a synthetic dataset.train_test_splitdivides our data.- We define
hyperparametersas a dictionary andtrackio.log()it. This is a great way to log configuration settings for easy retrieval later. - A
LogisticRegressionmodel is trained. - We calculate and
trackio.log()thetest_accuracy.
Run the script again. In your Trackio dashboard, you should now see the hyperparameters dictionary and the test_accuracy logged for this run!
Step 3: Logging a Trained Model as an Artifact
This is where advanced logging begins! We’ll save our trained LogisticRegression model to a file and then log its path.
Add this code to your script, after calculating accuracy:
# ... (previous code) ...
# 4. Log Trained Model as an Artifact
model_path = os.path.join(ARTIFACTS_DIR, "logistic_regression_model.pkl")
joblib.dump(model, model_path) # Save the model using joblib
print(f"Model saved locally to: {model_path}")
# Log the model artifact by providing its path and some metadata
trackio.log({
"model_artifact": model_path,
"model_name": "LogisticRegression",
"model_version": "1.0",
"model_framework": "scikit-learn"
})
Explanation:
os.path.join()creates a robust file path for our model withinARTIFACTS_DIR.joblib.dump(model, model_path)serializes ourscikit-learnmodel to a.pklfile. This is a common way to save Python objects.trackio.log()is used again, but this time we’re logging a dictionary containing themodel_artifactkey with the path to our saved model. We also include useful metadata likemodel_name,model_version, andmodel_framework. This metadata is crucial for understanding the artifact later.
Run the script. Check your dashboard! You’ll see the model_artifact entry. Depending on Trackio’s backend, it might display as a link to a file or indicate that a file has been tracked.
Step 4: Logging a Data Artifact
Next, let’s imagine our X_test dataset is a crucial piece of data we want to log alongside our model for reproducibility.
Add this code:
# ... (previous code) ...
# 5. Log Test Data as an Artifact (e.g., as a CSV)
test_data_path = os.path.join(ARTIFACTS_DIR, "test_data.csv")
np.savetxt(test_data_path, X_test, delimiter=",") # Save test data as CSV
print(f"Test data saved locally to: {test_data_path}")
trackio.log({
"test_data_artifact": test_data_path,
"data_description": "Features for model evaluation",
"data_format": "CSV",
"num_samples": X_test.shape[0],
"num_features": X_test.shape[1]
})
Explanation:
np.savetxt()saves ourX_testNumPy array into a CSV file. For pandas DataFrames, you would usedf.to_csv().- Again,
trackio.log()records the path (test_data_artifact) along with descriptive metadata.
Run the script and observe the new test_data_artifact entry in your Trackio dashboard.
Step 5: Logging a Visualization Artifact (Image)
Visualizations are incredibly helpful for understanding model behavior. Let’s log a confusion matrix plot.
Add this code:
# ... (previous code) ...
# 6. Log a Visualization (Confusion Matrix) as an Artifact
cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=["Class 0", "Class 1"])
fig, ax = plt.subplots(figsize=(6, 6))
disp.plot(cmap=plt.cm.Blues, ax=ax)
ax.set_title("Confusion Matrix")
confusion_matrix_path = os.path.join(ARTIFACTS_DIR, "confusion_matrix.png")
plt.savefig(confusion_matrix_path) # Save the plot as a PNG image
plt.close(fig) # Close the plot to free memory
print(f"Confusion matrix plot saved locally to: {confusion_matrix_path}")
trackio.log({
"confusion_matrix_plot": confusion_matrix_path,
"plot_type": "Confusion Matrix",
"evaluation_set": "Test"
})
Explanation:
- We use
sklearn.metrics.confusion_matrixandConfusionMatrixDisplayto create a visual representation of our model’s performance. plt.savefig()is crucial here: it saves the generated plot to a file (.pngin this case).plt.close(fig)is good practice to prevent plots from accumulating in memory, especially in long-running scripts.- Finally, we
trackio.log()the path to our saved image, again with descriptive metadata.
Run the script one last time. Your dashboard should now be rich with scalar metrics, hyperparameters, and three distinct artifacts: the trained model, the test data, and the confusion matrix plot!
Step 6: End the Trackio Run
It’s good practice to explicitly end your Trackio run, though it will often terminate automatically when your script finishes.
Add this to the very end of your script:
# ... (previous code) ...
# 7. End the Trackio Run
trackio.end()
print("Trackio run ended.")
Now, your advanced_logging_example.py is complete! You’ve successfully logged various types of artifacts.
Mini-Challenge: Log a Feature Importance Plot
You’ve done a great job logging models, data, and a basic plot. Now, it’s your turn to apply what you’ve learned.
Challenge: Extend the advanced_logging_example.py script to:
- Calculate feature importances for your
LogisticRegressionmodel. WhileLogisticRegressiondoesn’t have a directfeature_importances_attribute like tree-based models, itscoef_attribute can be interpreted as importance (absolute value). - Create a bar plot showing these feature importances.
- Save this plot as an image file (e.g.,
feature_importance.png) in yourtrackio_artifactsdirectory. - Log the path to this feature importance plot as an artifact with Trackio, including relevant metadata.
Hint:
- Access coefficients using
model.coef_[0](for binary classification). - Use
np.abs()to get absolute values for importance. plt.bar()is useful for bar plots.- Remember
plt.savefig()andtrackio.log()!
What to Observe/Learn:
- How to extract insights (like feature importance) from your model.
- The complete workflow of generating a visualization, saving it, and logging it as an artifact.
- The richness of information you can associate with a single experiment run in Trackio.
Take your time, try to solve it independently, and then check the solution if you get stuck!
Click for Mini-Challenge Solution
# ... (previous code in advanced_logging_example.py) ...
# Add this section after logging the confusion matrix plot
# 7. Mini-Challenge Solution: Log Feature Importance Plot
# For Logistic Regression, coefficients indicate feature importance
feature_importances = np.abs(model.coef_[0])
feature_names = [f"Feature {i}" for i in range(X.shape[1])]
# Create a bar plot
fig_fi, ax_fi = plt.subplots(figsize=(10, 6))
ax_fi.bar(feature_names, feature_importances)
ax_fi.set_xlabel("Feature")
ax_fi.set_ylabel("Absolute Coefficient Value (Importance)")
ax_fi.set_title("Feature Importances (Logistic Regression Coefficients)")
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
feature_importance_path = os.path.join(ARTIFACTS_DIR, "feature_importance.png")
plt.savefig(feature_importance_path)
plt.close(fig_fi)
print(f"Feature importance plot saved locally to: {feature_importance_path}")
trackio.log({
"feature_importance_plot": feature_importance_path,
"plot_type": "Feature Importance",
"model_insight": "Coefficient-based importance"
})
# 8. End the Trackio Run (if not already added)
trackio.end()
print("Trackio run ended.")
Common Pitfalls & Troubleshooting
Even with clear steps, logging artifacts can sometimes throw a curveball. Here are a few common issues and how to tackle them:
“File Not Found” Errors when Logging Artifacts:
- Pitfall: You tried to log a file path with Trackio, but the file either doesn’t exist at that location or the path is incorrect.
- Troubleshooting:
- Double-check the
os.path.join()calls. Are your directories and filenames correct? - Ensure
os.makedirs(ARTIFACTS_DIR, exist_ok=True)runs before you try to save any files. - Verify that
plt.savefig()orjoblib.dump()(or similar save functions) are successfully executed beforetrackio.log()is called for that artifact. Print themodel_path,test_data_path, etc., to confirm they exist before logging.
- Double-check the
Large Artifacts Slowing Down Your Workflow:
- Pitfall: Logging very large files (e.g., multi-GB datasets or high-resolution images) can consume significant local disk space and might slow down any potential syncing with remote services like Hugging Face Spaces.
- Troubleshooting:
- Be selective: Do you really need to log the entire raw dataset for every run? Perhaps log only preprocessed data, or a sample, and reference the raw data’s location in cloud storage.
- Optimize file formats: Use efficient formats like Parquet for tabular data, or compressed image formats (JPEG) when high fidelity isn’t strictly necessary.
- Consider versioning systems: For extremely large datasets, dedicated data versioning tools (like DVC) might be combined with Trackio, where Trackio logs the DVC pointer rather than the entire file.
Missing or Unclear Artifacts in the Dashboard:
- Pitfall: You logged an artifact, but it doesn’t appear as expected in the Trackio dashboard, or its description is unhelpful.
- Troubleshooting:
- Check
trackio.log()arguments: Ensure you’re passing a dictionary where the key is descriptive (e.g.,"model_artifact") and the value is the correct file path. - Include sufficient metadata: Always add extra keys to your logged dictionary (like
model_name,data_description,plot_type) to make the artifact immediately understandable in the dashboard. Don’t just log the path by itself! - Refresh the dashboard: Sometimes a simple refresh is all it takes for new logs to appear.
- Check
Summary
Congratulations! You’ve successfully ventured into the world of advanced logging with Trackio. Let’s quickly recap what you’ve learned:
- Artifacts are key: Beyond scalar metrics, artifacts like trained models, datasets, and visualizations are crucial for robust ML experiment tracking.
- Why log artifacts: They ensure reproducibility, provide version control, facilitate sharing, aid in debugging, and prepare models for deployment.
- Trackio’s approach: Trackio effectively logs artifacts by associating local file paths (and their content) with your experiment runs, leveraging
trackio.log()with descriptive metadata. - Practical application: You’ve learned to save and log a trained
scikit-learnmodel, a preprocessed dataset, and amatplotlibvisualization as distinct artifacts.
By consistently logging these artifacts, you transform your Trackio dashboard from a simple metric tracker into a comprehensive repository for each experiment, making your machine learning workflow more organized, transparent, and reproducible.
What’s next? In the upcoming chapters, we’ll explore how to leverage Trackio’s dashboard for deeper analysis, delve into command-line tools for managing runs, and discover how to seamlessly sync your local experiments with Hugging Face Spaces for collaborative sharing and deployment.
References
- Trackio Official Documentation
- Hugging Face Spaces Documentation
- Scikit-learn User Guide
- Matplotlib Documentation
This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.