ml learning path - yoonzh.com

date: 2025-02-17 Adapted from ¹

goals

explain core ML concepts and deployment challenges
manipulate tensors and implement efficient data pipelines
build and train models using PyTorch and TensorFlow
convert and secure models using Safetensors.
deploy models locally and at scale via APIs, cloud, or edge devices.
fine-tune and run local LLMs (deepseek)

Week 1: Introduction to ML & Deployment

ML fundamentals: Supervised, unsupervised, reinforcement learning.

Model deployment lifecycle: Training, validation, monitoring, scaling.

Ethical considerations: Bias, privacy, and environmental impact.

Project: Deploy a pre-trained scikit-learn model (e.g., Iris classifier) via a Flask API. Write a 1-page report on potential deployment pitfalls.

Week 2: Tensors & Mathematical Foundations

Tensors as data structures: Scalars, vectors, matrices, n-dimensional arrays.

Tensor operations: Reshaping, broadcasting, einsum.

Automatic differentiation and gradient computation.

Exercise: Use NumPy to compute the Jacobian matrix for a simple function (e.g., \({f(x,y)=x^2+3y}\)). Visualize tensor operations in a Jupyter notebook.

Week 3: Data Pipelines & Preprocessing

Data loading (CSV, JSON, Parquet), normalization, and augmentation.

Feature engineering: Embeddings, one-hot encoding, PCA.

Tools: Pandas, Hugging Face Datasets, TF Data API.

Project: Build a data pipeline for the Titanic dataset. Implement a custom PyTorch Dataset class to handle missing values and categorical features.

Week 4: PyTorch vs. TensorFlow Deep Dive

PyTorch: Dynamic computation graphs, torch.nn, GPU acceleration.

TensorFlow: Static graphs, Keras layers, distributed training.

Exercise: Train a logistic regression model on MNIST using both frameworks. Compare code structure, debugging ease, and training speed.

Week 5: Building & Training Models

Architectures: CNNs (ResNet), RNNs (LSTM), Transformers.

Transfer learning with pretrained models (e.g., BERT, ResNet-50).

Project: Train a PyTorch CNN on CIFAR-10 and a TensorFlow Transformer for text generation. Profile GPU usage with nvtop or NSight.

Week 6: Model Evaluation & Optimization

Metrics: Precision, recall, F1, ROC-AUC.

Hyperparameter tuning: Grid search, Bayesian optimization.

Explainability: SHAP, LIME.

Exercise: Use Optuna to optimize hyperparameters for a Random Forest model. Generate a SHAP summary plot for feature importance.

Week 7: Safetensors & Model Security

Safetensors format: Security benefits over pickle.

Model serialization/deserialization in PyTorch/TensorFlow.

Vulnerability scanning with tools like safety or bandit.

Project: Convert a PyTorch model to Safetensors. Write a script to detect unsafe operations in a model’s inference pipeline.

Week 8: Deployment with Flask/FastAPI

REST API design, request handling, async endpoints.

Scalability: Load balancing, Redis caching.

Project: Deploy a sentiment analysis model with FastAPI. Use Docker to containerize the app and Locust for load testing.

Week 9: Local LLMs & Fine-Tuning

Quantization (GGUF, GPTQ), LoRA for parameter-efficient tuning.

Tools: Ollama, llama.cpp, Hugging Face Transformers.

Project: Fine-tune Mistral-7B on a custom dataset (e.g., legal documents) using QLoRA. Benchmark inference speed on a CPU vs. GPU.

Week 10: Advanced Deployment Strategies

Cloud platforms: AWS SageMaker, Google AI Platform.

Edge devices: TensorFlow Lite, ONNX Runtime.

Project: Deploy a TFLite model to a Raspberry Pi. Collect real-time sensor data and run inferences.

Week 11: Capstone Project

End-to-end project: From dataset selection to deployed model. Example:

Train a GAN to generate synthetic art.
Deploy it as a Gradio app with user input controls.
Write a CI/CD pipeline for model updates.

Week 12: Next Steps & Emerging Trends

MLOps: MLflow, Kubeflow, monitoring with Prometheus.

Trends: Federated learning, neuromorphic computing.

Exercise: Present a research paper (e.g., “Attention Is All You Need”) and propose a deployment strategy for its architecture.

Tools: Docker, Hugging Face Hub, Weights & Biases.

Communities: Kaggle, MLflow Slack, PyTorch Forums.

Credits

Deepseek. ↩