How to Run Open-Source LLMs Locally on a Mac with MLX-LM
Run open-source LLMs locally on Apple Silicon with Apple’s MLX-LM: `pip install mlx-lm`, then `load()` a Hugging Face model and call `generate()` (or `stream_generate()` for live tokens). This post shows 117B on an M4 Max (≈79 tokens/sec, ~63 GB peak) and 3B on a MacBook Air (≈35.5 tokens/sec, ~2.6 GB).