On-Device Apple VLM + Windows Speech 🚀
Issue #7 · Week of October 20
We pulled the standouts from the last 2 weeks: lots of agent/tool integration, multimodal work and RAG builds tied to the broader push (Qwen3‑VL/Qwen3‑Omni and mobile Moondream wins). Highlights: Marko Budisic (Raleigh) showed EVA—video‑transcript RAG with precise timestamps; Eugene Yan (Seattle) demoed an LLM‑RecSys hybrid that embeds product IDs into Qwen3‑8B; Leonard lin putting Tokyo on the board with his Strix Halo tests and Rach from Singapore shows the way to sub-4ms code-search for AI agents. These are real, vetted builds—read on.
Testing and Benchmarking AMD Strix Halo's (Ryzen 395) AI Capabilities
Leonard Lin, CTO of Shisa.AI, showed off the new Framework Desktop, which runs AMD's latest Ryzen AI Max 395 (Strix Halo) APU. The unique thing about this small machine is that it has 128GB of unified memory, a relatively capable GPU (theoretical 60 FP16 TFLOPS) and Vulkan and ROCm support. He showed his work getting PyTorch with AOTriton/FA and vLLM running, as well as llama.cpp benchmark sweeps for models, including large MoEs like the new gpt-oss-120b inferencing locally at >50 tok/s.
TECH STACK
|
SiftDB: Grep-Native Code Search
Rach Pradhan from Menlo.ai presented SiftDB, a grep-native database that enabled AI agents to search codebases with sub-4ms latency. It showcases a grep-native indexing approach and a lightweight architecture that accelerates queries and supports tool-call integration into developer workflows. The project is open-source with runnable code and docs, giving builders a practical blueprint for production-ready, agent-centric code search. Survey feedback hinted strong interest from the community for tooling that speeds debugging and knowledge discovery.
TECH STACK
|
Building an Uber for RoboTaxis
Abhimanyu 'Chitra' Selvan from DigitalOcean presented What I Learned Building a Ride Share Platform for RoboTaxis, a live demo of hundreds of agents coordinating thousands of rides. The talk highlights an event-driven architecture that keeps agents asynchronous and loosely coupled, with lightweight messaging and a scalable dispatcher. It offers production-minded patterns beyond demos and hints at real-world robo-taxi potential. Takeaway: it shows builders how to scale multi-agent systems in production (audience appreciated the practical blueprint and linked demo resources).
TECH STACK
|
Too Big to Think
Josh Barron presented Too Big to Think, a condensed ICML-inspired demo on capacity, memorization, and generalization in pre-trained transformers. The approach trains small, capacity-limited transformers from scratch on two synthetic tasks to isolate generalization from memorization, showing that tiny models can extrapolate where bigger ones memorize. The work is backed by open code in a GitHub repo and a linked arXiv paper. It highlights edge compute relevance for robotics and on-device AI. Takeaway: model capacity shapes learning for practical deployment.
TECH STACK
|
Fantasy Football MCP Bot
Brian Matzelle from SageSure presented How to lose in fantasy football, a MCP app that lets Claude autonomously manage ESPN Fantasy teams through natural conversation. The demo uses a four-component stack (browser extension, Next.js client, FastMCP server, ESPN APIs) and a dynamic memory system with real-time monitoring and 36 tools for rosters, trades, waivers. It highlights 77% faster responses via context injection and safe API writes with smart tool discovery, a pattern survey feedback liked. A blueprint for end-to-end MCP apps.
|
How to Ship Complex Features 10x Faster with AI Agents | Dex Horthy (HumanLayer)
How to Run Open-Source LLMs Locally on a Mac with MLX-LM
You are one of 95,000+ readers from Anthropic, OpenAI, Google, Microsoft, Meta, Apple, Amazon, Nvidia, Netflix, Stripe, Databricks, Snowflake, and others — spanning frontier labs, big tech, startups, and top universities.


On-Device Apple VLM + Windows Speech 🚀