On-Device Apple VLM + Windows Speech 🚀 [AI Tinkerers - Post-Training] .

On-Device Apple VLM + Windows Speech 🚀

AI Tinkerers

On-Device Apple VLM + Windows Speech 🚀

Issue #7 · Week of October 20

Joe Heitzeberg
Joe Heitzeberg • Founder at AI Tinkerers • ⏱️ 1 min read
Creating space for leading builders to share ideas, grow, and make an impact.

We pulled the standouts from the last 2 weeks: lots of agent/tool integration, multimodal work and RAG builds tied to the broader push (Qwen3‑VL/Qwen3‑Omni and mobile Moondream wins). Highlights: Marko Budisic (Raleigh) showed EVA—video‑transcript RAG with precise timestamps; Eugene Yan (Seattle) demoed an LLM‑RecSys hybrid that embeds product IDs into Qwen3‑8B; Leonard lin putting Tokyo on the board with his Strix Halo tests and Rach from Singapore shows the way to sub-4ms code-search for AI agents. These are real, vetted builds—read on.

Top 5 Picks (October 20)
1 TOP PICK

Testing and Benchmarking AMD Strix Halo's (Ryzen 395) AI Capabilities

Profile photo

Leonard Lin

CTO at Shisa.AI

Leonard Lin, CTO of Shisa.AI, showed off the new Framework Desktop, which runs AMD's latest Ryzen AI Max 395 (Strix Halo) APU. The unique thing about this small machine is that it has 128GB of unified memory, a relatively capable GPU (theoretical 60 FP16 TFLOPS) and Vulkan and ROCm support. He showed his work getting PyTorch with AOTriton/FA and vLLM running, as well as llama.cpp benchmark sweeps for models, including large MoEs like the new gpt-oss-120b inferencing locally at >50 tok/s.
2 RUNNER UP

SiftDB: Grep-Native Code Search

Profile photo

Rach Pradhan

Angel Investor at Angel Investor

Rach Pradhan from Menlo.ai presented SiftDB, a grep-native database that enabled AI agents to search codebases with sub-4ms latency. It showcases a grep-native indexing approach and a lightweight architecture that accelerates queries and supports tool-call integration into developer workflows. The project is open-source with runnable code and docs, giving builders a practical blueprint for production-ready, agent-centric code search. Survey feedback hinted strong interest from the community for tooling that speeds debugging and knowledge discovery.
3 COMMUNITY FAVORITE

Building an Uber for RoboTaxis

Profile photo

Abhimanyu Selvan

Developer Relations Leader | Engineer at heartbyte.io

Abhimanyu 'Chitra' Selvan from DigitalOcean presented What I Learned Building a Ride Share Platform for RoboTaxis, a live demo of hundreds of agents coordinating thousands of rides. The talk highlights an event-driven architecture that keeps agents asynchronous and loosely coupled, with lightweight messaging and a scalable dispatcher. It offers production-minded patterns beyond demos and hints at real-world robo-taxi potential. Takeaway: it shows builders how to scale multi-agent systems in production (audience appreciated the practical blueprint and linked demo resources).
4 STANDOUT

Too Big to Think

Profile photo

Josh Barron

Applied Scientist at Amazon

Josh Barron presented Too Big to Think, a condensed ICML-inspired demo on capacity, memorization, and generalization in pre-trained transformers. The approach trains small, capacity-limited transformers from scratch on two synthetic tasks to isolate generalization from memorization, showing that tiny models can extrapolate where bigger ones memorize. The work is backed by open code in a GitHub repo and a linked arXiv paper. It highlights edge compute relevance for robotics and on-device AI. Takeaway: model capacity shapes learning for practical deployment.
5 NOTABLE

Fantasy Football MCP Bot

Profile photo

Brian Matzelle

Software Engineer at SageSure

Brian Matzelle from SageSure presented How to lose in fantasy football, a MCP app that lets Claude autonomously manage ESPN Fantasy teams through natural conversation. The demo uses a four-component stack (browser extension, Next.js client, FastMCP server, ESPN APIs) and a dynamic memory system with real-time monitoring and 36 tools for rosters, trades, waivers. It highlights 77% faster responses via context injection and safe API writes with smart tool discovery, a pattern survey feedback liked. A blueprint for end-to-end MCP apps.

More Great Builds
Quick hits from the community — demos worth bookmarking:
Profile photo
David CournapeauAI Tinkerers - Tokyo • Oct 10
David Cournapeau demonstrated Test2Synth, a hands-on demo of controlling a hardware synth from text. It shows an LLM-to-MIDI pipeline that translates natural language into MIDI commands, via Python. Depending on gear, the setup runs as video or live hardware, mapping prompts to sound parameters. The project earned a nomination note for pairing a powerful model with a MIDI synthesizer This hands-on approach shows how accessible pipelines turn devices into programmable instruments for builders.
Loading tech tags...
Changyu Hu from Japan Communication Inc presented a browser-based DIY Wedding Translator that delivers real-time captions in English, Mandarin, and Japanese at a Tokyo wedding. The demo features a heartbeat timer, a lightweight operator dashboard, and safety checks; it slices audio on a steady beat and streams subtitles to a shared screen. This hands-on approach highlights practical tricks for speech tools and hints at a privacy-conscious, low-cost live-translation product for events.
Loading tech tags...
Ron Jailall, an ML engineer based in Raleigh, NC, presented Edge AI: exploring the capabilities of Apple’s VLM. The demo shows a quantized fine-tuned Apple VLM for iOS/macOS that runs on Apple Silicon, probing Q&A quality and multilingual visual/text support. It also examines prompt responsiveness and cross-hardware resource use. Survey feedback suggests strong interest from the audience. For builders, it shows on-device, multimodal inference is ready for practical experiments.
Loading tech tags...
Osman Ramadan presented When Context Becomes Memory: Building a Self-Improving Agentic System, a CodeWords demo that turns chat interactions into continuous agentic learning. It is built with Python and Claude LLM, and it orchestrates modular microservices through an MCP based workflow, letting an agent read its own code, detect recurring failures, propose a pull request, and validate fixes in real time. The approach shows self-reflective reasoning and practical runtime context engineering. Builders responded with interest, hinting at autonomous workflow automation.
Loading tech tags...
Raj Bala, a founder and former AWS/Google/Gartner analyst, presented Google Maps for Vector Embeddings. The demo visualizes high-dimensional embeddings by projecting them to 2D with UMAP, creating a zoomable, interactive map of semantic space. It emphasizes practical visualization for debugging and comparing embeddings, with a public demo and open resources for quick experimentation. Subtle signals from the audience appreciated the tangible accessibility, and the approach could be a helpful layer for RAG tooling in production.
Loading tech tags...
Marko Budisic from Framatome presented EVA - Enhanced Video Archive, a system that turns legacy video into a searchable knowledge base by indexing transcripts, descriptions, and keyframes and using RAG to answer queries. It returns a report with linked screenshots and exact timestamps, plus an in-line video player and screenshot tooling. Marko, a PhD mechanical engineer and robotics/AI tech lead at Framatome, brings hands-on prototyping and a developing OCR pipeline.
Loading tech tags...
Profile photo
Shekhar UpadhayaAI Tinkerers - Tokyo • Oct 10
Shekhar Upadhaya, a co-founder of Skyhost, presented MapScroll: Prompt to Maps within minutes, a copilot for maps that lets creators build and share narrative maps. MapScroll uses Mapbox GL for AI driven, customized geospatial data visualization and showcases an LLM-to-map pipeline driving interactive narrative generation, with a public demo and code for reproducibility. The talk highlights new experimental features and practical use cases for educators and explorers, hinting at scalable location-aware storytelling. It shows end-to-end demos can become production tools.
Loading tech tags...
Dan Moore, founder of Tarka, presented AutoLearn-What happens when we make Agents deterministic. He showed how agent reasoning loops move off the transformer into deterministic code. The project crystallizes reasoning into persistent, executable Python skills via a FastAPI MCP server with GitHub at https://github.com/tarkaai/autolearn and the MCP server at https://www.agentr9y.com/. Survey feedback subtly suggests people appreciate reproducibility and testability. For builders, it hints at a production-ready path for auditable autonomous agents.
Loading tech tags...
Ash Tewari showed SPCHR, an AI helper that adds speech input to Windows apps. He, from Applied Information Sciences, wires brief Windows API hooks to transcribe speech locally or via Azure/Whisper and feed it into apps in real time. The project is open source (GitHub) with a walkthrough, and survey feedback notes its practicality. This on-device approach preserves privacy and hints at broader enterprise use without refactoring software. For builders, it shows a concrete path to voice-first workflows.
Loading tech tags...
Eugene Yan from Amazon presented How to Train an LLM-RecSys Hybrid for Steerable Recs, showing a finetuned Qwen3-8B that understands product IDs. He demonstrated a bilingual model that treats items as vocabulary and steers recommendations via chat. The project runs end-to-end in a single model and maps items into token space with RQ-VAE. Finetuning emits structured product IDs alongside natural language, with open-source code at eugeneyan.com and GitHub. Survey feedback hinted interest and approach points toward practical, steerable LLM recommenders. Takeaway for builders: a practical blueprint for production-ready, end-to-end LLM-powered recommendations.
Loading tech tags...

🏆 Hackathon Spotlight
Recent AI Tinkerers Hackathon Winners
🥇 1st /shipit
ShipIt brings Cursor-style autocomplete and ADK-powered automation to Figma—using Gemini 2.5‑Flash via Vertex AI to provide real-time component suggestions, AI-driven critiques, and human-in-the-loop acceptance, cutting landing page time 3×.
🥈 2nd Actual Code
ActualCode built a seven-agent A2A system on Vertex AI (Gemini Pro and Flash) that converts any GitHub repository into validated, repo-specific coding assessments in 2–3 minutes—a first-of-its-kind A2A hackathon demo that earned 2nd place.
🥉 3rd V.I.S.I.O.N
V.I.S.I.O.N. leverages the Google ADK and Vertex AI to transform YouTube tutorials into functional code within local IDEs through multimodal analysis of visual and audio context. This collaborative effort features engineers and researchers from Shopify, Walmart, and GHY International with deep expertise in production ML and autonomous multi-agent pipelines.
🥇 1st Discovery
Discovery built a real‑time, camera-based AI travel companion that self-hosts a vision model on Google Cloud Run's L4 GPUs and pipelines Gemma/Magistral, Gemini 2.5, and ElevenLabs TTS into scalable, interactive audio guides.
SoccerVision built a scalable Cloud Run pipeline combining YOLO-based player/ball detection, object tracking and a multimodal LLM (Gemini) to generate structured event JSON and annotated, subtitle-overlaid match videos—turning raw clips into real-time football analysis.

🎬 Latest Content

How to Ship Complex Features 10x Faster with AI Agents | Dex Horthy (HumanLayer)

One-Shot • Mar 04
Dex Horthy (HumanLayer) breaks down the “12 Factor Agents” approach to shipping multi-step agentic workflows faster: structured outputs, ...
Watch Now →

How to Run Open-Source LLMs Locally on a Mac with MLX-LM

Deep Dive Series • Jun 12
Run open-source LLMs locally on Apple Silicon with Apple’s MLX-LM: `pip install mlx-lm`, then `load()` a Hugging Face model and call `gen...
Read More →

💼 Top Job Matches
Matched based on your meetup activity and profile
Paxos Health • New York & Toronto • $110k - $175k (varies w/ location/level); generous equity
Stanford-founded Seed-stage healthcare AI startup with >$5M in VC funding and AI agents deployed in production with cu...
Apply Now →
Dex • London (5 days on-site) • £250,000
Frontier AI engineering role building the AI tooling layer for complex financial modelling.
Apply Now →
Jakib AI • Columbus, OH
Jakib is a profitable, growing applied AI firm embedded with operator-led companies in logistics, manufacturing, and c...
Apply Now →

You are one of 95,000+ readers from Anthropic, OpenAI, Google, Microsoft, Meta, Apple, Amazon, Nvidia, Netflix, Stripe, Databricks, Snowflake, and others — spanning frontier labs, big tech, startups, and top universities.

Ready for more?

Check out other posts from this blog.

View all posts