On-Device Apple VLM + Windows Speech 🚀

Joe Heitzeberg • Founder at AI Tinkerers • ⏱️ 1 min read

Creating space for leading builders to share ideas, grow, and make an impact.

We pulled the standouts from the last 2 weeks: lots of agent/tool integration, multimodal work and RAG builds tied to the broader push (Qwen3‑VL/Qwen3‑Omni and mobile Moondream wins). Highlights: Marko Budisic (Raleigh) showed EVA—video‑transcript RAG with precise timestamps; Eugene Yan (Seattle) demoed an LLM‑RecSys hybrid that embeds product IDs into Qwen3‑8B; Leonard lin putting Tokyo on the board with his Strix Halo tests and Rach from Singapore shows the way to sub-4ms code-search for AI agents. These are real, vetted builds—read on.

Top 5 Picks (October 20)

1 TOP PICK

Testing and Benchmarking AMD Strix Halo's (Ryzen 395) AI Capabilities

Leonard Lin

CTO at Shisa.AI

📍 AI Tinkerers - Tokyo • Oct 10

Leonard Lin, CTO of Shisa.AI, showed off the new Framework Desktop, which runs AMD's latest Ryzen AI Max 395 (Strix Halo) APU. The unique thing about this small machine is that it has 128GB of unified memory, a relatively capable GPU (theoretical 60 FP16 TFLOPS) and Vulkan and ROCm support. He showed his work getting PyTorch with AOTriton/FA and vLLM running, as well as llama.cpp benchmark sweeps for models, including large MoEs like the new gpt-oss-120b inferencing locally at >50 tok/s.

TECH STACK

PROJECT LINKS

2 RUNNER UP

SiftDB: Grep-Native Code Search

Rach Pradhan

Angel Investor at Angel Investor

📍 AI Tinkerers - Singapore • Oct 07

Rach Pradhan from Menlo.ai presented SiftDB, a grep-native database that enabled AI agents to search codebases with sub-4ms latency. It showcases a grep-native indexing approach and a lightweight architecture that accelerates queries and supports tool-call integration into developer workflows. The project is open-source with runnable code and docs, giving builders a practical blueprint for production-ready, agent-centric code search. Survey feedback hinted strong interest from the community for tooling that speeds debugging and knowledge discovery.

TECH STACK

PROJECT LINKS

3 COMMUNITY FAVORITE

Building an Uber for RoboTaxis

Abhimanyu Selvan

Developer Relations Leader | Engineer at heartbyte.io

📍 AI Tinkerers - Amsterdam • Oct 10

Abhimanyu 'Chitra' Selvan from DigitalOcean presented What I Learned Building a Ride Share Platform for RoboTaxis, a live demo of hundreds of agents coordinating thousands of rides. The talk highlights an event-driven architecture that keeps agents asynchronous and loosely coupled, with lightweight messaging and a scalable dispatcher. It offers production-minded patterns beyond demos and hints at real-world robo-taxi potential. Takeaway: it shows builders how to scale multi-agent systems in production (audience appreciated the practical blueprint and linked demo resources).

TECH STACK

PROJECT LINKS

4 STANDOUT

Too Big to Think

Josh Barron

Applied Scientist at Amazon

📍 AI Tinkerers - Austin • Oct 09

Josh Barron presented Too Big to Think, a condensed ICML-inspired demo on capacity, memorization, and generalization in pre-trained transformers. The approach trains small, capacity-limited transformers from scratch on two synthetic tasks to isolate generalization from memorization, showing that tiny models can extrapolate where bigger ones memorize. The work is backed by open code in a GitHub repo and a linked arXiv paper. It highlights edge compute relevance for robotics and on-device AI. Takeaway: model capacity shapes learning for practical deployment.

TECH STACK

PROJECT LINKS

5 NOTABLE

Fantasy Football MCP Bot

Brian Matzelle

Software Engineer at SageSure

📍 AI Tinkerers - New York City • Oct 02

Brian Matzelle from SageSure presented How to lose in fantasy football, a MCP app that lets Claude autonomously manage ESPN Fantasy teams through natural conversation. The demo uses a four-component stack (browser extension, Next.js client, FastMCP server, ESPN APIs) and a dynamic memory system with real-time monitoring and 36 tools for rosters, trades, waivers. It highlights 77% faster responses via context injection and safe API writes with smart tool discovery, a pattern survey feedback liked. A blueprint for end-to-end MCP apps.

TECH STACK

Claude Sonnet 4

Anthropic TypeScript SDK 0.61.0

PROJECT LINKS

More Great Builds

Quick hits from the community — demos worth bookmarking:

Test2Synth: text-to-MIDI

David Cournapeau • AI Tinkerers - Tokyo • Oct 10

David Cournapeau demonstrated Test2Synth, a hands-on demo of controlling a hardware synth from text. It shows an LLM-to-MIDI pipeline that translates natural language into MIDI commands, via Python. Depending on gear, the setup runs as video or live hardware, mapping prompts to sound parameters. The project earned a nomination note for pairing a powerful model with a MIDI synthesizer This hands-on approach shows how accessible pipelines turn devices into programmable instruments for builders.

Loading tech tags...

Wedding Captioner: 3-Lang Live

Changyu Hu • AI Tinkerers - Tokyo • Oct 10

Changyu Hu from Japan Communication Inc presented a browser-based DIY Wedding Translator that delivers real-time captions in English, Mandarin, and Japanese at a Tokyo wedding. The demo features a heartbeat timer, a lightweight operator dashboard, and safety checks; it slices audio on a steady beat and streams subtitles to a shared screen. This hands-on approach highlights practical tricks for speech tools and hints at a privacy-conscious, low-cost live-translation product for events.

Loading tech tags...

live-translate.app

Apple VLM: on-device test

Ron Jailall • AI Tinkerers - Raleigh • Sep 30

Ron Jailall, an ML engineer based in Raleigh, NC, presented Edge AI: exploring the capabilities of Apple’s VLM. The demo shows a quantized fine-tuned Apple VLM for iOS/macOS that runs on Apple Silicon, probing Q&A quality and multilingual visual/text support. It also examines prompt responsiveness and cross-hardware resource use. Survey feedback suggests strong interest from the audience. For builders, it shows on-device, multimodal inference is ready for practical experiments.

Loading tech tags...

CodeWords Self-Improving Agent

Osman Ramadan • AI Tinkerers - San Francisco • Oct 09

Osman Ramadan presented When Context Becomes Memory: Building a Self-Improving Agentic System, a CodeWords demo that turns chat interactions into continuous agentic learning. It is built with Python and Claude LLM, and it orchestrates modular microservices through an MCP based workflow, letting an agent read its own code, detect recurring failures, propose a pull request, and validate fixes in real time. The approach shows self-reflective reasoning and practical runtime context engineering. Builders responded with interest, hinting at autonomous workflow automation.

Loading tech tags...

codewords.ai

Video

ArXiv Paper

UMAP Embedding Explorer

Raj Bala • AI Tinkerers - Boston • Sep 29

Raj Bala, a founder and former AWS/Google/Gartner analyst, presented Google Maps for Vector Embeddings. The demo visualizes high-dimensional embeddings by projecting them to 2D with UMAP, creating a zoomable, interactive map of semantic space. It emphasizes practical visualization for debugging and comparing embeddings, with a public demo and open resources for quick experimentation. Subtle signals from the audience appreciated the tangible accessibility, and the approach could be a helpful layer for RAG tooling in production.

Loading tech tags...

milliondollarvectors.com

ragwalla.com

EVA: Searchable Video Archive

Marko Budisic • AI Tinkerers - Raleigh • Sep 30

Marko Budisic from Framatome presented EVA - Enhanced Video Archive, a system that turns legacy video into a searchable knowledge base by indexing transcripts, descriptions, and keyframes and using RAG to answer queries. It returns a report with linked screenshots and exact timestamps, plus an in-line video player and screenshot tooling. Marko, a PhD mechanical engineer and robotics/AI tech lead at Framatome, brings hands-on prototyping and a developing OCR pipeline.

Loading tech tags...

github.com

huggingface.co

Video

MapScroll: Prompt-to-Map Stories

Shekhar Upadhaya • AI Tinkerers - Tokyo • Oct 10

Shekhar Upadhaya, a co-founder of Skyhost, presented MapScroll: Prompt to Maps within minutes, a copilot for maps that lets creators build and share narrative maps. MapScroll uses Mapbox GL for AI driven, customized geospatial data visualization and showcases an LLM-to-map pipeline driving interactive narrative generation, with a public demo and code for reproducibility. The talk highlights new experimental features and practical use cases for educators and explorers, hinting at scalable location-aware storytelling. It shows end-to-end demos can become production tools.

Loading tech tags...

mapscroll.ai

AutoLearn: Deterministic Agents

Dan Moore • AI Tinkerers - Seattle • Oct 06

Dan Moore, founder of Tarka, presented AutoLearn-What happens when we make Agents deterministic. He showed how agent reasoning loops move off the transformer into deterministic code. The project crystallizes reasoning into persistent, executable Python skills via a FastAPI MCP server with GitHub at https://github.com/tarkaai/autolearn and the MCP server at https://www.agentr9y.com/. Survey feedback subtly suggests people appreciate reproducibility and testability. For builders, it hints at a production-ready path for auditable autonomous agents.

Loading tech tags...

github.com

autolearn.dev

Video

AI Speech Inject for Windows

Ash Tewari • AI Tinkerers - Raleigh • Sep 30

Ash Tewari showed SPCHR, an AI helper that adds speech input to Windows apps. He, from Applied Information Sciences, wires brief Windows API hooks to transcribe speech locally or via Azure/Whisper and feed it into apps in real time. The project is open source (GitHub) with a walkthrough, and survey feedback notes its practicality. This on-device approach preserves privacy and hints at broader enterprise use without refactoring software. For builders, it shows a concrete path to voice-first workflows.

Loading tech tags...

github.com

tewari.info

Qwen3-RecSys Steerable Recs

Eugene Yan • AI Tinkerers - Seattle • Sep 30

Eugene Yan from Amazon presented How to Train an LLM-RecSys Hybrid for Steerable Recs, showing a finetuned Qwen3-8B that understands product IDs. He demonstrated a bilingual model that treats items as vocabulary and steers recommendations via chat. The project runs end-to-end in a single model and maps items into token space with RQ-VAE. Finetuning emits structured product IDs alongside natural language, with open-source code at eugeneyan.com and GitHub. Survey feedback hinted interest and approach points toward practical, steerable LLM recommenders. Takeaway for builders: a practical blueprint for production-ready, end-to-end LLM-powered recommendations.

Loading tech tags...

eugeneyan.com

github.com

Video

🏆 Hackathon Spotlight

Recent AI Tinkerers Hackathon Winners

🗓️ AI Tinkerers & Google Cloud: Agents Hackathon Toronto
Toronto • Sep 30, 2025

🥇 1st /shipit

ShipIt brings Cursor-style autocomplete and ADK-powered automation to Figma—using Gemini 2.5‑Flash via Vertex AI to provide real-time component suggestions, AI-driven critiques, and human-in-the-loop acceptance, cutting landing page time 3×.

🥈 2nd Actual Code

ActualCode built a seven-agent A2A system on Vertex AI (Gemini Pro and Flash) that converts any GitHub repository into validated, repo-specific coding assessments in 2–3 minutes—a first-of-its-kind A2A hackathon demo that earned 2nd place.

🥉 3rd V.I.S.I.O.N

V.I.S.I.O.N. leverages the Google ADK and Vertex AI to transform YouTube tutorials into functional code within local IDEs through multimodal analysis of visual and audio context. This collaborative effort features engineers and researchers from Shopify, Walmart, and GHY International with deep expertise in production ML and autonomous multi-agent pipelines.

🗓️ AI Tinkerers Paris Hackathon – October 11, 2025
Paris • Oct 11, 2025

🥇 1st Discovery

Discovery built a real‑time, camera-based AI travel companion that self-hosts a vision model on Google Cloud Run's L4 GPUs and pipelines Gemma/Magistral, Gemini 2.5, and ElevenLabs TTS into scalable, interactive audio guides.

🥈 2nd SoccerVision by PeterDrury

SoccerVision built a scalable Cloud Run pipeline combining YOLO-based player/ball detection, object tracking and a multimodal LLM (Gemini) to generate structured event JSON and annotated, subtitle-overlaid match videos—turning raw clips into real-time football analysis.

🎬 Latest Content

How to Ship Complex Features 10x Faster with AI Agents | Dex Horthy (HumanLayer)

One-Shot • Mar 04

Dex Horthy (HumanLayer) breaks down the “12 Factor Agents” approach to shipping multi-step agentic workflows faster: structured outputs, ...

Watch Now →

How to Run Open-Source LLMs Locally on a Mac with MLX-LM

Deep Dive Series • Jun 12

Run open-source LLMs locally on Apple Silicon with Apple’s MLX-LM: `pip install mlx-lm`, then `load()` a Hugging Face model and call `gen...

💼 Top Job Matches

Matched based on your meetup activity and profile

Founding Applied AI Lead

Paxos Health • New York & Toronto • $110k - $175k (varies w/ location/level); generous equity

Stanford-founded Seed-stage healthcare AI startup with >$5M in VC funding and AI agents deployed in production with cu...

Apply Now →

AI Engineer: Build Frontier Agents Reinventing Financial Modelling

Dex • London (5 days on-site) • £250,000

Frontier AI engineering role building the AI tooling layer for complex financial modelling.

Apply Now →

Software Engineer (All Levels) — Applied AI

Jakib AI • Columbus, OH

Jakib is a profitable, growing applied AI firm embedded with operator-led companies in logistics, manufacturing, and c...

Apply Now →

View All Jobs Post a Job

You are one of 95,000+ readers from Anthropic, OpenAI, Google, Microsoft, Meta, Apple, Amazon, Nvidia, Netflix, Stripe, Databricks, Snowflake, and others — spanning frontier labs, big tech, startups, and top universities.

On-Device Apple VLM + Windows Speech 🚀