We also saw builders pushing AI deeper into developer workflows. Asha Somayajula’s git-cli: Ollama Git Command Generator lets you generate Git commands with natural language, and Matthew Mirman shared techniques for Coding on the Phone.
Finally, there were several notable innovative approaches to data access and local processing. Mike Schirtzinger’s Silent Notetaker: local browser transcription keeps audio on-device, and Boris demoed LinkUp for real-time web access.
Matthew Mirman, CEO of chat.dev, presented Hacking on the Phone, showing how coding over mobile can feel surprisingly fast. The demo uses chat.dev to run sandboxed, persistent Codex and Claude agent sessions on cloud-hosted Linux VMs, so the workflow keeps context while you iterate from a phone. Builders liked the live WIP feel and the hands-on clarity of how the loop works. It directly tackles the everyday pain of desktop-only tooling, hinting at a practical path for “agentic dev” as a portable product.
Mike Schirtzinger presented Silent Notetaker, a meeting notetaker that transcribes, labels speakers, and extracts decisions, action items, and open questions entirely in the browser with a single HTML file, no backend or sign-in. It runs the heavy transcription model on GPU via WebGPU while speaker and question models run on CPU with multithreaded WebAssembly, avoiding GPU contention and the “invisible memory leak” issues that show up in long-running streams. People seemed especially excited about the near feature parity with tools they already use, plus the open-source code they could tweak. It felt timely for the shift toward efficient local, privacy-first agent UX, and it could become a practical product for teams that want meeting memory without cloud costs.
Taylor Johnson presented SearchBench, a harness that runs controlled evaluation rounds over coding-agent search behavior, using a bounded ablation workflow to test whether agents locate the exact files human fixes changed. SearchBench compares incumbent strategies against challengers, emitting an evidence bundle with exact-hit, hop-distance, token usage, and failure artifacts, while estimating cost before provider calls. We liked how the “separable knobs” framing made search policy feel debuggable and budget-safe, and there was a subtle sense people enjoyed the practical results. It matched today’s agent evaluation push toward evidence over vibes.
Sebastian Muriel from PostHog presented how he enables customer-facing agents to do their best work by turning PostHog event data into queryable signals the agent can reason over. The demo runs an interactive terminal flow where Claude debugs a test GTM question, pulling from mechanics (product source code), engagement telemetry, and instance state, then explaining each contribution. Keeping those context sources separate made the reasoning auditable, and people seemed to enjoy that clarity. It felt especially timely for today’s agent enablement push, and the open-source PostHog/gtm-toolkit gives builders a practical starting point for agentized customer ops.
Travis Johnson showed an autonomous development bake-off where the same long-horizon health-dashboard build task ran under different agentic coding setups, keeping the spec constant while only the harness and workflow changed. He compared Claude Code vs Codex alongside Ultracode vs Compound Engineering, then walked through task specs, run logs and trace diffs, including the runs that quietly derailed. It stood out because it turned agentic iteration into measurable post-mortems, which developers could reproduce and learn from (people seemed to value the transparency). For real product teams building health agents, that kind of controlled evaluation is the shortest path from “works” to “reliable,” especially as autonomous tooling keeps scaling.
Asha Somayajula from CarNow Inc presented git-cli, a Rust-based CLI that turns natural-language requests into safe git commands by querying a local Ollama LLM. The tool runs entirely offline, was published on crates.io for easy installs, and ships with a clear GitHub repo for inspection and tweaks. (People loved the practical, guardrail-friendly workflow.) We liked how it made LLM command generation feel like everyday developer tooling, and it showed how local agentic UX can scale into real product-grade automation.
Boris from Linkup presented LinkUp, a live demo of structured, real-time web access for AI agents that aims to replace brittle scraping wrappers and hallucination-prone “guessing” loops. The demo focuses on a reusable retrieval layer pattern so agents can fetch fresh sources during execution, using clear agent loop boundaries instead of pure model memory. It felt especially practical for builders wrestling with stale training data, and people seemed to like how cleanly the pattern could be reused in new tools. We liked it because it points toward web-connected agents becoming everyday infrastructure, not a one-off hack.
John S. Moh presented Amplify Your Ideas, Pitches, and Stories with AI Digital Media, showing how custom motion-graphics workflows turn “stupid idea” drafts into polished contest entries, accelerator pitches, and even a NotebookLM-style podcast. He used Claude Code to programmatically align music to animation beats, and ElevenLabs voiceover to turn visuals into a coherent narrative. We liked it because the playful iteration made the lessons stick (people seemed to love the vibe), and it maps neatly onto today’s agentic skilling by turning drafts into customer-ready artifacts.
Iordanis Kerenidis presented Reading Microstructure with Temporal Fusion Transformers, showing how a Temporal Fusion Transformer models equity-index futures intraday microstructure and learns regime, trend vs mean reversion, feature relevance, and adaptive lookback decisions. The talk leans on transformer-based time-series modeling to replace brittle static rules with learned “judgment” signals. Since people seemed especially taken with the interpretability of those dynamic decisions, it felt broadly useful for anyone building agentic trading or analytics. It’s a strong template for turning complex market behavior into actionable features.
Matt Mireles demoed Roast.fm, a site where you upload files, images, or links and it generates personalized, live audience roasts. Under the hood, it uses an LLM as a judge to score and refine humor, turning messy inputs into something consistently punchy. He’s the kind of builder who thinks in eval loops and human-computer symbiosis, and it shows. People seemed to really enjoy how practical the approach felt, plus the cautionary lesson about safety filters and what they still miss. We liked it because it makes evaluation and iteration teachable, not mysterious.
Laurent Fabre, Field CTO at Databricks and a Guest Lecturer at HEC Paris, showcased Databricks & NeuralK: a demo where tabular foundation models predict financial outcomes with minimal fine-tuning. The workflow runs inside a Databricks environment, using Databricks-first experimentation loops to bring pre-trained signal to structured datasets faster. It stood out because the blueprint felt repeatable for enterprise teams working with messy, regulated data (people loved the practical path from model to metrics). We liked it as a real-world example of getting from prototype to product-ready ML sooner.
Luke Freiler from Centercode presented UserVolley, an AI agent that drives a real validation project by volleying between real users and synthetic users to sharpen ground truth. UserVolley simulates realistic product testing at scale using AI, then coordinates iterative feedback loops to improve what teams ship. It also reframed validation as an AI-era workflow for enterprise teams, and (people seemed to love how practical it felt). We liked it because it shows agentic evaluation moving from theory to daily utility, the same direction builders are heading.
Saeed Amen of Turnleaf Analytics presented a walkthrough of decomposing, forecasting, and trading inflation using machine learning and alternative data. The approach breaks inflation into components, forecasts either each part or the aggregate with large datasets, then maps those forecasts into systematic trades across macro asset classes. It kept a crisp end-to-end signal-to-execution pipeline, which many people seemed to really enjoy (and that comes through in the feedback). We liked it because it showed a reusable forecasting pattern builders can adapt to their own alt-data workflows.
Mike Scherbakov presented a Chrome extension demo where a 4B in-browser Gemma 4 model turns everyday browsing into a SKILL.md. The extension logs activity into a local browser database, then runs via WebGPU without leaving the device, using chunked model downloads (Range fetches) and a Service Worker alarm fallback to keep timing reliable. Reconstructing intent from copy paste patterns and app switching made the flow feel genuinely useful (people loved it). It also showed how privacy first agentic patterns can become practical day to day, not just demos.
Michael Geiger demoed Vibe-Coded Plug-Ins, where a user asks for a custom dashboard tile and the system generates a Vue component plus a typed props schema. It resolves live props through an agent using tenant-scoped tools, then renders the result inside a sandboxed iframe and only persists after the component mounts with non-blank output and an empty error log, feeding compile and runtime errors into a repair loop. As a product-minded full-stack builder at Our Tools LLC, he focused on making generated UI trustworthy enough to become durable state, and it clicked with folks who like real guardrails over magic.
You are one of 95,000+ readers from OpenAI, Anthropic, Google DeepMind, Mistral AI, Cohere, ElevenLabs, Perplexity, Midjourney, Scale AI, Together AI, Groq, Weaviate, and others — spanning frontier labs, big tech, startups, and top universities.
💡
This isn't a newsletter. It's a build log.
AI Tinkerers is a global community of people who ship real stuff with AI.
Top AI Demos #31: On-Device AI, Agent Code Search, and Phone Coding