Top AI Demos #31: On-Device AI, Agent Code Search, and Phone Coding [AI Tinkerers - Post-Training] .

Top AI Demos #31: On-Device AI, Agent Code Search, and Phone Coding

AI Tinkerers

Top AI Demos #31: On-Device AI, Agent Code Search, and Phone Coding

Issue #31 · Week of June 15

Joe Heitzeberg
Joe Heitzeberg • Founder at AI Tinkerers • ⏱️ 1 min read
Creating space for leading builders to share ideas, grow, and make an impact.

This week’s AI Tinkerers roundup highlights builders focused on making AI agents more reliable and useful. For instance, SearchBench: Evaluating Agent Code Search by Taylor Johnson tackles agent search, while Luke Freiler’s UserVolley: AI-Driven Product Validation uses agents for product testing.

We also saw builders pushing AI deeper into developer workflows. Asha Somayajula’s git-cli: Ollama Git Command Generator lets you generate Git commands with natural language, and Matthew Mirman shared techniques for Coding on the Phone.

Finally, there were several notable innovative approaches to data access and local processing. Mike Schirtzinger’s Silent Notetaker: local browser transcription keeps audio on-device, and Boris demoed LinkUp for real-time web access.

Top 5 Picks (June 15)
1 TOP PICK

Coding on the Phone

Profile photo

Matthew Mirman

CEO at chat.dev

Matthew Mirman, CEO of chat.dev, presented Hacking on the Phone, showing how coding over mobile can feel surprisingly fast. The demo uses chat.dev to run sandboxed, persistent Codex and Claude agent sessions on cloud-hosted Linux VMs, so the workflow keeps context while you iterate from a phone. Builders liked the live WIP feel and the hands-on clarity of how the loop works. It directly tackles the everyday pain of desktop-only tooling, hinting at a practical path for “agentic dev” as a portable product.
TECH STACK
2 RUNNER UP

Silent Notetaker: local browser transcription

Profile photo

Mike Schirtzinger

Founder at Brevity

Mike Schirtzinger presented Silent Notetaker, a meeting notetaker that transcribes, labels speakers, and extracts decisions, action items, and open questions entirely in the browser with a single HTML file, no backend or sign-in. It runs the heavy transcription model on GPU via WebGPU while speaker and question models run on CPU with multithreaded WebAssembly, avoiding GPU contention and the “invisible memory leak” issues that show up in long-running streams. People seemed especially excited about the near feature parity with tools they already use, plus the open-source code they could tweak. It felt timely for the shift toward efficient local, privacy-first agent UX, and it could become a practical product for teams that want meeting memory without cloud costs.
PROJECT LINKS
github.com
3 COMMUNITY FAVORITE

SearchBench: Evaluating Agent Code Search

Profile photo

Taylor Johnson

Full-Stack Engineer (Infra-Focused) at becker63.digital

Taylor Johnson presented SearchBench, a harness that runs controlled evaluation rounds over coding-agent search behavior, using a bounded ablation workflow to test whether agents locate the exact files human fixes changed. SearchBench compares incumbent strategies against challengers, emitting an evidence bundle with exact-hit, hop-distance, token usage, and failure artifacts, while estimating cost before provider calls. We liked how the “separable knobs” framing made search policy feel debuggable and budget-safe, and there was a subtle sense people enjoyed the practical results. It matched today’s agent evaluation push toward evidence over vibes.
PROJECT LINKS
github.com
4 STANDOUT

PostHog: Context-Rich AI Agents

Profile photo

Sebastian Muriel

Technical CSM at PostHog

Sebastian Muriel from PostHog presented how he enables customer-facing agents to do their best work by turning PostHog event data into queryable signals the agent can reason over. The demo runs an interactive terminal flow where Claude debugs a test GTM question, pulling from mechanics (product source code), engagement telemetry, and instance state, then explaining each contribution. Keeping those context sources separate made the reasoning auditable, and people seemed to enjoy that clarity. It felt especially timely for today’s agent enablement push, and the open-source PostHog/gtm-toolkit gives builders a practical starting point for agentized customer ops.
PROJECT LINKS
5 NOTABLE

Autonomous Development: Agentic Bake-Off

Profile photo

Travis Johnson

Co-Founder at Aurapath AI

Travis Johnson showed an autonomous development bake-off where the same long-horizon health-dashboard build task ran under different agentic coding setups, keeping the spec constant while only the harness and workflow changed. He compared Claude Code vs Codex alongside Ultracode vs Compound Engineering, then walked through task specs, run logs and trace diffs, including the runs that quietly derailed. It stood out because it turned agentic iteration into measurable post-mortems, which developers could reproduce and learn from (people seemed to value the transparency). For real product teams building health agents, that kind of controlled evaluation is the shortest path from “works” to “reliable,” especially as autonomous tooling keeps scaling.
PROJECT LINKS
@travcjohnson

More Great Builds
Quick hits from the community — demos worth bookmarking:
Asha Somayajula from CarNow Inc presented git-cli, a Rust-based CLI that turns natural-language requests into safe git commands by querying a local Ollama LLM. The tool runs entirely offline, was published on crates.io for easy installs, and ships with a clear GitHub repo for inspection and tweaks. (People loved the practical, guardrail-friendly workflow.) We liked how it made LLM command generation feel like everyday developer tooling, and it showed how local agentic UX can scale into real product-grade automation.
Loading tech tags...
Profile photo
BorisAI Tinkerers - Paris • Jun 03
Boris from Linkup presented LinkUp, a live demo of structured, real-time web access for AI agents that aims to replace brittle scraping wrappers and hallucination-prone “guessing” loops. The demo focuses on a reusable retrieval layer pattern so agents can fetch fresh sources during execution, using clear agent loop boundaries instead of pure model memory. It felt especially practical for builders wrestling with stale training data, and people seemed to like how cleanly the pattern could be reused in new tools. We liked it because it points toward web-connected agents becoming everyday infrastructure, not a one-off hack.
Loading tech tags...
John S. Moh presented Amplify Your Ideas, Pitches, and Stories with AI Digital Media, showing how custom motion-graphics workflows turn “stupid idea” drafts into polished contest entries, accelerator pitches, and even a NotebookLM-style podcast. He used Claude Code to programmatically align music to animation beats, and ElevenLabs voiceover to turn visuals into a coherent narrative. We liked it because the playful iteration made the lessons stick (people seemed to love the vibe), and it maps neatly onto today’s agentic skilling by turning drafts into customer-ready artifacts.
Loading tech tags...
Iordanis Kerenidis presented Reading Microstructure with Temporal Fusion Transformers, showing how a Temporal Fusion Transformer models equity-index futures intraday microstructure and learns regime, trend vs mean reversion, feature relevance, and adaptive lookback decisions. The talk leans on transformer-based time-series modeling to replace brittle static rules with learned “judgment” signals. Since people seemed especially taken with the interpretability of those dynamic decisions, it felt broadly useful for anyone building agentic trading or analytics. It’s a strong template for turning complex market behavior into actionable features.
Loading tech tags...
Matt Mireles demoed Roast.fm, a site where you upload files, images, or links and it generates personalized, live audience roasts. Under the hood, it uses an LLM as a judge to score and refine humor, turning messy inputs into something consistently punchy. He’s the kind of builder who thinks in eval loops and human-computer symbiosis, and it shows. People seemed to really enjoy how practical the approach felt, plus the cautionary lesson about safety filters and what they still miss. We liked it because it makes evaluation and iteration teachable, not mysterious.
Loading tech tags...
Profile photo
Laurent Fabre • AI Tinkerers - Paris • Jun 03
Laurent Fabre, Field CTO at Databricks and a Guest Lecturer at HEC Paris, showcased Databricks & NeuralK: a demo where tabular foundation models predict financial outcomes with minimal fine-tuning. The workflow runs inside a Databricks environment, using Databricks-first experimentation loops to bring pre-trained signal to structured datasets faster. It stood out because the blueprint felt repeatable for enterprise teams working with messy, regulated data (people loved the practical path from model to metrics). We liked it as a real-world example of getting from prototype to product-ready ML sooner.
Loading tech tags...
Luke Freiler from Centercode presented UserVolley, an AI agent that drives a real validation project by volleying between real users and synthetic users to sharpen ground truth. UserVolley simulates realistic product testing at scale using AI, then coordinates iterative feedback loops to improve what teams ship. It also reframed validation as an AI-era workflow for enterprise teams, and (people seemed to love how practical it felt). We liked it because it shows agentic evaluation moving from theory to daily utility, the same direction builders are heading.
Loading tech tags...
Saeed Amen of Turnleaf Analytics presented a walkthrough of decomposing, forecasting, and trading inflation using machine learning and alternative data. The approach breaks inflation into components, forecasts either each part or the aggregate with large datasets, then maps those forecasts into systematic trades across macro asset classes. It kept a crisp end-to-end signal-to-execution pipeline, which many people seemed to really enjoy (and that comes through in the feedback). We liked it because it showed a reusable forecasting pattern builders can adapt to their own alt-data workflows.
Loading tech tags...
Mike Scherbakov presented a Chrome extension demo where a 4B in-browser Gemma 4 model turns everyday browsing into a SKILL.md. The extension logs activity into a local browser database, then runs via WebGPU without leaving the device, using chunked model downloads (Range fetches) and a Service Worker alarm fallback to keep timing reliable. Reconstructing intent from copy paste patterns and app switching made the flow feel genuinely useful (people loved it). It also showed how privacy first agentic patterns can become practical day to day, not just demos.
Loading tech tags...
Profile photo
Michael GeigerAI Tinkerers - Columbus • Jun 01
Michael Geiger demoed Vibe-Coded Plug-Ins, where a user asks for a custom dashboard tile and the system generates a Vue component plus a typed props schema. It resolves live props through an agent using tenant-scoped tools, then renders the result inside a sandboxed iframe and only persists after the component mounts with non-blank output and an empty error log, feeding compile and runtime errors into a repair loop. As a product-minded full-stack builder at Our Tools LLC, he focused on making generated UI trustworthy enough to become durable state, and it clicked with folks who like real guardrails over magic.
Loading tech tags...

🎬 Latest Content

How to Ship Complex Features 10x Faster with AI Agents | Dex Horthy (HumanLayer)

One-Shot • Mar 04
Dex Horthy (HumanLayer) breaks down the “12 Factor Agents” approach to shipping multi-step agentic workflows faster: structured outputs, ...
Watch Now →

How to Run Open-Source LLMs Locally on a Mac with MLX-LM

Deep Dive Series • Jun 12
Run open-source LLMs locally on Apple Silicon with Apple’s MLX-LM: `pip install mlx-lm`, then `load()` a Hugging Face model and call `gen...
Read More →

💼 Top Job Matches
Matched based on your meetup activity and profile
Paxos Health • New York & Toronto • $110k - $175k (varies w/ location/level); generous equity
Stanford-founded Seed-stage healthcare AI startup with >$5M in VC funding and AI agents deployed in production with cu...
Apply Now →
Dex • London (5 days on-site) • £250,000
Frontier AI engineering role building the AI tooling layer for complex financial modelling.
Apply Now →
Jakib AI • Columbus, OH
Jakib is a profitable, growing applied AI firm embedded with operator-led companies in logistics, manufacturing, and c...
Apply Now →

You are one of 95,000+ readers from OpenAI, Anthropic, Google DeepMind, Mistral AI, Cohere, ElevenLabs, Perplexity, Midjourney, Scale AI, Together AI, Groq, Weaviate, and others — spanning frontier labs, big tech, startups, and top universities.

Ready for more?

Check out other posts from this blog.

View all posts