Top AI Demos #31: On-Device AI, Agent Code Search, and Phone Coding

Joe Heitzeberg • Founder at AI Tinkerers • ⏱️ 1 min read

Creating space for leading builders to share ideas, grow, and make an impact.

This week’s AI Tinkerers roundup highlights builders focused on making AI agents more reliable and useful. For instance, SearchBench: Evaluating Agent Code Search by Taylor Johnson tackles agent search, while Luke Freiler’s UserVolley: AI-Driven Product Validation uses agents for product testing.

We also saw builders pushing AI deeper into developer workflows. Asha Somayajula’s git-cli: Ollama Git Command Generator lets you generate Git commands with natural language, and Matthew Mirman shared techniques for Coding on the Phone.

Finally, there were several notable innovative approaches to data access and local processing. Mike Schirtzinger’s Silent Notetaker: local browser transcription keeps audio on-device, and Boris demoed LinkUp for real-time web access.

Top 5 Picks (June 15)

1 TOP PICK

Coding on the Phone

Matthew Mirman

CEO at chat.dev

📍 AI Tinkerers - New York City • Jun 03

Matthew Mirman, CEO of chat.dev, presented Hacking on the Phone, showing how coding over mobile can feel surprisingly fast. The demo uses chat.dev to run sandboxed, persistent Codex and Claude agent sessions on cloud-hosted Linux VMs, so the workflow keeps context while you iterate from a phone. Builders liked the live WIP feel and the hands-on clarity of how the loop works. It directly tackles the everyday pain of desktop-only tooling, hinting at a practical path for “agentic dev” as a portable product.

TECH STACK

PROJECT LINKS

2 RUNNER UP

Silent Notetaker: local browser transcription

Mike Schirtzinger

Founder at Brevity

📍 AI Tinkerers - Columbus • Jun 01

Mike Schirtzinger presented Silent Notetaker, a meeting notetaker that transcribes, labels speakers, and extracts decisions, action items, and open questions entirely in the browser with a single HTML file, no backend or sign-in. It runs the heavy transcription model on GPU via WebGPU while speaker and question models run on CPU with multithreaded WebAssembly, avoiding GPU contention and the “invisible memory leak” issues that show up in long-running streams. People seemed especially excited about the near feature parity with tools they already use, plus the open-source code they could tweak. It felt timely for the shift toward efficient local, privacy-first agent UX, and it could become a practical product for teams that want meeting memory without cloud costs.

TECH STACK

PROJECT LINKS

3 COMMUNITY FAVORITE

SearchBench: Evaluating Agent Code Search

Taylor Johnson

Full-Stack Engineer (Infra-Focused) at becker63.digital

📍 AI Tinkerers - Columbus • Jun 01

Taylor Johnson presented SearchBench, a harness that runs controlled evaluation rounds over coding-agent search behavior, using a bounded ablation workflow to test whether agents locate the exact files human fixes changed. SearchBench compares incumbent strategies against challengers, emitting an evidence bundle with exact-hit, hop-distance, token usage, and failure artifacts, while estimating cost before provider calls. We liked how the “separable knobs” framing made search policy feel debuggable and budget-safe, and there was a subtle sense people enjoyed the practical results. It matched today’s agent evaluation push toward evidence over vibes.

TECH STACK

MCP

PROJECT LINKS

github.com

4 STANDOUT

PostHog: Context-Rich AI Agents

Sebastian Muriel

Technical CSM at PostHog

📍 AI Tinkerers - Orange County • Jun 02

Sebastian Muriel from PostHog presented how he enables customer-facing agents to do their best work by turning PostHog event data into queryable signals the agent can reason over. The demo runs an interactive terminal flow where Claude debugs a test GTM question, pulling from mechanics (product source code), engagement telemetry, and instance state, then explaining each contribution. Keeping those context sources separate made the reasoning auditable, and people seemed to enjoy that clarity. It felt especially timely for today’s agent enablement push, and the open-source PostHog/gtm-toolkit gives builders a practical starting point for agentized customer ops.

TECH STACK

PROJECT LINKS

5 NOTABLE

Autonomous Development: Agentic Bake-Off

Travis Johnson

Co-Founder at Aurapath AI

📍 AI Tinkerers - Orange County • Jun 02

Travis Johnson showed an autonomous development bake-off where the same long-horizon health-dashboard build task ran under different agentic coding setups, keeping the spec constant while only the harness and workflow changed. He compared Claude Code vs Codex alongside Ultracode vs Compound Engineering, then walked through task specs, run logs and trace diffs, including the runs that quietly derailed. It stood out because it turned agentic iteration into measurable post-mortems, which developers could reproduce and learn from (people seemed to value the transparency). For real product teams building health agents, that kind of controlled evaluation is the shortest path from “works” to “reliable,” especially as autonomous tooling keeps scaling.

TECH STACK

PROJECT LINKS

More Great Builds

Quick hits from the community — demos worth bookmarking:

git-cli: Ollama Git Command Generator

Asha Somayajula • AI Tinkerers - St. Louis • Jun 03

Asha Somayajula from CarNow Inc presented git-cli, a Rust-based CLI that turns natural-language requests into safe git commands by querying a local Ollama LLM. The tool runs entirely offline, was published on crates.io for easy installs, and ships with a clear GitHub repo for inspection and tweaks. (People loved the practical, guardrail-friendly workflow.) We liked how it made LLM command generation feel like everyday developer tooling, and it showed how local agentic UX can scale into real product-grade automation.

Loading tech tags...

github.com

crates.io

LinkUp

Boris • AI Tinkerers - Paris • Jun 03

Boris from Linkup presented LinkUp, a live demo of structured, real-time web access for AI agents that aims to replace brittle scraping wrappers and hallucination-prone “guessing” loops. The demo focuses on a reusable retrieval layer pattern so agents can fetch fresh sources during execution, using clear agent loop boundaries instead of pure model memory. It felt especially practical for builders wrestling with stale training data, and people seemed to like how cleanly the pattern could be reused in new tools. We liked it because it points toward web-connected agents becoming everyday infrastructure, not a one-off hack.

Loading tech tags...

AI Digital Media Pitching

John S. Moh • AI Tinkerers - Orange County • Jun 02

John S. Moh presented Amplify Your Ideas, Pitches, and Stories with AI Digital Media, showing how custom motion-graphics workflows turn “stupid idea” drafts into polished contest entries, accelerator pitches, and even a NotebookLM-style podcast. He used Claude Code to programmatically align music to animation beats, and ElevenLabs voiceover to turn visuals into a coherent narrative. We liked it because the playful iteration made the lessons stick (people seemed to love the vibe), and it maps neatly onto today’s agentic skilling by turning drafts into customer-ready artifacts.

Loading tech tags...

Temporal Fusion Transformers in Microstructure

Iordanis Kerenidis • AI Tinkerers - Paris • Jun 03

Iordanis Kerenidis presented Reading Microstructure with Temporal Fusion Transformers, showing how a Temporal Fusion Transformer models equity-index futures intraday microstructure and learns regime, trend vs mean reversion, feature relevance, and adaptive lookback decisions. The talk leans on transformer-based time-series modeling to replace brittle static rules with learned “judgment” signals. Since people seemed especially taken with the interpretability of those dynamic decisions, it felt broadly useful for anyone building agentic trading or analytics. It’s a strong template for turning complex market behavior into actionable features.

Loading tech tags...

Roast.fm: AI-Powered Roasts

Matt Mireles • AI Tinkerers - Orange County • Jun 02

Matt Mireles demoed Roast.fm, a site where you upload files, images, or links and it generates personalized, live audience roasts. Under the hood, it uses an LLM as a judge to score and refine humor, turning messy inputs into something consistently punchy. He’s the kind of builder who thinks in eval loops and human-computer symbiosis, and it shows. People seemed to really enjoy how practical the approach felt, plus the cautionary lesson about safety filters and what they still miss. We liked it because it makes evaluation and iteration teachable, not mysterious.

Loading tech tags...

roast.fm

Video

Databricks & NeuralK

Laurent Fabre • AI Tinkerers - Paris • Jun 03

Laurent Fabre, Field CTO at Databricks and a Guest Lecturer at HEC Paris, showcased Databricks & NeuralK: a demo where tabular foundation models predict financial outcomes with minimal fine-tuning. The workflow runs inside a Databricks environment, using Databricks-first experimentation loops to bring pre-trained signal to structured datasets faster. It stood out because the blueprint felt repeatable for enterprise teams working with messy, regulated data (people loved the practical path from model to metrics). We liked it as a real-world example of getting from prototype to product-ready ML sooner.

Loading tech tags...

UserVolley: AI-Driven Product Validation

Luke Freiler • AI Tinkerers - Orange County • Jun 02

Luke Freiler from Centercode presented UserVolley, an AI agent that drives a real validation project by volleying between real users and synthetic users to sharpen ground truth. UserVolley simulates realistic product testing at scale using AI, then coordinates iterative feedback loops to improve what teams ship. It also reframed validation as an AI-era workflow for enterprise teams, and (people seemed to love how practical it felt). We liked it because it shows agentic evaluation moving from theory to daily utility, the same direction builders are heading.

Loading tech tags...

uservolley.com

Forecasting and Trading Inflation with ML

Saeed Amen • AI Tinkerers - Paris • Jun 03

Saeed Amen of Turnleaf Analytics presented a walkthrough of decomposing, forecasting, and trading inflation using machine learning and alternative data. The approach breaks inflation into components, forecasts either each part or the aggregate with large datasets, then maps those forecasts into systematic trades across macro asset classes. It kept a crisp end-to-end signal-to-execution pipeline, which many people seemed to really enjoy (and that comes through in the feedback). We liked it because it showed a reusable forecasting pattern builders can adapt to their own alt-data workflows.

Loading tech tags...

In-Browser LLM Workflow Analyzer

Mike Scherbakov • AI Tinkerers - New York City • Jun 03

Mike Scherbakov presented a Chrome extension demo where a 4B in-browser Gemma 4 model turns everyday browsing into a SKILL.md. The extension logs activity into a local browser database, then runs via WebGPU without leaving the device, using chunked model downloads (Range fetches) and a Service Worker alarm fallback to keep timing reliable. Reconstructing intent from copy paste patterns and app switching made the flow feel genuinely useful (people loved it). It also showed how privacy first agentic patterns can become practical day to day, not just demos.

Loading tech tags...

assete.ai

Vibe-Coded Plug-Ins

Michael Geiger • AI Tinkerers - Columbus • Jun 01

Michael Geiger demoed Vibe-Coded Plug-Ins, where a user asks for a custom dashboard tile and the system generates a Vue component plus a typed props schema. It resolves live props through an agent using tenant-scoped tools, then renders the result inside a sandboxed iframe and only persists after the component mounts with non-blank output and an empty error log, feeding compile and runtime errors into a repair loop. As a product-minded full-stack builder at Our Tools LLC, he focused on making generated UI trustworthy enough to become durable state, and it clicked with folks who like real guardrails over magic.

Loading tech tags...

ariso.ai

🎬 Latest Content

How to Ship Complex Features 10x Faster with AI Agents | Dex Horthy (HumanLayer)

One-Shot • Mar 04

Dex Horthy (HumanLayer) breaks down the “12 Factor Agents” approach to shipping multi-step agentic workflows faster: structured outputs, ...

Watch Now →

How to Run Open-Source LLMs Locally on a Mac with MLX-LM

Deep Dive Series • Jun 12

Run open-source LLMs locally on Apple Silicon with Apple’s MLX-LM: `pip install mlx-lm`, then `load()` a Hugging Face model and call `gen...

💼 Top Job Matches

Matched based on your meetup activity and profile

Founding Applied AI Lead

Paxos Health • New York & Toronto • $110k - $175k (varies w/ location/level); generous equity

Stanford-founded Seed-stage healthcare AI startup with >$5M in VC funding and AI agents deployed in production with cu...

Apply Now →

AI Engineer: Build Frontier Agents Reinventing Financial Modelling

Dex • London (5 days on-site) • £250,000

Frontier AI engineering role building the AI tooling layer for complex financial modelling.

Apply Now →

Software Engineer (All Levels) — Applied AI

Jakib AI • Columbus, OH

Jakib is a profitable, growing applied AI firm embedded with operator-led companies in logistics, manufacturing, and c...

Apply Now →

View All Jobs Post a Job

You are one of 95,000+ readers from OpenAI, Anthropic, Google DeepMind, Mistral AI, Cohere, ElevenLabs, Perplexity, Midjourney, Scale AI, Together AI, Groq, Weaviate, and others — spanning frontier labs, big tech, startups, and top universities.

Top AI Demos #31: On-Device AI, Agent Code Search, and Phone Coding