Top AI Demos #27: Build Self-Improving Agents, Musebot, and Thrum
Top AI Demos #27: Build Self-Improving Agents, Musebot, and Thrum
Issue #27 · Week of May 18
Joe Heitzeberg • Founder at AI Tinkerers • ⏱️ 1 min read
Creating space for leading builders to share ideas, grow, and make an impact.
This week, builders are shipping concrete systems, from local model training to multi-modal consumer AI. Christopher Slee’s work on Self-Improving Agents via Local Training shows how to boost local model capability, while Zackary Lowery’s Musebot: Multi-Modal Consumer AI integrates ComfyUI and Ollama for generative tasks on consumer hardware.
We’re seeing a strong focus on agent orchestration and control. Leon Letto’s Thrum: AI Messaging and Control offers a messaging system for managing agents and autonomous engineering, and Szymon Chmal’s skillgym: Testing SKILL.md Agents provides a tool for verifying agent behavior against defined skills. Anna Zhdan’s Koog: IDE-Integrated Coding Agents also fits here, enabling seamless integration of AI agents within development environments.
Several demos are pushing the boundaries of developer tooling and code systems. Long Hui’s framework for Long integrates ALM and Compliance as Code within Git repos, and Robb Winkle’s OpenProse: LLM Virtual Machines explores LLMs as executable systems.
Long Hui presented a “git-as-ALM” demo called SpecFlow, where compliance-grade specs live as Markdown in the same repo that CI validates. Instead of leaving your editor for a portal, the workflow uses spec-driven V-model traceability plus an AI assistant as the natural language UI over your repo state. Long’s background in sensor fusion and ASPICE-aligned process engineering shows in the rigor. People really seemed to like how this makes long-horizon agent work feel practical and reviewable, and it matched the community’s push toward toolchains that scale without portal friction.
Kevin Webb presented a VLM-powered “electronics design” workflow that treats vendor datasheet PDFs as a live component library. The demo built a document extraction ETL pipeline with open-weights VLMs deployed on Modal, then used a Zed IDE UI on top to browse and validate the extracted models. Instead of manually curating libraries, it turns technical docs into structured parts you can query. We liked it because it made multimodal extraction feel immediately usable, and (people seemed genuinely excited) about how quickly it could unlock faster component modeling. This kind of agentic data layer is exactly where builders are headed.
David Hague presented ICM - Interpreted Context Methodology, extending Jake Van Clief’s Interpretable Context Method by packaging multi-step agent workflows into plain folders full of markdown, with linked “skill files” to bootstrap new flows. The demo leans on his practical TypeScript and full-stack tooling mindset, turning messy orchestration into something you can version, review, and reuse like source code. It made the right kind of sense for repeatable agentic work, and people seemed to like how low-friction it felt. If this approach productizes into developer templates, it could make agent building more consistent without bespoke glue code.
Anna Zhdan from JetBrains spotlighted a demo on using Koog to build an ACP-compatible coding agent that could be integrated into IntelliJ, Zed, and 30+ other applications. The agent uses Agent Client Protocol to standardize client-agent communication and makes integration faster, with Koog handling the framework work and orchestration plus guidance for testing the connection end to end. (People seemed to love the “it actually plugs in” angle.) We liked it because it matched the current shift toward agentic tooling that’s practical across many environments.
Boyin Xu presented PrepPal, a voice-enabled interview and speaking coach that generates detailed, situation-specific feedback while you practice. The demo focuses on a low-latency voice-to-analysis loop, using an AI stack that turns your spoken answers into actionable coaching prompts, so iteration feels immediate. As a product leader with Google and Alibaba roots, Boyin brought a builder mindset to making the workflow accessible and repeatable. We liked how the loop felt practical (people seemed to gravitate to the hands-on practice angle), and it lines up with today’s push toward efficient agentic interactions for real-world communication.
Robb Winkle presented “LLM as a Virtual Machine,” showing how OpenProse .prose skills can make an LLM simulate a VM with enough fidelity to spawn subagents, maintain real state, and run real workflows. He walked through examples using recursive language models, the Captain’s Chair pattern, and automated PR review, all executed via prose.md with streaming from the DSL. We liked how it reframed long-horizon coding from prompt tweaks to reusable skill definitions (people seemed to click with that mindset). It also felt timely for builders aiming at agentic orchestration and practical autonomy.
Ryan McCrary presented an Evite alternative, rsvplease.to, where invitees could invite their own friends and the product grew via total strangers. The interesting part was how his solo-dev loop used an agent orchestrator to scan session replays for non-crash friction, then recorded CI runs of every PR so the team-less review could watch the UI feel before merging. People subtly responded that the workflow was immediately useful. We liked it because it made multimodal UX debugging practical, aligning with the current shift toward long-horizon, high-agency agents for everyday dev work.
Marek Piotr Mysior presented an agentic RAG pipeline that turns raw patent PDFs into structured engineering knowledge for inventive problem solving. The demo parses PDFs live, extracts technical contradictions, maps them to TRIZ inventive principles, and indexes everything in a vector database before an agent frames your mechanical issue as a TRIZ contradiction and retrieves relevant prior solutions. It matched the community’s appetite for practical, modular tooling (people really gravitated to the end-to-end flow) and felt especially relatable to engineers who want reuse beyond plain LLM generation. As a product, this could become a patent-aware design assistant for faster, more systematic ideation.
Venkata Sai Srikar Devulapalli presented a prompt-to-visualization system that turns natural-language ideas and datasets into theme-aware charts, going from prompt to query, data shaping, visualization spec, and real-time rendering. Vizey uses a multi-agent pipeline with config/schema validation to avoid the flaky “single prompt” failure mode, so outputs stay consistent and debuggable. The end result feels like production orchestration, not just clever prompting, and (people loved it) for how reliably it produced styled visuals. We liked it because it mirrors the community shift toward agentic workflows with contracts, which also makes productization in BI and creative analytics feel within reach.
Sebastian Estevez built p2claw, a peer-to-peer routing scheme that lets an AI agent serve web apps to the internet right from your computer, using unique URLs for instant access. The demo focused on practical ops automation and the “local to public” path, with the platform acting like a home base for your app’s availability. It stood out because it made deployment feel less manual, and people seemed genuinely excited about how learnable and shippable it was. We liked it because it turns agent prototypes into something you can actually hand to users.
You are one of 95,000+ readers from Apple, Amazon, Microsoft, Google, Nvidia, OpenAI, Anthropic, Cohere, ElevenLabs, Scale AI, Groq, Mistral AI, and others — spanning frontier labs, big tech, startups, and top universities.
✨
Stay in the Loop, Stay in the Lab
💥 Like what you saw?
✅ Find your local meetup — meet other builders
📣 Got thoughts? Reply to this email — we read it all
🧑💻 Forward this to someone who should be building with us
Top AI Demos #27: Build Self-Improving Agents, Musebot, and Thrum