⚡ Image Downscaling Attacks, Open-source Benchmark for App Gen and more
Issue #6 · Week of October 6
We pulled the standouts from the last 2 weeks: agentic workflow orchestration and multimodal security are on people’s minds. Check TextEvolve (Nick Ryan, NYC) for automated LLM program discovery, Pipelex (Louis Choquel, Paris) for declarative PLX pipelines that turn natural requirements into tested production workflows, and Anamorpher (Kikimora Morozova, NYC) for image-downscaling prompt injections that expose preprocessing risks. These are real, vetted builds we’re still thinking about - read on.
CompileBench: Testing LLMs on Build Systems
Piotr Grabowski from Quesma presented CompileBench Eval: Do You Need AGI to Compile Google Chrome? He demonstrated an open-source benchmark that forces LLMs to build real projects from scratch in Docker via shell, from simple utilities to complex projects with many dependencies. The talk dug into results, model quirks, and the internals, including how long-running tasks expose toolchain quirks and log clutter. It matters to builders as a practical guide for model selection and tooling in production-like workflows.
TECH STACK
|
Anamorpher Downscale Attack
Kikimora Morozova from Trail of Bits presented Image Downscaling Attacks on Production AI Systems. She showed an adversarial image that reads normal at high res but reveals hidden prompts after bicubic downsampling, enabling data exfiltration via Gemini CLI and Zapier MCP. The setup hinged on Anamorpher, built with Suha Hussain, using a least-squares embedding workflow and pixel-level perturbation visuals. For builders, it’s a reminder to harden image preprocessing and auditing in multimodal systems.
TECH STACK
|
TextEvolve: Auto LLM Tuning
Nick Ryan, a NYC-based ML engineer, presented How to remove yourself from the LLM design loop. He demonstrates TextEvolve, an LLM-driven tool that automates the iteration loop by generating optimized Python scripts, tests, and edge cases. The code is open source, and the workflow blends prompt-driven exploration with automated validation. Takeaway: this approach offers faster prototyping and more reliable deployments for builders who want reproducible, testable AI workflows.
|
AlphaEarth Forest Loss
Nicolas Schuldt presented 'Predicting Forest Loss Using AlphaEarth Embeddings', a live demo pairing 64-dim AlphaEarth embeddings with WRI data via similarity-weighted kNN to forecast deforestation. The system analyzes 2017 patches, finds twins by cosine similarity, and outputs risk probabilities with confidence intervals. It relies on live ecosystem fingerprinting and a lightweight embedding workflow, with public GitHub for reproducibility. Nicolas, a hands-on builder from Ecuador (Hexay), showed that this approach is scalable for conservation analytics and that attendees (people loved it).
TECH STACK
|
AgentPay Framework
James Kanyiri from Pocket Watch presented Payment integration for AI Agents, a framework letting AI agents call payments as composable tools. The project unifies transactional flows and is backed by PayLink’s API plus MCP AI integration, with support for M-Pesa, Airtel, KCB, and Equity. A runnable demo at paylink-platform.vercel.app and the public repo (paylinkmcp/paylink) let builders inspect and reuse. Audiences appreciated the agent-first abstraction, which lowers repeated integration and speeds deployment. Takeaway: a practical path to production-ready agent payments.
|
How to Ship Complex Features 10x Faster with AI Agents | Dex Horthy (HumanLayer)
How to Run Open-Source LLMs Locally on a Mac with MLX-LM
You are one of 95,000+ readers from Anthropic, OpenAI, Google, Microsoft, Meta, Apple, Amazon, Nvidia, Netflix, Stripe, Databricks, Snowflake, and others — spanning frontier labs, big tech, startups, and top universities.
⚡ Image Downscaling Attacks, Open-source Benchmark for App Gen and more