A podcast about using AI in embedded systems -- either as part of your product, or during development.

Embedded AI Podcast
Claim This Podcastby Embedded AI Podcast
Podcast Overview
A podcast about using AI in embedded systems -- either as part of your product, or during development.
Language
🇺🇲
Publishing Since
10/17/2025
1 verified contact email on file for Embedded AI Podcast
Pitch yourself as a guest, propose sponsorships, or reach out directly to the host.
Recent Episodes

June 26, 2026
E18: Shawn Hymel on Edge AI - NPUs, Deployment Challenges, and the Future of Embedded ML
<p>Ryan and Luca sit down with Shawn Hymel, an educator and course creator focused on edge AI and embedded systems. We explore what's changed in the past few years—from basic keyword spotting to full object detection on microcontrollers, thanks to integrated NPUs. Shawn walks us through the messy reality of deploying ML models to embedded hardware: vendor-specific toolchains, dependency hell, and the ongoing challenge of making edge AI accessible. We discuss what students and experienced engineers need to learn (or unlearn) to work effectively in this space, and look ahead to exciting developments like reinforcement learning on tiny devices and neuromorphic computing. It's a candid, technical conversation about where edge AI stands today and where it's headed.</p><p><strong>Key Topics:</strong></p><ul><li>[03:30] Why run ML on microcontrollers? Power, size, and application-specific advantages</li><li>[06:00] Keyword spotting as the original killer app for edge AI</li><li>[08:45] The game-changer: NPUs enabling full object detection on microcontrollers</li><li>[13:20] Privacy benefits of on-device processing vs. cloud-based inference</li><li>[16:00] How NPUs work under the hood and the vendor-specific deployment reality</li><li>[24:30] The painful parts: dependency hell, graph compilers, and memory arena sizing</li><li>[29:00] Using AI tools (LLMs) to navigate vendor documentation and generate code</li><li>[33:45] What CS students and ECE students each need to learn for edge AI</li><li>[40:15] Shifts in university enrollment: ECE rising, CS declining</li><li>[44:00] When you don't need AI: PID loops and deterministic solutions still matter</li><li>[47:30] Looking ahead: reinforcement learning on microcontrollers and neuromorphic computing</li></ul><p><strong>Notable Quotes:</strong></p><p>"Five, six years ago, we didn't have full object detection on microcontrollers. Now with NPUs, we can do things like full YOLO on a 320x320 image—milliwatts of power, full object detection. That was not a thing five years ago." — Shawn Hymel</p><p>"Expect to spend a day or two getting inference to actually run. The docs are still new, the graph compilers are fairly new. You're going to end up in dependency hell—both on the Python side and when you bring it over to the embedded side." — Shawn Hymel</p><p>"If a PID loop solves your need, there is absolutely no reason to put AI on there. That is a solved problem. Don't do it—just use a PID loop." — Shawn Hymel</p><p><strong>Resources Mentioned:</strong></p><ul><li><a href="https://shawnhymel.com">Shawn Hymel's Website</a> - Shawn's main site with links to free and paid courses on edge AI and embedded systems</li><li><a href="https://openmv.io">OpenMV</a> - Computer vision platform for microcontrollers, including the new AE3 board with NPU support</li><li><a href="https://edgeimpulse.com">Edge Impulse</a> - Platform that simplifies ML model deployment to embedded devices, supporting various NPUs</li><li><a href="https://www.coursera.org/specializations/machine-learning-introduction">Andrew Ng's Coursera ML Course</a> - Foundational machine learning course recommended for understanding the math behind ML</li><li><a href="https://ai.google.dev/edge/litert/microcontrollers/overview">TensorFlow Lite for Microcontrollers (LiteRT)</a> - Framework for running ML models on microcontrollers across different platforms</li></ul>

June 12, 2026
E17: Switching Providers - Insulating Yourself from AI Vendor Lock-in
<p>Ryan and Luca tackle a challenge many AI users are facing: what happens when your AI provider starts acting up? Drawing from recent experiences with Anthropic's capacity issues, secret billing practices, and model degradation, we explore practical strategies for avoiding vendor lock-in.</p><p>We discuss the three layers of complexity: the model itself, the harness (like Claude Code or GitHub Copilot), and your authored content (skills, MCP servers, prompts). Each layer presents different challenges when switching providers. Ryan shares his approach of stepping back to simpler, more granular prompting to stay provider-agnostic, while Luca experiments with maintaining escape hatches to other platforms. We also look at the realities of running local models and the tradeoffs between convenience and control. The bottom line? Pick one system, get proficient, but prepare your exit strategy - because in this volatile landscape, you'll likely need it sooner than you think.</p><p><strong>Key Topics:</strong></p><ul><li>[02:30] Anthropic's recent troubles: capacity issues, model degradation, and gaslighting users</li><li>[06:45] The Hermes.md billing scandal - secret charges for having a specific filename</li><li>[10:20] Ryan's approach: stepping back to simpler, granular prompting for provider independence</li><li>[15:00] The three layers of complexity: model, harness, and authored content</li><li>[18:30] Why models aren't interchangeable - different flavors, tokenizers, and caching strategies</li><li>[24:15] Luca's tone-of-voice challenge: getting consistent writing style across models</li><li>[30:00] Running local models and private inference as alternatives to frontier models</li><li>[35:45] Practical strategies: maintaining escape hatches without parallel systems</li><li>[40:20] Luca's solution: versioning authored content separately with symlinks</li></ul><p><strong>Notable Quotes:</strong></p><p>"The more you actually make use of AI in your work, the more you use it as a force multiplier, the more painful it becomes if that force multiplier goes away." — Luca</p><p>"It's about total clock time. If you get a one shot and then have to redo it again, how much of that clock time is being used effectively?" — Ryan</p><p>"Pick one, stick with it, be proficient in it. But prepare yourself to have to escape eventually, because the situation is so volatile." — Luca</p><p><strong>Resources Mentioned:</strong></p><ul><li><a href="https://claude.com/product/claude-code">Claude Code</a> - Anthropic's AI coding assistant with hooks and skills support</li><li><a href="https://opencode.ai">OpenCode</a> - Provider-agnostic AI coding harness that supports multiple models</li><li><a href="https://github.com/features/copilot/cli">GitHub Copilot CLI</a> - Multi-provider AI assistant with dropdown model selection</li><li><a href="https://modelcontextprotocol.io">MCP (Model Context Protocol)</a> - Protocol for extending AI capabilities across different harnesses</li><li><a href="https://ollama.com">Ollama</a> - Tool for running local AI models on your own hardware</li></ul>

May 29, 2026
E16 Running LLMs Locally: Privacy, Performance, and Practical Trade-offs
<p>We explore what it really means to run AI models locally instead of relying on cloud providers like OpenAI or Anthropic. From powerful desktop setups with dual NVIDIA RTX 3090s to tiny models running on embedded systems, we cover the full spectrum of local AI deployment.</p><p>Luca shares hands-on experience running local models for client work, explaining the hardware requirements (spoiler: you need fast VRAM, not just lots of RAM), performance trade-offs, and practical tools like Ollama and LM Studio. We discuss how modern open-weight models from Meta, Google, and Chinese companies compare to hosted solutions - typically about a year behind state-of-the-art but surprisingly capable. We also look at edge AI applications, from elderly fall detection to traffic accident monitoring, where compact models shine. The conversation covers context window limitations, quantization techniques, and why getting started is easier than you might think - though you'll need to manage expectations about what local models can deliver compared to their cloud-based cousins.</p><p><strong>Key Topics:</strong></p><ul><li>[00:00] Introduction: What are local models and why run them?</li><li>[02:30] Hardware requirements: VRAM vs system RAM, and why graphics cards matter</li><li>[05:45] Luca's setup: dual RTX 3090s and real-world client work with local models</li><li>[08:20] Performance metrics: time to first token, tokens per second, and output quality</li><li>[12:00] Tiny models for edge AI: Google's 270M parameter model and specific use cases</li><li>[15:30] Tools and workflows: Ollama, LM Studio, and OpenAI-compatible APIs</li><li>[18:45] Where models come from: Hugging Face, Meta's Llama, and the open-weight ecosystem</li><li>[22:10] Context window limitations and quantization techniques</li><li>[25:00] Getting started: realistic expectations and practical first steps</li></ul><p><strong>Notable Quotes:</strong></p><p>"What those models really need is tons of memory and as fast memory as you can get it. This is why people like Macs because they've got the unified memory architecture." — Luca Ingianni</p><p>"Modern models are actually pretty good. They are smaller, so they will have less knowledge. They are a tad slower. They struggle with much smaller context windows. But if you can work within those bounds, they work pretty well." — Luca Ingianni</p><p>"Using AI is a different mindset. It's a different way of thinking about solving a problem, but it's just another tool in your toolbox and it's just another way of getting work done." — Ryan Torvik</p><p><strong>Resources Mentioned:</strong></p><ul><li>Ollama - Docker-like tool for running local LLMs with simple pull/run commands and OpenAI-compatible API</li><li>LM Studio - Graphical user interface for downloading and running local models easily</li><li>Open WebUI - Web interface for local models that mimics ChatGPT's chat interface</li><li>Hugging Face - Repository with hundreds of thousands of models in various sizes and configurations</li><li>Meta Llama - Open-weight model family from Meta that helped start the local LLM movement</li><li>Google Gemma - Model family from Google including compact vision-capable models (270M parameters)</li></ul>
18 total episodes available
Similar Podcasts
Discover related shows you might enjoy
Deep-dive analytics for Embedded AI Podcast
Frequently asked questions
Have a different question and can't find the answer you're looking for? Reach out to our support team by sending us an email and we'll get back to you as soon as we can.
- What is Embedded AI Podcast?
- How often does this podcast release new episodes?
This podcast updates daily.
- Where can I listen to this podcast?
This podcast is available on 4 platforms including Apple Podcasts, Spotify, and more. You can also use the RSS feed directly.
- Does this podcast accept guests?
Yes, this podcast regularly features guests.
Legal Disclaimer
Pod Engine is not affiliated with, endorsed by, or officially connected with any of the podcasts displayed on this platform. We operate independently as a podcast discovery and analytics service.
All podcast artwork, thumbnails, and content displayed on this page are the property of their respective owners and are protected by applicable copyright laws. This includes, but is not limited to, podcast cover art, episode artwork, show descriptions, episode titles, transcripts, audio snippets, and any other content originating from the podcast creators or their licensors.
We display this content under fair use principles and/or implied license for the purpose of podcast discovery, information, and commentary. We make no claim of ownership over any podcast content, artwork, or related materials shown on this platform. All trademarks, service marks, and trade names are the property of their respective owners.
While we strive to ensure all content usage is properly authorized, if you are a rights holder and believe your content is being used inappropriately or without proper authorization, please contact us immediately at hey@podengine.ai for prompt review and appropriate action, which may include content removal or proper attribution.
By accessing and using this platform, you acknowledge and agree to respect all applicable copyright laws and intellectual property rights of content owners. Any unauthorized reproduction, distribution, or commercial use of the content displayed on this platform is strictly prohibited.




