FEED Validator

for Atom and RSS and KML

Congratulations!

This is a valid RSS feed.

Recommendations

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

line 1, column 38: Use of unknown namespace: https://www.oreilly.com/rss/custom [help]

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
                                      ^

line 496, column 0: content:encoded should not contain data-width attribute [help]

<figure class="wp-block-embed alignleft is-type-rich is-provider-twitter wp- ...

line 496, column 0: content:encoded should not contain data-dnt attribute [help]

<figure class="wp-block-embed alignleft is-type-rich is-provider-twitter wp- ...

line 496, column 0: content:encoded should not contain script tag [help]

<figure class="wp-block-embed alignleft is-type-rich is-provider-twitter wp- ...

line 505, column 0: content:encoded should not contain fetchpriority attribute [help]
line 505, column 0: content:encoded should not contain decoding attribute (8 occurrences) [help]
line 505, column 0: content:encoded should not contain sizes attribute (7 occurrences) [help]
line 946, column 0: content:encoded should not contain iframe tag (7 occurrences) [help]
```
<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-emb ...
```
line 1169, column 0: content:encoded should not contain loading attribute (5 occurrences) [help]

Source: https://www.oreilly.com/radar/feed/index.xml

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:media="http://search.yahoo.com/mrss/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
xmlns:custom="https://www.oreilly.com/rss/custom"
>
<channel>
<title>Radar</title>
<atom:link href="https://www.oreilly.com/radar/feed/" rel="self" type="application/rss+xml" />
<link>https://www.oreilly.com/radar</link>
<description>Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology</description>
<lastBuildDate>Mon, 25 Aug 2025 16:49:13 +0000</lastBuildDate>
<language>en-US</language>
<sy:updatePeriod>
hourly </sy:updatePeriod>
<sy:updateFrequency>
1 </sy:updateFrequency>
<generator>https://wordpress.org/?v=6.8.2</generator>
<image>
<url>https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/04/cropped-favicon_512x512-160x160.png</url>
<title>Radar</title>
<link>https://www.oreilly.com/radar</link>
<width>32</width>
<height>32</height>
</image>
<item>
<title>Firing Junior Developers Is Indeed The Dumbest Thing</title>
<link>https://www.oreilly.com/radar/firing-junior-developers-is-indeed-the-dumbest-thing/</link>
<comments>https://www.oreilly.com/radar/firing-junior-developers-is-indeed-the-dumbest-thing/#respond</comments>
<pubDate>Mon, 25 Aug 2025 16:49:03 +0000</pubDate>
<dc:creator><![CDATA[Mike Loukides]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Software Development]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=17344</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Firing-Junior-Developers-Is-Indeed-The-Dumbest-Thing.jpg"
medium="image"
type="image/jpeg"
/>
<description><![CDATA[Matt Garman’s statement that firing junior developers because AI can do their work is the “dumbest thing I’ve ever heard” has almost achieved meme status. I’ve seen it quoted everywhere. We agree. It’s a point we’ve made many times over the past few years. If we eliminate junior developers, where will the seniors come from? […]]]></description>
<content:encoded><![CDATA[
Matt Garman’s <a href="https://www.theregister.com/2025/08/21/aws_ceo_entry_level_jobs_opinion/" target="_blank" rel="noreferrer noopener">statement</a> that firing junior developers because AI can do their work is the “dumbest thing I’ve ever heard” has almost achieved meme status. I’ve seen it quoted everywhere.
We agree. It’s a point we’ve made many times over the past few years. If we eliminate junior developers, where will the seniors come from? A few years down the road, when the current senior developers are retiring, who will take their place? The roles of juniors and seniors are no doubt changing—and, as roles change, we need to be thinking about <a href="https://www.oreilly.com/radar/seniors-and-juniors/" target="_blank" rel="noreferrer noopener">the kinds of training junior developers will need</a> to work effectively in their new roles, to prepare to step into roles as senior developers later in their career—possibly sooner than they (or their management) anticipated. Programming languages and algorithms are still table stakes. In addition, junior developers now need to become skilled debuggers, they need to learn design skills, and they need to start thinking on a higher level than the function they’re currently working on.
We also believe that using AI effectively is a learned skill. Andrew Stellman has written about <a href="https://www.oreilly.com/radar/bridging-the-ai-learning-gap/" target="_blank" rel="noreferrer noopener">bridging the AI learning gap</a> and his <a href="https://www.oreilly.com/radar/the-sens-ai-framework/">Sens-AI framework</a> is designed for teaching how to use AI as part of learning to program in a new language.
As Tim O’Reilly has <a href="https://www.oreilly.com/radar/ai-and-programming-the-beginning-of-a-new-era/" target="_blank" rel="noreferrer noopener">written</a>,
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
Here’s what history consistently shows us: Whenever the barrier to communicating with computers lowers, we don’t end up with fewer programmers—we discover entirely new territories for computation to transform.
</blockquote>
</blockquote>
We will need more programmers, not fewer. And we will get them—at all levels of proficiency, from complete newbie to junior professional to senior. The question facing us is: how will we enable all of these programmers to make great software, software of a kind that <a href="https://www.oreilly.com/radar/we-are-only-beginning-to-understand-how-to-use-ai/" target="_blank" rel="noreferrer noopener">may not even exist today</a>? Not everyone needs to walk the path from beginner to seasoned professional. But that path has to exist. It will be developed through experience, what you can call “learning by doing.” That’s <a href="https://www.oreilly.com/radar/is-ai-a-normal-technology/" target="_blank" rel="noreferrer noopener">how technology breakthroughs turn into products, practices, and actual adoption</a>. And we’re building that path.
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/firing-junior-developers-is-indeed-the-dumbest-thing/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>Context Engineering: Bringing Engineering Discipline to Prompts—Part 3</title>
<link>https://www.oreilly.com/radar/context-engineering-bringing-engineering-discipline-to-prompts-part-3/</link>
<comments>https://www.oreilly.com/radar/context-engineering-bringing-engineering-discipline-to-prompts-part-3/#respond</comments>
<pubDate>Mon, 25 Aug 2025 10:34:01 +0000</pubDate>
<dc:creator><![CDATA[Addy Osmani]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=17333</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/The-Big-Picture-of-Context-Engineering.jpg"
medium="image"
type="image/jpeg"
/>
<custom:subtitle><![CDATA[Context Engineering in the Big Picture of LLM Applications]]></custom:subtitle>
<description><![CDATA[The following is Part 3 of 3 from Addy Osmani’s original post “Context Engineering: Bringing Engineering Discipline to Parts.” Part 1 can be found here and Part 2 here. Context engineering is crucial, but it’s just one component of a larger stack needed to build full-fledged LLM applications—alongside things like control flow, model orchestration, tool integration, […]]]></description>
<content:encoded><![CDATA[
The following is Part 3 of 3 from Addy Osmani’s original post “<a href="https://addyo.substack.com/p/context-engineering-bringing-engineering" target="_blank" rel="noreferrer noopener">Context Engineering: Bringing Engineering Discipline to Parts</a>.” Part 1 can be found <a href="https://www.oreilly.com/radar/context-engineering-bringing-engineering-discipline-to-prompts-part-1/" target="_blank" rel="noreferrer noopener">here</a> and Part 2 <a href="https://www.oreilly.com/radar/context-engineering-bringing-engineering-discipline-to-prompts-part-2/" target="_blank" rel="noreferrer noopener">here</a>.
Context engineering is crucial, but it’s just one component of a larger stack needed to build full-fledged LLM applications—alongside things like control flow, model orchestration, tool integration, and guardrails.
In Andrej Karpathy’s words, context engineering is “one small piece of an emerging thick layer of non-trivial software” that powers real LLM apps. So while we’ve focused on how to craft good context, it’s important to see where that fits in the overall architecture.
A production-grade LLM system typically has to handle many concerns beyond just prompting. For example:
<ul class="wp-block-list">
<li>Problem decomposition and control flow: Instead of treating a user query as one monolithic prompt, robust systems often break the problem down into subtasks or multistep workflows. For instance, an AI agent might first be prompted to outline a plan, then in subsequent steps be prompted to execute each step. Designing this flow (which prompts to call in what order; how to decide branching or looping) is a classic programming task—except the “functions” are LLM calls with context. Context engineering fits here by making sure each step’s prompt has the info it needs, but the decision to have steps at all is a higher-level design. This is why you see frameworks where you essentially write a script that coordinates multiple LLM calls and tool uses.</li>
<li>Model selection and routing: You might use different AI models for different jobs. Perhaps a lightweight model for simple tasks or preliminary answers, and a heavyweight model for final solutions. Or a code-specialized model for coding tasks versus a general model for conversational tasks. The system needs logic to route requests to the appropriate model. Each model might have different context length limits or formatting requirements, which the context engineering must account for (e.g., truncating context more aggressively for a smaller model). This aspect is more engineering than prompting: think of it as matching the tool to the job.</li>
<li>Tool integrations and external actions: If your AI can perform actions (like calling an API, database queries, opening a web page, running code), your software needs to manage those capabilities. That includes providing the AI with a list of available tools and instructions on usage, as well as actually executing those tool calls and capturing the results. As we discussed, the results then become new context for further model calls. Architecturally, this means your app often has a loop: prompt model → if model output indicates a tool to use → execute tool → incorporate result → prompt model again. Designing that loop reliably is a challenge.</li>
<li>User interaction and UX flows: Many LLM applications involve the user in the loop. For example, a coding assistant might propose changes and then ask the user to confirm applying them. Or a writing assistant might offer a few draft options for the user to pick from. These UX decisions affect context too. If the user says “Option 2 looks good but shorten it,” you need to carry that feedback into the next prompt (e.g., “The user chose draft 2 and asked to shorten it.”). Designing a smooth human-AI interaction flow is part of the app, though not directly about prompts. Still, context engineering supports it by ensuring each turn’s prompt accurately reflects the state of the interaction (like remembering which option was chosen or what the user edited manually).</li>
<li>Guardrails and safety: In production, you have to consider misuse and errors. This might include content filters (to prevent toxic or sensitive outputs), authentication and permission checks for tools (so the AI doesn’t, say, delete a database because it was in the instructions), and validation of outputs. Some setups use a second model or rules to double-check the first model’s output. For example, after the main model generates an answer, you might run another check: “Does this answer contain any sensitive info? If so, redact it.” Those checks themselves can be implemented as prompts or as code. In either case, they often add additional instructions into the context (a system message like “If the user asks for disallowed content, refuse,” is part of many deployed prompts). So the context might always include some safety boilerplate. Balancing that (ensuring the model follows policy without compromising helpfulness) is yet another piece of the puzzle.</li>
<li>Evaluation and monitoring: Suffice to say, you need to constantly monitor how the AI is performing. Logging every request and response (with user consent and privacy in mind) allows you to analyze failures and outliers. You might incorporate real-time evals—e.g., scoring the model’s answers on certain criteria, and if the score is low, automatically having the model try again or route to a human fallback. While evaluation isn’t part of generating a single prompt’s content, it feeds back into improving prompts and context strategies over time. Essentially, you treat the prompt and context assembly as something that can be debugged and optimized using data from production.</li>
</ul>
We’re really talking about a new kind of application architecture. It’s one where the core logic involves managing information (context) and adapting it through a series of AI interactions, rather than just running deterministic functions. Karpathy listed elements like control flows, model dispatch, memory management, tool use, verification steps, etc., on top of context filling. All together, they form what he jokingly calls “an emerging thick layer” for AI apps—thick because it’s doing a lot! When we build these systems, we’re essentially writing metaprograms: programs that choreograph another “program” (the AI’s output) to solve a task.
For us software engineers, this is both exciting and challenging. It’s exciting because it opens capabilities we didn’t have—e.g., building an assistant that can handle natural language, code, and external actions seamlessly. It’s challenging because many of the techniques are new and still in flux. We have to think about things like prompt versioning, AI reliability, and ethical output filtering, which weren’t standard parts of app development before. In this context, context engineering lies at the heart of the system: If you can’t get the right information into the model at the right time, nothing else will save your app. But as we see, even perfect context alone isn’t enough; you need all the supporting structure around it.
The takeaway is that we’re moving from prompt design to system design. Context engineering is a core part of that system design, but it lives alongside many other components.
<h2 class="wp-block-heading">Conclusion</h2>
Key takeaway: By mastering the assembly of complete context (and coupling it with solid testing), we can increase the chances of getting the best output from AI models.
For experienced engineers, much of this paradigm is familiar at its core—it’s about good software practices—but applied in a new domain. Think about it:
<ul class="wp-block-list">
<li>We always knew garbage in, garbage out. Now that principle manifests as “bad context in, bad answer out.” So we put more work into ensuring quality input (context) rather than hoping the model will figure it out.</li>
<li>We value modularity and abstraction in code. Now we’re effectively abstracting tasks to a high level (describe the task, give examples, let AI implement) and building modular pipelines of AI + tools. We’re orchestrating components (some deterministic, some AI) rather than writing all logic ourselves.</li>
<li>We practice testing and iteration in traditional dev. Now we’re applying the same rigor to AI behaviors, writing evals and refining prompts as one would refine code after profiling.</li>
</ul>
In embracing context engineering, you’re essentially saying, “I, the developer, am responsible for what the AI does.” It’s not a mysterious oracle; it’s a component I need to configure and drive with the right data and rules.
This mindset shift is empowering. It means we don’t have to treat the AI as unpredictable magic—we can tame it with solid engineering techniques (plus a bit of creative prompt artistry).
Practically, how can you adopt this context-centric approach in your work?
<ul class="wp-block-list">
<li>Invest in data and knowledge pipelines. A big part of context engineering is having the data to inject. So build that vector search index of your documentation, or set up that database query that your agent can use. Treat knowledge sources as core features in development. For example, if your AI assistant is for coding, make sure it can pull in code from the repo or reference the style guide. A lot of the value you’ll get from an AI comes from the external knowledge you supply to it.</li>
<li>Develop prompt templates and libraries. Rather than ad hoc prompts, start creating structured templates for your needs. You might have a template for “answer with citation” or “generate code diff given error.” These become like functions you reuse. Keep them in version control. Document their expected behavior. This is how you build up a toolkit of proven context setups. Over time, your team can share and iterate on these, just as they would on shared code libraries.</li>
<li>Use tools and frameworks that give you control. Avoid “just give us a prompt, we do the rest” solutions if you need reliability. Opt for frameworks that let you peek under the hood and tweak things—whether that’s a lower-level library like LangChain or a custom orchestration you build. The more visibility and control you have over context assembly, the easier debugging will be when something goes wrong.</li>
<li>Monitor and instrument everything. In production, log the inputs and outputs (within privacy limits) so you can later analyze them. Use observability tools (like LangSmith, etc.) to trace how context was built for each request. When an output is bad, trace back and see what the model saw—was something missing? Was something formatted poorly? This will guide your fixes. Essentially, treat your AI system as a somewhat unpredictable service that you need to monitor like any other—dashboards for prompt usage, success rates, etc.</li>
<li>Keep the user in the loop. Context engineering isn’t just about machine-machine info; it’s ultimately about solving a user’s problem. Often, the user can provide context if asked the right way. Think about UX designs where the AI asks clarifying questions or where the user can provide extra details to refine the context (like attaching a file, or selecting which codebase section is relevant). The term “AI-assisted” goes both ways—AI assists the user, but the user can assist AI by supplying context. A well-designed system facilitates that. For example, if an AI answer is wrong, let the user correct it and feed that correction back into context for next time.</li>
<li>Train your team (and yourself). Make context engineering a shared discipline. In code reviews, start reviewing prompts and context logic too. (“Is this retrieval grabbing the right docs? Is this prompt section clear and unambiguous?”) If you’re a tech lead, encourage team members to surface issues with AI outputs and brainstorm how tweaking context might fix it. Knowledge sharing is key because the field is new—a clever prompt trick or formatting insight one person discovers can likely benefit others. I’ve personally learned a ton just reading others’ prompt examples and postmortems of AI failures.</li>
</ul>
As we move forward, I expect context engineering to become second nature—much like writing an API call or a SQL query is today. It will be part of the standard repertoire of software development. Already, many of us don’t think twice about doing a quick vector similarity search to grab context for a question; it’s just part of the flow. In a few years, “Have you set up the context properly?” will be as common a code review question as “Have you handled that API response properly?”
In embracing this new paradigm, we don’t abandon the old engineering principles—we reapply them in new ways. If you’ve spent years honing your software craft, that experience is incredibly valuable now: It’s what allows you to design sensible flows, spot edge cases, and ensure correctness. AI hasn’t made those skills obsolete; it’s amplified their importance in guiding AI. The role of the software engineer is not diminishing—it’s evolving. We’re becoming directors and editors of AI, not just writers of code. And context engineering is the technique by which we direct the AI effectively.
Start thinking in terms of what information you provide to the model, not just what question you ask. Experiment with it, iterate on it, and share your findings. By doing so, you’ll not only get better results from today’s AI but also be preparing yourself for the even more powerful AI systems on the horizon. Those who understand how to feed the AI will always have the advantage.
Happy context-coding!
I’m excited to share that I’ve written a new <a href="https://www.oreilly.com/library/view/vibe-coding-the/9798341634749/" target="_blank" rel="noreferrer noopener">AI-assisted engineering book</a> with O’Reilly. If you’ve enjoyed my writing here you may be interested in checking it out.
<hr class="wp-block-separator has-alpha-channel-opacity is-style-wide"/>
AI tools are quickly moving beyond chat UX to sophisticated agent interactions. Our upcoming AI Codecon event, Coding for the Agentic World, will highlight how developers are already using agents to build innovative and effective AI-powered experiences. We hope you’ll join us on September 9 to explore the tools, workflows, and architectures defining the next era of programming. It’s free to attend. <a href="https://www.oreilly.com/AgenticWorld/" target="_blank" rel="noreferrer noopener">Register now to save your seat</a>.
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/context-engineering-bringing-engineering-discipline-to-prompts-part-3/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>Generative AI in the Real World: Understanding A2A with Heiko Hotz and Sokratis Kartakis</title>
<link>https://www.oreilly.com/radar/podcast/generative-ai-in-the-real-world-understanding-a2a-with-heiko-hotz-and-sokratis-kartakis/</link>
<comments>https://www.oreilly.com/radar/podcast/generative-ai-in-the-real-world-understanding-a2a-with-heiko-hotz-and-sokratis-kartakis/#respond</comments>
<pubDate>Thu, 21 Aug 2025 13:12:41 +0000</pubDate>
<dc:creator><![CDATA[Ben Lorica, Heiko Hotz and Sokratis Kartakis]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Generative AI in the Real World]]></category>
<category><![CDATA[Podcast]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?post_type=podcast&p=17315</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2024/01/Podcast_Cover_GenAI_in_the_Real_World-scaled.png"
medium="image"
type="image/png"
/>
<description><![CDATA[Everyone is talking about agents: single agents and, increasingly, multi-agent systems. What kind of applications will we build with agents, and how will we build with them? How will agents communicate with each other effectively? Why do we need a protocol like A2A to specify how they communicate? Join Ben Lorica as he talks with […]]]></description>
<content:encoded><![CDATA[
Everyone is talking about agents: single agents and, increasingly, multi-agent systems. What kind of applications will we build with agents, and how will we build with them? How will agents communicate with each other effectively? Why do we need a protocol like A2A to specify how they communicate? Join Ben Lorica as he talks with Heiko Hotz and Sokratis Kartakis about A2A and our agentic future.
About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.
Check out <a href="https://learning.oreilly.com/playlists/42123a72-1108-40f1-91c0-adbfb9f4983b/?_gl=1*16z5k2y*_ga*MTE1NDE4NjYxMi4xNzI5NTkwODkx*_ga_092EL089CH*MTcyOTYxNDAyNC4zLjEuMTcyOTYxNDAyNi41OC4wLjA." target="_blank" rel="noreferrer noopener">other episodes</a> of this podcast on the O’Reilly learning platform.
<h2 class="wp-block-heading">Timestamps</h2>
<ul class="wp-block-list">
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=0" target="_blank" rel="noreferrer noopener">0:00</a>: Intro to Heiko and Sokratis.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=24" target="_blank" rel="noreferrer noopener">0:24</a>: It feels like we’re in a Cambrian explosion of frameworks. Why agent-to-agent communication? Some people might think we should focus on single-agent tooling first.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=53" target="_blank" rel="noreferrer noopener">0:53</a>: Many developers start developing agents with completely different frameworks. At some point they want to link the agents together. One way is to change the code of your application. But it would be easier if you could get the agents talking the same language. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=103" target="_blank" rel="noreferrer noopener">1:43</a>: Was A2A something developers approached you for?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=113" target="_blank" rel="noreferrer noopener">1:53</a>: It is fair to say that A2A is a forward-looking protocol. We see a future where one team develops an agent that does something and another team in the same organization or even outside would like to leverage that capability. An agent is very different from an API. In the past, this was done via API. With agents, I need a stateful protocol where I send a task and the agent can run asynchronously in the background and do what it needs to do. That’s the justification for the A2A protocol. No one has explicitly asked for this, but we will be there in a few months time. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=235" target="_blank" rel="noreferrer noopener">3:55</a>: For developers in this space, the most familiar is MCP, which is a single agent protocol focused on external tool integration. What is the relationship between MCP and A2A?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=266" target="_blank" rel="noreferrer noopener">4:26</a>: We believe that MCP and A2A will be complementary and not rivals. MCP is specific to tools, and A2A connects agents with each other. That brings us to the question of when to wrap a functionality in a tool versus an agent. If we look at the technical implementation, that gives us some hints when to use each. An MCP tool exposes its capability by a structured schema: I need input A and B and I give you the sum. I can’t deviate from the schema. It’s also a single interaction. If I wrap the same functionality into an agent, the way I expose the functionality is different. A2A expects a natural language description of the agent’s functionality: “The agent adds two numbers.” Also, A2A is stateful. I send a request and get a result. That gives developers a hint on when to use an agent and when to use a tool. I like to use the analogy of a vending machine versus a concierge. I put money into a vending machine and push a button and get something out. I talk to a concierge and say, “I’m thirsty; buy me something to drink.”</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=429" target="_blank" rel="noreferrer noopener">7:09</a>: Maybe we can help our listeners make the notion of A2A even more concrete. I tell nonexperts that you’re already using an agent to some extent. Deep research is an agent. I talk to people building AI tools in finance, and I have a notion that I want to research, but I have one agent looking at earnings, another looking at other data. Do you have a canonical example you use?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=493" target="_blank" rel="noreferrer noopener">8:13</a>: We can parallelize A2A with real business. Imagine separate agents that are different employees with different skills. They have their own business cards. They share the business cards with the clients. The client can understand what tasks they want to do: learn about stocks, learn about investments. So I call the right agent or server to get a specialized answer back. Each agent has a business card that describes its skills and capabilities. I can talk to the agent with live streaming or send it messages. You need to define how you communicate with the agent. And you need to define the security method you will use to exchange messages.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=585" target="_blank" rel="noreferrer noopener">9:45</a>: Late last year, people started talking about single agents. But people were already talking about what the agent stack would be: memory, storage, observability, and so on. Now that you are talking about multi-agents or A2A, are there important things that need to be introduced to the agentic stack?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=632" target="_blank" rel="noreferrer noopener">10:32</a>: You would still have the same. You’d arguably need more. Statefulness, memory, access to tools.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=648" target="_blank" rel="noreferrer noopener">10:48</a>: Is that going to be like a shared memory across agents?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=652" target="_blank" rel="noreferrer noopener">10:52</a>: It all depends on the architecture. The way I imagine a vanilla architecture, the user speaks to a router agent, which is the primary contact of the user with the system. That router agent does very simple things like saying “hello.” But once the user asks the system “Book me a holiday to Paris,” there are many steps involved. (No agent can do this yet). The capabilities are getting better and better. But the way I imagine it is that the router agent is the boss, and two or three remote agents do different things. One finds flights; one books hotels; one books cars—they all need information from each other. The router agent would hold the context for all of those. If you build it all within one agentic framework, it becomes even easier because those frameworks have the concepts of shared memory built in. But it’s not necessarily needed. If the hotel booking agent is built in LangChain and from a different team than the flight booking agent, the router agent would decide what information is needed.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=808" target="_blank" rel="noreferrer noopener">13:28</a>: What you just said is the argument for why you need these protocols. Your example is the canonical simple example. What if my trip involves four different countries? I might need a hotel agent for every country. Because hotels might need to be specialized for local knowledge.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=852" target="_blank" rel="noreferrer noopener">14:12</a>: Technically, you might not need to change agents. You need to change the data—what agent has access to what data. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=869" target="_blank" rel="noreferrer noopener">14:29</a>: We need to parallelize single agents with multi-agent systems; we move from a monolithic application to microservices that have small, dedicated agents to perform specific tasks. This has many benefits. It also makes the life of the developer easier because you can test, you can evaluate, you can perform checks before moving to production. Imagine that you gave a human 100 tools to perform a task. The human will get confused. It’s the same for agents. You need small agents with specific terms to perform the right task. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=931" target="_blank" rel="noreferrer noopener">15:31</a>: Heiko’s example drives home why something like MCP may not be enough. If you have a master agent and all it does is integrate with external sites, but the integration is not smart—if the other side has an agent, that agent could be thinking as well. While agent-to-agent is something of a science fiction at the moment, it does make sense moving forward.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=971" target="_blank" rel="noreferrer noopener">16:11</a>: Coming back to Sokratis’s thought, when you give an agent too many tools and make it try to do too many things, it just becomes more and more likely that by reasoning through these tools, it will pick the wrong tool. That gets us to evaluation and fault tolerance. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1012" target="_blank" rel="noreferrer noopener">16:52</a>: At some point we might see multi-agent systems communicate with other multi-agent systems—an agent mesh.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1025" target="_blank" rel="noreferrer noopener">17:05</a>: In the scenario of this hotel booking, each of the smaller agents would use their own local model. They wouldn’t all rely on a central model. Almost all frameworks allow you to choose the right model for the right task. If a task is simple but still requires an LLM, a small open source model could be sufficient. If the task requires heavy “brain” power, you might want to use Gemini 2.5 Pro.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1087" target="_blank" rel="noreferrer noopener">18:07</a>: Sokratis brought up the word security. One of the earlier attacks against MCP is a scenario when an attacker buries instructions in the system prompt of the MCP server or its metadata, which then gets sent into the model. In this case, you have smaller agents, but something may happen to the smaller agents. What attack scenarios worry you at this point?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1142" target="_blank" rel="noreferrer noopener">19:02</a>: There are many levels at which something might go wrong. With a single agent, you have to implement guardrails before and after each call to an LLM or agent.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1164" target="_blank" rel="noreferrer noopener">19:24</a>: In a single agent, there is one model. Now each agent is using its own model. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1135" target="_blank" rel="noreferrer noopener">19:35</a>: And this makes the evaluation and security guardrails even more problematic. From A2A’s side, it supports all the different security types to authenticate agents, like API keys, HTTP authentication, OAuth 2. Within the agent card, the agent can define what you need to use to use the agent. Then you need to think of this as a service possibility. It’s not just a responsibility of the protocol. It’s the responsibility of the developer.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1229" target="_blank" rel="noreferrer noopener">20:29</a>: It’s equivalent to right now with MCP. There are thousands of MCP servers. How do I know which to trust? But at the same time, there are thousands of Python packages. I have to figure out which to trust. At some level, some vetting needs to be done before you trust another agent. Is that right?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1260" target="_blank" rel="noreferrer noopener">21:00</a>: I would think so. There’s a great article: “<a href="https://elenacross7.medium.com/%EF%B8%8F-the-s-in-mcp-stands-for-security-91407b33ed6b" target="_blank" rel="noreferrer noopener">The S in MCP Stands for Security</a>.” We can’t speak as much to the MCP protocol, but I do believe there have been efforts to implement authentication methods and address security concerns, because this is the number one question enterprises will ask. Without proper authentication and security, you will not have adoption in enterprises, which means you will not have adoption at all. WIth A2A, these concerns were addressed head-on because the A2A team understood that to get any chance of traction, built in security was priority 0. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1325" target="_blank" rel="noreferrer noopener">22:25</a>: Are you familiar with the buzzword “large action models”? The notion that your model is now multimodal and can look at screens and environment states.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1371" target="_blank" rel="noreferrer noopener">22:51</a>: Within DeepMind, we have Project Mariner, which leverages Gemini’s capabilities to ask on your behalf about your computer screen.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1386" target="_blank" rel="noreferrer noopener">23:06</a>: It makes sense that it’s something you want to avoid if you can. If you can do things in a headless way, why do you want to pretend you’re human? If there’s an API or integration, you would go for that. But the reality is that many tools knowledge workers use may not have these features yet. How does that impact how we build agent security? Now that people might start building agents to act like knowledge workers using screens?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1425" target="_blank" rel="noreferrer noopener">23:45</a>: I spoke with a bank in the UK yesterday, and they were very clear that they need to have complete observability on agents, even if that means slowing down the process. Because of regulation, they need to be able to explain every request that went to the LLM, and every action that followed from that. I believe observability is the key in this setup, where you just cannot tolerate any errors. Because it is LLM-based, there will still be errors. But in a bank you must at least be in a position to explain exactly what happened.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1485" target="_blank" rel="noreferrer noopener">24:45</a>: With most customers, whenever there’s an agentic solution, they need to share that they are using an agentic solution and the way [they] are using it is X, Y, and Z. A legal agreement is required to use the agent. The customer needs to be clear about this. There are other scenarios like UI testing where, as a developer, I want an agent to start using my machine. Or an elder who is connected with customer support of a telco to fix a router. This is impossible for a nontechnical person to achieve. The fear is there, like nuclear energy, which can be used in two different ways. It’s the same with agents and GenAI. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1568" target="_blank" rel="noreferrer noopener">26:08</a>: A2A is a protocol. As a protocol, there’s only so much you can do on the security front. At some level, that’s the responsibility of the developers. I may want to signal that my agent is secure because I’ve hired a third party to do penetration testing. Is there a way for the protocol to embed knowledge about the extra step?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1620" target="_blank" rel="noreferrer noopener">27:00</a>: A protocol can’t handle all the different cases. That’s why A2A created the notion of extensions. You can extend the data structure and also the methods or the profile. Within this profile, you can say, “I want all the agents to use this encryption.” And with that, you can tell all your systems to use the same patterns. You create the extension once, you adopt that for all the A2A compatible agents, and it’s ready. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1671" target="_blank" rel="noreferrer noopener">27:51</a>: For our listeners who haven’t opened the protocol, how easy is it? Is it like REST or RPC?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1685" target="_blank" rel="noreferrer noopener">28:05</a>: I personally learned it within half a day. For someone who is familiar with RPC, with traditional internet protocols, A2A is very intuitive. You have a server; you have a client. All you need to learn is some specific concepts, like the agent card. (The agent card itself could be used to signal not only my capabilities but how I have been tested. You can even think of other metrics like uptime and success rate.) You need to understand the concept of a task. And then the remote agent will update on this task as defined—for example, every five minutes or [upon] completion of specific subtasks.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1792" target="_blank" rel="noreferrer noopener">29:52</a>: A2A already supports JavaScript, TypeScript, Python, Java, and .NET. In ADK, the agent development kit, with one line of code we can define a new A2A agent.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1827" target="_blank" rel="noreferrer noopener">30:27</a>: What is the current state of adoption?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1840" target="_blank" rel="noreferrer noopener">30:40</a>: I should have looked at the PyPI download numbers.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1849" target="_blank" rel="noreferrer noopener">30:49</a>: Are you aware of teams or companies starting to use A2A?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1855" target="_blank" rel="noreferrer noopener">30:55</a>: I’ve worked with a customer with an insurance platform. I don’t know anything about insurance, but there’s the broker and the underwriter, which are usually two different companies. They were thinking about building an agent for each and having the agents talk via A2A</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1892" target="_blank" rel="noreferrer noopener">31:32</a>: Sokratis, what about you?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1900" target="_blank" rel="noreferrer noopener">31:40</a>: The interest is there for sure. Three weeks ago, I presented [at] the Google Cloud London Summit with a big customer on the integration of A2A into their agentic platform, and we shared tens of customers, including the announcement from Microsoft. Many customers start implementing agents. At some point they lack integration across business units. Now they see the more agents they build, the more the need for A2A.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Heiko_and%20_Sokratis.mp3#t=1952" target="_blank" rel="noreferrer noopener">32:32</a>: A2A is now in the Linux Foundation, which makes it more attractive for companies to explore, adopt, and contribute to, because it’s no longer controlled by a single entity. So decision making will be shared across multiple entities.</li>
</ul>
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/podcast/generative-ai-in-the-real-world-understanding-a2a-with-heiko-hotz-and-sokratis-kartakis/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>We Are Only Beginning to Understand How to Use AI</title>
<link>https://www.oreilly.com/radar/we-are-only-beginning-to-understand-how-to-use-ai/</link>
<comments>https://www.oreilly.com/radar/we-are-only-beginning-to-understand-how-to-use-ai/#respond</comments>
<pubDate>Thu, 21 Aug 2025 10:32:36 +0000</pubDate>
<dc:creator><![CDATA[Tim O’Reilly]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=17320</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Neanderthal-with-a-laptop.jpg"
medium="image"
type="image/jpeg"
/>
<custom:subtitle><![CDATA[Lessons from Google Docs and Other Internet Innovations]]></custom:subtitle>
<description><![CDATA[I remember once flying to a meeting in another country and working with a group of people to annotate a proposed standard. The convener projected a Word document on the screen and people called out proposed changes, which were then debated in the room before being adopted or adapted, added or subtracted. I kid you […]]]></description>
<content:encoded><![CDATA[
I remember once flying to a meeting in another country and working with a group of people to annotate a proposed standard. The convener projected a Word document on the screen and people called out proposed changes, which were then debated in the room before being adopted or adapted, added or subtracted. I kid you not.
I don’t remember exactly when this was, but I know it was after the introduction of Google Docs in 2005, because I do remember being completely baffled and frustrated that this international standards organization was still stuck somewhere in the previous century.
You may not have experienced anything this extreme, but many people will remember the days of sending around Word files as attachments and then collating and comparing multiple divergent versions. And this behavior also persisted long after 2005. (Apparently, this is still the case in some contexts, such as in parts of the U.S. government.) If you aren’t old enough to have experienced that, consider yourself lucky.
<figure class="wp-block-embed alignleft is-type-rich is-provider-twitter wp-block-embed-twitter"><div class="wp-block-embed__wrapper">
<blockquote class="twitter-tweet" data-width="500" data-dnt="true">I am become human google doc, incorporator of interagency feedback— Dean W. Ball (@deanwball) <a href="https://twitter.com/deanwball/status/1938232721593012619?ref_src=twsrc%5Etfw">June 26, 2025</a></blockquote><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
</div><figcaption class="wp-element-caption"><a href="https://x.com/deanwball/status/1945922786477662655" target="_blank" rel="noreferrer noopener">A note from the development of the White House AI Action Plan</a></figcaption></figure>
This is, in many ways, the point of Arvind Narayanan and Sayash Kapoor’s essay “<a href="https://knightcolumbia.org/content/ai-as-normal-technology" target="_blank" rel="noreferrer noopener">AI as Normal Technology</a>.” There is a long gap between the invention of a technology and a true understanding of how to apply it. One of the canonical examples came at the end of the Second Industrial Revolution. When first electrified, factories duplicated the design of factories powered by coal and steam, where immense central boilers and steam engines distributed mechanical power to various machines by complex arrangements of gears and pulleys. The steam engines were replaced by large electric motors, but the layout of the factory remained unchanged.
<figure class="wp-block-image size-full is-resized"><img fetchpriority="high" decoding="async" width="468" height="328" src="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Were-only-just-beginning-to-understand-AI.png" alt="A marine engine factory in Greenwich, England, 1865" class="wp-image-17325" style="width:808px;height:auto" srcset="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Were-only-just-beginning-to-understand-AI.png 468w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Were-only-just-beginning-to-understand-AI-300x210.png 300w" sizes="(max-width: 468px) 100vw, 468px" /><figcaption class="wp-element-caption"><a href="https://archive.org/details/illustratedlondov47lond/page/377/mode/1up" target="_blank" rel="noreferrer noopener">A marine engine factory in Greenwich, England, 1865</a></figcaption></figure>
Only over time were factories reconfigured to take advantage of small electric motors that could be distributed throughout the factory and incorporated into individual specialized machines. As <a href="https://www.oreilly.com/radar/is-ai-a-normal-technology/" target="_blank" rel="noreferrer noopener">I discussed last week with Arvind Narayanan</a>, there are four stages to every technology revolution: the invention of new technology; the diffusion of knowledge about it; the development of products based on it; and adaptation by consumers, businesses, and society as a whole. All this takes time. I love James Bessen’s framing of this process as “<a href="https://yalebooks.yale.edu/book/9780300195668/learning-by-doing/" target="_blank" rel="noreferrer noopener">learning by doing</a>.” It takes time and shared learning to understand how best to apply a new technology, to <a href="https://www.billcollinsenglish.com/OrdinaryEveningHaven.html" target="_blank" rel="noreferrer noopener">search the possible for its possibleness</a>. People try new things, show them to others, and build on them in a marvelous kind of leapfrogging of the imagination.
So it is no surprise that in 2005 files were still being sent around by email, and that one day a small group of inventors came up with a way to realize the true possibilities of the internet and built an environment where a file could be shared in real time by a set of collaborators, with all the mechanisms of version control present but hidden from view.
On next Tuesday’s episode of <a href="https://www.oreilly.com/live/live-with-tim/" target="_blank" rel="noreferrer noopener">Live with Tim O’Reilly</a>, I’ll be talking with that small group—Sam Schillace, Steve Newman, and Claudia Carpenter—whose company Writely was launched in beta 20 years ago this month. Writely was acquired by Google in March of 2006 and became the basis of Google Docs.
In that same year, Google also reinvented online maps, spreadsheets, and more. It was a year that some fundamental lessons of the internet—already widely available since the early 1990s—finally began to sink in.
Remembering this moment matters a lot, because we are at a similar point today, where we think we know what to do with AI but are still building the equivalent of factories with huge centralized engines rather than truly searching out the possibility of its deployed capabilities. Ethan Mollick recently wrote a wonderful essay about the opportunities (and failure modes) of this moment in “<a href="https://www.oneusefulthing.org/p/the-bitter-lesson-versus-the-garbage" target="_blank" rel="noreferrer noopener">The Bitter Lesson Versus the Garbage Can</a>.” Do we really begin to grasp what is possible with AI or just try to fit it into our old business processes? We have to wrestle with the angel of possibility and remake the familiar into something that at present we can only dimly imagine.
I’m really looking forward to talking with Sam, Steve, Claudia, and those of you who attend, to reflect not just on their achievement 20 years ago but also on what it can teach us about the current moment. <a href="https://www.oreilly.com/live/live-with-tim/" target="_blank" rel="noreferrer noopener">I hope you can join us</a>.
<hr class="wp-block-separator has-alpha-channel-opacity is-style-wide"/>
AI tools are quickly moving beyond chat UX to sophisticated agent interactions. Our upcoming AI Codecon event, Coding for the Agentic World, will highlight how developers are already using agents to build innovative and effective AI-powered experiences. We hope you’ll join us on September 9 to explore the tools, workflows, and architectures defining the next era of programming. It’s free to attend. <a href="https://www.oreilly.com/AgenticWorld/" target="_blank" rel="noreferrer noopener">Register now to save your seat</a>.
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/we-are-only-beginning-to-understand-how-to-use-ai/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>From Automation to Insight</title>
<link>https://www.oreilly.com/radar/from-automation-to-insight/</link>
<comments>https://www.oreilly.com/radar/from-automation-to-insight/#respond</comments>
<pubDate>Wed, 20 Aug 2025 10:20:16 +0000</pubDate>
<dc:creator><![CDATA[David Michelson]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=17311</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/AI-data-wrangling.jpg"
medium="image"
type="image/jpeg"
/>
<custom:subtitle><![CDATA[Using AI to Keep Up with Our Authors]]></custom:subtitle>
<description><![CDATA[As an acquisitions editor at O’Reilly, I spend considerable time tracking our authors’ digital footprints. Their social media posts, speaking engagements, and online thought leadership don’t just reflect expertise—they directly impact book sales and reveal promotional strategies worth replicating. Not surprisingly, some of our best-selling authors are social media mavens whose posting output is staggering. […]]]></description>
<content:encoded><![CDATA[
As an acquisitions editor at O’Reilly, I spend considerable time tracking our authors’ digital footprints. Their social media posts, speaking engagements, and online thought leadership don’t just reflect expertise—they directly impact book sales and reveal promotional strategies worth replicating. Not surprisingly, some of our best-selling authors are social media mavens whose posting output is staggering. Keeping up with multiple superposters across platforms quickly becomes unsustainable.
I recently built an AI solution to manage this challenge. Using Relay.app, I created a simple workflow to scrape LinkedIn posts from one author (let’s call her Bridget), analyze them with ChatGPT, and send me weekly email summaries about her posts and which got the most attention. The main goal was to follow what she said about her book, followed by thought leadership in her field. The setup took five minutes and worked immediately. No more periodically reviewing her profile or worrying about missing important posts.
But by the second summary, some limitations became apparent. Sorted by likes and impressions with generic summaries, every LinkedIn post was receiving the same treatment. I had solved the information overload problem but now needed a way to extract strategic insight.
To fix this, I worked with Claude to turn the prompt into something closer to an agent with basic decision-making authority. I gave it specific goals and decision criteria aimed at shedding light on promotional patterns that are not always easy to follow, let alone analyze, in a flurry of posts: autonomously select 10–15 priority posts per week, prioritizing direct book mentions; compare current performance against historical baselines; flag unusual engagement patterns (both positive and negative); and automatically adjust analysis depth based on how posts are performing.
The new report now provides deeper analysis focused primarily on posts mentioning the book, not just any popular post, along with strategic recommendations to improve post performance instead of “this had the most likes.” Recommendations are sorted into short-term and long-term promotion ideas, and it has even proposed testing novel strategies such as posting short video clips related to book chapters or incentive-driven posts.
The report isn’t perfect. The historical analysis isn’t quite right yet, and I’m working on generating visualizations. At the very least, it’s saving me time by automating the delivery and analysis of information I would otherwise have to get manually (and possibly overlook), and it is beginning to provide a starting point for understanding what has worked in Bridget’s promotional program. Over time, with further work, these insights could be shared with the author to plan promotional campaigns for new books, or incorporated into larger comparisons of promotional strategies between authors.
While working on this, I’ve asked myself: Is this an AI-enhanced automated workflow? An agent? An agentic workflow? Does it matter?
For my purposes, I don’t think it does. Sometimes you need simple automation to capture information you might miss. Sometimes you need more goal-directed, flexible analysis that results in deeper insight and strategic recommendations. More of a helpful assistant working behind the scenes week after week on your behalf. But getting caught up in definitions and labels can be a distraction. As AI tools become more accessible to everyone in the workplace, a more valuable focus is found in building solutions that address your specific problems using these new tools—whatever you might call them.
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/from-automation-to-insight/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>Why AI-Driven Client Apps Don’t Understand Your API</title>
<link>https://www.oreilly.com/radar/why-ai-driven-client-apps-dont-understand-your-api/</link>
<comments>https://www.oreilly.com/radar/why-ai-driven-client-apps-dont-understand-your-api/#respond</comments>
<pubDate>Tue, 19 Aug 2025 12:22:02 +0000</pubDate>
<dc:creator><![CDATA[Mike Amundsen]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=17303</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/A-humanoid-robot-scratches-its-head-in-confusion.jpg"
medium="image"
type="image/jpeg"
/>
<description><![CDATA[Recent surveys point to a massive growth in AI-driven bots crawling the internet looking for APIs. While many of these have malicious intent, a growing number are well-meaning API consumers just trying to discover, consume, and benefit from existing APIs. And, increasingly, these API requests are coming from Model Context Protocol (MCP)-driven platforms designed to […]]]></description>
<content:encoded><![CDATA[
Recent surveys point to a massive growth in <a href="https://www.darkreading.com/vulnerabilities-threats/ai-bad-bots-are-taking-over-web" target="_blank" rel="noreferrer noopener">AI-driven bots crawling the internet</a> looking for APIs. While many of these have malicious intent, a growing number are well-meaning API consumers just trying to discover, consume, and benefit from existing APIs. And, increasingly, these API requests are coming from <a href="https://github.com/modelcontextprotocol" target="_blank" rel="noreferrer noopener">Model Context Protocol</a> (MCP)-driven platforms designed to enable autonomous software to interact directly with web APIs.
And, if recent statistics are any guide, they’re <a href="https://arxiv.org/pdf/2503.13657" target="_blank" rel="noreferrer noopener">struggling</a>. The success rate for multistep AI-driven API workflows is <a href="https://arxiv.org/pdf/2412.14161" target="_blank" rel="noreferrer noopener">about 30%</a>. Worse, these clients often don’t give up. Instead, they keep trying—and failing—to interact with your APIs, driving up traffic while driving down the overall value proposition of target APIs.
So, what’s happening here? Why are AI-driven clients unable to take advantage of today’s APIs? And what will it take to turn this around?
It turns out the answer has been there all along. The things that AI-driven API consumers need are the same things that human developers need: clarity, context, and meaningful structure. Yet many companies still aren’t paying attention. And, as we learned back in 2017, “Attention is all you need.”
<h2 class="wp-block-heading">Are You Paying Attention?</h2>
The landmark 2017 paper “<a href="https://arxiv.org/abs/1706.03762" target="_blank" rel="noreferrer noopener">Attention Is All You Need</a>” introduced the world to the notion of <a href="https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)" target="_blank" rel="noreferrer noopener">transformers</a>. In the world of AI, a transformer is a model where words are mathematically scored based on their relationships to other words in the surrounding content. This scoring, referred to as attention, makes it possible for programs that use transformers (like <a href="https://chatgpt.com/" target="_blank" rel="noreferrer noopener">ChatGPT</a>) to produce responses that feel remarkably coherent to human readers.
The ability to use transformers to drive generative AI tools makes it imperative that we all rethink the way we design, document, and implement our APIs. In a nutshell, transformers pay attention to all the content they have access to, but they don’t understand any of it. Even more to the point, GenAI platforms like ChatGPT, <a href="https://claude.ai/" target="_blank" rel="noreferrer noopener">Claude</a>, <a href="https://gemini.google.com/app" target="_blank" rel="noreferrer noopener">Gemini</a>, and <a href="https://copilot.microsoft.com/" target="_blank" rel="noreferrer noopener">Copilot</a> can easily pay attention to your API design. They can identify the URLs, the HTTP methods, the inputs, the schema, and the expected outputs. But they can’t perform any reasoning about which API to use and what the content in the returned body actually means.
Essentially, today’s AI-driven bots are fast and flexible API consumers that can’t find their way out of a wet paper bag. The good news is that we can take advantage of an AI-driven client’s skills at paying attention and add support within our API design to make up for its inability to make wise choices.
And that is a clear recipe for making your APIs AI-ready.
<h2 class="wp-block-heading">Things You Can Do Now to Level the Playing Field</h2>
Since AI-driven API clients are going to be good at pattern-matching, recognizing repeated content, and making associations based on context, we can use those skills to fill in the gaps LLM apps have regarding decision making, meaning, and understanding.
Below are four practices that we already know make it easier for human developers to understand and use our APIs. It turns out these are the same things that will help AI-driven API clients be more successful too.
<ul class="wp-block-list">
<li>Be explicit: Don’t assume clients understand what this API does.</li>
<li>Tell them why: Provide clear descriptions of why and when clients might use the API.</li>
<li>Be consistent: The more your API looks like the thousands of others in the LLM’s training data, the better.</li>
<li>Make error responses actionable: Provide clear, consistent, detailed feedback that makes it easier to resolve runtime errors.</li>
</ul>
Let’s look at each of these in turn.
<h4 class="wp-block-heading">Be explicit</h4>
Unlike humans, machines are not intuitive explorers. While they are great at parsing text and making associations, machines don’t make intuitive leaps. Instead, machines need explicit affordances; clues about what can be accomplished, how to do it, and why you might want to execute an action. The classic human-centric approach of designing and documenting an API is captured in this terse list:
<ul class="wp-block-list">
<li class="has-black-color has-text-color has-link-color wp-elements-1bcaf7c8776bba0a622bcad66dbd5122"><code>GET /customers/</code></li>
<li class="has-black-color has-text-color has-link-color wp-elements-c408fcd964288ec93c81be0feff2b98f"><code>GET /customers/{id}</code></li>
<li class="has-black-color has-text-color has-link-color wp-elements-aea08b2051a92441ae146f5a1ff8eb68"><code>POST /customers/</code></li>
<li class="has-black-color has-text-color has-link-color wp-elements-049eb9062fa187d8e59eea310057a7c6"><code>PUT /customers/{id}</code></li>
<li class="has-black-color has-text-color has-link-color wp-elements-c1bfb5cdc21595201d40920bb06a8464"><code>DELETE /customers/{id}</code></li>
</ul>
Most humans know exactly what this list is communicating; the full list of available operations for managing a collection of <code>customer</code> records. Humans would look in other places in the API design documentation to determine the required and optional data properties to pass for each action as well as the format in which to cast the interactions (JSON, XML, HTML, etc.).
But machines can’t be trusted to exhibit that level of understanding and curiosity. They’re more likely to just make some “statistical guesses” about what this table represents and how to use it. To increase the chances of success and reduce the likelihood of mistakes, it is better to be much more explicit in your API documentation for machines. As in the following documentation example that is tuned for LLM consumption:
<ul class="wp-block-list">
<li>To retrieve a list of customer records use <code>GET /customers/</code></li>
<li>To retrieve a single customer record use <code>GET /customers/{id}</code> while supplying the proper value of <code>{id}</code></li>
<li>To create a new customer record use <code>POST /customers/</code> with the <code>createCustomer</code> schema</li>
<li>To update an existing customer record use <code>PUT /customers/{id}</code> with the <code>updateCustomer</code> schema while supplying the proper value for <code>{id}</code></li>
<li>To remove a customer record from the collection use <code>DELETE /customers/{id}</code> while supplying the proper value for <code>{id}</code></li>
</ul>
While these two lists essentially carry the same meaning for humans, the second list is much more helpful for machine-driven API clients.
<h4 class="wp-block-heading">Tell them why</h4>
Focusing on being explicit is a great way to improve the success rate of AI-driven client applications. Another way you can do this is to provide details on why an API client might want to use a particular API endpoint. It is important to keep in mind that AI-driven clients are pretty good at guessing how an API can be used but these same LLMs are not very good at figuring out why they should be used. You can fix that by adding text that explains the common uses for each API endpoint.
For example, in your documentation, include phrases such as “Use the <code>PriorityAccounts</code> endpoint to identify the top ten customers based on market size.” Or “Use the <code>submitApplication</code> endpoint once all the other steps in the employee application process have been completed.” These descriptions provide additional hints to API consumers on why or even when the APIs will be most helpful.
Note that, in both cases, the text identifies the endpoint by name and explains the reason an API client might use that API. AI-powered clients—especially those backed by LLMs—are very good at recognizing text like this and associating it with other text in your documentation such as the list we reviewed in the previous section.
<h4 class="wp-block-heading">Be predictable</h4>
The real power behind LLM-based client applications is found in all the documents and code these language models have scooped up as training data. All the books, papers, and source code fed into LLM databases provide statistical context for any new text your API documentation provides. It is the accumulated historical effort of thousands of writers, programmers, and software architects that makes it possible for AI clients to interact with your API.
And those interactions will be much smoother if your API looks a lot like all those other APIs it was fed as training data. If your API design contains lots of unique elements, unexpected responses, or nontraditional use of common protocols, AI-driven applications will have a harder time interacting with it.
For example, while it is perfectly “correct” to use HTTP <code>PUT</code> to create new records and HTTP <code>PATCH</code> to update existing records, most HTTP APIs use the <code>POST</code> to create records and PUT to update them. If your API relies solely on a unique way to use <code>PUT</code> and <code>PATCH</code> operations you are probably making things harder on your AI-driven apps and reducing your chances of success. Or, if your API is exclusively dependent on a set of <a href="https://www.w3.org/TR/xmlschema11-1/" target="_blank" rel="noreferrer noopener">XML-based Schema Definition</a> documents, AI-powered API clients that have been trained on thousands of lines of <a href="https://json-schema.org/specification" target="_blank" rel="noreferrer noopener">JSON Schema</a> might not recognize your API input and output objects and could make mistakes when attempting to add or update data for your API.
Whenever possible, take advantage of common patterns and implementation details when building your API. That will better ensure AI clients can recognize and successfully interact with your services.
<h4 class="wp-block-heading">Make error responses actionable</h4>
When humans encounter errors in user interfaces, they usually can scan the displayed error information, compare it to the data they already typed in, and come up with a solution to resolve the error and continue using the service. That is not very easy for machine-driven API clients to handle. They don’t have the ability to scan the unexpected response, derive meaning, and then formulate a creative solution. Instead they either try again (maybe with some random changes) or just give up.
When designing your APIs to support machine-driven clients, it is important to apply the same three rules we’ve already mentioned (be explicit, tell them why, and be predictable) when API clients encounter errors.
First, make sure the client application recognizes the error situation. For API clients, this is more than just returning HTTP status <code>400</code>. You should also include a formatted document that identifies and explains the details of the error. A great way to accomplish this is to use the Problem Details for HTTP APIs specification (<a href="https://datatracker.ietf.org/doc/html/rfc9457" target="_blank" rel="noreferrer noopener">RFC9457</a>) format. This response gives you a structured way to identify the problem and suggest a possible change in order to resolve the error.
<figure class="wp-block-image size-full is-resized"><img decoding="async" width="499" height="155" src="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/JSON-input.png" alt="JSON input" class="wp-image-17304" style="width:628px;height:auto" srcset="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/JSON-input.png 499w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/JSON-input-300x93.png 300w" sizes="(max-width: 499px) 100vw, 499px" /></figure>
Note that this response also meets our criteria for the second rule (Tell them why). This update failed because a field was missing and that field is <code>hatsize</code>. The error report even tells the machine what they can do in order to make another attempt at updating the record.
Another advantage of using the RFC9457 format is that it helps us meet the third rule (Be consistent). This RFC is a common specification found in many API examples and is quite likely that the LLM’s training data contains lots of these responses. It is better to use this existing error format instead of relying on one you created yourself.
Finally, it is a good idea to design your APIs to treat errors as partial attempts. Most of the time, API errors are just simple mistakes caused by inconsistent or missing documentation and/or inexperienced developers. Providing explicit error information not only helps resolve the problem more easily, it offers an opportunity to “retrain” machine clients by populating the machine’s local context with examples of how to resolve errors in the future.
Remember, LLM-based clients are great at recognizing patterns. You can use that when you design your APIs too.
<h2 class="wp-block-heading">Pay Attention to Your AI-Driven API Consumers</h2>
As mentioned at the start of this article, the things identified here as a way to improve your interactions with AI-driven API clients are all practices that have been suggested in the past for improving the design of APIs for human interaction.
Being explicit cuts down on the cognitive load for developers and helps them focus on the creative problem-solving work needed to use your API to solve their immediate problem.
Telling them why makes it easier for developers to identify the APIs they need and to better understand the way they work and when they can be applied.
Being consistent is another way to reduce cognitive load for programmers and provide a more “intuitive” experience when using your API.
And making error responses actionable leads to better error feedback and more consistent error resolution both at runtime and design time.
Finally, all these practices work better when you keep a close eye on the way API clients (both human- and AI-driven) actually use your service. Make note of which endpoints are commonly used. Identify persistent error conditions and how they get resolved. And keep track of API client traffic as a way to gauge which APIs provide the most return for your effort and which are more trouble than they are worth. Quality monitoring of your APIs will help you better understand who is using them and what kinds of trouble they are having. That will give you clues on how you can redesign your APIs in the future to improve the experience for everyone.
Whether you’re supporting human-driven API consumption or machine-driven clients, paying attention can pay off handsomely.
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/why-ai-driven-client-apps-dont-understand-your-api/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>Is AI a “Normal Technology”?</title>
<link>https://www.oreilly.com/radar/is-ai-a-normal-technology/</link>
<comments>https://www.oreilly.com/radar/is-ai-a-normal-technology/#respond</comments>
<pubDate>Tue, 19 Aug 2025 10:48:00 +0000</pubDate>
<dc:creator><![CDATA[Tim O’Reilly]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=17297</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Colorful-wiry-waves.jpg"
medium="image"
type="image/jpeg"
/>
<description><![CDATA[We think we see the world as it is, but in fact we see it through a thick fog of received knowledge and ideas, some of which are right and some of which are wrong. Like maps, ideas and beliefs shape our experience of the world. The notion that AI is somehow unprecedented, that artificial […]]]></description>
<content:encoded><![CDATA[
We think we see the world as it is, but in fact we see it through a thick fog of received knowledge and ideas, some of which are right and some of which are wrong. Like maps, ideas and beliefs shape our experience of the world. The notion that AI is somehow unprecedented, that artificial general intelligence is just around the corner and leads to a singularity beyond which everything is different, is one such map. It has shaped not just technology investment but government policy and economic expectations. But what if it’s wrong?
The best ideas help us see the world more clearly, cutting through the fog of hype. That’s why I was so excited to read Arvind Narayanan and Sayash Kapoor’s essay “<a href="https://knightcolumbia.org/content/ai-as-normal-technology" target="_blank" rel="noreferrer noopener">AI as Normal Technology</a>.” They make the case that while AI is indeed transformational, it is far from unprecedented. Instead, it is likely to follow much the same patterns as other profound technology revolutions, such as electrification, the automobile, and the internet. That is, the tempo of technological change isn’t set by the pace of innovation but rather by the pace of adoption, which is gated by economic, social, and infrastructure factors, and by the need of humans to adapt to the changes. (In some ways, this idea echoes Stewart Brand’s notion of “<a href="https://jods.mitpress.mit.edu/pub/issue3-brand/release/2" target="_blank" rel="noreferrer noopener">pace layers</a>.”)
<h2 class="wp-block-heading">What Do We Mean by “Normal Technology”?</h2>
Arvind Narayanan is a professor of computer science at Princeton who also thinks deeply about the impact of technology on society and the policy issues it raises. He joined me last week on <a href="https://www.oreilly.com/live/live-with-tim/" target="_blank" rel="noreferrer noopener">Live with Tim O’Reilly</a> to talk about his ideas. I started out by asking him to explain what he means by “normal technology.” Here’s a shortened version of his reply. (You can watch a more complete video answer and my reply <a href="https://youtu.be/Q9OHYw7Lyko" target="_blank" rel="noreferrer noopener">here</a>.)
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
There is, it turns out, a well-established theory of the way in which technologies are adopted and diffused throughout society. The key thing to keep in mind is that the logic behind the pace of advances in technology capabilities is different from the logic behind the way and the speed in which technology gets adopted. That depends on the rate at which human behavior can change. And organizations can figure out new business models. And I don’t mean the AI companies. There’s too much of a focus on the AI companies in thinking about the future of AI. I’m talking about all the other companies who are going to be deploying AI. So we present a four-stage framework. The first stage is invention. So this is improvements in model capabilities.…The model capabilities themselves have to be translated into products. That’s the second stage. That’s product development. And we’re still early in the second stage of figuring out what the right abstractions are, through which this very unreliable technology of large language models ([as] one prominent type of AI) can be fit into what we have come to expect from software, which is that it should work very deterministically, which is that users, once they’ve learned how to do something, their expectations will be fulfilled. And when those expectations are violated, we see that AI product launches have gone very horribly.…Stage three is diffusion. It starts with early users figuring out use cases, workflows, risks, how to route around that.…And the last and most time-consuming step is adaptation. So not only do individual users need to adapt; industries as a whole need to adapt. In some cases, laws need to adapt.
</blockquote>
We talked a bit about how that has happened in the past, using electrification as one well-known example. The first stage of the Industrial Revolution was powered by coal and steam, in factories with big, centralized power plants. Early attempts at factory electrification didn’t provide all that much advantage. It was only when they realized that electricity made it possible to easily distribute power to small, specialized machines to different factory functions that the second industrial revolution really took off.
Arvind made it real by talking about how AI might change software. It’s not about replacing programmers, he thinks, but about expanding the footprint of software customization.
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
So some people hope that in the future it becomes possible that just like we can vibe code small apps it becomes possible to build much more complex pieces of enterprise software just based on a prompt. Okay, suppose that’s possible.…I claim that in that world, it will make no sense for these enterprise software companies to build software once and then force thousands of different clients to use it to adjust their workflows to the abstractions defined in the software. That’s not going to be how we’ll use software in this future world.
What will happen is that developers are going to work with each downstream client, understand their requirements, and then perhaps generate software for them on the spot to meet a particular team’s needs or a particular company’s needs, or even perhaps a particular individual’s needs. So this is a complete, very conceptual revision of what enterprise software even means. And this is the kind of thing that we think is going to take decades. And it has little to do with the rate of AI capability improvement.
</blockquote>
This is a great example of what I mean by ideas as tools for seeing and responding to the world more effectively. The “normal technology” map will lead investors and entrepreneurs to make different choices than those who follow the “AI singularity” map. Over the long run, those who are guided by the more accurate map will end up building lasting businesses, while the others will end up as casualties of the bubble.
<figure class="wp-block-pullquote"><blockquote>We’ll be talking more deeply about how AI is changing the software industry at our second AI Codecon, coming up on September 9: <a href="https://www.oreilly.com/AgenticWorld/" target="_blank" rel="noreferrer noopener">Coding for the Agentic World</a>.</blockquote></figure>
<h2 class="wp-block-heading">Physical and Behavioral Constraints on AI Adoption</h2>
We also talked a bit about physical constraints (though I have to confess that this was more my focus than his). For example, the flowering of the 20th century automobile economy required the development of better roads, better tires, improvements to brakes, lights, and engines, refinement and distribution networks for gasoline, the reshaping of cities, and far more. We see this today in the bottlenecks around GPUs, around data center construction, around power. All of these things take time to get built.
Arvind’s main focus was on behavioral issues retarding adoption. He gave a great example:
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
So there’s these “reasoning models.” (Whether they’re actually reasoning is a different question.)…Models like o3, they’re actually very useful. They can do a lot of things that nonreasoning models can’t. And they started to be released around a year ago. And it turns out, based on Sam Altman’s own admission, that in the free tier of ChatGPT, less than 1% of users were using them per day. And in the pay tier, less than 7% of users were using them.…So this shows you how much diffusion lags behind capabilities. It’s exactly an illustration of the point that diffusion—changes to user workflows, learning new skills, those kinds of things—are the real bottleneck.
</blockquote>
And of course, the user backlash about the loss of the “personality” of GPT-4 drives this home even more, and raises a whole lot of new uncertainty. I thought Arvind nailed it when he called personality changes “a whole new switching cost.”
<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe title="AI's Unexpected Switching Cost—Arvind Narayanan Live with Tim O'Reilly" width="500" height="281" src="https://www.youtube.com/embed/mphW-gDtwvU?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</div></figure>
It is because AI is a normal technology that Arvind also thinks fears of AI running amok are overblown:
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
We don’t think the arrival of recursive self-improvement, for instance, if that were to happen, will be an exception to these patterns. We talk a lot about AI safety in the paper. We’re glad that many people are thinking carefully about AI safety. We don’t think it requires any extraordinary steps like pausing AI or banning open source AI or things like that. Safety is amenable to well-understood market and regulatory interventions.
When we say AI as normal technology, it’s not just a prediction about the future. One of the core points of the paper is that we have the agency to shape it as normal technology. We have the agency to ensure that the path through which it diffuses through society is not governed by the logic of the technology itself but rather by humans and institutions.
</blockquote>
I agree. Human agency in the face of AI is also one of the deep currents in my book <a href="https://www.oreilly.com/tim/wtf-book.html" target="_blank" rel="noreferrer noopener">WTF? What’s the Future and Why It’s Up to Us</a>.
<h2 class="wp-block-heading">AI KPIs and the “Golden Rule”</h2>
One of my favorite moments was when one of the attendees asked if a good guide to the KPIs used by AI companies oughtn’t to be what they would want the AI to do for themselves, their children, and their loved ones. This, of course, is not only a version of the <a href="https://en.wikipedia.org/wiki/Golden_Rule" target="_blank" rel="noreferrer noopener">Golden Rule</a>, found in many religions and philosophies, but really good practical business advice. My own philosophical mentor Lao Tzu once wrote, “Fail to honor people, they fail to honor you.” And also this: “Losing the way of life, people rely on goodness. Losing goodness, they rely on laws.” (That’s my own loose retranslation of <a href="https://terebess.hu/english/tao/bynner.html" target="_blank" rel="noreferrer noopener">Witter Bynner’s version</a>.) I first thought of the relevance of this quote in the days of my early open source activism. While others were focused on free and open source licenses (laws) as the key to its success, I was interested in figuring out why open source would win just by being better for people—matching “the way of life,” so to speak. Science, not religion.
<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="Build Your KPIs Around the Golden Rule—Arvind Narayanan Live with Tim O'Reilly" width="500" height="281" src="https://www.youtube.com/embed/1dVvcMXvMH0?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</div></figure>
<h2 class="wp-block-heading">Why Labor Law, Not Copyright, May Be the Key to AI Justice</h2>
In response to an attendee question about AI and copyright, Arvind once again demonstrated his ability to productively reframe the issue:
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
While my moral sympathies are with the plaintiffs in this case, I don’t think copyright is the right way to bring justice to the authors and photographers and publishers and others who genuinely, I think, have been wronged by these companies using their data without consent or compensation. And the reason for that is that it’s a labor issue. It’s not something that copyright was invented to deal with, and even if a future ruling goes a different way, I think companies will be able to adapt their processes so that they stay clear of copyright law while nonetheless essentially leaving their business model unchanged. And unless you can change their business model, force them to negotiate with these creators—with the little guy, basically—and work out a just compensation agreement, I don’t think justice will be served.
</blockquote>
When the <a href="https://www.brookings.edu/articles/hollywood-writers-went-on-strike-to-protect-their-livelihoods-from-generative-ai-their-remarkable-victory-matters-for-all-workers/" target="_blank" rel="noreferrer noopener">screenwriters guild went on strike about AI and won</a>, they showed just how right he is in this reframing. That case has faded from the headlines, but it provides a way forward to a fairer AI economy.
<h2 class="wp-block-heading">AI and Continuous Learning</h2>
We ended with another attendee question, about what kids should learn now to be ready for the future.
<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="The Skill Kids Need to Learn in the Age of AI—Arvind Narayanan Live with Tim O'Reilly" width="500" height="281" src="https://www.youtube.com/embed/Tz2xZnvjoGg?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</div></figure>
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
We have, in my view, a weird education system. And I’ve said this publicly for as long as I’ve been a professor, this concept that you stay in school for 20 years or whatever, right through the end of college, and then you’re fully trained, and then you go off into the workforce and just use those skills that you once learned.
Obviously, we know that the world doesn’t work like that. And that’s a big part of the reason why the college experience is so miserable for so many students. Because they’d actually rather be doing stuff instead of in this decontextualized environment where they’re supposed to just passively absorb information for using it some day in the future.
So I think AI is an opportunity to fix this deeply broken approach to education. I think kids can start making meaningful contributions to the world, much earlier than they’re expected to.
So that’s one half of the story. You can learn much better when you’re actually motivated to produce something useful. In the second half of the story it’s more true than ever that we should never stop learning.
</blockquote>
But it is time to stop my summary! If you are a subscriber, or signed up to watch the episode, you should have access to the full recording <a href="https://learning.oreilly.com/videos/live-with-tim/0642572022182/" target="_blank" rel="noreferrer noopener">here</a>.
<hr class="wp-block-separator has-alpha-channel-opacity is-style-wide"/>
AI tools are quickly moving beyond chat UX to sophisticated agent interactions. Our upcoming AI Codecon event, Coding for the Agentic World, will highlight how developers are already using agents to build innovative and effective AI-powered experiences. We hope you’ll join us on September 9 to explore the tools, workflows, and architectures defining the next era of programming. It’s free to attend. <a href="https://www.oreilly.com/AgenticWorld/" target="_blank" rel="noreferrer noopener">Register now to save your seat</a>.
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/is-ai-a-normal-technology/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>Context Engineering: Bringing Engineering Discipline to Prompts—Part 2</title>
<link>https://www.oreilly.com/radar/context-engineering-bringing-engineering-discipline-to-prompts-part-2/</link>
<comments>https://www.oreilly.com/radar/context-engineering-bringing-engineering-discipline-to-prompts-part-2/#respond</comments>
<pubDate>Mon, 18 Aug 2025 10:45:01 +0000</pubDate>
<dc:creator><![CDATA[Addy Osmani]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=17291</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Painting-AI.jpg"
medium="image"
type="image/jpeg"
/>
<custom:subtitle><![CDATA[The Art and Science of Effective Context]]></custom:subtitle>
<description><![CDATA[The following is Part 2 of 3 from Addy Osmani’s original post “Context Engineering: Bringing Engineering Discipline to Parts.” Part 1 can be found here. Great context engineering strikes a balance—include everything the model truly needs but avoid irrelevant or excessive detail that could distract it (and drive up cost). As Andrej Karpathy described, context […]]]></description>
<content:encoded><![CDATA[
The following is Part 2 of 3 from Addy Osmani’s original post “<a href="https://addyo.substack.com/p/context-engineering-bringing-engineering" target="_blank" rel="noreferrer noopener">Context Engineering: Bringing Engineering Discipline to Parts</a>.” Part 1 can be found <a href="https://www.oreilly.com/radar/context-engineering-bringing-engineering-discipline-to-prompts-part-1/" target="_blank" rel="noreferrer noopener">here</a>.
Great context engineering strikes a balance—include everything the model truly needs but avoid irrelevant or excessive detail that could distract it (and drive up cost).
As Andrej Karpathy described, context engineering is a delicate mix of science and art.
The “science” part involves following certain principles and techniques to systematically improve performance. For example, if you’re doing code generation, it’s almost scientific that you should include relevant code and error messages; if you’re doing question-answering, it’s logical to retrieve supporting documents and provide them to the model. There are established methods like few-shot prompting, retrieval-augmented generation (RAG), and chain-of-thought prompting that we know (from research and trial) can boost results. There’s also a science to respecting the model’s constraints—every model has a context length limit, and overstuffing that window can not only increase latency/cost but potentially degrade the quality if the important pieces get lost in the noise.
Karpathy summed it up well: “Too little or of the wrong form and the LLM doesn’t have the right context for optimal performance. Too much or too irrelevant and the LLM costs might go up and performance might come down.”
So the science is in techniques for selecting, pruning, and formatting context optimally. For instance, using embeddings to find the most relevant docs to include (so you’re not inserting unrelated text) or compressing long histories into summaries. Researchers have even catalogued failure modes of long contexts—things like context poisoning (where an earlier hallucination in the context leads to further errors) or context distraction (where too much extraneous detail causes the model to lose focus). Knowing these pitfalls, a good engineer will curate the context carefully.
Then there’s the “art” side—the intuition and creativity born of experience.
This is about understanding LLMs’ quirks and subtle behaviors. Think of it like a seasoned programmer who “just knows” how to structure code for readability: An experienced context engineer develops a feel for how to structure a prompt for a given model. For example, you might sense that one model tends to do better if you first outline a solution approach before diving into specifics, so you include an initial step like “Let’s think step by step…” in the prompt. Or you notice that the model often misunderstands a particular term in your domain, so you preemptively clarify it in the context. These aren’t in a manual—you learn them by observing model outputs and iterating. This is where prompt-crafting (in the old sense) still matters, but now it’s in service of the larger context. It’s similar to software design patterns: There’s science in understanding common solutions but art in knowing when and how to apply them.
Let’s explore a few common strategies and patterns context engineers use to craft effective contexts:
Retrieval of relevant knowledge: One of the most powerful techniques is retrieval-augmented generation. If the model needs facts or domain-specific data that isn’t guaranteed to be in its training memory, have your system fetch that info and include it. For example, if you’re building a documentation assistant, you might vector-search your documentation and insert the top matching passages into the prompt before asking the question. This way, the model’s answer will be grounded in real data you provided rather than in its sometimes outdated internal knowledge. Key skills here include designing good search queries or embedding spaces to get the right snippet and formatting the inserted text clearly (with citations or quotes) so the model knows to use it. When LLMs “hallucinate” facts, it’s often because we failed to provide the actual fact—retrieval is the antidote to that.
Few-shot examples and role instructions: This hearkens back to classic prompt engineering. If you want the model to output something in a particular style or format, show it examples. For instance, to get structured JSON output, you might include a couple of example inputs and outputs in JSON in the prompt, then ask for a new one. Few-shot context effectively teaches the model by example. Likewise, setting a system role or persona can guide tone and behavior (“You are an expert Python developer helping a user…”). These techniques are staples because they work: They bias the model toward the patterns you want. In the context-engineering mindset, prompt wording and examples are just one part of the context, but they remain crucial. In fact, you could say prompt engineering (crafting instructions and examples) is now a subset of context engineering—it’s one tool in the toolkit. We still care a lot about phrasing and demonstrative examples, but we’re also doing all these other things around them.
Managing state and memory: Many applications involve multiple turns of interaction or long-running sessions. The context window isn’t infinite, so a major part of context engineering is deciding how to handle conversation history or intermediate results. A common technique is summary compression—after each few interactions, summarize them and use the summary going forward instead of the full text. For example, Anthropic’s Claude assistant automatically does this when conversations get lengthy, to avoid context overflow. (You’ll see it produce a “[Summary of previous discussion]” that condenses earlier turns.) Another tactic is to explicitly write important facts to an external store (a file, database, etc.) and then later retrieve them when needed rather than carrying them in every prompt. This is like an external memory. Some advanced agent frameworks even let the LLM generate “notes to self” that get stored and can be recalled in future steps. The art here is figuring out what to keep, when to summarize, and how to resurface past info at the right moment. Done well, it lets an AI maintain coherence over very long tasks—something that pure prompting would struggle with.
Tool use and environmental context: Modern AI agents can use tools (e.g., calling APIs, running code, web browsing) as part of their operations. When they do, each tool’s output becomes new context for the next model call. Context engineering in this scenario means instructing the model when and how to use tools and then feeding the results back in. For example, an agent might have a rule: “If the user asks a math question, call the calculator tool.” After using it, the result (say 42) is inserted into the prompt: “Tool output: 42.” This requires formatting the tool output clearly and maybe adding a follow-up instruction like “Given this result, now answer the user’s question.” A lot of work in agent frameworks (LangChain, etc.) is essentially context engineering around tool use—giving the model a list of available tools, along with syntactic guidelines for invoking them, and templating how to incorporate results. The key is that you, the engineer, orchestrate this dialogue between the model and the external world.
Information formatting and packaging: We’ve touched on this, but it deserves emphasis. Often you have more info than fits or is useful to include fully. So you compress or format it. If your model is writing code and you have a large codebase, you might include just function signatures or docstrings rather than entire files, to give it context. If the user query is verbose, you might highlight the main question at the end to focus the model. Use headings, code blocks, tables—whatever structure best communicates the data. For example, rather than “User data: [massive JSON]… Now answer question.” you might extract the few fields needed and present “User’s Name: X, Account Created: Y, Last Login: Z.” This is easier for the model to parse and also uses fewer tokens. In short, think like a UX designer, but your “user” is the LLM—design the prompt for its consumption.
The impact of these techniques is huge. When you see an impressive LLM demo solving a complex task (say, debugging code or planning a multistep process), you can bet it wasn’t just a single clever prompt behind the scenes. There was a pipeline of context assembly enabling it.
For instance, an AI pair programmer might implement a workflow like:
<ol class="wp-block-list">
<li>Search the codebase for relevant code.</li>
<li>Include those code snippets in the prompt with the user’s request.</li>
<li>If the model proposes a fix, run tests in the background.</li>
<li>If tests fail, feed the failure output back into the prompt for the model to refine its solution.</li>
<li>Loop until tests pass.</li>
</ol>
Each step has carefully engineered context: The search results, the test outputs, etc., are each fed into the model in a controlled way. It’s a far cry from “just prompt an LLM to fix my bug” and hoping for the best.
<h2 class="wp-block-heading">The Challenge of Context Rot</h2>
As we get better at assembling rich context, we run into a new problem: Context can actually poison itself over time. This phenomenon, aptly termed “context rot” by developer <a href="https://news.ycombinator.com/item?id=44308711#44310054" target="_blank" rel="noreferrer noopener">Workaccount2</a> on Hacker News, describes how context quality degrades as conversations grow longer and accumulate distractions, dead-ends, and low-quality information.
The pattern is frustratingly common: You start a session with a well-crafted context and clear instructions. The AI performs beautifully at first. But as the conversation continues—especially if there are false starts, debugging attempts, or exploratory rabbit holes—the context window fills with increasingly noisy information. The model’s responses gradually become less accurate and more confused, or it starts hallucinating.
<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="468" height="468" src="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/The-Challenge-of-Context-Rot.png" alt="The challenge of context rot" class="wp-image-17292" srcset="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/The-Challenge-of-Context-Rot.png 468w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/The-Challenge-of-Context-Rot-300x300.png 300w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/The-Challenge-of-Context-Rot-160x160.png 160w" sizes="auto, (max-width: 468px) 100vw, 468px" /></figure>
Why does this happen? Context windows aren’t just storage—they’re the model’s working memory. When that memory gets cluttered with failed attempts, contradictory information, or tangential discussions, it’s like trying to work at a desk covered in old drafts and unrelated papers. The model struggles to identify what’s currently relevant versus what’s historical noise. Earlier mistakes in the conversation can compound, creating a feedback loop where the model references its own poor outputs and spirals further off track.
This is especially problematic in iterative workflows—exactly the kind of complex tasks where context engineering shines. Debugging sessions, code refactoring, document editing, or research projects naturally involve false starts and course corrections. But each failed attempt leaves traces in the context that can interfere with subsequent reasoning.
Practical strategies for managing context rot include:
<ul class="wp-block-list">
<li>Context pruning and refresh: Workaccount2’s solution is “I work around it by regularly making summaries of instances, and then spinning up a new instance with fresh context and feed in the summary of the previous instance.” This approach preserves the essential state while discarding the noise. You’re essentially doing garbage collection for your context.</li>
<li>Structured context boundaries: Use clear markers to separate different phases of work. For example, explicitly mark sections as “Previous attempts (for reference only)” versus “Current working context.” This helps the model understand what to prioritize.</li>
<li>Progressive context refinement: After significant progress, consciously rebuild the context from scratch. Extract the key decisions, successful approaches, and current state, then start fresh. It’s like refactoring code—occasionally you need to clean up the accumulated cruft.</li>
<li>Checkpoint summaries: At regular intervals, have the model summarize what’s been accomplished and what the current state is. Use these summaries as seeds for fresh context when starting new sessions.</li>
<li>Context windowing: For very long tasks, break them into phases with natural boundaries where you can reset context. Each phase gets a clean start with only the essential carry-over from the previous phase.</li>
</ul>
This challenge also highlights why “just dump everything into the context” isn’t a viable long-term strategy. Like good software architecture, good context engineering requires intentional information management—deciding not just what to include but also when to exclude, summarize, or refresh.
<hr class="wp-block-separator has-alpha-channel-opacity is-style-wide"/>
AI tools are quickly moving beyond chat UX to sophisticated agent interactions. Our upcoming AI Codecon event, Coding for the Agentic World, will highlight how developers are already using agents to build innovative and effective AI-powered experiences. We hope you’ll join us on September 9 to explore the tools, workflows, and architectures defining the next era of programming. It’s free to attend. <a href="https://www.oreilly.com/AgenticWorld/">Register now to save your seat</a>.

]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/context-engineering-bringing-engineering-discipline-to-prompts-part-2/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>People Work in Teams, AI Assistants in Silos</title>
<link>https://www.oreilly.com/radar/people-work-in-teams-ai-assistants-in-silos/</link>
<comments>https://www.oreilly.com/radar/people-work-in-teams-ai-assistants-in-silos/#respond</comments>
<pubDate>Fri, 15 Aug 2025 11:37:25 +0000</pubDate>
<dc:creator><![CDATA[Tim O’Reilly]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Deep Dive]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=17280</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/People-work-in-teams-AI-in-silos.jpg"
medium="image"
type="image/jpeg"
/>
<description><![CDATA[As I was waiting to start a recent episode of Live with Tim O’Reilly, I was talking with attendees in the live chat. Someone asked, “Where do you get your up-to-date information about what’s going on in AI?” I thought about the various newsletters and publications I follow but quickly realized that the right answer […]]]></description>
<content:encoded><![CDATA[
As I was waiting to start a recent episode of <a href="https://www.oreilly.com/live/live-with-tim/" target="_blank" rel="noreferrer noopener">Live with Tim O’Reilly</a>, I was talking with attendees in the live chat. Someone asked, “Where do you get your up-to-date information about what’s going on in AI?” I thought about the various newsletters and publications I follow but quickly realized that the right answer was “some chat groups that I am a part of.” Several are on WhatsApp, and another on Discord. For other topics, there are some Signal group chats. Yes, the chats include links to various media sources, but they are curated by the intelligence of the people in those groups, and the discussion often matters more than the links themselves.
Later that day, I asked my 16-year-old grandson how he kept in touch with his friends. “I used to use Discord a lot,” he said, “but my friend group has now mostly migrated to WhatsApp. I have two groups, one with about 8 good friends, and a second one with a bigger group of about 20.” The way “friend group” has become part of the language for younger people is a tell. Groups matter.
A WhatsApp group is also how I keep in touch with my extended family. (Actually, there are several overlapping family groups, each with a slightly different focus and set of active members.) And there’s a Facebook group that my wife and I use to keep in touch with neighbors in the remote town in the Sierra Nevada where we spend our summers.
I’m old enough to remember the proto-internet of the mid-1980s, when Usenet groups were how people shared information, formed remote friendships, and built communities of interest. Email, which grew up as a sibling of Usenet, also developed some group-forming capabilities. Listservs (mailing list managers) were and still are a thing, but they were a sideshow compared to the fecundity of Usenet. Google Groups remains as a 25-year-old relic of that era, underinvested in and underused.
Later on, I used Twitter to follow the people I cared about and those whose work and ideas I wanted to keep up with. After Twitter made it difficult to see the feed of people I wanted to follow, replacing it by default with a timeline of suggested posts, I pretty much stopped using it. I still used Instagram to follow my friends and family; it used to be the first thing I checked every morning when my grandchildren were little and far away. But now, the people I want to follow are hard to find there too, buried by algorithmic suggestions, and so I visit the site only intermittently. <a href="https://www.tractionsoftware.com/db/attachments/blog/2804/4/Weblogs-Grow-Up-Traction-CShirky-Release10May2003.pdf" target="_blank" rel="noreferrer noopener">Social software</a> (the original name that Clay Shirky gave to applications like FriendFeed and systems like RSS that allow a user to curate a list of “feeds” to follow) gave way to social media. A multiplexed feed of content from the people I have chosen is social software, group-forming and empowering to individuals; an algorithmically curated feed of content that someone else thinks I will like is social media, divisive and disempowering.
<figure class="wp-block-pullquote"><blockquote>“What are some tips on dealing with the fact that we are currently working in teams, but in silos of individual AI assistants?”</blockquote></figure>
For technology to do its best work for people, it has to provide support for groups. They are a fundamental part of the human social experience. But serving groups is hard. Consumer technology companies discover this opportunity, then abandon it with regularity, only for someone else to discover it again. We’ve all had this experience, I think. I am reminded of a marvelous passage from the Wallace Stevens’s poem “<a href="https://web.english.upenn.edu/~cavitch/pdf-library/Stevens_Esthe%CC%81tique_du_mal.pdf" target="_blank" rel="noreferrer noopener">Esthétique du Mal</a>”:
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
The tragedy, however, may have begun, Again, in the imagination’s new beginning, In the yes of the realist spoken because he must Say yes, spoken because under every no Lay a passion for yes that had never been broken.
</blockquote>
There is a passion for groups that has never been broken. We’re going to keep reinventing them until every platform owner realizes that they are an essential part of the landscape and sticks with them. They are not just a way to attract users before abandoning them as part of the cycle of <a href="https://en.wikipedia.org/wiki/Enshittification" target="_blank" rel="noreferrer noopener">enshittification</a>.
There is still a chance to get this right for AI. The imagination’s new beginning is cropping up at all levels, from LLMs themselves, where the advantages of hyperscaling seem to be slowing, reducing the likelihood of a winner-takes-all outcome, to protocols like MCP and A2A, to AI applications for teams.
<h2 class="wp-block-heading">AI Tooling for Teams?</h2>
In the enterprise world, there have long been products explicitly serving the needs of teams (i.e., groups), from Lotus Notes through SharePoint, Slack, and Microsoft Teams. 20 years ago, Google Docs kicked off a revolution that turned document creation into a powerful kind of group collaboration tool. Git and GitHub are also a powerful form of groupware, one so fundamental that software development as we know it could not operate without it. But so far, AI model and application developers largely seem to have ignored the needs of groups, despite their obvious importance. As Claire Vo put it to me in one recent conversation, “AI coding is still largely a single-player game.”
It is possible to share the output of AI, but most AI applications are still woefully lacking in the ability to collaborate during the act of creation. As one attendee asked on <a href="https://www.oreilly.com/radar/the-future-of-product-management-is-ai-native/" target="_blank" rel="noreferrer noopener">my recent Live with Tim O’Reilly episode with Marily Nika</a>, “What are some tips on dealing with the fact that we are currently working in teams, but in silos of individual AI assistants?” We are mostly limited to sharing our chats or the outputs of our AI work with each other by email or link. Where is the shared context? The shared workflows? Claire’s <a href="https://chatprd.ai/" target="_blank" rel="noreferrer noopener">ChatPRD</a> (AI for product management) apparently has an interface designed to support teams, and I have been told that Devin has some useful collaborative features, but as of yet, there is no full-on reinvention of AI interfaces for multiplayer interactions. We are still leaning on external environments like GitHub or Google Docs to make up for the lack of native collaboration in AI workflows.
We need to reinvent sharing for AI in the same way that Sam Schillace, Steve Newman, and Claudia Carpenter turned the office productivity world on its head back in 2005 with the <a href="https://en.wikipedia.org/wiki/Google_Docs" target="_blank" rel="noreferrer noopener">development of Writely</a>, which became Google Docs. It’s easy to forget (or for younger people never to know) how painful collaborative editing of documents used to be, and just how much the original Google Docs team got right. Not only did they make user control of sharing central to the experience; they also made version control largely invisible. Multiple collaborators could work on a document simultaneously and magically see each others’ work reflected in real time. Document history and the ability to revert to earlier versions is likewise seamless.
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
On August 26, I’ll be <a href="https://www.oreilly.com/live/live-with-tim/" target="_blank" rel="noreferrer noopener">chatting with Sam Schillace, Steve Newman, and Claudia Carpenter</a> on Live with Tim O’Reilly. We’ll be celebrating the 20th anniversary of Writely/Google Docs and talking about how they developed its seamless sharing, and what that might look like today for AI.
</blockquote>
What we really need is the ability to share context among a group. And that means not just a shared set of source documents but also a shared history of everyone’s interactions with the common project, and visibility into the channels by which the group communicates with each other about it. As Steve Newman wrote to me, “If I’m sharing that particular AI instance with a group, it should have access to the data that’s relevant to the group.”
In this article, I’m going to revisit some past attempts at designing for the needs of groups and make a few stabs at thinking out loud about them as provocations for AI developers.
<h2 class="wp-block-heading">Lessons from the Unix Filesystem</h2>
Maybe I’m showing my age, but so many ideas I keep going back to come from the design of the Unix operating system (later Linux.) But I’m not the only one. Back in 2007, the ever-insightful Marc Hedlund <a href="https://kottke.org/07/03/public-and-permanent" target="_blank" rel="noreferrer noopener">wrote</a>:
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
One of my favorite business model suggestions for entrepreneurs is, find an old UNIX command that hasn’t yet been implemented on the web, and fix that. talk and finger became ICQ, LISTSERV became Yahoo! Groups, ls became (the original) Yahoo!, find and grep became Google, rn became Bloglines, pine became Gmail, mount is becoming S3, and bash is becoming Yahoo! Pipes. I didn’t get until tonight that Twitter is wall for the web. I love that.
</blockquote>
I have a similar suggestion for AI entrepreneurs. Yes, rethink everything for AI, but figure out what to keep as well as what to let go. History can teach us a lot about what patterns are worth keeping. This is especially important as we explore <a href="https://www.oreilly.com/radar/an-architecture-of-participation-for-ai/" target="_blank" rel="noreferrer noopener">how to make AI more participatory and less monolithic</a>.
The Unix filesystem, which persists through Linux and is thus an integral part of the underlying architecture of the technological world as we know it, had a way of thinking about file permissions that is still relevant in the world of AI. (The following brief description is for those who are unfamiliar with the Unix/Linux filesystem. Feel free to skip ahead.)
Every file is created with a default set of permissions that control its access and use. There are separate permissions specified for user, group, and world: A file can be private so that only the person who created it can read and/or write to it, or if it is an executable file such as a program, run it. A file can belong to a group, identified by a unique numeric group ID in a system file that names the group, gives it that unique numeric ID and an optional encrypted group password, and lists the members who can read, write, or execute files belonging to it. Or a file can have “world” access, in which anyone can read and potentially write to it or run it. Every file thus not only has an associated owner (usually but not always the creator) but potentially also an associated group owner, who controls membership in the group.
This explicit framing of three levels of access seems important, rather than leaving group access as something that is sometimes available and sometimes not. I also like that Unix had a “little language” (<a href="https://en.wikipedia.org/wiki/Umask" target="_blank" rel="noreferrer noopener">umask</a> and <a href="https://en.wikipedia.org/wiki/Chmod" target="_blank" rel="noreferrer noopener">chmod</a>) for compactly viewing or modifying the read/write/execute permissions for each level of access.
A file that is user readable and writable versus one that is, say, world readable but not writable is an easily understood distinction. But there’s this whole underexplored middle in what permissions can be given to members of associated groups. The chief function, as far as I remember it, was to allow for certain files to be editable or runnable only by members of a group with administrative access. But this is really only the tip of the iceberg of possibilities, as we shall see.
One of the drawbacks of the original Unix filesystem is that the members of groups had to be explicitly defined, and a file can only be assigned to one primary group at a time. While a user can belong to multiple groups, a file itself is associated with a single owning group. More modern versions of the system, like Linux, work around this limitation by providing Access Control Lists (ACLs), which make it possible to define specific permissions for multiple users and multiple groups on a single file or directory. Groups in systems like WhatsApp and Signal and Discord and Google Groups also use an ACL-type approach. Access rights are usually controlled by an administrator. This draws hard boundaries around groups and makes ad hoc group-forming more difficult.
<h2 class="wp-block-heading">Lessons from Open Source Software</h2>
People think that free and open source depend on a specific kind of license. I have always believed that while licenses are important, the essential foundation of open source software is the ability of groups to collaborate on shared projects. There are countless stories of software developed by collaborative communities—notably Unix itself—that came about despite proprietary licenses. Yes, the open source Linux took over from proprietary versions of Unix, but let’s not forget that the original development was done not just at Bell Labs but at the University of California, Berkeley and other universities and companies around the world. This happened despite AT&T’s proprietary license and long before Richard Stallman wrote the GNU Manifesto or Linus Torvalds wrote the Linux kernel.
There were two essential innovations that enabled distributed collaboration on shared software projects outside the boundaries of individual organizations.
The first is what I have called “the architecture of participation.” Software products that are made up of small cooperating units rather than monoliths are easier for teams to work on. When we were interviewing Linus Torvalds for our 1999 essay collection <a href="https://www.oreilly.com/library/view/open-sources/1565925823/" target="_blank" rel="noreferrer noopener">Open Sources</a>, he said something like “I couldn’t have written a new kernel for Windows even if I had access to the source code. The architecture just wouldn’t support it.” That is, Windows was monolithic, while Unix was modular.
We have to ask the question: <a href="https://www.oreilly.com/radar/an-architecture-of-participation-for-ai/" target="_blank" rel="noreferrer noopener">What is the architecture of participation for AI?</a>
Years ago, I wrote the first version of the Wikipedia page about Kernighan and Pike’s book <a href="https://en.wikipedia.org/wiki/The_Unix_Programming_Environment" target="_blank" rel="noreferrer noopener">The Unix Programming Environment</a> because that book so fundamentally shaped my view of the programming world and seemed like it had such profound lessons for all of us. Kernighan and Pike wrote:
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
Even though the UNIX system introduces a number of innovative programs and techniques, no single program or idea makes it work well. Instead, what makes it effective is the approach to programming, a philosophy of using the computer. Although that philosophy can’t be written down in a single sentence, at its heart is the idea that the power of a system comes more from the relationships among programs than from the programs themselves. Many UNIX programs do quite trivial things in isolation, but, combined with other programs, become general and useful tools.
</blockquote>
What allowed that combination is the notion that every program produced its output as ASCII text, which could then be consumed and transformed by other programs in a pipeline, or if necessary, redirected into a file for storage. The behavior of the programs in the pipeline could be modified by a series of command line flags, but the most powerful features came from the transformations made to the data by a connected sequence of small utility programs with distinct powers.
Unix was the first operating system designed by a company that was, at its heart, a networking company. Unix was all about the connections between things, the space between. The small pieces loosely joined, end-to-end model became the paradigm for the internet as well and shaped the modern world. It was easy to participate in the collaborative development of Unix. New tools could be added without permission because the rules for cooperating applications were already defined.
MCP is a fresh start on creating an architecture of participation for AI at the macro level. The way I see it, pre-MCP the model for applications built with AI was hub-and-spoke. That is, we were in <a href="https://qz.com/1540608/the-problem-with-silicon-valleys-obsession-with-blitzscaling-growth" target="_blank" rel="noreferrer noopener">a capital-fueled race</a> for the leading AI model to become the centralized platform on which most AI applications would be built, much like Windows was the default platform in the PC era. The agentic vision of MCP is a networked vision, much like Unix, in which small, specialized tools can be combined in a variety of ways to accomplish complex tasks.
(Even pre-MCP, we saw this pattern at work in AI. What is <a href="https://en.wikipedia.org/wiki/Retrieval-augmented_generation" target="_blank" rel="noreferrer noopener">RAG</a> but a pipeline of cooperating programs?)
Given the slowdown in progress in LLMs, with most leading models clustering around similar benchmarks, including many open source/open weight models that can be customized and run by corporations or even individual users, we are clearly moving toward a distributed AI future. MCP provides a first step toward the communications infrastructure of this multipolar world of cooperating AIs. But we haven’t thought deeply enough about a world without gatekeepers, where the permissions are fluid, and group-forming is easy and under user control.
<figure class="wp-block-image size-full is-resized"><a href="https://www.oreilly.com/AgenticWorld/" target="_blank" rel=" noreferrer noopener"><img loading="lazy" decoding="async" width="689" height="213" src="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/AI-Codecon-September-9-2025.png" alt="AI Codecon, September 9, 2025" class="wp-image-17281" style="width:840px;height:auto" srcset="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/AI-Codecon-September-9-2025.png 689w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/AI-Codecon-September-9-2025-300x93.png 300w" sizes="auto, (max-width: 689px) 100vw, 689px" /></a><figcaption class="wp-element-caption">The future of cooperating agents is the subject of the second of our free AI Codecon conferences about the future of programming, <a href="https://www.oreilly.com/AgenticWorld/" target="_blank" rel="noreferrer noopener">Coding for the Future Agentic World</a>, to be held September 9. Addy Osmani and I are cohosting, and we’ve got <a href="https://learning.oreilly.com/live-events/ai-codecon-coding-for-the-future-agentic-world/0642572207748/" target="_blank" rel="noreferrer noopener">an amazing lineup of speakers</a>. We’ll be exploring agentic interfaces beyond chat UX; how to chain agents across environments to complete complex tasks; asynchronous, autonomous code generation in production; and the infrastructure enabling the agentic web, including MCP and agent protocols.</figcaption></figure>
There was a second essential foundation for the collaborative development of Unix and other open source software, and that was version control. Marc Rochkind’s 1972 SCCS (Source Code Control System), which he originally wrote for the IBM System/370 operating system but quickly ported to Unix, was <a href="https://en.wikipedia.org/wiki/Source_Code_Control_System" target="_blank" rel="noreferrer noopener">arguably the first version control system</a>. It pioneered the innovation (for the time) of storing only the differences between two files, not a complete new copy. It wasn’t released publicly till 1977, and was succeeded by a number of improved source code control systems over the years. Git, developed by Linux creator Linux Torvalds in 2005, has been the de facto standard for the last 20 years.
The earliest source code repositories were local, and change files were sent around by email or Usenet. (Do you remember <a href="https://en.wikipedia.org/wiki/Patch_(Unix)" target="_blank" rel="noreferrer noopener">patch</a>?) Git was a creature of the internet era, where everything could be found online, and so it soon became the basis of one of the web’s great assemblages of collective intelligence. GitHub, created in 2008 by Tom Preston-Werner, Chris Wanstrath, P. J. Hyett, and Scott Chacon, turned the output of the entire software industry into a shared resource, segmented by an inbuilt architecture of user, group, and world. There are repositories that represent the work of one author, and there are others that are the work of a community of developers.
Explicit check-ins, forks, and branches are the stuff of everyday life for the learned priesthood of software developers. And increasingly, they are stuff of everyday life for the agents that are part of the modern AI-enabled developer tools. It’s easy to forget just how much GitHub is the substrate of the software development workflow, as important in many ways as the internet itself.
But clearly there is work to be done. How might version control come to a new flowering in AI? What features would make it easier for a group, not just an individual, to have a shared conversation with an AI? How might a group collaborate in developing a large software project or other complex intellectual work? This means figuring out a lot about memory, how versions of the past are not consistent, how some versions are more canonical than others, and what a gift it is for users to be able to roll back to an earlier state and go forward from there.
<h2 class="wp-block-heading">Lessons from Google Docs</h2>
Google Docs and similar applications are another great example of version control at work, and there’s a lot to learn from them. Given that the promise of AI is that everyone, not just the learned few, may soon be able to develop complex bespoke software, version control for AI will need to have the simplicity of Google Docs and other office productivity tools inspired by it as well as the more powerful mechanisms provided by formal version control systems like Git.
One important distinction between the kind of version control and group forming that is enabled by GitHub versus Google Docs is that GitHub provides a kind of exoskeleton for collaboration, while Google docs internalizes it. Each Google Docs file carries within it the knowledge of who can access it and what actions that they can take. Group forming is natural and instantaneous. I apologize for subjecting you to yet another line from my favorite poet Wallace Stevens, but in Google Docs and its siblings, access permissions and version control are “<a href="http://derekdenton.com/2017/04/04/2017-4-3-an-ordinary-evening-in-new-haven-by-wallace-stevens/" target="_blank" rel="noreferrer noopener">a part of the [thing] itself and not about it</a>.”
Much like in the Unix filesystem, a Google doc may be private, open to a predefined group (e.g., all employees with <a href="http://oreilly.com" target="_blank" rel="noreferrer noopener">oreilly.com</a> addresses), or open to anyone. But it also provides a radical simplification of group formation. Inviting someone to collaborate on a Google doc—to edit, comment, or merely read it—creates an ad hoc group centered on that document.
<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" width="924" height="396" src="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Google-Docs-Ad-Hoc-Group.png" alt="Google docs ad hoc group" class="wp-image-17282" style="width:790px;height:auto" srcset="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Google-Docs-Ad-Hoc-Group.png 924w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Google-Docs-Ad-Hoc-Group-300x129.png 300w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Google-Docs-Ad-Hoc-Group-768x329.png 768w" sizes="auto, (max-width: 924px) 100vw, 924px" /></figure>
My aspiration for groups in AI is that they have the seamless ad hoc quality of the community of contributors to a Google doc. How might our interactions with AI be different if we were no longer sharing a fixed output but the opportunity for cocreation? How might an ad hoc group of collaborators include not only humans but their AI assistants? What is the best way for changes to be tracked when those changes include not just explicit human edits to AI output but revised instructions to recreate the AI contribution?
Maybe Google already has a start on a shared AI environment for groups. <a href="http://notebook.lm" target="_blank" rel="noreferrer noopener">NotebookLM</a> is built on the substrate of Google Drive, which inherited its simple but robust permissions architecture from Google Docs. I’d love to see the team there spend more time thinking through how to apply the lessons of Google Docs to <a href="https://notebooklm.google.com/" target="_blank" rel="noreferrer noopener">NotebookLM</a> and other AI interfaces. Unfortunately, the NotebookLM team seems to be focused on making it into an aggregator of Notebooks rather than providing it as an extension of the collaborative infrastructure of Google Workspace. This is a missed opportunity.
<h2 class="wp-block-heading">Core Versus Boundary</h2>
A group with enumerated members—say, the employees of a company—has a boundary. You are in or out. So do groups like citizens of a nation, the registered users of a site or service, members of a club or church, or professors at a university as distinct from students, who may themselves be divided into undergraduates and grad students and postdocs. But many social groups have no boundary. Instead, they have a kind of gravitational core, like a solar system whose gravity extends outward from its dense core, attenuating but never quite ending.
<figure class="wp-block-image is-resized"><img decoding="async" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdER0SGt17WNiLYvwtEjTS0H8LI34bVk5YYGUxNc1MwxqyMDQWzFpAqs-YwzxY2vqAENUZgrOSJN_ad_GOf2jFcyVrc6LFF1RttgUmCD8H9wI-MFXr3e2tdyF0xVJaNp6BIWUw0?key=lvx07zVq8JDbV1uX4N-LPYIp" alt="Image of gravitational core
Image generated by Google Imagen via Gemini 2.5
" style="width:840px;height:auto"/></figure>
Image generated by Google Imagen via Gemini 2.5
I know this is a fanciful metaphor, but it is useful.
The fact that ACLs work by drawing boundaries around groups is a serious limitation. It’s important to make space for groups organized around a gravitational core. A public Google group or a public Google doc open to access for anyone with the link or a Signal group with shareable invite links (versus the targeted invitations to a WhatsApp group) draws in new users by the social equivalent to the way a dense body deforms the space around it, pulling them into its orbit.
I’m not sure what I’m entirely asking for here. But I am suggesting that any AI system focused on enabling collaboration take the Core versus Boundary pattern into account. Design systems that can have a gravitational core (i.e., public access with opt-in membership), not just mechanisms for creating group boundaries with defined membership.
<h2 class="wp-block-heading">The Tragedy Begins Again?</h2>
The notion of the follow, which originally came from <a href="https://en.wikipedia.org/wiki/RSS" target="_blank" rel="noreferrer noopener">RSS</a> and was later widely adopted in the timelines of Twitter, Facebook, and other social media apps, provides an instructive take on the Core pattern.
“Following” inverts the membership in a group by taking output that is world-readable and curating it into a user-selected group. We take this for granted, but the idea that there can be billions of people posting to Facebook, and that each of them can have an individual algorithmically curated feed of content from a small subset of the other billions of users, only those whom they chose, is truly astonishing. This is a group that is user specified but with the actual content dynamically collected by the platform on behalf of the user trillions of times a day. “@mentions” even allow users to invite people into their orbit, turning any given post into the kind of ad hoc group that we see with Google Docs. Hashtags allow them to invite others in by specifying a core of shared interests.
And of course, in social media, you can also see the tragedy that Wallace Stevens spoke of. The users, each at the bottom of their personal gravity well, had postings from the friends they chose drawn to them by the algorithmic curvature of space, so to speak, when suddenly, a great black hole of suggested content came in and disrupted the dance of their chosen planets.
A group can be defined either by its creator (boundary) or collectively by its members (core). If those who control internet applications forget that groups don’t belong to them but to their creators, the users are forced to migrate elsewhere to recreate the community that they had built but have now lost.
I suspect that there is a real opportunity for AI to recreate the power of this kind of group forming, displacing those who have put their own commercial preferences ahead of those of their users. But that opportunity can’t be taken for granted. The race to load all the content into massive models in the race for superintelligence started out with homogenization on a massive scale, dwarfing even the algorithmically shaped feeds of social media. Once advertising enters the mix, there will be strong incentives for AI platforms too to place their own preferences ahead of those of their users. Given the enormous capital required to win the AI race, the call to the dark side will be strong. So we should fear a centralized AI future.
Fortunately, the fevered dreams of the hyperscalers are beginning to abate as progress slows (though the hype still continues apace.) Far from being a huge leap forward, GPT-5 appears to have made the case that progress is leveling off. It appears that AI may be a “<a href="https://learning.oreilly.com/live-events/live-with-tim-oreilly-a-conversation-with-princetons-arvind-narayanan/0642572218294/" target="_blank" rel="noreferrer noopener">normal technology</a>” after all, not a singularity. That means that we can expect continued competition.
The best defense against this bleak future is to build the infrastructure and capabilities for a distributed AI alternative. How can we bring that into the world? It can be informed by these past advances in group collaboration, but it will need to find new pathways as well. We are starting a long process by which (channeling Wallace Stevens again) we “searches the possible for its possibleness.” I’d love to hear from developers who are at the forefront of that search, and I’m sure others would as well.
Thanks to Alex Komoroske, Claire Vo, Eran Sandler, Ilan Strauss, Mike Loukides, Rohit Krishnan, and Steve Newman for helpful comments during the development of this piece.
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/people-work-in-teams-ai-assistants-in-silos/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>Taming the Delightful Chaos</title>
<link>https://www.oreilly.com/radar/taming-the-delightful-chaos/</link>
<comments>https://www.oreilly.com/radar/taming-the-delightful-chaos/#respond</comments>
<pubDate>Thu, 14 Aug 2025 10:11:08 +0000</pubDate>
<dc:creator><![CDATA[Q McCallum]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=17271</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/2048px-NY_stock_exchange_traders_floor_LC-U9-10548-6.jpg"
medium="image"
type="image/jpeg"
/>
<custom:subtitle><![CDATA[What the Computerization of Wall Street Can Teach Us About AI]]></custom:subtitle>
<description><![CDATA[If you want to make the most of The Field We Now Call AI, look to trading. Specifically, the tech-driven sort. People who’ve read my other work, or who have had the misfortune of speaking with me one-on-one, have already heard this line. My long-running half-joke is that my AI consulting is based on best […]]]></description>
<content:encoded><![CDATA[
If you want to make the most of The Field We Now Call AI, look to trading. Specifically, the tech-driven sort.
People who’ve read my other work, or who have had the misfortune of speaking with me one-on-one, have already heard this line. My long-running half-joke is that my AI consulting is based on best practices I picked up from trading way back when.
I say this with good reason. Modern trading—for brevity, I’ll lump algo(rithmic), electronic, quant(itative) finance, and any other form of Throwing Computers at the Stock Market under the umbrella of “algo trading”—applies data analysis and mathematical modeling to business pursuits. It’s full of hard-learned lessons that you can and should borrow for data work in other domains, even if your industry exists far afield of the financial markets. You can always ask, “How would algo trading handle this modeling issue/account for errors in this data pipeline/connect this analysis work to the business model?”
More recently I’ve been thinking about algo trading’s origin story. Which has led me to ask:
What can the computerization of Wall Street tell us about the rise of AI in other domains?
The short version is that the computers arrived and trading changed forever. But the truth is far more nuanced. Companies that internalize the deeper lessons from that story are poised to win out with AI—all of data science, ML/AI, and GenAI.
Let’s start with an abbreviated, slightly oversimplified history of technology in trading.
<h2 class="wp-block-heading">An Abbreviated History of the Delightful Chaos</h2>
At its core, trading is a simple matter of buy low, sell high: buy some shares of stock; wait for their price to go up; sell those shares; profit.
This is when you’ll point out that there are more complicated approaches which juggle shares from multiple companies…and that short-selling reverses the order to “sell high, buy low”…plus you have derivatives and all that… And I would agree with you. Those products and techniques certainly exist! But deep down, they are all expressions of “buy low, sell high.”
The mechanics of trading amount to strategy, matching, and execution:
Your trading strategy defines what shares you’ll buy, when to buy them, and when to sell. It can be as innumerate as “buy when the CEO wears black shoes, sell when they wear brown shoes.” It can involve deep industry research that tells you to move when the price exceeds some value X. Maybe you plot some charts to look for trends. Or you take that charting to the next level by building crazy mathematical models. However you devise your trading strategy, it’s all about the numbers: how many shares and at what price. You’re watching movements of share prices and you’re reacting to them, usually with great haste.
On the other side of strategy we have order matching and trade execution. Here’s where you pair up people who want to buy or sell, and then place those orders, respectively. In the olden days, matching and execution took place through “open outcry” or “pit” trading: people in a large, arena-like room (the pit) bought and sold shares through shouting (hence “outcry”) and hand signals (occasionally, the “catching hands” kind of signal). You watched prices on big screens and took orders by phone. Your location in the pit was key, as was your height in some cases, because you needed the right people to see you at the right time. Pit traders will tell you that it was loud and frenetic—like a sports match, except that every action involved money changing hands. Oh yes, and a lot of this was recorded on paper tickets. Messy handwriting and mishearing things led to corrections after-hours.
Computerization of these activities was a three-decade process—a slow start but a rousing finish. It began in the 1970s with early-day NASDAQ publishing prices electronically. (To drive the point home, note that the last two letters stand for “Automated Quotation.” You now have extra trivia for your next party conversation. You’re welcome.) Then came the UK’s 1986 “Big Bang” shift to electronic trading. Things really picked up in the 1990s through the early 2000s, which saw much wider-scale use of electronic quoting and orders. Then came <a href="https://www.tradersmagazine.com/news/a-tick-too-far-how-decimalization-changed-the-industry-and-why-its-a-hot-topic-again/" target="_blank" rel="noreferrer noopener">decimalization</a> and <a href="https://www.forbes.com/sites/realspin/2014/04/29/memo-to-michael-lewis-the-excesses-of-high-speed-trading-are-a-direct-result-of-sec-micromanagement/#619b98de454e" target="_blank" rel="noreferrer noopener">REG-NMS</a>, which further encouraged computerized order matching and execution.
Combined, this led to a world in which you could get up-to-the minute share price data, find a counterparty with which to trade, and place orders—all without heading to (or calling someone in) the pit. Without hand signals. Without jumping up and down to be seen. Without the risk of fisticuffs.
From there, “pull in price data by computer” and “place orders by computer” logically progressed to “hire rocket scientists who’ll build models to determine trading strategy based on massive amounts of data.” And to top it off, remember that all of this electronic activity was taking place at, well, computer speeds.
Pit traders simply couldn’t keep up. And they were eventually pushed out. Open outcry trading is pretty much gone, and the role of “trader” has shifted to “person who builds or configures machines that operate in the financial markets.”
<h2 class="wp-block-heading">Understanding the Why</h2>
From a distance, it’s easy to write this off as “the computers showed up and the humans were gone. End of story.” Or even “the computers won simply because they were faster.” That’s the scenario AI-hopeful execs have in mind, but it’s far more complicated than that. It helps to understand why the bots took over.
I wrote a <a href="https://newsletter.complex-machinery.com/archive/017-stacking-the-deck/" target="_blank" rel="noreferrer noopener">short take on this</a> last year:
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
Trading is a world awash in numbers, analyses, and pattern-finding. In the pre-technology era, humans did this work just fine. But then computers arrived, doing the math better, faster, at a larger scale, and without catching a case of nerves. Code could react to market data changes so quickly that network bandwidth, not processor speed, became the limiting factor. In every aspect of the game—from parsing price data to analyzing correlations to placing orders—humans found themselves outpaced.
</blockquote>
I’ll pause here to explain that trading happens in a marketplace. There are other participants, among whom there’s an element of competition (uncovering price shifts before anyone else and then moving the fastest on those discoveries) but also cooperation (as the person buying and the person selling both want to move quickly). That lent itself well to network effects, because once one group started using computers to parse market data and place orders, other groups wanted to join in and so they got their own. The traders who were still dealing in paper and hand signals weren’t so much competing with computers but with other traders who were using computers.
Continuing from that earlier write-up:
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
To understand what this meant for 1990s-era traders, imagine you’re a chess pro sitting down for a game. Except the board now extends to fifty dimensions and your opponent can make multiple moves without waiting for you to finish your turn. They react to your confused facial expression by explaining: the pieces could always do this; you just weren’t able to move them that way. That was the shift from open-outcry (“pit”) trading to the electronic variety. Human actors were displaced overnight. It just took them another few years to accept.
</blockquote>
That sentence in bold gets to the core of why computerization was a runaway success. The desire for speed was always there. The desire for consistency under pressure was always there. The desire to find meaningful patterns in the mountains of pricing data was always there. We just couldn’t do that till computers came along. People figured out that computers could consistently, dispassionately multitask on market matters while crunching massive amounts of data.
From that perspective, computers didn’t really take human jobs—humans were doing jobs that were meant for computers, before computers were available.
Computers and trading made for a perfect marriage.
Well, almost.
<h2 class="wp-block-heading">It’s Not All Roses</h2>
All of these computers jockeying for position, operating at machine speeds, introduced new opportunities but also new risk exposures. New problems cropped up, notable for both their magnitude and ubiquity: high-speed cheating, like order spoofing; flash crashes; bots going out of control… Traders and exchanges alike implemented new testing and safety procedures—layers upon layers of risk management practices—as a matter of survival. It was the only way to reap the rewards of using bots while closing off sources of ruin.
Tech-related incidents still happen, like the <a href="https://qethanm.cc/2023/08/01/the-origins-of-an-incident-knight-capital/" target="_blank" rel="noreferrer noopener">2012 Knight Capital meltdown</a>. And bad actors still get away with things now and then. But when you consider the size and scale of the model-driven, electronically traded financial markets, the problems are relatively few. Especially since every incident is taken as a learning experience, leading traders and exchanges to institute new policies that discourage similar problems from cropping up down the road.
Frankly, the most notorious incidents in finance—like the 2008 mortgage crisis or the self-destruction of hedge fund <a href="https://en.wikipedia.org/wiki/Long-Term_Capital_Management" target="_blank" rel="noreferrer noopener">LTCM</a>—were rooted not in technology but in human nature: greed, hubris, and people choosing to oversimplify or misinterpret risk metrics like <a href="https://www.investopedia.com/terms/v/var.asp" target="_blank" rel="noreferrer noopener">VaR</a>. The computerization of trading has mostly been positive.
<h2 class="wp-block-heading">Learning from the Lessons</h2>
That trip through trading history brings us right back to where I started this piece:
If you want to make the most of The Field We Now Call AI, look to trading. Specifically, the tech-driven sort.
The move from the pits to computerized trading holds lessons for today’s world of AI. If you’re an executive who dreams of replacing human headcount with AI bots, you’d do well to consider the following:
Give the machines machine jobs. Notice how traders and exchanges applied computers to the work that was amenable to automation—matching, execution, market data, all that. The same holds for AI. That manual task may annoy you, but if AI isn’t capable of handling it just yet, it must remain a manual task.
Machines give you “faster”; you still need to figure out “better.” Does the AI solution provide an appreciable improvement over the manual approach? You’ll need to run tests—the kind where there is an objective, observable, independently verifiable definition of success—to figure this out. Importantly, you’ll need to run these tests before modifying your org chart.
The machines’ speed will multiply the number and scale of any errors. This includes the error of using AI where it’s a poor fit. Avoid doing the wrong thing, just faster.
This is of special concern in light of the wider adoption of AI-on-AI interactions, such as agents. One bot going out of control is bad enough. Multiple bots going out of control, while interacting with each other, can lead to a meltdown.
Technology still requires human experience. While bots have taken over the moment-to-moment stock market action, they’re built by teams of experts. The computers are useless unless backed up by your team’s collective domain knowledge, expertise, and safety practices.
Tune your risk/reward trade-off. Yes, you’ll want to develop controls and safeguards to protect yourself from the machines going off the rails. And you’ll need to think about this at every stage of the project, from conception to R&D to deployment and beyond. Yes.
Yes, and, you’ll want to think beyond your downside exposures to consider your upside gain. Well-placed AI can bring about massive returns on investment for your company. But only if you choose the AI projects for which the risk/reward trade-off plays in your favor.
You’re only in competition with yourself. Traders try to get ahead of each other, to detect price movements and place their orders before anyone else. And they place trades with one another, each taking a different side of the same bet (and hunting for counterparties who will make bad bets). But in the end, as a trader, you’re only in competition with yourself: “How did I do today, compared to yesterday? How do I avoid mishaps today, so I can do this again tomorrow?”
The same holds for your use of AI. Executives are under pressure—whether from their investors, their board, or simple FOMO as they read about what other companies are doing—to apply AI anywhere, everywhere. It’s best to look inside and figure out what AI can do for you, instead of trying to copycat the competition or using AI for AI’s sake.
<h2 class="wp-block-heading">What if…?</h2>
I opened with a question about algo trading, so it’s fitting that I close on one. To set the stage:
In the early days of data science—a good 15 years before GenAI came around—I hypothesized that traders and quants would do well in this field. It was a smaller and calmer version of what they were already doing, and they had internalized all kinds of best practices from their higher-stakes environment. “If Wall Street pay ever sinks low enough that those people leave,” I mused, “the data field will definitely change.”
Wall Street comp never sank far enough for that to happen. Which is good for the folks who still work in that field. But it also means I never got to thoroughly test my hypothesis. I still wonder, though:
What if more people with algo trading experience had entered the data science field early, and had spread their influence?
Imagine if, in the early to mid-2010s, a good portion of corporate data departments were built and staffed by former traders, quants, and similar finance professionals. Would we still see the meteoric rise of GenAI? Would companies be just as excited to throw AI at every possible problem? Or would we see a smaller, more focused, more effective use of data analysis in the pursuit of profit?
In the most likely alternate reality, the companies that genuinely need AI are doing well at it. Those that would have passed up on AI in our timeline come much closer to reaching their full AI potential here. In both cases the data team is deeply connected to, and focused on, the business mission. They adhere to metrics that allow them to track model performance. To that point, the use of those AI models is based on what those systems are capable of doing rather than what someone wishes they could do.
Importantly, these quant-run shops exhibit a stronger appreciation of risk-taking and risk management. I use those terms in the finance sense, which involves fine-tuning one’s risk/reward trade-off. You don’t just close off the downsides of using automated decision making; you aggressively pursue additional opportunities for upside gain. That involves rigorous testing during the R&D phase, plus plenty of human oversight once the models are running in production. It’s very much a matter of discipline. (Compare that to our timeline, in which the Move Fast and Break Things mindset has bolstered the Just Go Ahead and Do It approach.)
Interestingly enough, this alternate timeline still sports plenty of companies that use solely AI for the cool factor. There are just no quants or traders in those AI departments. Those people are finely attuned to using data in service of the business goal, so a frivolous use of AI sends them running for the exit. If they even join the company in the first place.
All in all, the companies in the alternate timeline that need AI are doing quite well. Those that don’t need AI, they’re still making the snake oil vendors very happy.
Today’s GenAI hype machine would certainly disagree with me. But I’ll point out that the GenAI hype doesn’t hold a candle to the tangible, widespread impact of the computerization of trading.
Food for thought.
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/taming-the-delightful-chaos/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>The Abstractions, They Are A-Changing</title>
<link>https://www.oreilly.com/radar/the-abstractions-they-are-a-changing/</link>
<comments>https://www.oreilly.com/radar/the-abstractions-they-are-a-changing/#respond</comments>
<pubDate>Tue, 12 Aug 2025 11:00:19 +0000</pubDate>
<dc:creator><![CDATA[Mike Loukides]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=17263</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Context-engineering-is-key.jpg"
medium="image"
type="image/jpeg"
/>
<custom:subtitle><![CDATA[We’re Beginning to Understand What’s Next]]></custom:subtitle>
<description><![CDATA[Since ChatGPT appeared on the scene, we’ve known that big changes were coming to computing. But it’s taken a few years for us to understand what they were. Now, we’re starting to understand what the future will look like. It’s still hazy, but we’re starting to see some shapes—and the shapes don’t look like “we won’t […]]]></description>
<content:encoded><![CDATA[
Since ChatGPT appeared on the scene, we’ve known that big changes were coming to computing. But it’s taken a few years for us to understand what they were. Now, we’re starting to understand what the future will look like. It’s still hazy, but we’re starting to see some shapes—and the shapes don’t look like “we won’t need to program any more.” But what will we need?
Martin Fowler recently <a href="https://martinfowler.com/articles/2025-nature-abstraction.html" target="_blank" rel="noreferrer noopener">described</a> the force driving this transformation as the biggest change in the level of abstraction since the invention of high-level languages, and that’s a good place to start. If you’ve ever programmed in assembly language, you know what that first change means. Rather than writing individual machine instructions, you could write in languages like Fortran or COBOL or BASIC or, a decade later, C. While we now have much better languages than early Fortran and COBOL—and both languages have evolved, gradually acquiring the features of modern programming languages—the conceptual difference between Rust and an early Fortran is much, much smaller than the difference between Fortran and assembler. There was a fundamental change in abstraction. Instead of using mnemonics to abstract away hex or octal opcodes (to say nothing of patch cables), we could write formulas. Instead of testing memory locations, we could control execution flow with for loops and if branches.
The change in abstraction that language models have brought about is every bit as big. We no longer need to use precisely specified programming languages with small vocabularies and syntax that limited their use to specialists (who we call “programmers”). We can use natural language—with a huge vocabulary, flexible syntax, and lots of ambiguity. The Oxford English Dictionary contains over 600,000 words; the last time I saw a complete English grammar reference, it was four very large volumes, not a page or two of BNF. And we all know about ambiguity. Human languages thrive on ambiguity; it’s a feature, not a bug. With LLMs, we can describe what we want a computer to do in this ambiguous language rather than writing out every detail, step-by-step, in a formal language. That change isn’t just about “vibe coding,” although it does allow experimentation and demos to be developed at breathtaking speed. And that change won’t be the disappearance of programmers because everyone knows English (at least in the US)—not in the near future, and probably not even in the long term. Yes, people who have never learned to program, and who won’t learn to program, will be able to use computers more fluently. But we will continue to need people who understand the transition between human language and what a machine actually does. We will still need people who understand how to break complex problems into simpler parts. And we will especially need people who understand how to manage the AI when it goes off course—when the AI starts generating nonsense, when it gets stuck on an error that it can’t fix. If you follow the hype, it’s easy to believe that those problems will vanish into the dustbin of history. But anyone who has used AI to generate nontrivial software knows that we’ll be stuck with those problems, and that it will take professional programmers to solve them.
The change in abstraction does mean that what software developers do will change. We have been writing about that for the past few years: more attention to testing, more attention to up-front design, more attention to reading and analyzing computer-generated code. The lines continue to change, as simple code completion turned to interactive AI assistance, which changed to agentic coding. But there’s a seismic change coming from the deep layers underneath the prompt and we’re only now beginning to see that.
A few years ago, everyone talked about “prompt engineering.” Prompt engineering was (and remains) a poorly defined term that sometimes meant using tricks as simple as “tell it to me with horses” or “tell it to me like I am five years old.” We don’t do that so much any more. The models have gotten better. We still need to write prompts that are used by software to interact with AI. That’s a different, and more serious, side to prompt engineering that won’t disappear as long as we’re embedding models in other applications.
More recently, we’ve realized that it’s not just the prompt that’s important. It’s not just telling the language model what you want it to do. Lying beneath the prompt is the context: the history of the current conversation, what the model knows about your project, what the model can look up online or discover through the use of tools, and even (in some cases) what the model knows about you, as expressed in all your interactions. The task of understanding and managing the context has recently become known as <a href="https://blog.langchain.com/the-rise-of-context-engineering/" target="_blank" rel="noreferrer noopener">context engineering</a>.
Context engineering must account for what can go wrong with context. That will certainly evolve over time as models change and improve. And we’ll also have to deal with the same dichotomy that prompt engineering faces: A programmer managing the context while generating code for a substantial software project isn’t doing the same thing as someone designing context management for a software project that incorporates an agent, where errors in a chain of calls to language models and other tools are likely to multiply. These tasks are related, certainly. But they differ as much as “explain it to me with horses” differs from reformatting a user’s initial request with dozens of documents pulled from a retrieval system (RAG).
Drew Breunig has written an excellent pair of articles on the topic: “<a href="https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html" target="_blank" rel="noreferrer noopener">How Long Contexts Fail</a>” and “<a href="https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html" target="_blank" rel="noreferrer noopener">How to Fix Your Context</a>.” I won’t enumerate (maybe I should) the context failures and fixes that Drew describes, but I will describe some things I’ve observed:
<ul class="wp-block-list">
<li>What happens when you’re working on a program with an LLM and suddenly everything goes sour? You can tell it to fix what’s wrong, but the fixes don’t make things better and often make it worse. Something is wrong with the context, but it’s hard to say what and even harder to fix it.</li>
<li>It’s been noticed that, with long context models, the beginning and the end of the context window get the most attention. Content in the middle of the window is likely to be ignored. How do you deal with that?</li>
<li>Web browsers have accustomed us to pretty good (if not perfect) interoperability. But different models use their context and respond to prompts differently. Can we have interoperability between language models?</li>
<li>What happens when hallucinated content becomes part of the context? How do you prevent that? How do you clear it?</li>
<li>At least when using chat frontends, some of the most popular models are implementing conversation history: They will remember what you said in the past. While this can be a good thing (you can say “always use 4-space indents” once), again, what happens if it remembers something that’s incorrect?</li>
</ul>
“Quit and start again with another model” can solve many of these problems. If Claude isn’t getting something right, you can go to Gemini or GPT, which will probably do a good job of understanding the code Claude has already written. They are likely to make different errors—but you’ll be starting with a smaller, cleaner context. Many programmers describe bouncing back and forth between different models, and I’m not going to say that’s bad. It’s similar to asking different people for their perspectives on your problem.
But that can’t be the end of the story, can it? Despite the hype and the breathless pronouncements, we’re still experimenting and learning how to use generative coding. “Quit and start again” might be a good solution for proof-of-concept projects or even single-use software (“<a href="https://www.ohad.com/2025/07/10/voidware/" target="_blank" rel="noreferrer noopener">voidware</a>”) but hardly sounds like a good solution for enterprise software, which as we know, has lifetimes measured in decades. We rarely program that way, and for the most part, we shouldn’t. It sounds too much like a recipe for repeatedly getting 75% of the way to a finished project only to start again, to find out that Gemini solves Claude’s problem but introduces its own. Drew has interesting suggestions for specific problems—such as using RAG to determine which MCP tools to use so the model won’t be confused by a large library of irrelevant tools. At a higher level, we need to think about what we really need to do to manage context. What tools do we need to understand what the model knows about any project? When we need to quit and start again, how do we save and restore the parts of the context that are important?
Several years ago, O’Reilly author Allen Downey suggested that in addition to a source code repo, we need a prompt repo to save and track prompts. We also need an output repo that saves and tracks the model’s output tokens—both its discussion of what it has done and any reasoning tokens that are available. And we need to track anything that is added to the context, whether explicitly by the programmer (“here’s the spec”) or by an agent that is querying everything from online documentation to in-house CI/CD tools and meeting transcripts. (We’re ignoring, for now, agents where context must be managed by the agent itself.)
But that just describes what needs to be saved—it doesn’t tell you where the context should be saved or how to reason about it. Saving context in an AI provider’s cloud seems like a <a href="https://www.techdirt.com/2025/06/16/why-centralized-ai-is-not-our-inevitable-future/" target="_blank" rel="noreferrer noopener">problem waiting to happen</a>; what are the consequences of letting OpenAI, Anthropic, Microsoft, or Google keep a transcript of your thought processes or the contents of internal documents and specifications? (In a <a href="https://techcrunch.com/2025/07/31/your-public-chatgpt-queries-are-getting-indexed-by-google-and-other-search-engines/" target="_blank" rel="noreferrer noopener">short-lived experiment</a>, ChatGPT chats were indexed and findable by Google searches.) And we’re still learning how to reason about context, which may well require another AI. Meta-AI? Frankly, that feels like a cry for help. We know that context engineering is important. We don’t yet know how to engineer it, though we’re starting to get some hints. (Drew Breunig said that we’ve been doing context engineering for the past year, but we’ve only started to understand it.) It’s more than just cramming as much as possible into a large context window—that’s a recipe for failure. It will involve knowing how to locate parts of the context that aren’t working, and ways of retiring those ineffective parts. It will involve determining what information will be the most valuable and helpful to the AI. In turn, that may require better ways of observing a model’s internal logic, something Anthropic has been <a href="https://www.anthropic.com/research/tracing-thoughts-language-model" target="_blank" rel="noreferrer noopener">researching</a>.
Whatever is required, it’s clear that context engineering is the next step. We don’t think it’s the last step in understanding how to use AI to aid software development. There are still problems like discovering and using organizational context, sharing context among team members, developing architectures that work at scale, designing user experiences, and much more. Martin Fowler’s observation that there’s been a change in the level of abstraction is likely to have huge consequences: benefits, surely, but also new problems that we don’t yet know how to think about. We’re still negotiating a route through uncharted territory. But we need to take the next step if we plan to get to the end of the road.
<hr class="wp-block-separator has-alpha-channel-opacity is-style-wide"/>
AI tools are quickly moving beyond chat UX to sophisticated agent interactions. Our upcoming AI Codecon event, Coding for the Future Agentic World, will highlight how developers are already using agents to build innovative and effective AI-powered experiences. We hope you’ll join us on September 9 to explore the tools, workflows, and architectures defining the next era of programming. It’s free to attend. <a href="https://www.oreilly.com/AgenticWorld/" target="_blank" rel="noreferrer noopener">Register now to save your seat</a>.
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/the-abstractions-they-are-a-changing/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>AI’s Swiss Cheese</title>
<link>https://www.oreilly.com/radar/ais-swiss-cheese/</link>
<pubDate>Mon, 11 Aug 2025 13:04:01 +0000</pubDate>
<dc:creator><![CDATA[Tim O’Reilly]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=17252</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/AIs-Swiss-Cheese.jpg"
medium="image"
type="image/jpeg"
/>
<custom:subtitle><![CDATA[Takeaways from My Conversation with Matthew Prince of Cloudflare]]></custom:subtitle>
<description><![CDATA[Last month, I spoke with Matthew Prince, cofounder and CEO of Cloudflare, as part of my ongoing series of public conversations, Live with Tim O’Reilly. Cloudflare made waves with its July 1 announcement that it will block AI crawlers by default and give content owners the ability to decide who can use their material for […]]]></description>
<content:encoded><![CDATA[
Last month, I spoke with Matthew Prince, cofounder and CEO of Cloudflare, as part of my ongoing series of public conversations, <a href="https://www.oreilly.com/live/live-with-tim/" target="_blank" rel="noreferrer noopener">Live with Tim O’Reilly</a>. Cloudflare made waves with its <a href="https://www.cloudflare.com/press-releases/2025/cloudflare-just-changed-how-ai-crawlers-scrape-the-internet-at-large/" target="_blank" rel="noreferrer noopener">July 1 announcement</a> that it will block AI crawlers by default and give content owners the ability to decide who can use their material for training AI models. That’s a big deal for anyone who creates original work for the web.
Obviously, I really care about this issue. O’Reilly’s mission is to <a href="https://www.oreilly.com/about/" target="_blank" rel="noreferrer noopener">share the knowledge of innovators</a>, and we depend on the ability to reach customers so they can hear directly from those innovators. We’re in the midst of a fundamental shift from the old search-driven web, which rewarded content creators with traffic, to an AI-driven internet where the bots give the answers and the original sources may never be seen. As Matthew put it, “Bots don’t click on ads. They don’t subscribe. And they don’t give you the validation of knowing people are reading your content.” (We don’t advertise, but we do depend on subscribers and à la carte content purchasers.)
It is true that we’re still early in the evolution of the business model for AI, but better business models don’t happen in a vacuum. Cloudflare is giving content creators tools for expressing their preferences. It’s been too easy for AI companies to grab whatever content they can get their hands on without compensation or credit. Cloudflare is starting down the path of building market mechanisms for humans to express our preferences to roving bots. Now, we have to get those deploying those bots to respect not just the wishes of their creators but the wishes of those whose content or services they are consuming.
<h2 class="wp-block-heading">Filling Holes: From the Internet’s Missing Security Layer to AI’s Swiss Cheese</h2>
I kicked things off by asking Matthew to describe Cloudflare’s mission. His short answer for cocktail parties is “serve the internet faster and protect it from bad guys.” But when I asked him to go “more slowly” (<a href="https://www.oreilly.com/radar/more-slowly/" target="_blank" rel="noreferrer noopener">a Proust reference I’ve used before</a>), he said, “When you are sitting down and writing out the protocols for the internet back in the ’60s, ’70s, and ’80s, we had no idea what it was going to become. . . .Cloudflare is trying to go back and fill in those holes—security, privacy, reliability—that should have been there from the beginning.”
He then applied this notion of filling holes to AI. The metaphor he kept returning to is Swiss cheese. It’s full of holes. AI models are too, and they need high-quality content to fill them. The humans who create that content need to be compensated, but the AI companies have just skated right by this issue, much as the original developers of the internet ignored security, with later consequences for all of us.
Matthew walked us through the numbers. Ten years ago, the deal with Google was roughly two pages scraped for every unique visitor sent back. With the rise of “answer boxes” and now AI overviews, that ratio has gone from 2:1 to 18:1. For AI companies like OpenAI and Anthropic, the imbalance is even more extreme: OpenAI’s ratio is around 1,500 to 1. Anthropic’s is around 60,000 to 1. Matthew put it bluntly: “That breaks the business model of the web.”
From the user’s point of view, these direct answers are a convenience. From the creator’s point of view, they remove the link between value creation and value capture. This is another version of what The New York Times called “<a href="https://www.oreilly.com/radar/how-to-fix-ais-original-sin/" target="_blank" rel="noreferrer noopener">AI’s original sin</a>.”
<h2 class="wp-block-heading">Avoiding the Medici Future</h2>
Another metaphor that I loved for its historical resonance was what Matthew called the “Medici future” of journalism: “You could imagine five big AI companies, each employing all the journalists and researchers they need. There’s a conservative one, a liberal one. . . .But the decentralizing power of the web could be replaced by massive centralization.”
The flowering of the arts under the Medicis in Renaissance Florence was wonderful, but do we really want to return to that world of artistic patronage dependent on immense wealth inequality? Cloudflare’s first step is requiring what I call Unix-style “user, group, world” permissions for AI crawlers so there can be a functioning market for content. Step two is making sure that the market is fair with a level playing field for big and small AI companies alike.
We found ourselves in fierce agreement here. For too long, platforms have taught us to chase clicks as a proxy for value. I liked Matthew’s take: “Traffic is a bad approximation for value. . . .A better world is one where we identify the holes in the Swiss cheese and reward the people who fill them, both monetarily and with recognition.”
<h2 class="wp-block-heading">AI Security and the Arms Race</h2>
Cloudflare has long used machine learning to detect and block malicious traffic. AI will make attacks more sophisticated, but it can also help build better defenses. “Whoever has the most data tends to win in the world of AI,” he said. “On balance, I think the world becomes more secure because of AI rather than less secure.”
He also described a growing push for cryptographic identification of bots so we can build an “industry spam filter” for AI crawlers. That’s a direction I’d love to see become a standard, much as the anti-spam community developed shared blocklists and reputation systems in the early 2000s.
Matthew ended with a vision I share: This AI disruption could be the moment we build a better web, one that rewards quality and originality instead of sheer volume.
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
“If this is the end of the wild, spammy, engagement-driven internet, maybe that’s a good thing. . . .Let’s compensate the people who fill the holes in the Swiss cheese, reward them, celebrate them—and maybe we’ll build a better web.”
</blockquote>
That’s exactly the conversation we need to be having now, before the future gets locked in.
Sorry to have no video excerpts for this chat. Due to technical difficulties, the conversation was audio only.
<hr class="wp-block-separator has-alpha-channel-opacity is-style-wide"/>
AI tools are quickly moving beyond chat UX to sophisticated agent interactions. Our upcoming AI Codecon event, Coding for the Future Agentic World, will highlight how developers are already using agents to build innovative and effective AI-powered experiences. We hope you’ll join us on September 9 to explore the tools, workflows, and architectures defining the next era of programming. It’s free to attend. <a href="https://www.oreilly.com/AgenticWorld/" target="_blank" rel="noreferrer noopener">Register now to save your seat</a>.
]]></content:encoded>
</item>
<item>
<title>Context Engineering: Bringing Engineering Discipline to Prompts—Part 1</title>
<link>https://www.oreilly.com/radar/context-engineering-bringing-engineering-discipline-to-prompts-part-1/</link>
<pubDate>Mon, 11 Aug 2025 11:08:42 +0000</pubDate>
<dc:creator><![CDATA[Addy Osmani]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Deep Dive]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=17241</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Context-Engineering_Firefly-Image-e1754910447413.jpg"
medium="image"
type="image/jpeg"
/>
<custom:subtitle><![CDATA[From “Prompt Engineering” to “Context Engineering”]]></custom:subtitle>
<description><![CDATA[The following is Part 1 of 3 from Addy Osmani’s original post “Context Engineering: Bringing Engineering Discipline to Parts.” Context Engineering Tips: To get the best results from an AI, you need to provide clear and specific context. The quality of the AI’s output directly depends on the quality of your input. How to improve […]]]></description>
<content:encoded><![CDATA[
The following is Part 1 of 3 from Addy Osmani’s original post “<a href="https://addyo.substack.com/p/context-engineering-bringing-engineering" target="_blank" rel="noreferrer noopener">Context Engineering: Bringing Engineering Discipline to Parts</a>.”
<h3 class="wp-block-heading">Context Engineering Tips:</h3>
To get the best results from an AI, you need to provide clear and specific context. The quality of the AI’s output directly depends on the quality of your input.
How to improve your AI prompts:
<ul class="wp-block-list">
<li>Be precise: Vague requests lead to vague answers. The more specific you are, the better your results will be.</li>
<li>Provide relevant code: Share the specific files, folders, or code snippets that are central to your request.</li>
<li>Include design documents: Paste or attach sections from relevant design docs to give the AI the bigger picture.</li>
<li>Share full error logs: For debugging, always provide the complete error message and any relevant logs or stack traces.</li>
<li>Show database schemas: When working with databases, a screenshot of the schema helps the AI generate accurate code for data interaction.</li>
<li>Use PR feedback: Comments from a pull request make for context-rich prompts.</li>
<li>Give examples: Show an example of what you want the final output to look like.</li>
<li>State your constraints: Clearly list any requirements, such as libraries to use, patterns to follow, or things to avoid.</li>
</ul>
Prompt engineering was about cleverly phrasing a question; context engineering is about constructing an entire information environment so the AI can solve the problem reliably.
“Prompt engineering” became a buzzword essentially meaning the skill of phrasing inputs to get better outputs. It taught us to “program in prose” with clever one-liners. But outside the AI community, many took prompt engineering to mean just typing fancy requests into a chatbot. The term never fully conveyed the real sophistication involved in using LLMs effectively.
As applications grew more complex, the limitations of focusing only on a single prompt became obvious. One analysis quipped: Prompt engineering walked so context engineering could run. In other words, a witty one-off prompt might have wowed us in demos, but building reliable, industrial-strength LLM systems demanded something more comprehensive.
This realization is why our field is coalescing around “context engineering” as a better descriptor for the craft of getting great results from AI. Context engineering means constructing the entire context window an LLM sees—not just a short instruction, but all the relevant background info, examples, and guidance needed for the task.
The phrase was popularized by developers like Shopify’s CEO Tobi Lütke and AI leader Andrej Karpathy in mid-2025.
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
“I really like the term ‘context engineering’ over prompt engineering,” wrote Tobi. “It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.” Karpathy emphatically agreed, noting that “people associate prompts with short instructions, whereas in every serious LLM application, context engineering is the delicate art and science of filling the context window with just the right information for each step.”
</blockquote>
In other words, real-world LLM apps don’t succeed by luck or one-shot prompts—they succeed by carefully assembling context around the model’s queries.
The change in terminology reflects an evolution in approach. If prompt engineering was about coming up with a magical sentence, context engineering is <a href="https://analyticsindiamag.com/ai-features/context-engineering-is-the-new-vibe-coding/#:~:text=If%20prompt%20engineering%20was%20about,in%20favour%20of%20context%20engineering" target="_blank" rel="noreferrer noopener">about</a> writing the full screenplay for the AI. It’s a structural shift: Prompt engineering ends once you craft a good prompt, whereas context engineering begins with designing whole systems that bring in memory, knowledge, tools, and data in an organized way.
As Karpathy explained, doing this well involves everything from clear task instructions and explanations, to providing few-shot examples, retrieved facts (RAG), possibly multimodal data, relevant tools, state history, and careful compacting of all that into a limited window. Too little context (or the wrong kind) and the model will lack the information to perform optimally; too much irrelevant context and you waste tokens or even degrade performance. The sweet spot is non-trivial to find. No wonder Karpathy calls it both a science and an art.
The term context engineering is catching on because it intuitively captures what we actually do when building LLM solutions. “Prompt” sounds like a single short query; “context” implies a richer information state we prepare for the AI.
Semantics aside, why does this shift matter? Because it marks a maturing of our mindset for AI development. We’ve learned that generative AI in production is less like casting a single magic spell and more like engineering an entire environment for the AI. A one-off prompt might get a cool demo, but for robust solutions you need to control what the model “knows” and “sees” at each step. It often means retrieving relevant documents, summarizing history, injecting structured data, or providing tools—whatever it takes so the model isn’t guessing in the dark. The result is we no longer think of prompts as one-off instructions we hope the AI can interpret. We think in terms of context pipelines: all the pieces of information and interaction that set the AI up for success.
<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="468" height="468" src="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Prompt-Engineering-vs-Context-Engineering.png" alt="Prompt engineering vs. context engineering" class="wp-image-17242" srcset="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Prompt-Engineering-vs-Context-Engineering.png 468w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Prompt-Engineering-vs-Context-Engineering-300x300.png 300w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Prompt-Engineering-vs-Context-Engineering-160x160.png 160w" sizes="auto, (max-width: 468px) 100vw, 468px" /></figure>
To illustrate, consider the difference in perspective. Prompt engineering was often an exercise in clever wording (“Maybe if I phrase it this way, the LLM will do what I want”). Context engineering, by contrast, feels more like traditional engineering: What inputs (data, examples, state) does this system need? How do I get those and feed them in? In what format? At what time? We’ve essentially gone from squeezing performance out of a single prompt to designing LLM-powered systems.
<h2 class="wp-block-heading">What Exactly Is Context Engineering?</h2>
Context engineering means dynamically giving an AI everything it needs to succeed—the instructions, data, examples, tools, and history—all packaged into the model’s input context at runtime.
A useful <a href="https://blog.langchain.com/context-engineering-for-agents/#:~:text=As%20Andrej%20Karpathy%20puts%20it%2C,Karpathy%20summarizes%20this%20well" target="_blank" rel="noreferrer noopener">mental model</a> (suggested by Andrej Karpathy and others) is to think of an LLM like a CPU, and its context window (the text input it sees at once) as the RAM or working memory. As an engineer, your job is akin to an operating system: load that working memory with just the right code and data for the task. In practice, this context can come from many sources: the user’s query, system instructions, retrieved knowledge from databases or documentation, outputs from other tools, and summaries of prior interactions. Context engineering is about orchestrating all these pieces into the prompt that the model ultimately sees. It’s not a static prompt but a dynamic assembly of information at runtime.
<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="468" height="468" src="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Context-Engineering.png" alt="Illustration: multiple sources of information are composed into an LLM’s context window (its “working memory”). The context engineer’s goal is to fill that window with the right information, in the right format, so the model can accomplish the task effectively." class="wp-image-17244" srcset="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Context-Engineering.png 468w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Context-Engineering-300x300.png 300w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Context-Engineering-160x160.png 160w" sizes="auto, (max-width: 468px) 100vw, 468px" /><figcaption class="wp-element-caption">Illustration: multiple sources of information are composed into an LLM’s context window (its “working memory”). The context engineer’s goal is to fill that window with the right information, in the right format, so that the model can accomplish the task effectively.</figcaption></figure>
Let’s break down what this involves:
<ul class="wp-block-list">
<li>It’s a system, not a one-off prompt. In a well-engineered setup, the final prompt the LLM sees might include several components: e.g., a role instruction written by the developer, plus the latest user query, plus relevant data fetched on the fly, plus perhaps a few examples of desired output format. All of that is woven together programmatically. For example, imagine a coding assistant AI that gets the query “How do I fix this authentication bug?” The system behind it might automatically search your codebase for related code, retrieve the relevant file snippets, and then construct a prompt like: “You are an expert coding assistant. The user is facing an authentication bug. Here are relevant code snippets: [code]. The user’s error message: [log]. Provide a fix.” Notice how that final prompt is built from multiple pieces. Context engineering is the logic that decides which pieces to pull in and how to join them. It’s akin to writing a function that prepares arguments for another function call—except here, the “arguments” are bits of context and the function is the LLM invocation.</li>
<li>It’s dynamic and situation-specific. Unlike a single hard-coded prompt, context assembly happens per request. The system might include different info depending on the query or the conversation state. If it’s a multi-turn conversation, you might include a summary of the conversation so far, rather than the full transcript, to save space (and sanity). If the user’s question references some document (“What does the design spec say about X?”), the system might fetch that spec from a wiki and include the relevant excerpt. In short, context engineering logic responds to the current state—much like how a program’s behavior depends on input. This dynamic nature is crucial. You wouldn’t feed a translation model the exact same prompt for every sentence you translate; you’d feed it the new sentence each time. Similarly, in an AI agent, you’re constantly updating what context you give as the state evolves.</li>
<li>It blends multiple types of content. LangChain <a href="https://blog.langchain.com/context-engineering-for-agents/#:~:text=What%20are%20the%20types%20of,a%20few%20different%20context%20types" target="_blank" rel="noreferrer noopener">describes</a> context engineering as an umbrella that covers at least three facets of context: (1) Instructional context—the prompts or guidance we provide (including system role instructions and few-shot examples), (2) Knowledge context—domain information or facts we supply, often via retrieval from external sources, and (3) Tools context—information coming from the model’s environment via tools or API calls (e.g., results from a web search, database query, or code execution). A robust LLM application often needs all three: clear instructions about the task, relevant knowledge plugged in, and possibly the ability for the model to use tools and then incorporate the tool results back into its thinking. Context engineering is the discipline of managing all these streams of information and merging them coherently.</li>
<li>Format and clarity matter. It’s not just what you include in the context, but how you present it. Communicating with an AI model has surprising parallels to communicating with a human: If you dump a huge blob of unstructured text, the model might get confused or miss the point, whereas a well-organized input will guide it. Part of context engineering is figuring out how to compress and structure information so the model grasps what’s important. This could mean summarizing long texts, using bullet points or headings to highlight key facts, or even formatting data as JSON or pseudo-code if that helps the model parse it. For instance, if you retrieved a document snippet, you might preface it with something like “Relevant documentation:” and put it in quotes, so the model knows it’s reference material. If you have an error log, you might show only the last 5 lines rather than 100 lines of stack trace. Effective context engineering often involves creative information design—making the input as digestible as possible for the LLM.</li>
</ul>
Above all, context engineering is about setting the AI up for success.
Remember, an LLM is powerful but not psychic—it can only base its answers on what’s in its input plus what it learned during training. If it fails or hallucinates, often the root cause is that we didn’t give it the right context, or we gave it poorly structured context. When an LLM “agent” misbehaves, usually “the appropriate context, instructions and tools have not been communicated to the model.” Garbage in, garbage out. Conversely, if you do supply all the relevant info and clear guidance, the model’s performance improves dramatically.
<h4 class="wp-block-heading">Feeding high-quality context: Practical tips</h4>
Now, concretely, how do we ensure we’re giving the AI everything it needs? Here are some pragmatic tips that I’ve found useful when building AI coding assistants and other LLM apps:
<ul class="wp-block-list">
<li>Include relevant source code and data. If you’re asking an AI to work on code, provide the relevant code files or snippets. Don’t assume the model will recall a function from memory—show it the actual code. Similarly, for Q&A tasks include the pertinent facts or documents (via retrieval). Low context guarantees low-quality output. The model can’t answer what it hasn’t been given.</li>
<li>Be precise in instructions. Clearly state what you want. If you need the answer in a certain format (JSON, specific style, etc.), mention that. If the AI is writing code, specify constraints like which libraries or patterns to use (or avoid). Ambiguity in your request can lead to meandering answers.</li>
<li>Provide examples of the desired output. Few-shot examples are powerful. If you want a function documented in a certain style, show one or two examples of properly documented functions in the prompt. Modeling the output helps the LLM understand exactly what you’re looking for.</li>
<li>Leverage external knowledge. If the task needs domain knowledge beyond the model’s training (e.g., company-specific details, API specs), retrieve that info and put it in the context. For instance, attach the relevant section of a design doc or a snippet of the API documentation. LLMs are far more accurate when they can cite facts from provided text rather than recalling from memory.</li>
<li>Include error messages and logs when debugging. If asking the AI to fix a bug, show it the full error trace or log snippet. These often contain the critical clue needed. Similarly, include any test outputs if asking why a test failed.</li>
<li>Maintain conversation history (smartly). In a chat scenario, feed back important bits of the conversation so far. Often you don’t need the entire history—a concise summary of key points or decisions can suffice and saves token space. This gives the model context of what’s already been discussed.</li>
<li>Don’t shy away from metadata and structure. Sometimes telling the model why you’re giving a piece of context can help. For example: “Here is the user’s query.” or “Here are relevant database schemas:” as prefacing labels. Simple section headers like “User Input: … / Assistant Response: …” help the model parse multi-part prompts. Use formatting (markdown, bullet lists, numbered steps) to make the prompt logically clear.</li>
</ul>
Remember the golden rule: LLMs are powerful but they aren’t mind-readers. The quality of output is directly proportional to the quality and relevance of the context you provide. Too little context (or missing pieces) and the AI will fill gaps with guesses (often incorrect). Irrelevant or noisy context can be just as bad, leading the model down the wrong path. So our job as context engineers is to feed the model exactly what it needs and nothing it doesn’t.
<hr class="wp-block-separator has-alpha-channel-opacity"/>
AI tools are quickly moving beyond chat UX to sophisticated agent interactions. Our upcoming AI Codecon event, Coding for the Future Agentic World, will highlight how developers are already using agents to build innovative and effective AI-powered experiences. We hope you’ll join us on September 9 to explore the tools, workflows, and architectures defining the next era of programming. It’s free to attend. <a href="https://www.oreilly.com/AgenticWorld/" target="_blank" rel="noreferrer noopener">Register now to save your seat</a>.
]]></content:encoded>
</item>
<item>
<title>Generative AI in the Real World: Jay Alammar on Building AI for the Enterprise</title>
<link>https://www.oreilly.com/radar/podcast/generative-ai-in-the-real-world-jay-alammar-on-building-ai-for-the-enterprise/</link>
<comments>https://www.oreilly.com/radar/podcast/generative-ai-in-the-real-world-jay-alammar-on-building-ai-for-the-enterprise/#respond</comments>
<pubDate>Thu, 07 Aug 2025 19:00:10 +0000</pubDate>
<dc:creator><![CDATA[Ben Lorica and Jay Alammar]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Podcast]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?post_type=podcast&p=17223</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2024/01/Podcast_Cover_GenAI_in_the_Real_World-scaled.png"
medium="image"
type="image/png"
/>
<description><![CDATA[Jay Alammar, director and Engineering Fellow at Cohere, joins Ben Lorica to talk about building AI applications for the enterprise, using RAG effectively, and the evolution of RAG into agents. Listen in to find out what kinds of metadata you need when you’re onboarding a new model or agent; discover how an emphasis on evaluation […]]]></description>
<content:encoded><![CDATA[
Jay Alammar, director and Engineering Fellow at Cohere, joins Ben Lorica to talk about building AI applications for the enterprise, using RAG effectively, and the evolution of RAG into agents. Listen in to find out what kinds of metadata you need when you’re onboarding a new model or agent; discover how an emphasis on evaluation helps an organization improve its processes; and learn how to take advantage of the latest code-generation tools.
About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.
Check out <a href="https://learning.oreilly.com/playlists/42123a72-1108-40f1-91c0-adbfb9f4983b/?_gl=1*16z5k2y*_ga*MTE1NDE4NjYxMi4xNzI5NTkwODkx*_ga_092EL089CH*MTcyOTYxNDAyNC4zLjEuMTcyOTYxNDAyNi41OC4wLjA." target="_blank" rel="noreferrer noopener">other episodes</a> of this podcast on the O’Reilly learning platform.
<h3 class="wp-block-heading">Timestamps</h3>
<ul class="wp-block-list">
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=0" target="_blank" rel="noreferrer noopener">0:00</a>: Introduction to Jay Alammar, director at <a href="https://cohere.com/" target="_blank" rel="noreferrer noopener">Cohere</a>. He’s also the author of Hands-On Large Language Models.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=30" target="_blank" rel="noreferrer noopener">0:30</a>: What has changed in how you think about teaching and building with LLMs?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=45" target="_blank" rel="noreferrer noopener">0:45</a>: This is my fourth year with Cohere. I really love the opportunity because it was a chance to join the team early (around the time of GPT-3). Aidan Gomez, one of the cofounders, was one of the coauthors of the transformers paper. I’m a student of how this technology went out of the lab and into practice. Being able to work in a company that’s doing that has been very educational for me. That’s a little of what I use to teach. I use my writing to learn in public. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=140" target="_blank" rel="noreferrer noopener">2:20</a>: I assume there’s a big difference between learning in public and teaching teams within companies. What’s the big difference?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=156" target="_blank" rel="noreferrer noopener">2:36</a>: If you’re learning on your own, you have to run through so much content and news, and you have to mute a lot of it as well. This industry moves extremely fast. Everyone is overwhelmed by the pace. For adoption, the important thing is to filter a lot of that and see what actually works, what patterns work across use cases and industries, and write about those. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=205" target="_blank" rel="noreferrer noopener">3:25</a>: That’s why something like RAG proved itself as one application paradigm for how people should be able to use language models. A lot of it is helping people cut through the hype and get to what’s actually useful, and raise AI awareness. There’s a level of AI literacy that people need to come to grips with. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=250" target="_blank" rel="noreferrer noopener">4:10</a>: People in companies want to learn things that are contextually relevant. For example, if you’re in finance, you want material that will help deal with Bloomberg and those types of data sources, and material aware of the regulatory environment. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=278" target="_blank" rel="noreferrer noopener">4:38</a>: When people started being able to understand what this kind of technology was capable of doing, there were multiple lessons the industry needed to understand. Don’t think of chat as the first thing you should deploy. Think of simpler use cases, like summarization or extraction. Think about these as building blocks for an application. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=328" target="_blank" rel="noreferrer noopener">5:28</a>: It’s unfortunate that the name “generative AI” came to be used because the most important things AI can do aren’t generative: they’re the representation with embeddings that enable better categorization, better clustering, and enabling companies to make sense of large amounts of data. The next lesson was to not rely on a model’s information. In the beginning of 2023, there were so many news stories about the models being a search engine. People expected the model to be truthful, and they were surprised when it wasn’t. One of the first solutions was RAG. RAG tries to retrieve the context that will hopefully contain the answer. The next question was data security and data privacy: They didn’t want data to leave their network. That’s where private deployment of models becomes a priority, where the model comes to the data. With that, they started to deploy their initial use cases. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=484" target="_blank" rel="noreferrer noopener">8:04</a>: Then that system can answer systems to a specific level of difficulty—but with more difficulty, the system needs to be more advanced. Maybe it needs to search for multiple queries or do things over multiple steps. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=511" target="_blank" rel="noreferrer noopener">8:31</a>: One thing we learned about RAG was that just because something is in the context window doesn’t mean the machine won’t hallucinate. And people have developed more appreciation of applying even more context: GraphRAG, context engineering. Are there specific trends that people are doing more of? I got excited about GraphRAG, but this is hard for companies. What are some of the trends within the RAG world that you’re seeing?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=582" target="_blank" rel="noreferrer noopener">9:42</a>: Yes, if you provide the context, the model might still hallucinate. The answers are probabilistic in nature. The same model that can answer your questions 99% of the time correctly might…</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=610" target="_blank" rel="noreferrer noopener">10:10</a>: Or the models are black boxes and they’re opinionated. The model may have seen something in its pretraining data. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=625" target="_blank" rel="noreferrer noopener">10:25</a>: True. And if you’re training a model, there’s that trade-off; how much do you want to force the model to answer from the context versus general common sense?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=655" target="_blank" rel="noreferrer noopener">10:55</a>: That’s a good point. You might be feeding conspiracy theories in the context windows. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=664" target="_blank" rel="noreferrer noopener">11:04</a>: As a model creator, you always think about generalization and how the model can be the best model across the many use cases.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=675" target="_blank" rel="noreferrer noopener">11:15</a>: The evolution of RAG: There are multiple levels of difficulty that can be built into a RAG system. The first is to search one data source, get the top few documents, and add them to the context. Then RAG systems can be improved by saying, “Don’t search for the user query itself, but give the question to a language model to say ‘What query should I ask to answer this question?’” That became query rewriting. Then for the model to improve its information gathering, give it the ability to search for multiple things at the same time—for example, comparing NVIDIA’s results in 2023 and 2024. A more advanced system would search for two documents, asking multiple queries. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=795" target="_blank" rel="noreferrer noopener">13:15</a>: Then there are models that ask multiple queries in sequence. For example, what are the top car manufacturers in 2024, and do they each make EVs? The best process is to answer the first question, get that list, and then send a query for each one. Does Toyota make an EV? Then you see the agent building this behavior. Some of the top features are the ones we’ve described: query rewriting, using search engines, deciding when it has enough information, and doing things sequentially.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=878" target="_blank" rel="noreferrer noopener">14:38</a>: Earlier in the pipeline—as you take your PDF files, you study them and take advantage of them. Nirvana would be a knowledge graph. I’m hearing about teams taking advantage of the earlier part of the pipeline. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=933" target="_blank" rel="noreferrer noopener">15:33</a>: This is a design pattern we’re seeing more and more of. When you’re onboarding, give the model an onboarding phase where it can collect information, store it someplace that can help it interact. We see a lot of metadata for agents that deal with databases. When you onboard to a database system, it would make sense for you to give the model a sense of what the tables are, what columns they have. You see that also with a repository, with products like Cursor. When you onboard the model to a new codebase, it would make sense to give it a Markdown page that tells it the tech stack and the test frameworks. Maybe after implementing a large enough chunk, do a check-in after running the test. Regardless of having models that can fit a million tokens, managing that context is very important.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1043" target="_blank" rel="noreferrer noopener">17:23</a>: And if your retrieval gives you the right information, why would you stick a million tokens in the context? That’s expensive. And people are noticing that LLMs behave like us: They read the beginning of the context and the end. They miss things in the middle. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1072" target="_blank" rel="noreferrer noopener">17:52</a>: Are you hearing people doing GraphRAG, or is it a thing that people write about but few are going down this road?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1098" target="_blank" rel="noreferrer noopener">18:18</a>: I don’t have direct experience with it.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1104" target="_blank" rel="noreferrer noopener">18:24</a>: Are people asking for it?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1107" target="_blank" rel="noreferrer noopener">18:27</a>: I can’t cite much clamor. I’ve heard of lots of interesting developments, but there are lots of interesting developments in other areas. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1125" target="_blank" rel="noreferrer noopener">18:45</a>: The people talking about it are the graph people. One of the patterns I see is that you get excited, and a year in you realize that the only people talking about it are the vendors.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1156" target="_blank" rel="noreferrer noopener">19:16</a>: Evaluation: You’re talking to a lot of companies. I’m telling people “Your eval is IP.” So if I send you to a company, what are the first few things they should be doing?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1188" target="_blank" rel="noreferrer noopener">19:48</a>: That’s one of the areas where companies should really develop internal knowledge and capabilities. It’s how you’re able to tell which vendor is better for your use case. In the realm of software, it’s akin to unit tests. You need to differentiate and understand what use cases you’re after. If you haven’t defined those, you aren’t going to be successful. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1230" target="_blank" rel="noreferrer noopener">20:30</a>: You set yourself up for success if you define the use cases that you want. You gather internal examples with your exact internal data, and that can be a small dataset. But that will give you so much direction.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1250" target="_blank" rel="noreferrer noopener">20:50</a>: That might force you to develop your process too. When do you send something to a person? When do you send it to another model?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1264" target="_blank" rel="noreferrer noopener">21:04</a>: That grounds people’s experience and expectations. And you get all the benefits of unit tests. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1293" target="_blank" rel="noreferrer noopener">21:33</a>: What’s the level of sophistication of a regular enterprise in this area?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1300" target="_blank" rel="noreferrer noopener">21:40</a>: I see people developing quite quickly because the pickup in language models is tremendous. It’s an area where companies are catching up and investing. We’re seeing a lot of adoption of tool use and RAG and companies defining their own tools. But it’s always a good thing to continue to advocate.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1344" target="_blank" rel="noreferrer noopener">22:24</a>: What are some of the patterns or use cases that are common now that people are happy about, that are delivering on ROI?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1360" target="_blank" rel="noreferrer noopener">22:40</a>: RAG and grounding it on internal company data is one area where people can really see a type of product that was not possible a few years ago. Once a company deploys a RAG model, other things come to mind like multimodality: images, audio, video. Multimodality is the next horizon.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1401" target="_blank" rel="noreferrer noopener">23:21</a>: Where are we on multimodality in the enterprise?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1407" target="_blank" rel="noreferrer noopener">23:27</a>: It’s very important, specifically if you are looking at companies that rely on PDFs. There’s charts and images in there. In the medical field, there’s a lot of images. We’ve seen that embedding models can also support images.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1442" target="_blank" rel="noreferrer noopener">24:02</a>: Video and audio are always the orphans.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1447" target="_blank" rel="noreferrer noopener">24:07</a>: Video is difficult. Only specific media companies are leading the charge. Audio, I’m anticipating lots of developments this year. It hasn’t caught up to text, but I’m expecting a lot of audio products to come to market. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1481" target="_blank" rel="noreferrer noopener">24:41</a>: One of the earliest use cases was software development and coding. Is that an area that you folks are working in?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1491" target="_blank" rel="noreferrer noopener">24:51</a>: Yes, that is my focus area. I think a lot about code-generation agents.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1501" target="_blank" rel="noreferrer noopener">25:01</a>: At this point, I would say that most developers are open to using code-generation tools. What’s your sense of the level of acceptance or resistance?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1526" target="_blank" rel="noreferrer noopener">25:26</a>: I advocate for people to try out the tools and understand where they’re strong and where they’re lacking. I’ve found the tools very useful, but you need to assert ownership and understand how LLMs evolved from being writers of functions (which is how evaluation benchmarks were written a year ago) to more advanced software engineering, where the model needs to solve larger problems across multiple steps and stages. Models are now evaluated on SWE-bench, where the input is a GitHub issue. Go and solve the GitHub issue, and we’ll evaluate it when the unit tests pass.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1617" target="_blank" rel="noreferrer noopener">26:57</a>: Claude Code is quite good at this, but it will burn through a lot of tokens. If you’re working in a company and it solves a problem, that’s fine. But it can get expensive. That’s one of my pet peeves—but we’re getting to the point where I can only write software when I’m connected to the internet. I’m assuming that the smaller models are also improving and we’ll be able to work offline.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1665" target="_blank" rel="noreferrer noopener">27:45</a>: 100%. I’m really excited about smaller models. They’re catching up so quickly. What we could only do with the bigger models two years ago, now you can do with a model that’s 2B or 4B parameters.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1697" target="_blank" rel="noreferrer noopener">28:17</a>: One of the buzzwords is agents. I assume most people are in the early phases—they’re doing simple, task-specific agents, maybe multiple agents working in parallel. But I think multi-agents aren’t quite there yet. What are you seeing?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1731" target="_blank" rel="noreferrer noopener">28:51</a>: Maturity is still evolving. We’re still in the early days for LLMs as a whole. People are seeing that if you deploy them in the right contexts, under the right user expectations, they can solve many problems. When built in the right context with access to the right tools, they can be quite useful. But the end user remains the final expert. The model should show the user its work and its reasons for saying something and its sources for the information, so the end user becomes the final arbiter.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1809" target="_blank" rel="noreferrer noopener">30:09</a>: I tell nontech users that you’re already using agents if you’re using one of these deep research tools.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1820" target="_blank" rel="noreferrer noopener">30:20</a>: Advanced RAG systems have become agents, and deep research is maybe one of the more mature systems. It’s really advanced RAG that’s really deep.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1840" target="_blank" rel="noreferrer noopener">30:40</a>: There are finance startups that are building deep research tools for analysts in the finance industry. They’re essentially agents because they’re specialized. Maybe one agent is going for earnings. You can imagine an agent for knowledge work.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1875" target="_blank" rel="noreferrer noopener">31:15</a>: And that’s the pattern that is maybe the more organic growth out of the single agent.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1889" target="_blank" rel="noreferrer noopener">31:29</a>: And I know developers who have multiple instances of Claude Code doing something that they will bring together. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1901" target="_blank" rel="noreferrer noopener">31:41</a>: We’re at the beginning of discovering and exploring. We don’t really have the user interfaces and systems that have evolved enough to make the best out of this. For code, it started out in the IDE. Some of the earlier systems that I saw used the command line, like Aider, which I assumed was the inspiration for Claude Code. It’s definitely a good way to augment AI in the IDE.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1945" target="_blank" rel="noreferrer noopener">32:25</a>: There’s new generations of the terminal even: Warp and marimo, that are incorporating many of these developments.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=1959" target="_blank" rel="noreferrer noopener">32:39</a>: Code extends beyond what software engineers are using. The general user requires some level of code ability in the agent, even if they’re not reading the code. If you tell the model to give you a bar chart, the model is writing Matplotlib code. Those are agents that have access to a run environment where they can write the code to give to the user, who’s an analyst, not a software engineer. Code is the most interesting area of focus.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2013" target="_blank" rel="noreferrer noopener">33:33</a>: When it comes to agents or RAG, it’s a pipeline that starts from the source documents to the information extraction strategy—it becomes a system that you have to optimize end to end. When RAG came out, it was just a bunch of blog posts saying that we should focus on chunking. But now people realize this is an end-to-end system. Does this make it a much more formidable challenge for an enterprise team? Should they go with a RAG provider like Cohere or experiment themselves?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2080" target="_blank" rel="noreferrer noopener">34:40</a>: It depends on the company and the capacity they have to throw at this. In a company that needs a database, they can build one from scratch, but maybe that’s not the best approach. They can outsource or acquire it from a vendor. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2105" target="_blank" rel="noreferrer noopener">35:05</a>: Each of those steps has 20 choices, so there’s a combinatorial explosion.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2116" target="_blank" rel="noreferrer noopener">35:16</a>: Companies are under pressure to show ROI quickly and realize the value of their investment. That’s an area where using a vendor that specializes is helpful. There are a lot of options: the right search systems, the right connectors, the workflows and the pipelines and the prompts. Query rewriting and rewriting. In our education content, we describe all of those. But if you’re going to build a system like this, it will take a year or two. Most companies don’t have that kind of time. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2177" target="_blank" rel="noreferrer noopener">36:17</a>: Then you realize you need other enterprise features like security and access control. In closing: Most companies aren’t going to train their own foundation models. It’s all about MCP, RAG, and posttraining. Do you think companies should have a basic AI platform that will allow them to do some posttraining?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2222" target="_blank" rel="noreferrer noopener">37:02</a>: I don’t think it’s necessary for most companies. You can go far with a state-of-the-art model if you interact with it on the level of prompt engineering and context management. That can get you so far. And you benefit from the rising tide of the models improving. You don’t even need to change your API. That rising tide will continue to be helpful and beneficial. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2259" target="_blank" rel="noreferrer noopener">37:39</a>: Companies that have that capacity and capability, and maybe that’s closer to the core of what their product is, things like fine tuning are things where they can distinguish themselves a little bit, especially if they’re tried things like RAG and prompt engineering. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2292" target="_blank" rel="noreferrer noopener">38:12</a>: The superadvanced companies are even doing reinforcement fine-tuning.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2302" target="_blank" rel="noreferrer noopener">38:22</a>: The recent development in foundation models are multimodalities and reasoning. What are you looking forward to on the foundation model front that is still below the radar?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2328" target="_blank" rel="noreferrer noopener">38:48</a>: I’m really excited to see more of these text diffusion models. Diffusion is a different type of system where you’re not generating your output token by token. We’ve seen it in image and video generation. The output in the beginning is just static noise. But then the model generates another image, refining the output so it becomes more and more clear. For text, that takes another format. If you’re emitting output token by token, you’re already committed to the first two or three words. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2397" target="_blank" rel="noreferrer noopener">39:57</a>: With text diffusion models, you have a general idea you want to express. You have an attempt at expressing it. And another attempt where you change all the tokens, not one by one. Their output speed is absolutely incredible. It increases the speed, but also could pose new paradigms or behaviors.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2438" target="_blank" rel="noreferrer noopener">40:38</a>: Can they reason?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2440" target="_blank" rel="noreferrer noopener">40:40</a>: I haven’t seen demos of them doing reasoning. But that’s one area that could be promising.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2451" target="_blank" rel="noreferrer noopener">40:51</a>: What should companies think about the smaller models? Most people on the consumer side are interacting with the large models. What’s the general sense for the smaller models moving forward? My sense is that they will prove sufficient for most enterprise tasks.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2493" target="_blank" rel="noreferrer noopener">41:33</a>: True. If the companies have defined the use cases they want and have found a smaller model that can satisfy this, they can deploy or assign that task to a small model. It will be smaller, faster, lower latency, and cheaper to deploy.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Jay_Alammar.mp3#t=2522" target="_blank" rel="noreferrer noopener">42:02</a>: The more you identify the individual tasks, the more you’ll be able to say that a small model can do the tasks reliably enough. I’m very excited about small models. I’m more excited about small models that are capable than large models.</li>
</ul>
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/podcast/generative-ai-in-the-real-world-jay-alammar-on-building-ai-for-the-enterprise/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>The Future of Product Management Is AI-Native</title>
<link>https://www.oreilly.com/radar/the-future-of-product-management-is-ai-native/</link>
<pubDate>Thu, 07 Aug 2025 14:54:18 +0000</pubDate>
<dc:creator><![CDATA[Tim O’Reilly]]></dc:creator>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=17225</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/08/Colorful-Waves.jpg"
medium="image"
type="image/jpeg"
/>
<custom:subtitle><![CDATA[Takeaways from My Conversation with Marily Nika]]></custom:subtitle>
<description><![CDATA[In my recent Live with Tim O’Reilly interview, I spoke with Marily Nika, author of Building AI-Powered Products and one of the most thoughtful voices at the intersection of AI and product management. We talked about what it means to build products in the age of AI—and how the role of product manager is being […]]]></description>
<content:encoded><![CDATA[
In my recent Live with Tim O’Reilly interview, I spoke with Marily Nika, author of <a href="https://www.oreilly.com/library/view/building-ai-powered-products/9781098152697/" target="_blank" rel="noreferrer noopener">Building AI-Powered Products</a> and one of the most thoughtful voices at the intersection of AI and product management. We talked about what it means to build products in the age of AI—and how the role of product manager is being redefined in real time. This is a subject that’s near and dear to me as I work with the O’Reilly team to take <a href="https://www.oneusefulthing.org/p/the-bitter-lesson-versus-the-garbage" target="_blank" rel="noreferrer noopener">the bitter lesson</a> to heart and rethink all of our processes and products in light of the new capabilities of AI. (For additional perspective, see also <a href="https://www.dbreunig.com/2025/08/01/does-the-bitter-lesson-have-limits.html" target="_blank" rel="noreferrer noopener">Drew Breunig’s critique</a> of the bitter lesson as applied to corporate AI strategy.)
Marily started in AI product management at Google back in 2013, before most of us even called it that. Today, she argues, this is no longer a niche skill set. It’s becoming THE job. “All product managers will be AI product managers,” she said. But she also warned against what she called the “shiny object trap”—using AI just to keep up with the hype. Good PMs must stay grounded in user pain points and product strategy. AI should be used only when it’s the best possible solution. “Use cases haven’t changed,” Marily noted. “People still want the same things. What’s changed is how we can solve for them.”
<h2 class="wp-block-heading">Marily’s Rapid Prototyping Workflow</h2>
One of the most exciting parts of our conversation was hearing about Marily’s rapid prototyping workflow using Perplexity for user research, custom GPTs for spec generation in her own voice, and v0 for UI mockups. With these tools, she can go from idea to functional prototype in hours, not weeks. “Every week I block time on my calendar just for AI experimentation. It’s made me a much better PM,” she said.
<iframe loading="lazy" width="560" height="315" src="https://www.youtube.com/embed/ZuzCiQ2HeaY?si=xlODpLPhb8ej7QX0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>
I hadn’t thought about limiting a search to Reddit to mine for user pain points. That’s brilliant.
One of our live attendees asked a thoughtful question: Is there such a thing as “vibe PMing”? Here’s Marily’s answer:
<iframe loading="lazy" width="560" height="315" src="https://www.youtube.com/embed/DTmUqg8Lsdg?si=f8pRnEsXQcm7f6qU" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>
We also discussed when to prioritize polish over speed—and whether AI can help us do both. “AI is a slider, not a switch. You have to decide how much to use it at each stage,” she noted.
Marily also laid out three emerging product manager archetypes:
<ul class="wp-block-list">
<li>AI builder PMs, who work on the models themselves</li>
<li>AI experience PMs, who craft novel UX with those models</li>
<li>AI-enhanced PMs, who use AI to amplify traditional product work</li>
</ul>
That’s real food for thought, and something that we’ll have to dig deeper into as we continue to develop our O’Reilly live training curriculum for AI-centered product management.
<h2 class="wp-block-heading">Strategy Meets Implementation</h2>
We talked about a theme close to my heart: the PM as translator between strategy and implementation.
I’m very influenced by my wife Jen Pahlka’s work on government transformation, as described in her book <a href="https://www.recodingamerica.us/" target="_blank" rel="noreferrer noopener">Recoding America</a>. In her telling, product management is the skill of shaping not just what to do when developing a product but also what not to do. Government is in many ways an extreme case, with mandates developed by nontechnical members of Congress and their staff, or by administrative agencies, with little attention given to the details of how those mandates will be implemented, whether the implied implementation will work, or even if the specifications are implementable! But those lessons are also often surprisingly relevant for those of us in the corporate world.
Two stories stick in my mind. The first is about a PM at the Centers for Medicare & Medicaid Services who was faced with a spec that she thought was unimplementable. Conflicting mandates from Congress meant that doctors would be required to sign up for a program three months before they’d receive the information they needed to make that decision. Changing the spec would have been next to impossible. So she made the bold decision to override it, reasoning that Congress had specified quarterly reporting because they didn’t understand that it would be possible to create an API to provide real-time updates. The second is about a project leader who recognized that the project as specified wouldn’t work but said, “If they tell us to build a concrete boat, we’ll build a concrete boat.”
In her response to my extended tirade, Marily emphasized that while PMs don’t run day-to-day delivery, they must understand the trade-offs between latency, cost, UX, privacy, and feasibility—especially in AI development. You don’t need to build a concrete boat just because someone told you to.
<h2 class="wp-block-heading">Shared Tools and Team AI Adoption</h2>
One of the best attendee questions of the hour—one that was so good that I am using it as part of the framing of a longer post I’m working on about AI for groups—was “What are some tips on dealing with the fact that we are currently working in teams, but in silos of individual AI assistants?” (This question was from someone identified only as DP. For some reason, many of our corporate customers don’t want their employees to be identified by name or affiliation in the chat for our live events, which is too bad. DP, if you happen to read this post, please reach out. I’d love to chat with you more about this idea. If you know my name, you know my email.)
<iframe loading="lazy" width="560" height="315" src="https://www.youtube.com/embed/ffdkQat6hJE?si=ohslW7y8oPS-Ww7P" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>
As you can see from the video excerpt, Marily completely agreed that this is a problem. AI use is still often siloed and secretive in teams—people afraid they’re “cheating” by using it, she noted. She called for teams to be open and collaborative about their AI workflows: create shared prompt libraries, use group tools like <a href="https://notebooklm.google/" target="_blank" rel="noreferrer noopener">NotebookLM</a>, and normalize AI use with shared agents and systems.
It occurred to me based on her response that NotebookLM may have a good start as a platform for shared AI work by nondevelopers, because it inherits many of the collaboration features from Google Drive and the associated family of Google productivity apps. In a similar way, AI for developers relies on GitHub for most of its “groupware” capabilities.
But that highlights just how LLMs themselves are really weak in this area. Leaning on external infrastructure is not a substitute for native features. For example, how might an LLM instance have a group memory, not just user memory? How might it include version control? How might we share an AI workflow versus just sending around links to outputs, much as we used to send around Word and Excel files before 2005, when Google Docs taught us there was a better way.
<h2 class="wp-block-heading">The Rise of AI-Native PMs</h2>
In response to another audience question, we talked about Andreessen Horowitz’s claim that the world’s largest company might well be an AI healthtech company. How might someone in healthcare get into AI product management? Marily gave a powerful reminder: You don’t need to be an AI expert to get started. Now is the time. No matter what your job is today, you can learn, experiment, and build with AI. Lean into your healthcare expertise. She told a story from one of <a href="https://learning.oreilly.com/live-training/courses/~/0636920076705/" target="_blank" rel="noreferrer noopener">her product management live courses on the O’Reilly platform</a> that illustrated how one user had made the transition from a small hardware company into an AI healthtech opportunity at Apple.
<iframe loading="lazy" width="560" height="315" src="https://www.youtube.com/embed/tiN4iYGuJB0?si=wJUiIUv57_Iwpra2" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>
We both agreed: We’re still early. Despite all the hype about the current market leaders, today’s AI is barely scratching the surface. Some of today’s dominant players may not survive. So many killer AI-native applications haven’t been invented yet. The future of AI is still up for grabs, and it’s up to us to build it.
<hr class="wp-block-separator has-alpha-channel-opacity is-style-wide"/>
Thanks to Marily for sharing her expertise with us, and to all of the O’Reilly customers whose questions are such an important part of our live events, including this one.
<hr class="wp-block-separator has-alpha-channel-opacity is-style-wide"/>
AI tools are quickly moving beyond chat UX to sophisticated agent interactions. Our upcoming AI Codecon event, Coding for the Future Agentic World, will highlight how developers are already using agents to build innovative and effective AI-powered experiences. We hope you’ll join us on September 9 to explore the tools, workflows, and architectures defining the next era of programming. It’s free to attend. <a href="https://www.oreilly.com/AgenticWorld/" target="_blank" rel="noreferrer noopener">Register now to save your seat</a>.
]]></content:encoded>
</item>
</channel>
</rss>
<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/
Object Caching 240/246 objects using Memcached
Page Caching using Disk: Enhanced (Page is feed)
Minified using Memcached
Served from: www.oreilly.com @ 2025-08-25 17:21:05 by W3 Total Cache
-->

If you would like to create a banner that links to this page (i.e. this validation result), do the following:

Download the "valid RSS" banner.
Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)
Add this HTML to your page (change the image src attribute if necessary):

<a href="http://www.feedvalidator.org/check.cgi?url=https%3A//www.oreilly.com/radar/feed/index.xml"><img src="valid-rss-rogers.png" alt="[Valid RSS]" title="Validate my RSS feed" /></a>

If you would like to create a text link instead, here is the URL you can use:

http://www.feedvalidator.org/check.cgi?url=https%3A//www.oreilly.com/radar/feed/index.xml

Home · About · News · Docs · Terms