This is a valid RSS feed.
This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.
<content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/f ...
line 36, column 0: (12 occurrences) [help]
<content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/f ...
line 36, column 0: (11 occurrences) [help]
<content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/f ...
line 217, column 0: (9 occurrences) [help]
<p><a href="http://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-11-at ...
<content:encoded><![CDATA[<p><a data-flickr-embed="true" href="htt ...
<content:encoded><![CDATA[<p><a data-flickr-embed="true" href="htt ...
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
xmlns:georss="http://www.georss.org/georss"
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
>
<channel>
<title>tecosystems</title>
<atom:link href="https://redmonk.com/sogrady/feed/" rel="self" type="application/rss+xml" />
<link>https://redmonk.com/sogrady</link>
<description>because technology is just another ecosystem</description>
<lastBuildDate>Tue, 22 Oct 2024 14:02:21 +0000</lastBuildDate>
<language>en-US</language>
<sy:updatePeriod>
hourly </sy:updatePeriod>
<sy:updateFrequency>
1 </sy:updateFrequency>
<generator>https://wordpress.org/?v=6.6.1</generator>
<item>
<title>The Narrows Bridge: From Open Source to AI</title>
<link>https://redmonk.com/sogrady/2024/10/22/from-open-source-to-ai/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Tue, 22 Oct 2024 14:02:21 +0000</pubDate>
<category><![CDATA[AI]]></category>
<category><![CDATA[Open Source]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6055</guid>
<description><![CDATA[In November of 1938, construction began on the Tacoma Narrows bridge in Washington state. Twenty months later in July of 1940, it opened to traffic. Connecting the Tacoma and the Kitsap Peninsula west of Seattle, it was at the time the third longest suspension bridge in the world. From the first day of construction it]]></description>
<content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/files/2024/10/Opening_day_of_the_Tacoma_Narrows_Bridge_Tacoma_Washington.jpg"><img fetchpriority="high" decoding="async" src="http://redmonk.com/sogrady/files/2024/10/Opening_day_of_the_Tacoma_Narrows_Bridge_Tacoma_Washington.jpg" alt="" width="768" height="606" class="aligncenter size-full wp-image-6056" srcset="https://redmonk.com/sogrady/files/2024/10/Opening_day_of_the_Tacoma_Narrows_Bridge_Tacoma_Washington.jpg 768w, https://redmonk.com/sogrady/files/2024/10/Opening_day_of_the_Tacoma_Narrows_Bridge_Tacoma_Washington-300x237.jpg 300w, https://redmonk.com/sogrady/files/2024/10/Opening_day_of_the_Tacoma_Narrows_Bridge_Tacoma_Washington-480x379.jpg 480w" sizes="(max-width: 768px) 100vw, 768px" /></a></p>
<p>In November of 1938, construction began on the Tacoma Narrows bridge in Washington state. Twenty months later in July of 1940, it opened to traffic. Connecting the Tacoma and the Kitsap Peninsula west of Seattle, it was at the time the third longest suspension bridge in the world. From the first day of construction it was buffeted by high winds, winds that introduced substantial vertical movement into an engineering structure that generally tries to avoid such. Multiple efforts were made to mitigate these forces and keep them in check. It was an inauspicious beginning for an expensive and complex endeavor.</p>
<hr />
<p>Some sixty years after construction on that bridge began, a fraught and contentious debate in a then obscure corner of the technology industry resulted in both the term open source and the ten point <a href="https://opensource.org/osd">definition</a> that encapsulates it. While it was accorded little importance at the time, with the benefit of hindsight, this discussion amongst passionate but largely unrecognized technology advocates was of monumental historical importance. Over the nearly three decades since its inception, open source has grown from a seemingly utopian academic curiosity to industry default for large capital markets.</p>
<p>For all of its success, however, open source has been besieged in recent years by attackers from multiple fronts. Most notably, a group of vendors and investors seeking to commercialize open source have attempted to blur that definition to the point of <a href="https://redmonk.com/sogrady/2023/08/03/why-opensource-matters/">meaninglessness</a> and irrelevance – a conflict that continues today. More recently, however, open source has come under intense pressure as both reasonable and not so reasonable actors alike try to understand how the term applies to Artificial Intelligence (AI) systems and projects.</p>
<p>The term open source, of course, was originally coined to describe – and its corresponding definition was designed to apply to – source code. AI projects, however, are vastly broader in their scope. Source code is a component of AI projects, to be sure, but one among many, and in most cases not the most important. The varied other components, from data to parameters and weights, are functionally and legally quite distinct from source code. It is not clear, as but one example, whether copyright – the standard legal mechanism underpinning open source licenses – can be applied to embeddings, which are very long strings of numbers representing multidimensional transformations of various types of data.</p>
<p>Source code, in other words, is a precisely and narrowly bounded subject area. AI projects are not. Their scope blends software, data, techniques, biases and more. AI is inarguably a fundamentally different asset than software alone.</p>
<p>And yet, largely because vendors have been cavalierly throwing around the term open source to describe obviously non-open source projects that contain use restrictions that open source explicitly forbids, the Open Source Initiative (OSI) – stewards and defenders of the Open Source Definition (OSD) since 1998 – has been compelled to respond. The most egregious of the offenders in terms of misuse has been Meta, who has repeatedly described its Llama model as open source while omitting the fact that it imposes restrictions on usage, usage by competitors and so on. Restrictions that open source does not permit. Their behavior stands in stark contrast with their counterparts at Google, who have to their credit attempted to <a href="https://redmonk.com/sogrady/2024/02/26/ai-open-source/">hold the line</a> by being explicit that their Gemma model is open but that it does not meet the OSI’s definition of open source and therefore should not be considered such. Generally speaking, however, Meta’s careless approach is far more common than Google’s, which has resulted in pressure building on the OSI to reconsider its definition of what is and what is not open source within the new, arcane and rapidly evolving world of AI.</p>
<p>The entire debate about an open source AI definition (OSAID), then, has been driven by misuse and misrepresentation. It was at the same time implicitly predicated on a single core assumption: that open source and AI are, or can be made to be, compatible. Simply stated, the current process assumes that it is possible to achieve a definition of open source that is both consistent with long held open source ideals and community norms while being equally applicable and relevant to fast emerging AI projects and their various interested parties.</p>
<p>After months of observation and consideration of nascent AI projects, vendor efforts in the space and conversations with experts in the field as well as interested third party organizations, I no longer share that core assumption.</p>
<p>I do not believe the term open source can or should be extended into the AI world.</p>
<p>There are several problems with the application of open source to AI. These are arguably the most pressing.</p>
<h2>One of These Things is Not Like The Other</h2>
<p>As discussed above, software and AI are the proverbial apples and oranges – or perhaps more accurately, apples versus an apple pie. This by itself is, or should be, cause for concern. At its heart, the current deliberation around an open source definition for AI is an attempt to drag a term defined over two decades ago to describe a narrowly defined asset into the present to instead cover a brand new, far more complicated future set of artifacts. The risks of which are substantial. First, trying to bend the original open source definition and its principles to apply to AI has the potential to fall well short of fully circumscribing the new project assets in all of their complexity, which is bad. Worse, however, is the prospect of perceived shortcomings in the OSAID bleeding into the trust of and faith in the tried and true original OSD. The implications of that are far reaching and highly concerning.</p>
<p>To properly address the greater complexities of AI projects, the new OSAID would need to grapple with far more complicated and nuanced issues than are involved with mere source code, and in so doing it almost certainly would have to resort to compromise. Which the release candidates to date, in fact, have.</p>
<p>AI and source code are simply too different to be neatly managed side by side. The complexity of AI demands complexity of licensing, which brings us to the problem of nuance.</p>
<h2>If You’re Explaining, You’re Losing</h2>
<p>An open source AI definition will inevitably have key areas of contention, principally around data sharing and availability. Idealists seeking to preserve and protect the bedrock principles of open source, for example, argue that any model that doesn’t require training data is compromising the <a href="https://en.wikipedia.org/wiki/Free_and_open-source_software">four key freedoms</a> that the original open source definition satisfies. The OSI, for its part, contends that in discussions with various AI researchers, their consensus opinion is that the weights are more important than the original training data. That position may or may not be correct. What is definitely true is that even if that assertion is correct, it is a nuanced position that is counterintuitive and requires lengthy explanation.</p>
<p>Similarly, data brings with it a host of legal complications – complications that are without precedent in the world of pure source code. Source code, for example, is inarguably less conflicted from a legal standpoint than, as but one example, medical data used to train AI models intended to assist in the early detection of cancer. The OSI’s approach to this – similar to what the Linux Foundation is trying to do with the Open Data Product Specification – is categorizing the various types of data. In the OSI’s case, there are four buckets of data: open, public, obtainable and unshareable. These are superficially self-descriptive, but also subtle, slippery and legalistic distinctions. Which means, ultimately, that they require nuance to be understood.</p>
<p>In more simple terms, both with respect to the availability of data and the types of data being made available, or not, the OSI is trying to thread a needle of balancing open source’s legacy of making everything available with the messy reality that is data availability. The idealists want training data required for obvious reasons. The pragmatic path, unfortunately, involves substantial compromise and, more problematically, requires explanation to be understood. And as the old political adage advises: “If you’re explaining, you’re losing.”</p>
<p>This is particularly true in this case, because in contrast to the OSI’s complicated position, critics have a simple and easy case to make – one made for easy black and white headlines and stories on Hacker News: if you’re not fully satisfying the four freedoms, you’re not open source. The reality might be that, at least in the case of large foundational models, even if all of the training data was made available, the number of entities that could leverage it and build their own replacements could be counted on one hand – two or three at the most. But reality does not determine perception.</p>
<p>To illustrate this, consider these two potential headlines:</p>
<ul>
<li>“The OSI’s Open Source Definition (OSD) mandates the release of all source code. Their Open Source AI Definition (OSAID), on the other hand, does not require the release of all the training data.” </li>
</ul>
<p>Versus</p>
<ul>
<li>“The OSI says that they <em>want</em> all training data released, but that requiring it would be problematic because of legal complexity and difficult to actually leverage given the dataset size. To clarify, they’ve created four different categories of data which they go into in detail here…” </li>
</ul>
<p>Optically, then, the pragmatic path is a minefield. One of the things that most RedMonk clients have heard at some point is: “the market generally has no ability to appreciate nuance.” In a world in which professional technology industry reporters are unable to distinguish between genuine open source – is it on this list of approved licenses or not? – and objectively non-open source, as in the cases of licenses which are open except when they are not, there is essentially no chance licenses which depend on levels and shifting definitions will be correctly interpreted.</p>
<p>The need to rely on nuance, then, to explain the OSAID seems inherently problematic.</p>
<h2>An Open Source AI Definition is…Mandatory?</h2>
<p>As described above, implicit in this entire multi-year process is an assumption that an open source definition that will satisfy a required consensus of parties is possible. There is, however, a second assumption underlying that, which is that AI requires a revised and updated open source definition. The most straightforward articulation of this idea arrives courtesy of <a href="https://allthingsopen.org/articles/the-open-source-ai-definition-why-we-need-it">Mark Collier</a>, who said:</p>
<blockquote><p>
This brings me to the Open Source AI Definition (OSAID), an effort organized by OSI over the past two years, which I have participated in alongside others from both traditional open source backgrounds and AI experts. It is often said that naming things is the hardest problem in software engineering, but in this case we started with a name, “Open Source AI,” and set out to define it.
</p></blockquote>
<p>This position is certainly understandable. Rogue actors such as Meta have been abusing the term open source, when they are well aware that the arbitrary use restrictions (competitors can’t use it, you can’t use it to do certain things, etc) attached to the license make it clearly and unambiguously not open source. Meta’s actions are bad enough, but arguably worse are the columnists that have enabled Meta’s behavior by victim blaming. In the absence of an OSI AI definition – which as argued above may not be actually achievable – these writers bafflingly hold the OSI responsible for Meta’s behavior.</p>
<p>In the face of this continuous assault on the OSD, then, the obvious response is for the OSI to respond to this willful misuse by way of an updated definition with more clarity and industry buy in.</p>
<p>Or is it?</p>
<p>Given the fact that Meta paid no attention whatsoever to the original definition which had been industry consensus for years, it’s not clear that an updated definition – again assuming, potentially counterfactually, that one is achievable – would change their behavior. That seems like a questionable assumption, particularly given the downsides discussed above if the effort is unsuccessful.</p>
<p>The default path as described above was an updated open source definition, because it was implicitly assumed that that was the only option on the table. But what if it wasn’t?</p>
<h1>The Road Not Traveled</h1>
<p>What if, instead of trying to bend and reshape a decades old definition intended to describe one asset to encompass another, wildly different asset, we instead abandoned that approach entirely?</p>
<p>On the one hand, for parties that have been fighting for the better part of two years to thread the needle between idealism and capitalism to arrive at an ideologically sound and yet commercially acceptable definition of open source AI, the idea of abandoning the effort will presumably seem horrifying and a non-starter.</p>
<p>This is not the first time that outside parties have sought to reshape or redefine open source, however.</p>
<p>For several years, there has been a desire on the part of some to bring open source into greater alignment with commercial interests – even at the expense of core open source principles. The response from the wider open source community, however, was rejection. The belief was then and is now that trying to shoehorn specific commercial protections into the term open source would fatally compromise it. Instead, just as when prior efforts to bend open source to other new and <a href="https://medium.com/@stephenrwalli/software-freedom-in-a-post-open-source-world-9f497f646af9">ultimately incompatible goals around ethical source</a>, those who sought to change open source were told instead to find <a href="https://x.com/adamhjk/status/1687113805237714944">a new home</a>, a new term and a new definition for what they were building. The end result of which was “<a href="https://fair.io/">Fair Source</a>,” a new, from scratch term that borrowed some ideals from open source but is entirely its own new and unique brand.</p>
<p>What if AI followed that path? The industry’s massive cumulative efforts to date to define what and how open source principles might apply to AI need not be wasted. They could instead be repurposed behind a new, clean slate term of choice – one that accurately conveys the portions of a model that are open, while not falsely advertising its features by applying the open source brand to non-open source assets.</p>
<p>Naming is, as Collier mentions above, the hardest problem in software engineering, and so coming up with a new, alternative term for open source AI would not be a simple exercise. It seems likely, however, that it would be both simpler and more achievable than coming up with a definition of open source AI that might minimally satisfy both the idealists and the pragmatists.</p>
<p>The only way that this would work, of course, would be if sufficient momentum could be assembled behind it. This would require the support of multiple, conflicting parties. Here are a few arguments in favor of a new term for each:</p>
<p><strong>Idealists</strong>:</p>
<ul>
<li>Assuming some industry consensus could be achieved around a new brand – greater consensus, at least, than is behind the current OSAID release candidate – pressure on the term open source would immediately begin to decline. One “defense” of Meta’s behavior at present is that there is no accepted definition for open projects. A new definition, even without the open source branding, eliminates that argument. Second, and perhaps more importantly for idealists, if open source and the four freedoms are no longer explicitly invoked by the license, there’s a lessened need to be so strict about what’s in and what’s out. Idealists instead could center around protecting the original, source code derived definition of open source while attempting to clearly differentiate it from the new, AI-centric term of choice. </li>
</ul>
<p><strong>Pragmatists</strong>:</p>
<ul>
<li>Those who have been most willing to compromise in this process to accommodate the vagaries of, as but one example, data licensing would no longer be fighting against the reputation and legacy of the OSD and the four freedoms. It would instead be an opportunity to start fresh, informed by open source but not beholden to it. Pragmatists would have more room to maneuver in their efforts to find a license that balances openness with a desire to maximize uptake of the license, avoiding the worst case scenario of a strict definition that few or no models satisfy. </li>
</ul>
<p><strong>Vendors</strong>:</p>
<ul>
<li>Assuming again that some level of industry consensus could be achieved, they could receive a similar if not exactly equivalent level of marketing benefit from the new definition, without the corresponding costs of constant criticism from open source communities for their willful misuse of that term. In a world in which the various large, AI players are able to agree on a) both a clear and understandable model for achieving a certain level of openness as determined by the OSI and b) a willingness to put their marketing weight behind the new brand, marketing becomes at once a simpler and less fraught exercise.</li>
</ul>
<p><strong>The OSI</strong>:</p>
<ul>
<li>The OSI, its members and various participants have been tearing each other apart for well over a year trying to achieve an outcome that is, in all likelihood, not achievable. While redirecting current efforts into the creation of a new AI-centric brand might seem like surrender, it would more accurately be described as being flexible and adaptive rather than rigid and hidebound. Given a landscape in which the forces arrayed against the new license seem to be stronger than those <a href="https://opensource.org/ai/endorsements">supporting</a> it, preemptively eliminating an entire line of attack while creating the space to introduce a new brand for a new technology area that would protect your existing brand from fall out is nothing more or less than the most logical course of action moving forward. </li>
</ul>
<hr />
<p>As mentioned above, the Tacoma Narrows bridge opened to the public in July of 1940. A mere four months later, systemic forces – 40+ MPH winds in this case – stressed the structure to the point that its concrete and metal surface began to <a href="https://en.wikipedia.org/wiki/File:Tacoma_Narrows_Bridge_destruction.ogv">ripple and twist</a> like a ribbon. A little over an hour later, the bridge collapsed entirely. Its spectacular destruction and the engineering failure to account for the forces that destroyed it has led to it being an object lesson for engineers to this day.</p>
<p>If the OSI chooses to stay the course with the OSAID, I will personally do everything in my power to help it succeed. But I fear that, as with the Narrows bridge, the fault lines are already on display, and it will not survive the high winds that are sure to be in its future.</p>
<p>Better to choose a new bespoke name and brand, one specifically tailored to suit the unique and dynamic challenges of the new technology. As for what that name might be, I’m relatively indifferent. Public AI was one option floated on a recent call and that has promise, or some flavor of open model / weights might work – though as one participant in the current process pointed out, the potential for overlap and confusion might make those less than ideal. This industry is generally bad at naming, so this exercise might be no exception. The name, however, is less important than consensus. As Abraham Lincoln said, “Public sentiment is everything. With public sentiment, nothing can fail.”</p>
<p>However they proceed, I wish the OSI luck in their thankless task and hope they find a solid bridge with which to bring the spirit of open source forward into the world of AI.</p>
]]></content:encoded>
<enclosure url="https://en.wikipedia.org/wiki/File:Tacoma_Narrows_Bridge_destruction.ogv" length="0" type="video/ogg" />
</item>
<item>
<title>The RedMonk Programming Language Rankings: June 2024</title>
<link>https://redmonk.com/sogrady/2024/09/12/language-rankings-6-24/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Thu, 12 Sep 2024 19:22:17 +0000</pubDate>
<category><![CDATA[Programming Languages]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6052</guid>
<description><![CDATA[This iteration of the RedMonk programming Language Rankings is brought to you by Amazon Web Services. AWS manages a variety of developer communities where you can join and learn more about building modern applications in your preferred language. As has become typical in recent years, our Q3 programming language rankings are arriving a few months]]></description>
<content:encoded><![CDATA[<blockquote><p>
This iteration of the RedMonk programming Language Rankings is brought to you by Amazon Web Services. AWS manages a variety of <a href="https://aws.amazon.com/developer/community">developer communities</a> where you can join and learn more about building modern applications in your preferred language.
</p></blockquote>
<p>As has become typical in recent years, our Q3 programming language rankings are arriving a few months late. Unlike the last run when that was primarily due to the anomalous results we observed, this quarter it’s more attributable to summer vacation schedules. We’re still waiting to see what the longer term implications of coding assistants will be on these rankings, but at least for the present we’re continuing with the exercise as it continues to identify trends for us in the market. Trends which we’ll get to shortly.</p>
<p>In the meantime, however, as a reminder, this work is a continuation of the work originally performed by Drew Conway and John Myles White late in 2010. While the specific means of collection has changed, the basic process remains the same: we extract language rankings from GitHub and Stack Overflow, and combine them for a ranking that attempts to reflect both code (GitHub) and discussion (Stack Overflow) traction. The idea is not to offer a statistically valid representation of current usage, but rather to correlate language discussion and usage in an effort to extract insights into potential future adoption trends.</p>
<h1>Our Current Process</h1>
<p>The data source used for the GitHub portion of the analysis is the GitHub Archive. We query languages by pull request in a manner similar to the one GitHub used to assemble the State of the Octoverse. Our query is designed to be as comparable as possible to the previous process.</p>
<ul>
<li>Language is based on the base repository language. While this continues to have the caveats outlined below, it does have the benefit of cohesion with our previous methodology.</li>
<li>We exclude forked repos.</li>
<li>We use the aggregated history to determine ranking (though based on the table structure changes this can no longer be accomplished via a single query.)</li>
<li>For Stack Overflow, we simply collect the required metrics using their useful data explorer tool.</li>
</ul>
<p>With that description out of the way, please keep in mind the other usual caveats.</p>
<ul>
<li>To be included in this analysis, a language must be observable within both GitHub and Stack Overflow. If a given language is not present in this analysis, that’s why.</li>
<li>No claims are made here that these rankings are representative of general usage more broadly. They are nothing more or less than an examination of the correlation between two populations we believe to be predictive of future use, hence their value.</li>
<li>There are many potential communities that could be surveyed for this analysis. GitHub and Stack Overflow are used here first because of their size and second because of their public exposure of the data necessary for the analysis. We encourage, however, interested parties to perform their own analyses using other sources.</li>
<li>All numerical rankings should be taken with a grain of salt. We rank by numbers here strictly for the sake of interest. In general, the numerical ranking is substantially less relevant than the language’s tier or grouping. In many cases, one spot on the list is not distinguishable from the next. The separation between language tiers on the plot, however, is generally representative of substantial differences in relative popularity.</li>
<li>In addition, the further down the rankings one goes, the less data available to rank languages by. Beyond the top tiers of languages, depending on the snapshot, the amount of data to assess is minute, and the actual placement of languages becomes less reliable the further down the list one proceeds.</li>
<li>Languages that have communities based outside of Stack Overflow such as Mathematica will be under-represented on that axis. It is not possible to scale a process that measures one hundred different community sites, both because many do not have public metrics available and because measuring different community sites against one another is not statistically valid.</li>
</ul>
<p>With that, here is the first quarter plot for 2024.</p>
<p><a href="http://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_.png"><img decoding="async" src="http://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-1024x845.png" alt="" width="1024" height="845" class="aligncenter size-large wp-image-6053" srcset="https://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-1024x845.png 1024w, https://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-300x247.png 300w, https://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-768x633.png 768w, https://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-1536x1267.png 1536w, https://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-2048x1689.png 2048w, https://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-480x396.png 480w, https://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-760x627.png 760w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></p>
<p>1 JavaScript<br />
2 Python<br />
3 Java<br />
4 PHP<br />
5 C#<br />
6 TypeScript<br />
7 CSS<br />
7 C++<br />
9 Ruby<br />
10 C<br />
11 Swift<br />
12 Go<br />
12 R<br />
14 Shell<br />
14 Kotlin<br />
14 Scala<br />
17 Objective-C<br />
18 PowerShell<br />
19 Rust<br />
19 Dart</p>
<p>The Top 20, in a fashion that has become typical in recent years, was not entirely devoid of movement, but nearly so. Outside of CSS moving down a spot and C++ moving up one, the Top 10 was unchanged. And even in the back half of the rankings, where languages tend to be less entrenched and movement is more common, only three languages moved at all.</p>
<p>While these rankings are, as explicitly acknowledged above, not intended to be an accurate representation of typical enterprise language usage – because the data is not available to measure that – what it is clearly evidence of is a landscape resistant to change. There are a few signs of languages following in TypeScript’s footsteps and working their way up the path, both in the Top 20 and at the back end of the Top 100 as we’ll discuss shortly, but they’re the exception that proves the rule.</p>
<p>It’s possible that we’ll see more fluid usage of languages, and increased usage of code assistants would theoretically make that much more likely, but at this point it’s a fairly static status quo.</p>
<p>With that, some results of note:</p>
<ul>
<li><strong>TypeScript</strong> (6): technically TypeScript didn’t move, as it was ranked sixth in our last run, but this is the first quarter in which is has been the sole occupant of that spot. CSS, in this case, dropped one place to seven leaving TypeScript just outside the Top 5. It will be interesting to see whether or not it has more momentum to expend or whether it’s topped out for the time being. </li>
<li><strong>Kotlin</strong> (14) / <strong>Scala</strong> (14): both of these JVM-based languages jumped up a couple of spots – two spots in Scala’s case and three for Kotlin. Scala’s rise is notable because it had been on something of a downward trajectory from a one time high of 12th, and Kotlin’s placement is a mild surprise because it had spent three consecutive runs not budging from 17, only to make the jump now. The tie here, meanwhile, is interesting because Scala’s long history gives it an accretive advantage over Kotlin’s more recent development, but in any case the combination is evidence of the continued staying power of the JVM. </li>
<li><strong>Objective C</strong> (17): speaking of downward trajectories and the 17th placement on this list, Objective C’s slide that began in mid-2018 continued and left the language with its lowest placement in these rankings to date at 17. That’s still an enormously impressive achievement, of course, and there are dozens of languages that would trade their usage for Objective C’s, but the direction of travel seems clear. </li>
<li><strong>Dart</strong> (19) / <strong>Rust</strong> (19): while once grouped with Kotlin as up and coming languages driven by differing incentives and trends, Dart and Rust have not been able to match the ascent of their counterpart with five straight quarters of no movement. That’s not necessarily a negative; as with Objective C, these are still highly popular languages and communities, but it’s worth questioning whether new momentum will arrive and from where, particularly because the communities are experiencing <a href="https://arstechnica.com/gadgets/2024/09/rust-in-linux-lead-retires-rather-than-deal-with-more-nontechnical-nonsense/">some friction</a> in growing their usage. </li>
<li><strong>Ballerina</strong> (61) / <strong>Bicep</strong> (78) / <strong>Grain</strong> / <strong>Moonbit</strong> / <strong>Zig</strong> (87): as discussed during last quarter’s run, we’re keeping an eye on Bicep, Grain, Moonbit and Zig among others because of what they represent: an unusually visible cloud DSL, two languages optimized for WebAssembly and then a language that follows in the footsteps of C++ and Rust. Grain and Moonbit still haven’t made it into the Top 100, but Bicep jumped eight spots to 78 and Zig 10 to 87. That progress pales next to Ballerina, however, which jumped from 80 to 61 this quarter. The general purpose language from WS02, thus, is added to the list of potential up and comers we’re keeping an eye on.</li>
</ul>
<p><strong>Disclosure</strong>: WS02 is not currently a RedMonk client.</p>
<p><strong>Credit</strong>: My colleague <a href="https://redmonk.com/rstephens/">Rachel Stephens</a> wrote the queries that are responsible for the GitHub axis in these rankings. She is also responsible for the query design for the Stack Overflow data.</p>
]]></content:encoded>
</item>
<item>
<title>The Post-Valkey World</title>
<link>https://redmonk.com/sogrady/2024/07/16/post-valkey-world/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Tue, 16 Jul 2024 13:35:07 +0000</pubDate>
<category><![CDATA[Databases]]></category>
<category><![CDATA[Open Source]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6048</guid>
<description><![CDATA[Six years ago in August, Redis – then known as Redis Labs – applied a new license called the Commons Clause to a set of modules, or extensions to the core Redis database. The practical import of this clause was to override the existing open source license to render the software effectively proprietary and source]]></description>
<content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-16-at-09.12.49@2x.png"><img decoding="async" src="http://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-16-at-09.12.49@2x-1024x502.png" alt="" width="1024" height="502" class="aligncenter size-large wp-image-6049" srcset="https://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-16-at-09.12.49@2x-1024x502.png 1024w, https://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-16-at-09.12.49@2x-300x147.png 300w, https://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-16-at-09.12.49@2x-768x376.png 768w, https://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-16-at-09.12.49@2x-1536x753.png 1536w, https://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-16-at-09.12.49@2x-480x235.png 480w, https://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-16-at-09.12.49@2x-1200x588.png 1200w, https://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-16-at-09.12.49@2x.png 1702w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></p>
<p>Six years ago in August, Redis – then known as Redis Labs – applied a new license called the <a href="https://redmonk.com/sogrady/2018/09/10/tragedy-of-the-commons-clause/">Commons Clause</a> to a set of modules, or extensions to the core Redis database. The practical import of this clause was to override the existing open source license to render the software effectively proprietary and source available while still leveraging the brand of the original open source license.</p>
<p>This was not the first commercial open source organization to relicense its assets, or even the first to use that particular license (though Redis <a href="https://www.zdnet.com/article/redis-labs-drops-commons-clause-for-a-new-license/">retired</a> the Commons Clause license a year later). Neo4J preceded them in that. Nor was it the last time an existing project swapped an open source license for a more restrictive, proprietary alternative. Mongo followed suit two months later exchanging the AGPL for the SSPL, and Confluent (Apache to Confluent Community License) and Timescale (Apache/Timescale License) followed two months after that, a pattern which has continued into the present.</p>
<p>While the initial costs from a public relations and community outrage perspective were high, each successive act of relicensing helped smooth the path for the next. Implicit in this pattern was a normalization of the once controversial act of using open source to reach ubiquity and then unilaterally changing the rules of access in an effort to achieve economic exclusivity. So regular had this practice become, in fact, that project founders began fielding questions from potential investors about when, rather than if, they would relicense their project.</p>
<p>Such is the state of affairs at present. In more and more cases, the accepted strategy for open source commercialization is to leverage open source to achieve breadth, at which point the license is abandoned in favor of something that imposes the kinds of restrictions open source expressly and explicitly forbids.</p>
<p>It may, however, be more accurate to say that that <em>was</em> the state of affairs. It’s too early to say definitively, but there are signs at least that the post-Valkey world is meaningfully different than the world before its release.</p>
<h3>Forks</h3>
<p>Before considering the Valkey fork specifically, it’s worth establishing some general context around forks.</p>
<p><strong>First</strong>, forks have been around as long as open source has. From Unix to Linux to PostgreSQL to MySQL, the history of open source is littered with forks. In recent decades, however, three things have typically been true of forks.</p>
<ul>
<li>First, they were broadly considered a nuclear option of last resort.</li>
<li>Second, with rare exceptions like the 2014 Node/io.js split, few forks attracted much attention.</li>
<li>Lastly, because of the first and second points, almost regardless of the context of why a project was forked in the first place, the safe bet was always in favor of the original project and against the fork. </li>
</ul>
<p>Valkey, however, appears to be a different animal altogether.</p>
<p>First, there’s the speed with which the project came together. For those unfamiliar with the history, six years after re-licensing external modules, Redis re-licensed the core database itself on <a href="https://redis.io/blog/redis-adopts-dual-source-available-licensing/">March 20</a>. Specifically, it retired the permissive three clause BSD license in favor of a dual proprietary source available license or the Server Side Public License (SSPL) originally created by Mongo. What this meant in practice was that the permissively licensed database used, contributed to and supported by a variety of players was now effectively a source available, proprietary database.</p>
<p>Eight days later, the Linux Foundation <a href="https://www.linuxfoundation.org/press/linux-foundation-launches-open-source-valkey-community">announced</a> the launch of a fork. An eight day timeline for a project of this size and scope is shocking. An eight day timeline to announce a project of this scope that includes names like AWS, Google and Oracle of all vendors is an event without any obvious precedent. Even setting the requisite legal approvals aside, naming and branding take time – as they did in this case, apparently.</p>
<p>Second, there are the aforementioned brands. One of the primary reasons most forks do not succeed is that they fail to gain participatory critical mass quickly enough.</p>
<p>Consider the example of OpenSearch. The same year that Redis relicensed their modules, Elastic decided against relicensing the Elasticsearch project (though they ultimately reversed course three years later) in favor of co-mingling open source and proprietary assets in their core repo and the resulting builds in an attempt to make it more difficult for third parties like AWS to leverage the open source code. In response, AWS went with the nuclear option and <a href="https://redmonk.com/sogrady/2019/03/15/cloud-open-source-powder-keg/">forked the Elasticsearch project</a> into OpenSearch, sending shockwaves through the industry.</p>
<p>While OpenSearch is still an active, supported project with substantial usage, however, it has to date failed to make much of a dent in Elasticsearch’s traction. There are many reasons for this, but perhaps the most important is the lack of perceived widespread industry support. AWS has, to date, not gotten any of its hyperscale competitors to commit to the project. Which means that enterprise CIOs evaluating Elasticsearch and OpenSearch have a simple and straightforward calculation to make: who is more likely to deliver them greater innovation over the longer term? AWS – where OpenSearch is one amongst hundreds of projects – and an assortment of smaller, mostly less visible startups? Or Elastic, a $12B company singularly focused on the project and synonymous with search?</p>
<p>Valkey, by comparison, offers a much more nuanced calculation. Redis, obviously, has a deep and unmatched knowledge of the product. But if you are a buying executive at a major enterprise, and you’re choosing between a company yet to go public, and an offering backed by multiple hyperscale options, smaller clouds like Digital Ocean and Heroku not to mention database stalwarts like Oracle or vendors like Percona that will offer on premise support, that is a more difficult calculation.</p>
<p>Which brings us to the last, and arguably most notable, distinction: the attention. The day that Valkey was announced, I posted a link to the press release, with an anodyne caption reading “ok, you have my attention.” That aggressively bland tweet saw almost 150 thousand views – or roughly 15X my follower count. That level of attention, particularly for something as basic as a press release, is extremely unusual.</p>
<p><a href="http://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-11-at-11.35.29@2x.png"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-11-at-11.35.29@2x-1024x803.png" alt="" width="1024" height="803" class="aligncenter size-large wp-image-6050" srcset="https://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-11-at-11.35.29@2x-1024x803.png 1024w, https://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-11-at-11.35.29@2x-300x235.png 300w, https://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-11-at-11.35.29@2x-768x603.png 768w, https://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-11-at-11.35.29@2x-480x377.png 480w, https://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-11-at-11.35.29@2x-799x627.png 799w, https://redmonk.com/sogrady/files/2024/07/CleanShot-2024-07-11-at-11.35.29@2x.png 1198w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></p>
<p>As has been the level of inquiries RedMonk has received about Valkey. Unlike forks that have preceded it, which have to try and manufacture interest and attention, Valkey is a constant source of curiosity for clients, and questions. What do we make of it? What do the metrics tell us? Are we getting asked about it? Why do people seem to care about it? Does it feel different to us as well? Do we believe the project has staying power? And perhaps the most important question: might Valkey change the calculus for other prominent forks like OpenBao or OpenTofu?</p>
<h3>What Does it Mean?</h3>
<p>Objectively speaking, we can’t answer that question yet. It’s too soon to say. On the one hand, as of yesterday the Valkey Docker image had been pulled ~193K times in just a few months. On the other, Redis has been pulled billions of times since the project’s inception. Neither of which number tells us much since they’re accretive metrics not amenable to comparison.</p>
<p>Likewise, Valkey has shown greater activity levels versus Redis over the last 30 days. For example, per GitHub, Valkey has seen:</p>
<ul>
<li>~2X the number of merged pull requests versus Redis</li>
<li>~3X the number of open requests</li>
<li>Roughly the same number of closed issues</li>
<li>~2X the number of project authors</li>
<li>~6.5X the number of code additions</li>
<li>~4X the number of code deletions</li>
</ul>
<p>Superficially, that seems to tell a clear story. But the reality is that forks necessarily involve a great deal of initial logistical blocking and tackling, and that will inevitably boost project metrics in the short term. Also, it’s merely a month’s worth of data.</p>
<p>The short version of the above is that it’s too early to draw any real conclusions, and that the next several quarters will tell us what Valkey actually is and can accomplish. All that we can say definitively at this point is that Valkey appears to be a viable project with anomalous visibility for a fork.</p>
<p>There are suggestions, however, that Valkey’s impact could be broad. In the short term, it has already had a ripple effect on behaviors. In at least one instance, the interest Valkey has attracted has contributed to a reversal of a seemingly set decision regarding a licensing adjacent question. Elsewhere, the tone and tenor of licensing conversations RedMonk is having is materially different than it was five months ago.</p>
<p>In the longer term, Valkey represents – whatever the project’s ultimate fate might be – the first real, major pushback from a market standpoint against the prevailing relicensing trend.</p>
<p>To be clear, no one should get carried away and expect Valkeys to begin popping up everywhere. It’s important to note that there are many variables that impact the friction involved in forking a project and the viability of sustaining it long term. Some projects are easier to fork than others, unquestionably, and Redis – if only because it was a project with many external contributors – was lower friction than some.</p>
<p>Not every project that is re-licensed can or will be forked. But investors, boards and leadership that are pushing for re-licensing as a core strategy will, moving forward, have to seriously consider the possibility of a fork as a potentially more meaningful cost. Where would be re-licensors previously expected no major project consequences from their actions, the prospect of a Valkey-like response is a new consideration.</p>
<p>In recent years the primary costs to the re-licensing of a given project have been developer unrest, buyer irritation (and in some cases additional legal guarantees) and a weather-able period of bad headlines. The uniqueness of Valkey and its potential to reinvigorate projects like OpenBao and OpenTofu, however, signal that that period may be over.</p>
<p><strong>Disclosure</strong>: Amazon, Google, Heroku (Salesforce), MongoDB, Neo4J, Oracle, Percona and Redis are RedMonk customers. Confluent, Digital Ocean and Timescale are not currently RedMonk customers.</p>
]]></content:encoded>
</item>
<item>
<title>AI Conundrums</title>
<link>https://redmonk.com/sogrady/2024/07/03/ai-conundrums/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Wed, 03 Jul 2024 18:05:11 +0000</pubDate>
<category><![CDATA[AI]]></category>
<category><![CDATA[Open Source]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6044</guid>
<description><![CDATA[As the industry’s various AI markets continue their trundling paths towards maturity, not to mention the trough of disillusionment and the attendant and growing AI exhaustion, patterns have continued to unfold as has been discussed previously. The end game of many of these is and has been clear from the start. As but one example,]]></description>
<content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/files/2024/07/alexander-cuts-the-gordian-knot-563f2d.jpg"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2024/07/alexander-cuts-the-gordian-knot-563f2d.jpg" alt="" width="640" height="500" class="aligncenter size-full wp-image-6045" srcset="https://redmonk.com/sogrady/files/2024/07/alexander-cuts-the-gordian-knot-563f2d.jpg 640w, https://redmonk.com/sogrady/files/2024/07/alexander-cuts-the-gordian-knot-563f2d-300x234.jpg 300w, https://redmonk.com/sogrady/files/2024/07/alexander-cuts-the-gordian-knot-563f2d-480x375.jpg 480w" sizes="(max-width: 640px) 100vw, 640px" /></a></p>
<p>As the industry’s various AI markets continue their trundling paths towards maturity, not to mention the trough of disillusionment and the attendant and growing AI exhaustion, patterns have continued to unfold as has been <a href="https://redmonk.com/sogrady/2024/05/29/ai-patterns/">discussed previously</a>. The end game of many of these is and has been clear from the start. As but one example, enterprises are proving predictably unwilling to trust newly minted AI startups characterized by a startling lack of process and legal documentation with their private corporate data.</p>
<p>In other cases, that is far from the case. The following are brief thoughts on three such cases in which the answer isn’t obvious, and in at least one case, may not exist at all.</p>
<h2>Which Model for Which User?</h2>
<p>As noted <a href="https://redmonk.com/sogrady/2024/05/29/ai-patterns/">last month</a>, for all of the understandable industry focus and attention paid to large, expansively capable models, there are important roles to play for more limited medium and even small models. As conversations proceed beyond just the model capability and issues about data exfiltration and token costs come to the fore, things become more complicated. Couple those concerns with enterprises focused on more narrowly drawn use cases, and the state of the art large models aren’t the obvious and preordained winners that industry perception might suggest.</p>
<p>What this means, in practical terms, is that the large commercial models like ChatGPT, Claude, Gemini and others that their respective vendors would dearly like to sell to less price-sensitive large enterprises look in some cases less useful or cost-effective than smaller, cheaper models that can be trained more easily to solve very specific problems. Which presents an interesting economics problem, because as fast as the individual consumer business of players like ChatGPT has grown – which is very fast indeed, and $20 a month at scale isn’t nothing – the vendors are never going to get rich off the consumer market. Or even, in all likelihood, cover their exorbitant chip-driven hardware costs.</p>
<p>One early hypothesis of the market has been that individuals would end up using lower end, cheaper models while enterprises would require the largest, most state of the art and highly trained. Instead, the reality in many cases may very well be the opposite.</p>
<h2>AI Gateways: The New Default Interface?</h2>
<p>One of the least surprising developments of the AI space to date has been the emergence of AI gateways from players like Cloudflare, Fastly or Kong. Much like their API gateway predecessors, AI gateways are instantiated in between users and AI endpoints like OpenAI’s. The primary justification to date for these have been issues like improving query performance or preventing the wild escalation of token based costs.</p>
<p>One potential use case that has generally been under appreciated to date has been that of: the new AI interface abstraction.</p>
<p>AI models, at present, are proliferating wildly. Businesses are experimenting with multiple models, attempting to arrive at a workable strategic approach that delivers required capabilities while eliminating the possibility of data exfiltration while also minimizing costs. Users, meanwhile, are actively and aggressively imprinting on particular models – even models, in many cases, that they have been forbidden to use.</p>
<p>What if an interface existed, however, which disintermediated the user interface from the model behind it? What if a single interface could deliver unified access to both large, public models and internal private ones – much as someone might use Ollama on a laptop – but in a scalable, enterprise-friendly way?</p>
<p>Enterprises would potentially tick boxes like public/private capabilities, scalable, centralized data on usage and therefore costs, as well as benefits in compliance and other areas. Users, for their part, would potentially get a single interface with the capability of every model.</p>
<p>AI gateways already have many of the requisite capabilities to execute on this functionally, if not the messaging and vision.</p>
<p>Model vendors would presumably resist this sort of commoditization and disintermediation, preferring direct routes to their customers. But short of knee-capping a given vendors APIs, it would be difficult to deny users their interface of choice for fear of limiting its addressable market.</p>
<h2>Open Source, AI and Data</h2>
<p>At present, the OSI is working aggressively to try and finalize a proposed draft definition of open source as it pertains to AI technologies. This is necessary because while it’s easy to understand how copyright-based licenses apply to software, it’s much more difficult to determine where, how and even whether it applies to the unique combination of software, data, inferences, embeddings and so on that makes up large AI models.</p>
<p>While there are a range of issues, one of the most pressing at present concerns training data. More specifically, whether or not training data needs to be included alongside other project components like software to be considered “open source.”</p>
<p>The current OSI draft of the definition does not mandate release of the accompanying training data. It instead requires:</p>
<blockquote><p>
“Sufficiently detailed information about the data used to train the system, so that a skilled person can recreate a substantially equivalent system using the same or similar data.”
</p></blockquote>
<p>Many smart and reasonable individuals with decades of open source experience regard this as insufficient. Jason Warner, CEO of Poolside, <a href="https://www.youtube.com/watch?v=fvS-7vkbhTY&ab_channel=RedMonk">discussed</a> the importance of training data with us recently. Julia Ferraioli, meanwhile, explicitly <a href="https://www.juliaferraioli.com/blog/2024/on-open-source-ai/">made the case</a> last week that the current definition falls short, arguing in essence that without the data – and a sure way to replicate it – the draft AI definition cannot fulfill the four key freedoms that the traditional OSI definition does.</p>
<p>This is correct. If data is key to AI, and data may or not be replicated per these more lenient terms, then while users would be able to use or distribute the system, clearly the draft definition could not guarantee the right to study the full system or modify it deeply.</p>
<p>Clearly, then, an OSI definition that does not require the inclusion of training data is problematic. A definition that requires full training set availability, however, is arguably equally problematic.</p>
<h3>Practicality</h3>
<p>First, and most obvious, is practicality. For large models that are trained on datasets the size of the internet, dropping them in a repo the way that we would with source code is challenging individually and essentially impossible at scale. It’s also not clear that datasets of that scale would be practically navigable or could be reasonably evaluated.</p>
<h3>Legality</h3>
<p>The second issue is legality. Obviously, there are reasons for commercial parties to not want to release their training data. Some of those reasons might be that they are training on data of questionable legality.</p>
<p>But setting simple cases like that aside, the questions here are anything but simple.</p>
<p>Many years ago, we were briefed on a particular data use case in which two separate datasets that could not legally be combined on disk, were instead leveraged in the only fashion the lawyers authorized – in memory. According to the way the licenses to these datasets were written, disk meant spinning disks – which was regarded as legally distinct from RAM.</p>
<p>When law intersects with data, in other words, things get complicated quickly. It’s not clear, in fact, whether requiring the release of data in a license is itself legal. Questions like that are perhaps best deferred to actual lawyers like Luis Villa and Van Lindbergh. But even with resources and input from such experts helping to illuminate and clarify some of the questions around all of this, there will presumably be a large number of corner cases with no simple answer.</p>
<p>Authors may rely on training data that cannot be legally released under an open source license. More commonly, they may rely on training data that they don’t know unequivocally whether they’re able to release or not, because sufficient case law has yet to decide the issue.</p>
<p>What this means in practical terms is that when in doubt – in the best case scenario, at least, as opposed to the worst case scenario we’ll get to momentarily – authors will simply default to non-open source licenses.</p>
<h3>Outcomes</h3>
<p>Which brings us to the last issue, which concerns outcomes – desired and non-desired. On the one hand, strict adherence to the four freedoms clearly necessitates a full release of training data. Whether those training datasets are big and unwieldy – and whether that dramatically narrows the funnel of available open source models due to questions of data and law – is immaterial, if the desired outcome is a pure definition of open source and AI.</p>
<p>It seems at least possible, however, that such adherence could lead to undesirable outcomes. If we assume, for example, that the definition requires full release of datasets, one thing is certain: in Julia’s words, it would be “a definition for which few existing systems qualify.”</p>
<p>In and of itself, that would not necessarily be a negative if there was a plausible pathway for some reasonable number of AI projects to comply in some reasonable timeframe. It’s not clear here, however, that that is the case. It seems more likely, in fact, that a tight definition would have the reverse effect. If the goal seems fundamentally unachievable, why try? At which point, each project would have a choice to make: follow in Google’s footsteps with Gemma or Meta’s with Llama.</p>
<p>Google was explicit that while Gemma was an open model, it did not and would not qualify for the term open source because they respect the OSI definition. The majority of the press ignored this important bit of what they considered semantics and called it open source. Meta, on the other hand, as it has for years, willfully and counterfactual described and continues to describe Llama as open source – in spite of the use restrictions on it which mean peers like Amazon, Google and Microsoft cannot leverage the project – and the press, which the exception of the odd tech reporter here and there and, of all publications, <a href="https://www.nature.com/articles/d41586-024-02012-5">Nature</a>, has not seen fit to question them.</p>
<p>In a world, then, where companies provably want the benefits of the open source brand, but it’s seen as difficult if not impossible to achieve – particularly a world in which the open source label is already <a href="https://redmonk.com/sogrady/2023/08/03/why-opensource-matters/">under siege</a> from other quarters – the most likely course of action from this perspective is vendors totally abandoning any pretense of consideration for the OSI and the term it’s charged with guarding. In attempting to more closely protect the open source definition with respect to AI, then, it’s possible that the outcome would be opposite of the intention.</p>
<p>Which is why the question is such a challenge, and the task of answering it not one to be envied.</p>
<p><strong>Disclosure</strong>: Amazon, Cloudflare, Fastly, Google and Microsoft are all RedMonk customers. Anthropic, Kong, Meta, OpenAI and Poolside are not currently RedMonk customers.</p>
]]></content:encoded>
</item>
<item>
<title>AI Patterns</title>
<link>https://redmonk.com/sogrady/2024/05/29/ai-patterns/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Wed, 29 May 2024 14:19:43 +0000</pubDate>
<category><![CDATA[AI]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6037</guid>
<description><![CDATA[Subhrajyoti07, CC BY-SA 4.0, via Wikimedia Commons Artificial General Intelligence (AGI) may yet be a ways off, but that hasn’t limited the current crop of AI technologies’ ability to impact industries. Much as software once ate the world, AI is ingesting everything from event agendas to internal development focus to available speculative capital. As might]]></description>
<content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/files/2024/05/1024px-Crisscross_elliptical_patterns_created_using_light_painting.jpg"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2024/05/1024px-Crisscross_elliptical_patterns_created_using_light_painting.jpg" alt="" width="1024" height="681" class="aligncenter size-full wp-image-6038" srcset="https://redmonk.com/sogrady/files/2024/05/1024px-Crisscross_elliptical_patterns_created_using_light_painting.jpg 1024w, https://redmonk.com/sogrady/files/2024/05/1024px-Crisscross_elliptical_patterns_created_using_light_painting-300x200.jpg 300w, https://redmonk.com/sogrady/files/2024/05/1024px-Crisscross_elliptical_patterns_created_using_light_painting-768x511.jpg 768w, https://redmonk.com/sogrady/files/2024/05/1024px-Crisscross_elliptical_patterns_created_using_light_painting-480x319.jpg 480w, https://redmonk.com/sogrady/files/2024/05/1024px-Crisscross_elliptical_patterns_created_using_light_painting-943x627.jpg 943w" sizes="(max-width: 1024px) 100vw, 1024px" /></a><br />
<a href="https://commons.wikimedia.org/wiki/File:Crisscross_elliptical_patterns_created_using_light_painting.jpg">Subhrajyoti07</a>, <a href="https://creativecommons.org/licenses/by-sa/4.0">CC BY-SA 4.0</a>, via Wikimedia Commons</p>
<p>Artificial General Intelligence (AGI) may yet be a ways off, but that hasn’t limited the current crop of AI technologies’ ability to impact industries. Much as software once ate the world, AI is ingesting everything from event agendas to internal development focus to available speculative capital. As might be predicted given that backdrop, a growing proportion of technology industry discussions focus on not if but when AI will impact a given market, product or project. The following are a few common themes that have emerged from these conversations.</p>
<h2>Interface</h2>
<p>Decades ago, before the emergence of trends like cloud and open source, technology vendors typically designed for buyers and treated users as afterthoughts. Products were adopted top down, after all, so what did it matter that the resulting products were hard to use, aesthetically deficient or both?</p>
<p>Thanks to trends like cloud and open source, however, the industry has developed an appreciation for users and consequently thought more about things like developer experience. Gaps <a href="https://redmonk.com/sogrady/2020/10/06/developer-experience-gap/">persist</a>, but products are generally more thoughtful about their interfaces today than they were a decade or more ago.</p>
<p>Which is why it’s surprising that comparatively little attention is paid to AI’s interface question. Enterprises and in particular the vendors that serve them love to talk about models – and models are important, as we’ll get to momentarily. While those conversations about models are happening, however, legions of developers and other users are actively and aggressively imprinting on their particular tool of choice. And more problematically for the businesses that might want their users to leverage a different tool, not only are users imprinting on their tool’s specific UI, they’re accumulating significant query histories within it that are difficult if not impossible to export.</p>
<p>Anecdotally, there are many stories of developers declining to use objectively superior models because of their <a href="https://en.wikipedia.org/wiki/Imprinting_(psychology)#Baby_duck_syndrome">Baby Duck syndrome</a> with their tool of choice. And while it’s too early to say, it may well be true that enterprises will find the challenge of getting developers to switch tools comparable in difficulty to getting them to switch IDEs. Good luck with that, in other words.</p>
<p>Worse, unlike IDEs which have a limited organizational footprint, AI tools and LLMs in particular can be used by every employee from marketing to sales to public relations – not just developers and other technical staff.</p>
<h3>Model Size</h3>
<p>One thing the industry has at present is plenty of models. Hugging Face alone as of this moment lists 683,310 available. It seems safe, therefore, to conclude that the industry is in the midst of a model boom, though how viable or relevant the overwhelming majority are or will be is a separate question. While the list of models is nearly endless, however, the ways that adopters are thinking about them are not.</p>
<p>At least at present, large models like OpenAI’s GPT and Google’s Gemini get the most attention, because their size, sophistication and training scope give them a very wide range of capabilities and, consequently, they tend to demo well. But large models have their limitations, and they can be expensive at scale. For these reasons, many users are turning to medium or small sized models, whether it’s to cut costs, to run them locally on less capable hardware or varying other reasons.</p>
<p>What’s interesting, however, is seeing how vendors are self-sorting themselves on this basis. Google, Microsoft and OpenAI all have models in varying sizes, but the overwhelming majority of their messaging concerns their largest, state of the art models. The same is generally true for the likes of AI21, Anthropic, Cohere and Mistral. AWS, for its part, used its reInvent conference to <a href="https://redmonk.com/sogrady/2023/12/06/reinvent-2023/">heavily sell</a> the idea of model choice. And most recently, first at the Red Hat Summit and subsequently at IBM Think, the subsidiary and parent companies laid out a vision of choice, but put a particular emphasis on small models fine tuned with their recently launched <a href="https://redmonk.com/sogrady/2024/05/14/instructlab/">InstructLab project</a>. Red Hat’s CEO Matt Hicks was blunt, saying simply “I’m a small model guy.”</p>
<p>Given the varying tradeoffs and myriad of customer needs, models are obviously going to come in many sizes. But it will be interesting to see whether or not small and medium sized models, which have to date taken a back seat to their larger, more capable brethren, get more airtime moving forward.</p>
<h3>On or Off Prem</h3>
<p>Some months before the Oxide Computer company was launched in 2019, co-founder Bryan Cantrill was privately discussing the prospect of starting a hardware company. The response here was laughter, and calling the idea “crazy.” The response was apparently not an uncommon one.</p>
<p>On the one hand, the pitch had obvious things going for it. First, it was inarguably true that large, at scale internet companies had learned an enormous amount about building hardware and that many of these lessons had been captured and shared publicly in forums like the Open Compute Project. It was also true that the ability of enterprises to purchase hardware built on these hard earned lessons was limited to non-existent.</p>
<p>But on the other, industry consensus was clear that all roads lead to cloud. There were existing on premises workloads that would be impractical or impossible to migrate to the cloud, and thus datacenters would be around indefinitely. But the projections were that most net new workloads would be cloud workloads. Further, cloud provider growth reflected this consensus. Starting a hardware company, therefore, seemed like a sisyphean task.</p>
<p>While neither Oxide nor the rest of the industry knew it, however, AI was about to intervene. Three years after Oxide’s founding ChatGPT was released and quickly became the fastest growing technology product in history.</p>
<p>It was not an on premises product, of course, nor were most of the large models like Gemini that followed it. But just as quickly as enterprises realized that the technology had immense potential benefits, they quickly had to consider the risks of granting these models access to their private, internal data – as illustrated by cases like <a href="https://techcrunch.com/2023/05/02/samsung-bans-use-of-generative-ai-tools-like-chatgpt-after-april-internal-data-leak/">Samsung</a>.</p>
<p>What has followed has been an observable resurgence in interest in on premises hardware. See, for example, this graph of Google searches for datacenter; note the timing in particular. Correlation is not causation, of course, but the sudden upward tick in these searches seems a little coincidental.</p>
<p><a href="http://redmonk.com/sogrady/files/2024/05/datacenter-google-trends-wm.png"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2024/05/datacenter-google-trends-wm-1024x844.png" alt="" width="1024" height="844" class="aligncenter size-large wp-image-6039" srcset="https://redmonk.com/sogrady/files/2024/05/datacenter-google-trends-wm-1024x844.png 1024w, https://redmonk.com/sogrady/files/2024/05/datacenter-google-trends-wm-300x247.png 300w, https://redmonk.com/sogrady/files/2024/05/datacenter-google-trends-wm-768x633.png 768w, https://redmonk.com/sogrady/files/2024/05/datacenter-google-trends-wm-1536x1266.png 1536w, https://redmonk.com/sogrady/files/2024/05/datacenter-google-trends-wm-2048x1688.png 2048w, https://redmonk.com/sogrady/files/2024/05/datacenter-google-trends-wm-480x396.png 480w, https://redmonk.com/sogrady/files/2024/05/datacenter-google-trends-wm-761x627.png 761w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></p>
<p>The rise of AI technologies is not going to usher in the great repatriation, to be clear. Cloud providers will remain the simplest and lowest friction approach for standing up AI workloads in most cases, because operating datacenters is both hard and expensive.</p>
<p>But even for vendors like Oxide that do not specialize in GPU-based hardware, the accelerating demand from enterprises to run local infrastructure to leverage data they do not trust to external large providers is likely to significantly expand on prem compute footprints.</p>
<h3>Gravity and Trust</h3>
<p>Another contributing factor to the expansion of on prem AI implementations is the so-called gravity to the large datasets that are the fuel on which AI depends. Large data is difficult to move quickly and safely, and as with physical objects subject to Newtonian physics, data at rest is likely to stay at rest. The idea of data gravity, of course, is not new. Dave McCrory was articulating the idea as far back as <a href="https://ionir.com/the-fundamentals-of-data-gravity-with-dave-mccrory%E2%80%AF">2010</a>, and it’s a well understood aspect to data management.</p>
<p>What is perhaps less well understood is the idea that gravity and trust are inherently intertwined. A large accumulation of data in a given location, of course, is an implicit assertion of trust. If an enterprise didn’t trust the platform, it would not choose to aggregate data there but a platform it trusted more. This in turn implies that where a large dataset currently resides is likely to be more trusted than a new platform, whatever assurances the new platform provider might share.</p>
<p>Practically speaking, then, large scale data management providers have a trust advantage over new market entrants. Even if Salesforce’s Einstein AI suite can’t match OpenAI’s feature for feature or capability for capability, then, it might well be worth leveraging because of the trust gap between the two vendors. And indeed, we’ve had a customer admit exactly that.</p>
<p>Trust won’t always trump features, of course, and some data is less sensitive and requires commensurately lower levels of protection. But trust is undeniably a factor in AI adoption, and is likely to favor incumbent providers at least in the short term.</p>
<h2>The Net</h2>
<p>AI adoption involves a multitude of considerations, clearly, and cannot in most cases be reduced to a single axis upon which to make a decision. As much as providers want to tout features and new capabilities, incredible as those may be, adoption will in most cases be a complicated conversation involving multiple parties, differing fears and risk estimates and decisions on how to weight capability versus size, speed and cost.</p>
<p><strong>Disclosure</strong>: AWS, Google, IBM, Microsoft, Oxide, Red Hat and Salesforce are RedMonk customers. AI21, Anthropic, Cohere, Mistral and OpenAI are not currently customers.</p>
]]></content:encoded>
</item>
<item>
<title>InstructLab: What if Contributing to Models Was Easy?</title>
<link>https://redmonk.com/sogrady/2024/05/14/instructlab/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Tue, 14 May 2024 15:41:22 +0000</pubDate>
<category><![CDATA[AI]]></category>
<category><![CDATA[Open Source]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6034</guid>
<description><![CDATA[One of the unfortunate but unavoidable truths about tech conferences in 2024 is that they are, of necessity, AI conferences. While we can and should debate the opportunity costs of redirecting so much time and attention from events to AI-related subjects, the reality is that the sheer relevance, interest in and capabilities of these technologies]]></description>
<content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/files/2024/05/160199024.png"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2024/05/160199024.png" alt="" width="175" height="175" class="aligncenter size-full wp-image-6035" srcset="https://redmonk.com/sogrady/files/2024/05/160199024.png 175w, https://redmonk.com/sogrady/files/2024/05/160199024-150x150.png 150w" sizes="(max-width: 175px) 100vw, 175px" /></a><br />
One of the unfortunate but unavoidable truths about tech conferences in 2024 is that they are, of necessity, AI conferences. While we can and should debate the <a href="https://twitter.com/editingemily/status/1786040914336489907">opportunity costs</a> of redirecting so much time and attention from events to AI-related subjects, the reality is that the sheer relevance, interest in and capabilities of these technologies – not to mention their still accelerating rates of innovation – makes them inevitable headline acts. The question facing events, then, is not whether to feature AI prominently, but whether they’re talking about genuinely novel and interesting problems within AI or whether they’re talking about AI for the sake of talking about AI.</p>
<p>The good news for Red Hat following its annual Summit last week, is that it was the former. The company and its parent IBM announced two open source projects; the Granite family of models and InstructLab, a project intended to lower the barriers to model contributions.</p>
<p>The Granite models are easy to understand and contextualize. We’re in the middle of a model boom – of questionable sustainability, but that’s a subject for another day. Suffice it to say that there is no shortage of models and no real impediments to understanding them.</p>
<p>InstructLab, on the other hand, requires some introduction and explanation. To understand InstructLab, it’s helpful to remember what open source was like in the early days. Source code was exploding in availability, and developers inhaled it, tinkered with it and compiled it on whatever hardware they had to hand. A subset of those developers had the ability, incentives and willingness to share their improvements back to the upstream project, and the software moved forward on the backs of these collective inputs.</p>
<p>Today, open source moves forward in much this same fashion – though not without its <a href="https://redmonk.com/sogrady/2023/08/03/why-opensource-matters/">existential challenges</a>. AI, for its part, has embraced this same ethos of community driven development within platforms like Hugging Face. But the barriers to actually contributing back to those models that are open enough to contribute to, by Red Hat’s estimation at least, are too high – in terms of the required hardware, technical ability and available training data. InstructLab, therefore, is explicitly intended to make it possible for experts in non-technical fields to contribute back to existing models.</p>
<p>The simplest way to contribute to a model – the model equivalent of a pull request – is a “skill” which requires two things: a YAML file and a second text file detailing attribution of the content – who created it, where it came from and so on. YAML doesn’t have many fans, but from the perspective of contributors it’s really just a text file with structured formatting. InstructLab then uses a limited number of these skills to generate a larger corpus of related synthetic data which is used to update the model.</p>
<p>The obvious question for those following this space is whether it’s intended as a replacement for traditional Retrieval-Augmented Generation (RAG) model updates, and the short answer is no. It’s best characterized as a complement, one that opens a pipeline for other forms of model input unsuitable for RAG processes.</p>
<p>While there are many more questions to be asked about InstructLab, however, from how it works to how it compares to RAG to its overall efficiency and efficacy, perhaps the most interesting question is: what if it does work?</p>
<p>If we assume that InstructLab accomplishes exactly what it set out to, which is to say dramatically lowering the barrier to model contributions and updates, what would that mean? On the one hand, decreased friction to contributions resulting in an explosion in same would suggest models that dramatically accelerate their abilities, coverage and breadth.</p>
<p>On the other hand, however, the question is what governance and provenance would and will be required to manage new influxes of content contributions. In the source code world, the industry has decades of experience in understanding the implications of intellectual property and the scaffolding needed to manage it properly; see, for example, the OpenTofu project’s <a href="https://opentofu.org/blog/our-response-to-hashicorps-cease-and-desist/">response</a> to allegations of copyright infringement. Code is also a relatively narrowly circumscribed domain with clear boundaries.</p>
<p>Relevant model content, however, may take a myriad of forms. It’s also not clear that the industry currently has the intellectual property governance mechanisms – both in terms of the licenses and the processes to manage them – in place to handle a world in which anyone, not just the data scientists, is a potential input for models. Developers and other individual practitioners who are typically less than obsessed with licensing and other compliance concerns are likely to appreciate the lowered barriers to entry. It remains to be seen, however, whether their employers will feel the same way.</p>
<p>It is clear that Red Hat, the long term standard bearer for open source, is trying to bring more of the community contribution and enthusiasm it knows well to the world of AI. What is less clear is whether enterprise AI customers are ready for it.</p>
<p>But the same was true of open source software once upon a time. It took years, but enterprises evolved the ability to consume open source at scale. It may take a similar period of learning and acclimation before enterprises are ready to embrace democratized model updates, but the potential benefits are obvious if they do.</p>
<p><strong>Disclosure</strong>: IBM and Red Hat are RedMonk customers.</p>
]]></content:encoded>
</item>
<item>
<title>The RedMonk Programming Language Rankings: January 2024</title>
<link>https://redmonk.com/sogrady/2024/03/08/language-rankings-1-24/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Fri, 08 Mar 2024 15:43:17 +0000</pubDate>
<category><![CDATA[Programming Languages]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6031</guid>
<description><![CDATA[This iteration of the RedMonk programming Language Rankings is brought to you by Amazon Web Services. AWS manages a variety of developer communities where you can join and learn more about building modern applications in your preferred language. As many are aware, our latest analysis of the programming language rankings has been delayed. While there]]></description>
<content:encoded><![CDATA[<blockquote><p>
This iteration of the RedMonk programming Language Rankings is brought to you by Amazon Web Services. AWS manages a variety of <a href="https://aws.amazon.com/developer/community">developer communities</a> where you can join and learn more about building modern applications in your preferred language.
</p></blockquote>
<p>As many are aware, our latest analysis of the programming language rankings has been delayed. While there are a number of factors that have contributed to this including schema changes and query issues, the biggest single factor in the latency between this ranking and its predecessor was our attempt to explain some serious anomalies discovered in the data. We can’t definitively identify the source of those as yet, but our <a href="https://redmonk.com/rstephens/2023/12/14/language-rankings-update/">current hypothesis</a> is that they are manifestations of the acceleration of AI code assistants. We’ll continue to monitor the impact of these tools on both the industry and the underlying data that these rankings are built on.</p>
<p>In the meantime, however, as a reminder, this work is a continuation of the work originally performed by Drew Conway and John Myles White late in 2010. While the specific means of collection has changed, the basic process remains the same: we extract language rankings from GitHub and Stack Overflow, and combine them for a ranking that attempts to reflect both code (GitHub) and discussion (Stack Overflow) traction. The idea is not to offer a statistically valid representation of current usage, but rather to correlate language discussion and usage in an effort to extract insights into potential future adoption trends.</p>
<h1>Our Current Process</h1>
<p>The data source used for the GitHub portion of the analysis is the GitHub Archive. We query languages by pull request in a manner similar to the one GitHub used to assemble the State of the Octoverse. Our query is designed to be as comparable as possible to the previous process.</p>
<ul>
<li>Language is based on the base repository language. While this continues to have the caveats outlined below, it does have the benefit of cohesion with our previous methodology.</li>
<li>We exclude forked repos.</li>
<li>We use the aggregated history to determine ranking (though based on the table structure changes this can no longer be accomplished via a single query.)</li>
<li>For Stack Overflow, we simply collect the required metrics using their useful data explorer tool.</li>
</ul>
<p>With that description out of the way, please keep in mind the other usual caveats.</p>
<ul>
<li>To be included in this analysis, a language must be observable within both GitHub and Stack Overflow. If a given language is not present in this analysis, that’s why.</li>
<li>No claims are made here that these rankings are representative of general usage more broadly. They are nothing more or less than an examination of the correlation between two populations we believe to be predictive of future use, hence their value.</li>
<li>There are many potential communities that could be surveyed for this analysis. GitHub and Stack Overflow are used here first because of their size and second because of their public exposure of the data necessary for the analysis. We encourage, however, interested parties to perform their own analyses using other sources.</li>
<li>All numerical rankings should be taken with a grain of salt. We rank by numbers here strictly for the sake of interest. In general, the numerical ranking is substantially less relevant than the language’s tier or grouping. In many cases, one spot on the list is not distinguishable from the next. The separation between language tiers on the plot, however, is generally representative of substantial differences in relative popularity.</li>
<li>In addition, the further down the rankings one goes, the less data available to rank languages by. Beyond the top tiers of languages, depending on the snapshot, the amount of data to assess is minute, and the actual placement of languages becomes less reliable the further down the list one proceeds.</li>
<li>Languages that have communities based outside of Stack Overflow such as Mathematica will be under-represented on that axis. It is not possible to scale a process that measures one hundred different community sites, both because many do not have public metrics available and because measuring different community sites against one another is not statistically valid.</li>
</ul>
<p>With that, here is the first quarter plot for 2024.</p>
<p><a href="http://redmonk.com/sogrady/files/2024/03/lang.rank_.q124.wm_.png"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2024/03/lang.rank_.q124.wm_-1024x845.png" alt="" width="1024" height="845" class="aligncenter size-large wp-image-6032" srcset="https://redmonk.com/sogrady/files/2024/03/lang.rank_.q124.wm_-1024x845.png 1024w, https://redmonk.com/sogrady/files/2024/03/lang.rank_.q124.wm_-300x247.png 300w, https://redmonk.com/sogrady/files/2024/03/lang.rank_.q124.wm_-768x633.png 768w, https://redmonk.com/sogrady/files/2024/03/lang.rank_.q124.wm_-1536x1267.png 1536w, https://redmonk.com/sogrady/files/2024/03/lang.rank_.q124.wm_-2048x1689.png 2048w, https://redmonk.com/sogrady/files/2024/03/lang.rank_.q124.wm_-480x396.png 480w, https://redmonk.com/sogrady/files/2024/03/lang.rank_.q124.wm_-760x627.png 760w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></p>
<p>1 JavaScript<br />
2 Python<br />
3 Java<br />
4 PHP<br />
5 C#<br />
6 TypeScript<br />
6 CSS<br />
8 C++<br />
9 Ruby<br />
10 C<br />
11 Swift<br />
12 Go<br />
12 R<br />
14 Shell<br />
14 Objective-C<br />
16 Scala<br />
17 Kotlin<br />
18 PowerShell<br />
19 Rust<br />
19 Dart</p>
<p>The top 20 languages cohort was not devoid of movement this run, but it was more static than not as we’ve come to expect in recent years. There was no movement in the top 5 languages, and less than a third of the top 20 languages moved at all. For better or for worse, these metrics reflect an environment resistant to change.</p>
<p>The last analysis of these numbers <a href="https://redmonk.com/sogrady/2023/05/16/language-rankings-1-23/">considered</a> the possibility that coding assistants had the potential to ease and speed the uptake of new languages by lowering barriers to entry and speeding ramp up time in adopting new, unfamiliar languages. If that’s occurring at present, however, it’s not observable in the data. We see some new language entrants, and we have a list of languages of interest from our qualitative research, but to date these rankings offer minimal evidence of any AI-fueled acceleration of new language adoption.</p>
<p>With that, some results of note:</p>
<ul>
<li><strong>Go</strong>: two years ago at this time, Go had lost its forward momentum and fallen back into 16th place, and it wasn’t clear what emerging use cases might reverse its fortunes. As of this run, it is up four spots over that span to place it in a tie with the steady R language at 12. There’s no obvious reason for this resurgence, but anecdotal evidence suggests that Go’s blend of accessibility, design and performance have rendered it a reasonable language of compromise for back end development use cases.</p>
</li>
<li>
<p><strong>TypeScript</strong>: ever since it became the first new entrant to our top 10 since – for one brief quarter, Swift – TypeScript has been the language to watch. Its upward trajectory has been slowed over the course of its ascent as these metrics are accretive in nature, but this quarter’s run continues a period of slow but steady growth as it moves up one spot from seven to six. With an ever increasing industry focus on security as well, it’s not implausible that the language has further growth yet in front of it. It’s arguable that some of that growth came at the expense of our next language, in fact.</p>
</li>
<li>
<p><strong>C++</strong>: in the initial incarnation of these rankings, C++ debuted at 7. It climbed as high as 5 at times, and recently had settled back into seventh place. For this run, C++ dropped to eighth place for the first time in the history of these rankings. It’s important to note at this point that top 10 languages are, relatively speaking, enormously popular and have achieved a ranking that dozens if not hundreds of other languages would envy. All of that said, it’s worth asking whether C++ can sustain its back end popularity in the face of the influx of more modern and accessible lower level languages like Go or Rust.</p>
</li>
<li>
<p><strong>Dart / Kotlin / Rust</strong>: speaking of Rust, the notable thing about it as well as Dart and Kotlin, as has been true in recent quarters, is that there is no news. All three of these languages have outperformed their various peers to achieve entry into our top 20, but none have been able to capitalize on their newfound popularity to resume their upward trajectory in the manner of TypeScript. The incumbents ahead of them have proven hard to displace, and there is also emerging competition for some of the languages from the likes of Zig as we’ll come back to.</p>
</li>
<li>
<p><strong>Swift</strong>: even in a long list of languages that have been static in their progress, Swift stands out having not moved one spot in either direction in six years. To put that in context, the last time Swift wasn’t 11 in our rankings Google’s <a href="https://blog.research.google/2017/08/transformer-novel-neural-network.html">transformer paper</a> was merely one quarter into its journey towards upending the technology industry. On the one hand, as noted above, this means Swift is enormously popular as a language. On the other hand, its ranking and the lack of objective evidence to the contrary seems to prove that the efforts to make Swift more of a force for server side applications have failed.</p>
</li>
<li>
<p><strong>Bicep (86), Grain, Moonbit, Zig (97)</strong>: as with the Dart/Kotlin/Rust grouping above, these languages are grouped here not because they’re all technically similar but rather because they are on the languages of interest list mentioned above. They are included here for a variety of reasons: Zig is on here because it has attempted to learn from the languages that preceded it from C++ to Rust. Grain and Moonbit are on here, meanwhile, because they are optimized for <a href="https://redmonk.com/sogrady/2023/01/18/wasm/">WebAssembly</a>. Bicep is on here because it comes up with surprising frequency – and a mildly surprising ranking – for a cloud DSL. Only two of these languages are currently ranked, but we’re watching all of them to see if these or any other new languages begin to emerge.</p>
</li>
</ul>
<p><strong>Credit</strong>: My colleague <a href="https://redmonk.com/rstephens/">Rachel Stephens</a> wrote the queries that are responsible for the GitHub axis in these rankings. She is also responsible for the query design for the Stack Overflow data.</p>
]]></content:encoded>
</item>
<item>
<title>AI: The Difference Between Open and Open Source</title>
<link>https://redmonk.com/sogrady/2024/02/26/ai-open-source/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Mon, 26 Feb 2024 15:34:56 +0000</pubDate>
<category><![CDATA[AI]]></category>
<category><![CDATA[Open Source]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6028</guid>
<description><![CDATA[There is a pattern in AI that has become clear. Every few weeks – or more accurately at this point, days – there is a release of a new AI model. The majority of these models are not released under an actual open source license, but instead under licensing that imposes varying restrictions based on]]></description>
<content:encoded><![CDATA[<p><a href="https://venturebeat.com/ai/meet-smaug-72b-the-new-king-of-open-source-ai/"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2024/02/CleanShot-2024-02-26-at-10.23.06@2x-1024x941.png" alt="" width="1024" height="941" class="aligncenter size-large wp-image-6029" srcset="https://redmonk.com/sogrady/files/2024/02/CleanShot-2024-02-26-at-10.23.06@2x-1024x941.png 1024w, https://redmonk.com/sogrady/files/2024/02/CleanShot-2024-02-26-at-10.23.06@2x-300x276.png 300w, https://redmonk.com/sogrady/files/2024/02/CleanShot-2024-02-26-at-10.23.06@2x-768x705.png 768w, https://redmonk.com/sogrady/files/2024/02/CleanShot-2024-02-26-at-10.23.06@2x-1536x1411.png 1536w, https://redmonk.com/sogrady/files/2024/02/CleanShot-2024-02-26-at-10.23.06@2x-480x441.png 480w, https://redmonk.com/sogrady/files/2024/02/CleanShot-2024-02-26-at-10.23.06@2x-683x627.png 683w, https://redmonk.com/sogrady/files/2024/02/CleanShot-2024-02-26-at-10.23.06@2x.png 1548w" sizes="(max-width: 1024px) 100vw, 1024px" /></a><br />
There is a pattern in AI that has become clear. Every few weeks – or more accurately at this point, days – there is a release of a new AI model. The majority of these models are not released under an actual open source license, but instead under licensing that imposes varying restrictions based on intended use cases, revenue limits, user counts or other limiting factors. In most cases the project authors explicitly and counterfactually describe the released artifacts as open source and a credulous press accepts these statements at face value with no pushback. End users, for their part, having been assured by the authors and press covering the releases that the project is open source, assume that it is. Up to this point, after all, projects that were described as open source generally were, and therefore allowed no arbitrary restrictions, and thus users did not have to pay much attention.</p>
<p>Each of these incidents acts to dilute the term open source, and thus weaken it.</p>
<p>Some would excuse if not actively condone this behavior because when it comes to the question of what is open source AI, the answer is we don’t know yet. It is not clear, at present, precisely what the term open source means in the context of AI. There is no industry consensus, and the primary, underfunded defender of the term is still <a href="https://discuss.opensource.org/t/a-new-draft-of-the-open-source-ai-definition-v-0-0-5-is-available-for-comments/174">working on</a> a definition.</p>
<p>The implicit assertion of those that would defend the description of assets as open source that are objectively not is that the blame should not go to bad actor authors, but rather the OSI. If only their definition had been available, the reasoning goes, these parties deliberately and willfully misusing the term open source would be more respectful.</p>
<p>This position ignores some obvious challenges. Most obviously, defining open source with respect to AI is an enormous industry challenge. It is not clear, for example, that copyright – the fundamental intellectual property mechanism open source licensing is based on – can be applied to embeddings and other abstruse, numerical portions of released projects. And while the open source <a href="https://opensource.org/osd">definition</a> was designed in an era where the source code was all that mattered, it is but one small piece of an AI model. What, then, should a definition in an AI era require of project authors to ensure the same rights to an end user? How encompassing should it be? And what are the downstream implications of that? A project trained on massive datasets stretching across the internet, as but one example, is clearly not going to be able to convey that as part of its release.</p>
<p>But it’s not just that defining open source is difficult. Those who would blame the OSI for the repeated misuse of the term open source with respect to AI models are ignoring a simple truth: that while we can’t yet say what open source is, precisely, with respect to AI, it’s easy to tell what <em>it is not</em>.</p>
<p>It is true that we do not yet understand what the scope of an open source AI license might be, and whether it touches on training data or whether weights, parameters and embeddings are sufficient. We can say with confidence, however, that licenses that pose artificial use restrictions based on the user counts and revenue mentioned above will not qualify for this definition.</p>
<p>It is possible, therefore, to be respectful of the term open source and its specific meaning even in the absence of a definition that applies to models. And it’s possible to do so in a manner in which full credit is still received for making assets open rather than keeping them private and proprietary. We know this is possible because this is precisely what Google has done with Gemma.</p>
<p>Released last week, Gemma are two small but high performing models from Google intended to compete with the likes of Meta’s LLaMa. Like LLaMa, Gemma is an open model. Unlike Meta, however, which falsely claimed that LLaMa was open source, Google was careful to state that while Gemma is open, it is not open source.</p>
<p>Their reasoning is <a href="https://opensource.googleblog.com/2024/02/building-open-models-responsibly-gemini-era.html">as follows</a>:</p>
<blockquote><p>
We’re precise about the language we’re using to describe Gemma models because we’re proud to enable responsible AI access and innovation, and we’re equally proud supporters of open source. The definition of “Open Source” has been invaluable to computing and innovation because of requirements for redistribution and derived works, and against discrimination. These requirements enable cross-industry collaboration, individual innovation and entrepreneurship, and shared research to happen with exponential effects.</p>
<p> However, existing open-source concepts can’t always be directly applied to AI systems, which raises questions on how to use open-source licenses with AI. It’s important that we carry forward open principles that have made the sea-change we’re experiencing with AI possible while clarifying the concept of open-source AI and addressing concepts like derived work and author attribution.
</p></blockquote>
<p>The gist, in other words, is that while we don’t yet know what open source AI is, we do know what it isn’t.</p>
<p>This articulation and branding is important – vitally so – for the long term health of the term open source, and thereby, the industry. But note that it comes at no cost to Google. There is no ambiguity or uncertainty about whether the model is open and available: it has been described and received as such. “Open model” conveys precisely what it needs to, and makes no promises it cannot fulfill. Unfortunately, the press has not yet internalized the difference between open and open source that Google so clearly articulated, and took it upon themselves to apply to Gemma <a href="https://www.theverge.com/2024/2/21/24078610/google-gemma-gemini-small-ai-model-open-source">the term open source</a> that Google so assiduously declined to itself.</p>
<p>Unfortunate as that may be, Google should be commended for its behavior here, for doing the right thing by open source and for providing a clear path that with luck, others may follow.</p>
<p>Open is good. The industry succeeds and is driven forward when groundbreaking new models are released and made available. But for the health of open source and the industry as a whole, it’s important to choose our words carefully and to understand that while open is good, open is not open source.</p>
<p><strong>Disclosure</strong>: Google is a RedMonk customer. Meta and the OSI are not currently RedMonk customers.</p>
]]></content:encoded>
</item>
<item>
<title>The Return of the Pendulum: Consolidation is Here</title>
<link>https://redmonk.com/sogrady/2024/02/07/pendulum-return/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Wed, 07 Feb 2024 17:56:29 +0000</pubDate>
<category><![CDATA[AI]]></category>
<category><![CDATA[Cloud]]></category>
<category><![CDATA[Databases]]></category>
<category><![CDATA[Trends & Observations]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6024</guid>
<description><![CDATA[From its birth through the first few years following the turn of the century, technology was most properly considered as a centralized industry. Dominated by a smaller number of large players and their monolithic, relatively homogenous platforms, it was predictable in its progression and characterized by a steady, stately pace. There were disruptions and revolutions,]]></description>
<content:encoded><![CDATA[<p><a data-flickr-embed="true" href="https://www.flickr.com/photos/aidanmorgan/10712619536" title="Pendulum"><img loading="lazy" decoding="async" src="https://live.staticflickr.com/7341/10712619536_ae1853c9b3_k.jpg" width="2048" height="1365" alt="Pendulum"/></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script></p>
<p>From its birth through the first few years following the turn of the century, technology was most properly considered as a centralized industry. Dominated by a smaller number of large players and their monolithic, relatively homogenous platforms, it was predictable in its progression and characterized by a steady, stately pace. There were disruptions and revolutions, of course – the personal computer era being perhaps the most notable of these – but overall, the landscape was largely coalesced around a small number of vendors and their respective technologies.</p>
<p>Around the turn of the new millennium, however, a variety of macro market pressures were building. What would come to be called open source, for one. The internet, for another. Twelve years after one of the original internet pioneers was born, meanwhile, it gave birth to yet one more market moving factor: the cloud. These and a myriad of other pressures, trends and developments combined to start history’s pendulum on a reverse trajectory.</p>
<p>Seemingly overnight, the wider technology market developed not only a tolerance of but appetite for specialized technologies – specialized technologies that could only come from a more decentralized, distributed technology industry. So it was that the small stable of big tech companies gave way to a new, far more crowded landscape of players. As but one example, what had been staid, heavily centralized markets like databases blew up into the fragmentation that was NoSQL. One software category populated by a small number of relevant players became a half dozen or more categories each of which had four or more relevant vendors of its own.</p>
<p>As Ashlee Vance <a href="https://www.nytimes.com/2010/06/21/technology/21chip.html?_r=1&src=busln">quoted</a> Andrew Feldman, then CEO of SeaMicro (acquired by AMD in 2012) in 2010:</p>
<blockquote><p>
“There is a foment happening. It’s a bubbling of ideas and technology.”
</p></blockquote>
<p>None of the above should be controversial, or particular open to dispute. No one can credibly question the assertion that the market, on the whole, swung towards specialization. The only real question is when the pendulum would swing back in the other direction. When, the question was, would the market again advantage general purpose offerings, with consolidation as its direct, inevitable consequence?</p>
<p>Based on the evidence at hand, the answer appears to be now.</p>
<p>Two and a half years ago, it was noted <a href="https://redmonk.com/sogrady/2021/10/26/general-purpose-database/">in this space</a> that the database market – the same market that was aggressively and forcefully decentralized beginning around 2009 – was beginning to consolidate. In principles consistent with <a href="https://redmonk.com/sogrady/2020/06/10/convergent-evolution-cdns-cloud/">convergent evolution</a>, specialized database players began to add general purpose, adjacent features to expand their addressable market and to satisfy customers seeking to reduce their vendor overhead. General purpose relational databases, meanwhile, increased their ability to compete with their more specialized counterparts by adding specific areas of capability – the ability to ingest and operate on JSON, for example.</p>
<p>That was an interesting development for the database market, certainly, but if it was an isolated development its significance would be limited. Instead, however, this type of functional consolidation is rippling outwards and impacting category after category. The APM, logging and observability markets, for example, have been aggressively merging for years. Developer toolchains, once made up of independent, best of breed components from version control to issue tracking to CI/CD to vulnerability scanning, are driving towards native, pre-integrated experiences offered by a single party. Developers individually, meanwhile, have been clamoring for Heroku-like platforms that abstract all of the underlying plumbing away, leaving them with only the simple task of pushing code to it. CDNs become more general purpose, clouds become better CDNs. One time areas of high visibility and focus like operating systems, virtual machines or container orchestration platforms are gradually fading from sight, obscured beneath multiple layers of abstraction and consolidation. Where RedMonk’s language rankings were once somewhat dynamic, characterized by growth, sometimes rapid, of late they have been more static and reflective of minimal movement.</p>
<p>From individuals to markets, the inevitable outcome of having too many choices is a desire to make fewer choices. Consolidation and centralization are real, and they are here, observable in most markets today with one notable exception which we’ll come back to.</p>
<p>Some will likely argue that, like every other aspect of life today, that fragmentation was a zero interest rate phenomenon. And inarguably the availability of free money contributed to a world with more startups. But to point to ZIRP as the sole or even primary driver of fragmentation and decentralization is reductive, and ignores the reality which is that this trend was always inevitable independent of interest rates, because history demonstrates conclusively that pendulums that swing in one direction will inevitably, at some point, swing in the other.</p>
<p>The question facing us, then, is not whether the pendulum has started to reverse its course, but what that swing means. The implications are wide, but a few conclusions seem obvious:</p>
<ol>
<li>General purpose, more broadly capable platforms will increasingly have an advantage. Specialized players must not only be better than their general purpose competitors to survive, they must be significantly better. </li>
<li>For specialized players who are unable or unwilling to broaden their functional surface areas, in addition to being technically superior they need to significantly increase their ability to partner. Multiple complementary and specialized third party offerings are going to be facing customers preferring single vendor, general purpose systems and they must mitigate their fragmented nature through superior technical integration, yes, but also non-technical, operational excellence via partnered pricing, joint go to markets, packaging, co-marketing and so on. </li>
<li>One of general purpose plaforms’ advantages, in theory if not always in practice, is in experience. No need for multiple logins, for example. No need for the cognitive overhead of multiple user interfaces, and so on. Combined with the wider markets current existing preference for and emphasis on developer experience, this implies that specialized players must heavily emphasize experience in their own offerings – and again, potentially mitigate some of the weaknesses there via partnerships. </li>
<li>Specialized vendors should be prepared to answer customer questions that have been uncommon in these recent years of fragmentation: why should I choose a specialized offering? Why shouldn’t I just use this easy general purpose platform even if it’s not as capable? Why should my procurement team have to deal with multiple vendors when I can just manage one? </li>
<li>Merger and acquisition work is ahead – probably a great deal of it. While the wider US economy, at least, is performing above expectations and interest rates are beginning to come down, it’s a challenging time for financing for everyone except AI companies. The combination of larger players seeking to broaden their capability footprint and smaller players finding it challenging to compete with larger, more broadly capable platforms is likely to lead to some of the former acquiring some of the latter. </li>
</ol>
<p>Speaking of AI, however, that is the notable exception to this trend mentioned above. If anything, fragmentation in the AI world is expanding, not contracting. New models arrive by the day, as do new use cases and the vendors that would target them. This is likely to continue in the near term, but there are already signs that AI is poised to be impacted by precisely the same trends currently being experienced by other software categories. AI is new, so its period of frothy experimentation is tolerated, but there will come a time when customers, and even most developers, tire of having to continually making choices about models and otherwise, and default to whatever general purpose platform they’ve imprinted on.</p>
<p>Fourteen years ago, <a href="https://redmonk.com/sogrady/2010/07/09/specialization/">this</a> was the observation regarding the arrival and onset of specialization.</p>
<blockquote><p>
As to when the pendulum will shift back towards general purpose, all I can tell you is that it will at some point. The inevitable result of an explosion of choice is a reactionary market shift away from it. All this has happened before, and all of this will happen again.
</p></blockquote>
<p>Today, in 2024, that same question remains. When will the pendulum shift back to specialization? The answer is that we don’t know, but we can confidently state that it will. It always does.</p>
]]></content:encoded>
</item>
<item>
<title>Re-founding, reInventing and the Future of AWS</title>
<link>https://redmonk.com/sogrady/2023/12/06/reinvent-2023/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Wed, 06 Dec 2023 15:07:38 +0000</pubDate>
<category><![CDATA[AI]]></category>
<category><![CDATA[Cloud]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6014</guid>
<description><![CDATA[Four weeks ago at the company’s Universe event, in a move that proved controversial in some corners, GitHub CEO Thomas Dohmke announced that the company was now “re-founded on Copilot.” This was a bold statement that some apparently viewed as a potential abandonment of Git and its founding principles, which alarmed those with little to]]></description>
<content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/files/2023/12/DALL·E-2023-12-06-09.49.13-A-dark-gritty-android-in-a-less-futuristic-more-dystopian-setting.-The-android-has-a-rugged-metallic-body-showing-signs-of-wear-and-tear.-It-featur.png"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2023/12/DALL·E-2023-12-06-09.49.13-A-dark-gritty-android-in-a-less-futuristic-more-dystopian-setting.-The-android-has-a-rugged-metallic-body-showing-signs-of-wear-and-tear.-It-featur.png" alt="" width="1024" height="1024" class="aligncenter size-full wp-image-6015" srcset="https://redmonk.com/sogrady/files/2023/12/DALL·E-2023-12-06-09.49.13-A-dark-gritty-android-in-a-less-futuristic-more-dystopian-setting.-The-android-has-a-rugged-metallic-body-showing-signs-of-wear-and-tear.-It-featur.png 1024w, https://redmonk.com/sogrady/files/2023/12/DALL·E-2023-12-06-09.49.13-A-dark-gritty-android-in-a-less-futuristic-more-dystopian-setting.-The-android-has-a-rugged-metallic-body-showing-signs-of-wear-and-tear.-It-featur-300x300.png 300w, https://redmonk.com/sogrady/files/2023/12/DALL·E-2023-12-06-09.49.13-A-dark-gritty-android-in-a-less-futuristic-more-dystopian-setting.-The-android-has-a-rugged-metallic-body-showing-signs-of-wear-and-tear.-It-featur-150x150.png 150w, https://redmonk.com/sogrady/files/2023/12/DALL·E-2023-12-06-09.49.13-A-dark-gritty-android-in-a-less-futuristic-more-dystopian-setting.-The-android-has-a-rugged-metallic-body-showing-signs-of-wear-and-tear.-It-featur-768x768.png 768w, https://redmonk.com/sogrady/files/2023/12/DALL·E-2023-12-06-09.49.13-A-dark-gritty-android-in-a-less-futuristic-more-dystopian-setting.-The-android-has-a-rugged-metallic-body-showing-signs-of-wear-and-tear.-It-featur-480x480.png 480w, https://redmonk.com/sogrady/files/2023/12/DALL·E-2023-12-06-09.49.13-A-dark-gritty-android-in-a-less-futuristic-more-dystopian-setting.-The-android-has-a-rugged-metallic-body-showing-signs-of-wear-and-tear.-It-featur-627x627.png 627w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></p>
<p>Four weeks ago at the company’s Universe event, in a move that proved controversial in some corners, GitHub CEO Thomas Dohmke announced that the company was now “re-founded on Copilot.” This was a bold statement that some apparently viewed as a potential abandonment of Git and its founding principles, which alarmed those with little to no interest in having a generative AI system assist their development. The reality was presumably less about deprecation, however, and rather intended to reflect a simple truth: one way or another moving forward AI is going to play a role in most developers’ workloads and the company’s products are going to reflect that.</p>
<p>Fast forward to last week. AWS’ reInvent, the name notwithstanding, made no similarly grandiose and explicit promises that the company would be rebuilt on AI. But it didn’t have to. The keynotes spoke volumes on that score.</p>
<p>Normally a sprawling event announcing hundreds of disparate services with no real larger narrative attempting to stitch them together other than messages like “primitives not frameworks,” reInvent this year was an AI event first and everything else second. With the notable exception of Werner Vogels’ Thursday closing session, the rest of the keynotes were dominated by AI related subjects.</p>
<p>This was understandable for two reasons. First, and most obviously, as GitHub’s event demonstrated only weeks prior, AI is currently reshaping our industry with a speed and breadth that is arguably without precedent this century. Virtually every event, then, is now an AI event in some way. Less obviously, however, AWS’ overwhelming focus on AI can be best understood as a response to an industry perception. For the past seventeen years, AWS has paced the industry, kicking off wave after wave of technical innovation from cloud compute and storage to managed databases to function as a service and serverless to, well, the list is long.</p>
<p>GitHub’s Copilot, with all due respect to startups like TabNine that have paved the way, was the first offering from a well-resourced hyperscale subsidiary that caught mainstream attention following its launch in October of 2021. But it was OpenAI’s ChatGPT, launched little less than twelve months ago that really blew the doors off. There’s a reason that that technology set records for adoption, hitting 100M users in two months, and that reason is that the technology was a revelation.</p>
<p>In contrast to prior technical waves RedMonk has observed such as containers, distributed version control, NoSQL or even cloud, conversational AI is relevant – or at least potentially relevant – to virtually every employee in an organization, regardless of role. Containers grew explosively while only really being technically relevant to developers and operators in the early days. ChatGPT had no such limitations; it could write marketing or sales copy, if imperfectly, as well as output code. The early versions were clunky – it answered the first question asked here incorrectly, but worse, very subtly so – and even the latest iterations still make basic mistakes. But the potential was clear then, and it’s clear now. And the systems are getting better quickly, as evidenced by the fact that they now know the correct number of how many World Championships the Red Sox have won this century.</p>
<p>The problem, then, is that nowhere in the above two paragraphs briefly describing the arc of innovation we find ourselves in is the abbreviation AWS. For the first time in nearly two decades, AWS found itself on the outside looking in for what may prove to be the most transformative wave of technology adoption since the internet itself. Never mind that Google – long regarded as having some of the best talent and tools in the artificial intelligence space, not least because it published the <a href="https://blog.research.google/2017/08/transformer-novel-neural-network.html">paper</a> that made ChatGPT possible five years before it was released – found itself in the same position. AWS was used to being the 800 pound gorilla, and if that’s the role you’re accustomed to, being a bystander is unacceptable.</p>
<p>So AWS got to work and did what AWS does, which is spin up new services with a speed that is the envy of the industry. reInvent represented the company’s best opportunity to tell said industry about its vision for AI moving forward, and how it would impact the company and its customers.</p>
<p>To judge from keynote airtime, at least, that vision is centered around models. Specifically the choice of models. There’s not much debate that different models will have different strengths and weaknesses. A model designed to replication vision, for example, is likely to be better at that task than one built to generate music. Training data, model design and refinement all end up producing both models that can handle a variety of tasks and those that are more tightly specialized. In addition to capabilities, users – or at least their employers – will have other concerns about models. What are they trained on? How will my data be shared and used, and how do I know it won’t be exfiltrated? How fast is it? How much does it cost? And so on – there will be questions about models.</p>
<p>reInvent, however, represented a rather large bet at least from an airtime perspective that models will be the primary concern, not a secondary consideration. On paper, this seems to make sense. AWS, dating back to last year’s Bedrock announcement, has attempted to differentiate itself from the market, which is to say primarily from Microsoft and OpenAI, by offering its own model choice versus their more closed model.</p>
<p>It’s not clear, however, that that’s going to be the selling point that AWS seems to believe it will be. Much yet depends on how these technologies are adopted, of course, and that is an open question. The sheer growth rates of tools like ChatGPT (<a href="https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/">0-100M</a> users in two months) and Copilot (<a href="https://www.theinformation.com/briefings/microsoft-github-copilot-revenue-100-million-ARR-ai">$0-$100M</a> run rate in two years) is evidence that these tools are popular for practitioners, as are the examples too numerous to count of teams explicitly forbidden from using them that are using them anyway.</p>
<p>The top of the organizations, meanwhile, are either evaluating these tools at present or being pushed to by boards that are reading breathless new AI headlines in even mainstream papers on a daily basis and asking, “what’s our AI strategy?” The question, therefore, is what happens when most or all of a developer population is already using one or more tools and an organization’s leadership wants them to use a different one.</p>
<p>The history of AWS itself may perhaps be instructive here. At reInvent in 2018, then CEO Andy Jassy <a href="https://redmonk.com/sogrady/2018/12/12/three-takeaways-reinvent-2018/">said</a> the following:</p>
<blockquote><p>
When we got started, we noticed that developers were being ignored – I’m not sure why, maybe it’s because they didn’t have money, and they were largely constrained in terms of what they could use.
</p></blockquote>
<p>What Jassy was suggesting was that AWS’ success was due at least in part to fleets of developers imprinting on AWS cloud offerings, making their adoption within an organization something of a fait accompli. In an era of developer empowerment, it’s difficult and rarely wise to attempt to impose technologies on unwilling developer populations top down.</p>
<p>If we assume that developers will have some and potentially a major role in adoption of generative AI technology, then, the logical question to ask is how much they care about models. The answer is complicated. On the one hand, hubs like Hugging Face are vibrant communities with large, active developer populations. Populations that are there to engage around models, to tinker and to seek out with precision options that fit their particular, and often very niche, use cases.</p>
<p>On the other hand, choice can be a burden, and if faced with a decision between a tool they’ve become accustomed to such as ChatGPT or Copilot and the choice of a half dozen or, more problematically, thousands of models, most seem to opt for the devil they know. After testing Google’s Bard vs ChatGPT recently, for example, one developer determined that, in his opinion, Bard provably wrote better Python. He did not, however, switch to Bard because of this. He remained with ChatGPT, in his words, because it had his long chat history, because he was used to it, and because he didn’t just write Python. There’s a curiosity about models, clearly, but that doesn’t necessarily correlate with practical usage, and there’s a difference between tinkering with open source models on the side and leveraging generative systems in your day to day workflow.</p>
<p>In some ways, AWS is following in Google’s footsteps. Just as Google chose to compete with a clear and closed market leader in Apple with openness via the Android platform, reInvent was AWS’ attempt to sway executives on the basis of its greater choice and flexibility with models versus the hermetically sealed Microsoft/OpenAI continuum. Mobile devices and generative AI infrastructure are very different markets, of course, but the dynamics of how iOS versus Android are notable nevertheless.</p>
<p>To be clear, AWS’ open model play will have some success, as organizational leaders have real concerns and trust issues with Microsoft and OpenAI, and Google to date is not a more open alternative. But if developers are uninterested in and unimpressed by AWS’ extended discussions of the potential benefits of its myriad of available models – particularly the more enterprise focused of these like privacy or data sovereignty, might the company risk losing their hearts and minds to those providing a vision of AI that is focused on the actual experience of using the tools, not the background models that they can’t see?</p>
<p>Perhaps the most interesting aspect of this outsized focus on models at reInvent is that there was a ready alternative to hand, the questionably named new product Q. Q was of course announced at the show, to much fanfare and acclaim, but got far less time on stage than the plethora of other models and options the company had on display and far less time than one might have expected. Given the timeframes involved, it seems probable that Q was built very quickly – and potentially, given some of its <a href="https://futurism.com/the-byte/amazon-ai-severe-hallucinations">reported hallucinations</a>, too quickly. But the market has come to understand that hallucinations are par for the course for new AI products, and are the very definition of a solvable problem. While they make for problematic headlines, then, they don’t suggest much about the product’s future. And the product’s future could be interesting indeed.</p>
<p>As has been documented both <a href="https://redmonk.com/sogrady/2021/12/10/reinvention/">here</a> and <a href="https://vickiboykis.com/2022/12/05/the-cloudy-layers-of-modern-day-programming/">elsewhere</a> repeatedly, AWS velocity has inevitably resulted a surplus of options; a sprawling catalog of services that is increasingly too much for developers to navigate without assistance. Q was built, in part, to provide that assistance. It may not be quite up to the job yet, but it offers a vision of how it could be.</p>
<p>In recent years the market has experienced something of a renaissance in what were once referred to as PaaS platforms. From Cosmonic to Fermyon to Fly.io to Netlify to Render to Vercel, the appetite for abstractions above base cloud primitives is growing, and growing quickly. What’s clear from the market is that there is unlikely to be a one sized fits all, general purpose PaaS platform capable of addressing an entire market. Instead, multiple specialized platforms with individual areas of focus and specialty have emerged. What’s less clear at this point is whether, and how, a conversational AI platform might emerge as another credible option in that space. Is there a future in which Q, for example, could shield users from a growing variety of development, implementation and operational tasks and serve as a de facto PaaS or something like it? Not for every workload, clearly, but then none of the would be next generation platforms can offer full coverage. That seems at least possible, and if executed would represent a potential solution to one of the company’s greatest current challenges: the size, scope and breadth of its own product catalog.</p>
<p>But that was not the vision that AWS chose to emphasize at its annual event. Whether that was out of a realization that Q wasn’t fully baked yet or out of a genuine belief that choice and optionality of models will be a compelling, market moving differentiator wasn’t clear. As competitors like the GitHub, Microsoft and OpenAI combination continue to push the boundaries of imbuing every aspect of the developer experience with AI, however, it will be a strategically important question for AWS to answer.</p>
<p><strong>Disclosure</strong>: AWS, GitHub, Google, Microsoft, Render, and Vercel are RedMonk clients. Cosmonic, Fermyon, Fly.io, Hugging Face and Netlify are not currently RedMonk clients.</p>
]]></content:encoded>
</item>
</channel>
</rss>
If you would like to create a banner that links to this page (i.e. this validation result), do the following:
Download the "valid RSS" banner.
Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)
Add this HTML to your page (change the image src
attribute if necessary):
If you would like to create a text link instead, here is the URL you can use:
http://www.feedvalidator.org/check.cgi?url=http%3A//redmonk.com/sogrady/feed/