Congratulations!

[Valid RSS] This is a valid RSS feed.

Recommendations

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

Source: http://redmonk.com/sogrady/feed/

  1. <?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
  2. xmlns:content="http://purl.org/rss/1.0/modules/content/"
  3. xmlns:wfw="http://wellformedweb.org/CommentAPI/"
  4. xmlns:dc="http://purl.org/dc/elements/1.1/"
  5. xmlns:atom="http://www.w3.org/2005/Atom"
  6. xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
  7. xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
  8. xmlns:georss="http://www.georss.org/georss"
  9. xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
  10. >
  11.  
  12. <channel>
  13. <title>tecosystems</title>
  14. <atom:link href="https://redmonk.com/sogrady/feed/" rel="self" type="application/rss+xml" />
  15. <link>https://redmonk.com/sogrady</link>
  16. <description>because technology is just another ecosystem</description>
  17. <lastBuildDate>Wed, 18 Jun 2025 17:10:20 +0000</lastBuildDate>
  18. <language>en-US</language>
  19. <sy:updatePeriod>
  20. hourly </sy:updatePeriod>
  21. <sy:updateFrequency>
  22. 1 </sy:updateFrequency>
  23. <generator>https://wordpress.org/?v=6.7.1</generator>
  24. <item>
  25. <title>The RedMonk Programming Language Rankings: January 2025</title>
  26. <link>https://redmonk.com/sogrady/2025/06/18/language-rankings-1-25/</link>
  27. <dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
  28. <pubDate>Wed, 18 Jun 2025 17:10:20 +0000</pubDate>
  29. <category><![CDATA[Programming Languages]]></category>
  30. <guid isPermaLink="false">https://redmonk.com/sogrady/?p=6076</guid>
  31.  
  32. <description><![CDATA[This iteration of the RedMonk programming Language Rankings is brought to you by Amazon Web Services. AWS manages a variety of developer communities where you can join and learn more about building modern applications in your preferred language. Even by our standards, dropping the Q1 programming language rankings the same month we run the Q3]]></description>
  33. <content:encoded><![CDATA[<blockquote><p>
  34.  This iteration of the RedMonk programming Language Rankings is brought to you by Amazon Web Services. AWS manages a variety of <a href="https://aws.amazon.com/developer/community">developer communities</a> where you can join and learn more about building modern applications in your preferred language.
  35. </p></blockquote>
  36. <p>Even by our standards, dropping the Q1 programming language rankings the same month we run the Q3 numbers is quite the delay. While the usual travel and school vacation delays have applied, however, the drawn out process in this case is deliberate on our part. As has been discussed in recent iterations of these rankings, the arrival of AI has had a significant and accelerating impact on Stack Overflow, which comprises one half of the data used to both plot and rank languages twice a year.</p>
  37. <p>My colleague Rachel has been studying this impact in detail and has more on it <a href="https://redmonk.com/rstephens/2025/06/18/stackoverflow">here</a>, but for our purposes it’s enough to know that Stack Overflow’s value from an observational standpoint is not what it once was, and that has a tangible impact as we’ll see. Still to be determined on our end is whether Stack Overflow should continue to be used, and if not what a reasonable alternative might be. Stay tuned for more details on that front when we get to the Q3 rankings, which will presumably be in Q1 of next year.</p>
  38. <p>In the meantime, however, as a reminder, this work is a continuation of the work originally performed by Drew Conway and John Myles White late in 2010. While the specific means of collection has changed, the basic process remains the same: we extract language rankings from GitHub and Stack Overflow, and combine them for a ranking that attempts to reflect both code (GitHub) and discussion (Stack Overflow) traction. The idea is not to offer a statistically valid representation of current usage, but rather to correlate language discussion and usage in an effort to extract insights into potential future adoption trends.</p>
  39. <h2>Our Current Process</h2>
  40. <p>The data source used for the GitHub portion of the analysis is the GitHub Archive. We query languages by pull request in a manner similar to the one GitHub used to assemble the State of the Octoverse. Our query is designed to be as comparable as possible to the previous process.</p>
  41. <ul>
  42. <li>Language is based on the base repository language. While this continues to have the caveats outlined below, it does have the benefit of cohesion with our previous methodology.</li>
  43. <li>We exclude forked repos.</li>
  44. <li>We use the aggregated history to determine ranking (though based on the table structure changes this can no longer be accomplished via a single query.)</li>
  45. <li>For Stack Overflow, we simply collect the required metrics using their useful data explorer tool.</li>
  46. </ul>
  47. <p>With that description out of the way, please keep in mind the other usual caveats.</p>
  48. <ul>
  49. <li>To be included in this analysis, a language must be observable within both GitHub and Stack Overflow. If a given language is not present in this analysis, that’s why.</li>
  50. <li>No claims are made here that these rankings are representative of general usage more broadly. They are nothing more or less than an examination of the correlation between two populations we believe to be predictive of future use, hence their value.</li>
  51. <li>There are many potential communities that could be surveyed for this analysis. GitHub and Stack Overflow are used here first because of their size and second because of their public exposure of the data necessary for the analysis. We encourage, however, interested parties to perform their own analyses using other sources.</li>
  52. <li>All numerical rankings should be taken with a grain of salt. We rank by numbers here strictly for the sake of interest. In general, the numerical ranking is substantially less relevant than the language’s tier or grouping. In many cases, one spot on the list is not distinguishable from the next. The separation between language tiers on the plot, however, is generally representative of substantial differences in relative popularity.</li>
  53. <li>In addition, the further down the rankings one goes, the less data available to rank languages by. Beyond the top tiers of languages, depending on the snapshot, the amount of data to assess is minute, and the actual placement of languages becomes less reliable the further down the list one proceeds.</li>
  54. <li>Languages that have communities based outside of Stack Overflow such as Mathematica will be under-represented on that axis. It is not possible to scale a process that measures one hundred different community sites, both because many do not have public metrics available and because measuring different community sites against one another is not statistically valid.</li>
  55. </ul>
  56. <p>With that, here is the first quarter plot for 2025.</p>
  57. <p><a href="http://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_.png"><img fetchpriority="high" decoding="async" src="http://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-1024x844.png" alt="" width="1024" height="844" class="aligncenter size-large wp-image-6077" srcset="https://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-1024x844.png 1024w, https://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-300x247.png 300w, https://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-768x633.png 768w, https://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-1536x1266.png 1536w, https://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-2048x1688.png 2048w, https://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-480x396.png 480w, https://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-761x627.png 761w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></p>
  58. <p>1 JavaScript<br />
  59. 2 Python<br />
  60. 3 Java<br />
  61. 4 PHP<br />
  62. 5 C#<br />
  63. 6 TypeScript<br />
  64. 7 CSS<br />
  65. 7 C++<br />
  66. 9 Ruby<br />
  67. 10 C<br />
  68. 11 Swift<br />
  69. 12 Go<br />
  70. 12 R<br />
  71. 14 Shell<br />
  72. 14 Kotlin<br />
  73. 14 Scala<br />
  74. 17 Objective-C<br />
  75. 18 PowerShell<br />
  76. 19 Rust<br />
  77. 20 Dart</p>
  78. <p>If you’re tracking from our last iteration of the rankings &#8211; and Rachel has the entire history of the Top 20 rankings charted <a href="https://redmonk.com/rstephens/2025/06/18/top20-jan2025">here</a>, the only change within our Top 20 languages is Dart dropping from a tie with Rust at 19 into sole possession of 20. In the decade and a half that we have been ranking these languages, this is by far the least movement within the top 20 that we have seen. While this is to some degree attributable to a general stasis that has settled over the rankings in recent years, the extraordinary lack of movement is likely also in part a manifestation of Stack Overflow’s decline in query volume. As that long time developer site sees fewer questions, it becomes less impactful in terms of driving volatility on its half of the rankings axis, and potentially less suggestive of trends moving forward. As mentioned above, we’re not yet at a point where Stack Overflow’s role in our rankings has been deprecated, but the conversations at least are happening behind the scenes.</p>
  79. <p>With that, some results of note:</p>
  80. <ul>
  81. <li><strong>TypeScript</strong> (6): even acknowledging the general lack of movement within, it’s notable that TypeScript has effectively stalled just outside the Top 10. On the one hand, it can piggyback on the ubiquity of JavaScript while offering important safety provisions, but on the other, it has a reputation of not scaling particularly well. This reputation, in fact, has led Microsoft to reimplement the TypeScript compiler and tools in Go. The question now is whether this reimplementation will lead to greater performance, leading to greater adoption and more usage, or whether the fact that Microsoft felt it needed to be reimplemented in the first place could throw shade on the language. It will be interesting to watch, assuming we have data enough to observe any potential impact.</p>
  82. </li>
  83. <li>
  84. <p><strong>Kotlin</strong> (14) / <strong>Scala</strong> (14): both the JVM-based languages held their gains from our last ranking and it’s unclear what their prospects are for moving up more significantly. In 2015, when Go entered our rankings at 17, Scala was at 14 and jumped up briefly to 11 two years later. In 2023, however, Go passed Scala &#8211; having already been ranked above Kotlin &#8211; and has maintained that role ever since. And with Go finding new fans in companies like Microsoft and Rust making gains among other server side workloads, particularly those with security concerns, Kotlin and Scala’s growth paths are not assured.</p>
  85. </li>
  86. <li>
  87. <p><strong>Dart</strong> (20) / <strong>Rust</strong> (19): while Dart technically dropped one spot, that far down the rankings the actual differences are marginal at best. These two languages, which have little to nothing in common and are aimed at very different users and workloads, have tended to move in lockstep and this quarter’s run does not represent much of an exception in that regard.</p>
  88. </li>
  89. <li>
  90. <p><strong>Ballerina</strong> (64) / <strong>Bicep</strong> (79) / <strong>Grain</strong> / <strong>Moonbit</strong> / <strong>Zig</strong> (86): among the “languages we’re paying attention to” set, there was little more movement than within our Top 20, and for the most part movement among them was down. Grain and Moonbit remained unranked, while Ballerina dropped from 61 to 64, Bicep dropped from 78 to 79. Zig, however, did manage to jump, if only one spot from 87 to 86 &#8211; it probably does not hurt that Mitchell Hashimoto is a <a href="https://x.com/mitchellh/status/1841167210896900266?lang=en">major fan</a>. It is worth noting for these emerging languages, however, that they may be disproportionately impacted by Stack Overflow’s decline. In every case where the languages are ranked, they perform better within our GitHub rankings than they do within Stack Overflow’s: 62 vs 66 for Ballerina, 69 vs 73 for Bicep and 70 vs 83 for Zig. In Zig’s case in particular, then, it is possible that faster growth in code as measured by GitHub is being dragged down by the steep decline in query volume on Stack Overflow. Which is yet another reason why we’re carefully evaluating our options moving forward, but in the meantime we’ll keep all of these languages on our “to watch” list.</p>
  91. </li>
  92. </ul>
  93. <p><strong>Credit</strong>: My colleague Rachel Stephens wrote the queries that are responsible for the GitHub axis in these rankings. She is also responsible for the query design for the Stack Overflow data.</p>
  94. ]]></content:encoded>
  95. </item>
  96. <item>
  97. <title>Beyond Code: APIs as the Next OSS Battleground</title>
  98. <link>https://redmonk.com/sogrady/2025/06/09/open-source-apis/</link>
  99. <dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
  100. <pubDate>Mon, 09 Jun 2025 19:55:36 +0000</pubDate>
  101. <category><![CDATA[Open Source]]></category>
  102. <guid isPermaLink="false">https://redmonk.com/sogrady/?p=6073</guid>
  103.  
  104. <description><![CDATA[On August 13th, 2010, Oracle sued Google over copyright and patent infringement claims relating to the reimplementation of the Java runtime within its Android platform. The suit took over a decade to resolve, and had several major twists and turns, but ultimately the Supreme Court decided in Google’s favor on April 5th, 2021. Among the]]></description>
  105. <content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279.jpg"><img decoding="async" src="http://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279-683x1024.jpg" alt="" width="683" height="1024" class="aligncenter size-large wp-image-6074" srcset="https://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279-683x1024.jpg 683w, https://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279-200x300.jpg 200w, https://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279-768x1151.jpg 768w, https://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279-1025x1536.jpg 1025w, https://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279-480x720.jpg 480w, https://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279-418x627.jpg 418w" sizes="(max-width: 683px) 100vw, 683px" /></a></p>
  106. <p>On August 13th, 2010, Oracle sued Google over copyright and patent infringement claims relating to the reimplementation of the Java runtime within its Android platform. The suit took over a decade to resolve, and had several major twists and turns, but ultimately the Supreme Court decided in Google’s favor on April 5th, 2021. Among the items at stake in this trial were the question of whether APIs were copyrightable, which is another way of saying the immediate future of the technology industry hung in the balance.</p>
  107. <p>In its decision, the Supreme Court did not declare APIs immune from copyright, but rather held that Google’s use of the Java APIs constituted fair use. While it was not a total victory for those who would see APIs explicitly walled off from such concerns, it significantly raised the bar for legal challenges based on competitive usage of APIs. This was immediately relevant, as a loss would have almost certainly led to a widespread chilling effect across APIs industry-wide.</p>
  108. <p>But Google vs Oracle is also critical to what may be the next front in the ongoing conflict between open source and commercial open source: APIs.</p>
  109. <p>Those who have tracked popular open source projects such as PostgreSQL have likely heard a familiar observation amongst authors of the original project: that a database, for example, with Postgres API compatibility is not the same as  a Postgres database. Databases that offer Postgres compatibility like AWS’ Aurora or Google’s AlloyDB, these fans argue, may not be fully compatible because of slight differences between the implementations, feature additions or omissions and more.</p>
  110. <p>What cannot be argued is that the API for a large, successful and widely adopted software project is an enormously valuable asset. What might be argued is that it is possible, in certain cases, that the API is more valuable than the underlying code it represents. The underlying code for an API can and has been reimplemented in clean room settings, while the API must be a fixed point for developers.</p>
  111. <p>With large projects that are maintained by multiple third parties such as Postgres,  the potential friction from API reimplementations is minimal. By virtue of being a project worked on by many commercial vendors, there is no real exclusivity offered or claimed by the API.</p>
  112. <p>The dynamics for single entity projects or open source projects developed primarily or solely by a single vendor, however, are quite another matter.</p>
  113. <p>For many years now, open source projects and database projects specifically have developed a pattern or lifecycle from a licensing standpoint. Initial development is conducted under a typically permissive open source license, in which control is traded for usage and distribution growth. Once certain usage thresholds are met, and attract commensurate funding &#8211; venture or otherwise &#8211; permissive licenses are discarded in favor of licenses offering much stronger protections, up to and past the edge of what the definition of open source permits. These licensing “rug pulls” may have eased somewhat, in that commercial vendors <a href="https://redmonk.com/sogrady/2025/05/06/oss-forward-back/">appear to be pulling back</a> from source available licenses and finding an equilibrium around the strongest copyleft license in the AGPL, but the justification is the same: exclusivity.</p>
  114. <p>In short, whether it’s the AGPL or non-open source, source available  alternatives, the end goal for relicensing is to try and capture the vast majority or entirety of the revenue associated with a given open source project rather than share it with other vendors, particularly large hyperscalers. Many source available licenses explicitly forbid other companies from monetizing the licensed code. The AGPL, meanwhile, does not forbid third parties from monetizing a given codebase, but it does require them to share any changes or fixes they make &#8211; a practice that many avoid as a rule. Thus a project can be technically open, but practically speaking only monetized by the original author of a given project.</p>
  115. <p>But what about their APIs?</p>
  116. <p>In January of 2019, AWS released a long suspected new database, DocumentDB. It was, as might be guessed, a document database, and one specifically that offered some MongoDB compatibility. MongoDB had, one quarter prior, relicensed its database from the AGPL to the much more expansive SSPL. This was ostensibly an effort to thwart competition from the likes of AWS, but the timing made it clear that AWS wanted no part of even the less protective AGPL and had instead done a clean room reimplementation of MongoDB’s API to offer a datastore theoretically compatible with Mongo, but built on their own stack not subject to the requirements of the AGPL &#8211; or the SSPL for that matter. .</p>
  117. <p>This all having taken place almost two years before the landmark Google v Oracle decision, however, AWS was very careful to state that its API compatibility was only up to the version last licensed as the AGPL. No one at the time had any real legal certainty on whether APIs were copyrightable and thus proprietary.</p>
  118. <p>In the years since, as discussed, the industry does not have certainty, precisely, but it has made assumptions in the wake of the trial, one of them being that APIs are for all intents and purposes non-proprietary.</p>
  119. <p>Which brings us to <a href="https://www.mongodb.com/blog/post/building-for-developers-not-imitators">this news</a> from late May, in which MongoDB announced that they had asked FerretDB to “<em>stop engaging in unfair business practices</em>.” Their claims are based on assertions that Ferret:</p>
  120. <ul>
  121. <li><em>Misleads and deceives developers by falsely claiming that its product is a “replacement” for MongoDB “in every possible way</em> and</li>
  122. <li><em>FerretDB has infringed upon MongoDB’s patents</em>.</li>
  123. </ul>
  124. <p>Two things stand out immediately. First, that Mongo’s claims ultimately reduce to trademark and patent infringement matters, and second that neither API nor copyright are mentioned once. Setting aside the relative merits or lackthereof of these claims, which are best left to those with legal backgrounds, courts or both, the important question is whether this case is a one off or the shape of things to come.</p>
  125. <p>Commercial open source projects have struggled to maximize their revenue exclusivity for years, primarily through the aforementioned series of relicensing efforts. Those efforts, however, are based on copyright as it applies to source code. If copyright doesn’t apply to APIs, or if the bar for fair use is low enough to be easily achievable from a legal standpoint, that may suggest a future in which competitive third parties “<a href="https://en.wikipedia.org/wiki/Leapfrogging_(strategy)">island hop</a>” the source code and go straight for the APIs. Given the size and usage base of some of the commercial open source projects, the economic incentives to do so are substantial indeed. APIs are ultimately a door for developers, and if that door can open to your products as easily as the original author’s, that will likely be of interest regardless of what the license on the original source code might be.</p>
  126. <p>The upside to the Google v Oracle ruling was clear, in that an industry in which every last programming interface was considered proprietary would be a tectonic, systemic shock. The downside, though, is that we now have to hope that we don’t see a resurgence in interest in “<a href="https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguish">embrace, extend, extinguish</a>” efforts from large third parties trying to co-opt open source projects and user bases.</p>
  127. <p>Either way, it seems likely that the next wave of conflict won’t be over licenses pertaining to code, but the APIs they implement.</p>
  128. <p><strong>Disclosure</strong>: AWS, Google, MongoDB and Oracle are RedMonk customers. FerretDB is not currently a RedMonk customer.</p>
  129. ]]></content:encoded>
  130. </item>
  131. <item>
  132. <title>Everyone Gets a Database</title>
  133. <link>https://redmonk.com/sogrady/2025/06/06/data-consolidation/</link>
  134. <dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
  135. <pubDate>Fri, 06 Jun 2025 16:50:39 +0000</pubDate>
  136. <category><![CDATA[Databases]]></category>
  137. <guid isPermaLink="false">https://redmonk.com/sogrady/?p=6072</guid>
  138.  
  139. <description><![CDATA[Once upon a time, there were software categories called Application Performance Monitoring (APM) and Logging. They each involved the collection of large volumes of telemetry data, used among other things for the purpose of understanding and attacking problems at varying layers of the enterprise application stack. As time passed and infrastructure grew more distributed and]]></description>
  140. <content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/files/2021/10/pxfuel.com_-scaled.jpg"><img decoding="async" src="http://redmonk.com/sogrady/files/2021/10/pxfuel.com_-1024x683.jpg" alt="Pendulum" width="1024" height="683" class="aligncenter size-large wp-image-5924" srcset="https://redmonk.com/sogrady/files/2021/10/pxfuel.com_-1024x683.jpg 1024w, https://redmonk.com/sogrady/files/2021/10/pxfuel.com_-300x200.jpg 300w, https://redmonk.com/sogrady/files/2021/10/pxfuel.com_-768x512.jpg 768w, https://redmonk.com/sogrady/files/2021/10/pxfuel.com_-1536x1024.jpg 1536w, https://redmonk.com/sogrady/files/2021/10/pxfuel.com_-2048x1365.jpg 2048w, https://redmonk.com/sogrady/files/2021/10/pxfuel.com_-480x320.jpg 480w, https://redmonk.com/sogrady/files/2021/10/pxfuel.com_-941x627.jpg 941w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></p>
  141. <p>Once upon a time, there were software categories called Application Performance Monitoring (APM) and Logging. They each involved the collection of large volumes of telemetry data, used among other things for the purpose of understanding and attacking problems at varying layers of the enterprise application stack.</p>
  142. <p>As time passed and infrastructure grew more distributed and applications more complex, a new software category gradually emerged: observability.  Its aim was to provide those charged with running applications and infrastructure a more nuanced, granular and integrated view of software problems that might be known or unknown, ephemeral and/or involve multiple layers of a given stack. This approach proving effective, the category attracted more attention, and consequently money &#8211; both investment and revenue &#8211; began to flow more freely.</p>
  143. <p>Unsurprisingly, then, vendors in the APM and Logging categories concluded that this newly emerging adjacent market represented both a logical extension to their existing capabilities as well as a potentially lucrative growth opportunity. Rather than leave money on the table, many vendors in these spaces grew sideways into the observability market competing with native observability players.</p>
  144. <p>This is, in its own way, both a case study in market consolidation as well as just another in a long line of such cases. As is, increasingly, the consolidation we see within the database and data platforms market.</p>
  145. <hr />
  146. <p>Almost four years ago, it <a href="https://redmonk.com/sogrady/2021/10/26/general-purpose-database/">became apparent</a> that the pendulum that had swung away from general purpose databases and towards an array of specialized datastores had reversed and was well into its return trajectory. As captivating as the idea and abilities of bespoke databases built for a singular purpose was, the reality of both developing to and continually operating multiple databases (as well as significantly expanding the vendor procurement footprint) had set in. Enterprises and developers alike, though for reasons that had little in common, increasingly advantaged databases that could handle multiple workloads through a single engine and interface.</p>
  147. <p>In the years since, that directional shift has not slowed. If anything, it’s accelerated. Database consolidation continues apace, and single workload databases are increasingly the exception rather than the rule.</p>
  148. <p>There is a larger question facing the data sector, however: where and how will data lakes(houses) and databases collide? Recent events are suggestive, but the history is contradictory.</p>
  149. <p>Five years ago, MongoDB &#8211; born as a document database but having since added the ability to handle workloads well beyond that including search, stream and vector &#8211; announced a new set of capabilities including a data lake product. This was an early effort to begin to converge the database with large scale data stores underneath them. Two years after that, they followed up that announcement with refinements on both the object storage and analytical query fronts.</p>
  150. <p>Large scale data storage and databases, it seemed, would follow the macro trend and converged towards a single interface with the added bonus that procurement would only have to deal with one vendor. The trajectory seemed clear.</p>
  151. <p>Clear, except that last fall MongoDB announced that it was deprecating its  data lake offering, and that it would be end-of-lifed within a year, or three months from today. The question then became whether MongoDB’s deprecated effort to merge database with data platform was the outlier, or the shape of things to come.</p>
  152. <p>That answer won’t be evident for some time, but two notable acquisitions suggest that convergence may yet be on the way.</p>
  153. <ul>
  154. <li>One month ago on May 14th, Databricks acquired the database company Neon &#8211; a serverless Postgres database vendor that was well thought of amongst friends of RedMonk. The acquisition cost was $1B. </li>
  155. <li>Earlier this week, meanwhile, its biggest rival Snowflake agreed to acquire another Postgres database vendor, Crunchy Data, for $250M. </li>
  156. </ul>
  157. <p>These two dueling is, of course, nothing new. See, for example, their competition over the DBRX (Databricks) and Arctic (Snowflake) models, or the tug of war over Tabular &#8211; ultimately acquired by Databricks for $2B.</p>
  158. <p>Both companies obviously see a future in which AI plays a if not the critical role with respect to data, which is logical given that AI is built on and from a high volume of data and that AI advantages existing data incumbents for both trust and data gravity reasons. But then again every software category today is making enormous bets on AI.</p>
  159. <p>It is notable, however, that both vendors just as clearly see traditional database capabilities &#8211; PostgreSQL capabilities in particular &#8211; as likely to become, if they are not already, table stakes. Convergence, put simply, is the goal.</p>
  160. <p>The challenge for these data platforms, as it was for MongoDB when it launched its data lake product, however, is market permission. While it makes all the sense in the world on paper for data platforms and databases to come together, markets do not always follow what makes sense on paper and do not always embrace a new product in a market new to the vendor. Enterprises are cautious about investing in products offered by vendors whose fundamental DNA lies in an entirely distinct market with different expectations. And from the seller’s side of the equation, vendors need to learn how to go to market and sell to different users and different buyers with differing sets of concerns.</p>
  161. <p>It is too early to say whether or not Databricks and Snowflake will be granted permission to compete directly in database markets, or that they have or will acquire the ability to do so efficiently. But they’re collectively betting a billion and a quarter dollars that MongoDB had the right of it back in 2020, not in 2024, and that the market wants data lakes and databases offered by the same, single supplier.</p>
  162. <p>They’re making the same bet, in other words, that APM and Logging companies made when they implicitly argued that observability should be a feature of their existing product rather than a brand new market of its own.</p>
  163. <p><strong>Disclosure</strong>: Crunchy Data and MongoDB are RedMonk customers. Databricks and Snowflake are not currently RedMonk customers.</p>
  164. ]]></content:encoded>
  165. </item>
  166. <item>
  167. <title>OSS: Two Steps Forward, One Step Back</title>
  168. <link>https://redmonk.com/sogrady/2025/05/06/oss-forward-back/</link>
  169. <dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
  170. <pubDate>Tue, 06 May 2025 15:20:21 +0000</pubDate>
  171. <category><![CDATA[Open Source]]></category>
  172. <guid isPermaLink="false">https://redmonk.com/sogrady/?p=6070</guid>
  173.  
  174. <description><![CDATA[In October of 2018, MongoDB relicensed its previously open source database to a new source available license of its own creation. Up until that point, the license governing the project had been the AGPL, an OSI approved open source license that took the copyleft provisions of the GPL and extended them into applications hosted on]]></description>
  175. <content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1.png"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1-1024x1022.png" alt="" width="1024" height="1022" class="aligncenter size-large wp-image-5955" srcset="https://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1-1024x1022.png 1024w, https://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1-300x300.png 300w, https://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1-150x150.png 150w, https://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1-768x767.png 768w, https://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1-480x479.png 480w, https://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1-628x627.png 628w, https://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1.png 1026w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></p>
  176. <p>In October of 2018, MongoDB relicensed its previously open source database to a new source available license of its own creation. Up until that point, the license governing the project had been the AGPL, an OSI approved open source license that took the copyleft provisions of the GPL and extended them into applications hosted on networks.</p>
  177. <p>In simple terms, where the GPL’s reciprocal provisions included software packaged and distributed in binary fashion, they did not apply to modified GPL projects hosted and made available over networks. If you wanted to distribute a new version of the Linux kernel, for example, you were required to make those changes available under the same terms. If you hosted applications on the internet running on a modified Linux kernel, however, you did not.</p>
  178. <p>The AGPL was explicitly designed to close this so-called loophole. Historically, however, usage of the license has been rare relative to other open source licenses of both the copyleft and permissive varieties. This was in part because large internet companies strictly forbade its usage for fear of running afoul of the extended protections.</p>
  179. <p>For MongoDB, however, the protections afforded by the AGPL did not go far enough. As a result, they set out to craft a license that followed in the AGPL’s footsteps by not applying reciprocal provisions to software hosted in a network fashion, but dramatically expanding the scope of these protections beyond the boundaries of the protected software itself and into adjacent software.</p>
  180. <p>Specifically, the license says the following (emphasis added):</p>
  181. <blockquote><p>
  182.  “Service Source Code” means the Corresponding Source for the Program or the modified version, and the Corresponding Source for <em>all programs that you use to make the Program or modified version available as a service, including, without limitation, management software, user interfaces, application program interfaces, automation software, monitoring software, backup software, storage software and hosting software, all such that a user could run an instance of the service using the Service Source Code you make available.</em>
  183. </p></blockquote>
  184. <p>The AGPL strictly governs a given project’s codebase, then, while the SSPL extends itself to any immediately adjacent software. If large internet companies &#8211; and cloud providers in particular &#8211; were averse to the AGPL, then, the SSPL was a non-starter. In practical terms it is nearly impossible to comply with the terms of the license, and the reach is clearly at odds  with if not totally irreconcilable with the ninth requirement of the <a href="https://opensource.org/osd">open source definition</a>.</p>
  185. <blockquote><p>
  186.  The license must not place restrictions on other software that is distributed along with the licensed software. For example, the license must not insist that all other programs distributed on the same medium must be open source software.
  187. </p></blockquote>
  188. <p>MongoDB initially attempted to work with the OSI to have the SSPL accepted and approved as an open source license, but the combination of a flawed review process and the license’s fundamental nature led to the eventual abandonment of those efforts. The license instead remains source available, not open source.</p>
  189. <p>Other commercial vendors seeking exclusivity, however, began to take up the source available license at the expense of open source alternatives. In January of 2021, Elastic moved away from the permissive Apache license to a dual license strategy, one license of which was the SSPL. In March of 2024, Redis did the same.</p>
  190. <p>Recently, however, the SSPL’s trajectory appears to have been altered. In September of last year, Elastic added the AGPL as another licensing option &#8211; effectively deprecating the more restrictive SSPL. Then this year on May 1st, Redis again followed in Elastic’s footsteps and added the AGPL as an option. In describing the thought process behind the move, Salvatore Sanfilippo &#8211; the original author of Redis &#8211; <a href="https://antirez.com/news/151">said</a>:</p>
  191. <blockquote><p>
  192.  My feeling was that the SSPL, in practical terms, failed to be accepted by the community. The OSI wouldn’t accept it, nor would the software community regard the SSPL as an open license. In little time, I saw the hypothesis getting more and more traction, at all levels within the company hierarchy.
  193. </p></blockquote>
  194. <p>No mention was made of the unique Redis fork <a href="https://redmonk.com/sogrady/2024/07/16/post-valkey-world/">Valkey</a> in that post, but it appears in the comments and the idea that it had no role in the internal discussions on the license choice is implausible. Regardless of the motivation, the decision to return to an open source license was a consequential one and further evidence of a changing trajectory for the license and for open source.</p>
  195. <p>Nor is it just SSPL projects moving to the AGPL. Grafana and MinIO previously moved from Apache licenses to the AGPL in April and May of 2021, respectively. The Zitadel project, meanwhile, did the same in March.</p>
  196. <p>What do these moves collectively suggest, then, about the health of open source? Two things, at least, are implied.</p>
  197. <ul>
  198. <li>First, that commercial open source vendors continue to seek stronger protections for code they have authored. In the last decade plus, permissive licenses saw <a href="https://redmonk.com/sogrady/2017/01/13/the-state-of-open-source-licensing/">substantial jumps</a> in their usage, and because licensing has historically been more fashion statement than rigorous analysis, each new permissively licensed project encouraged the next. In recent years and amongst commercially backed projects, however, there has been a backlash. Company after company leveraged permissive licenses initially in a bid to gain ubiquity, then ratcheted up protections with new licenses as they transitioned from a focus on growth to exclusivity of value capture.
  199. <p>That much has been apparent for years. The real question was where the equilibrium would be found: in OSI approved open source licenses, or in source available alternatives? The available evidence at this time suggests that the AGPL may be that equilibrium, combining OSI-approval with staunch protections and disincentives for large clouds, among other potential competitors.</p>
  200. </li>
  201. <li>
  202. <p>And speaking of protections and large clouds, the second implication is that the AGPL is now viewed as sufficiently protective. A large part of the original justification for the SSPL and other source available alternatives was the need for protections from large, hyperscale clouds picking up permissively licensed projects and using their greater resources and distribution to advantage themselves over the original authors. It was even claimed by individuals outside of MongoDB, in fact, that AWS’ DocumentDB &#8211; which replicated MongoDB’s API without leveraging the MongoDB codebase &#8211; was a demonstration of the power and necessity of the SSPL. The timeline, however, does not support this argument. The SSPL was first applied to MongoDB in October 2018. DocumentDB, meanwhile, was released in January of 2019. Even at the speed AWS operates, they did not write, stand up, test and release a new database service in two months. It’s clear that for AWS, at least, the AGPL was sufficient disincentive to avoid the codebase. Questions remain as to whether the AGPL is a deterrent strong enough for <a href="https://bsky.app/profile/msw.bsky.social/post/3lo6zd3awxs2v">Chinese cloud providers</a>, then, but rightly or wrongly the market seems to be settling on the AGPL as the commercial open source license of choice, not the SSPL.</p>
  203. <p>Which, even if some are disappointed that the AGPL is being used effectively as a proprietary software license, represents a win for open source. It also suggests that the next competitive frontier, thanks to <a href="https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_Inc.?wprov=sfti1">Google v Oracle</a>, will not be codebases but APIs, but that’s a subject for another time.</p>
  204. </li>
  205. </ul>
  206. <p>For now, it’s necessary to examine what represented a clear loss for open source, which was the drama surrounding the CNCF, NATS and Synadia. In late April, a conflict bubbled to the surface that would better have been kept private. In short, there were allegations that Synadia &#8211; the principal authors of NATS &#8211; wanted to withdraw its donation from the CNCF, including the trademarks. All is now putatively <a href="https://www.cncf.io/announcements/2025/05/01/cncf-and-synadia-align-on-securing-the-future-of-the-nats-io-project/">well</a> between the two organizations, and NATS and its trademark will remain with the CNCF with Synadia’s stewardship.</p>
  207. <p>But the flareup starkly revealed traditional fault lines in the wider open source community around the role of foundations. For many, this situation provided an opportunity not to protest the alleged about face but rather to attack foundations generally and the CNCF specifically for their shortcomings, both perceived and real.</p>
  208. <p>Generally, critiques of foundations are appropriate. While RedMonk’s view is generally that foundations are <a href="https://redmonk.com/jgovernor/2024/09/13/open-source-foundations-considered-helpful/">useful</a> and indeed vital, they are imperfect institutions that often struggle to balance the competing needs of vendors, enterprises and individual developers. Attention on where and how these “worst solutions except everything else that’s been tried,” to paraphrase Churchill, can be improved and refined is an important exercise.</p>
  209. <p>The time for that exercise, here at least, is not when their existence is existentially threatened. Whatever else one may think of them, objectively speaking foundations exist as an external home for software, one in which certain guarantees and commitments are held sacrosanct, and in which commercial entities can trust and, optionally, collaborate. Vendors that choose to donate projects to foundations do so understanding, or at least should, that donation is a one way door. If commercial organizations can commit a project to this neutral third party, and then unilaterally withdraw from the foundation whenever it suits them, the guarantees of neutrality that foundations provide are immediately rendered worthless. The idea, then, the CNCF or any other foundation could blithely let a disaffected project depart was genuinely appalling to hear, akin to condoning the secession of states. However one feels about how a foundation has assisted or not assisted a member project, the idea that the irrevocable promise companies knowingly and willingly make when donating a project and its trademark to a foundation is actually, depending on the circumstances, revocable was genuinely shocking. It was also a sign that widespread bitterness towards and distaste for foundations remains a stronger force than may be commonly understood or appreciated.</p>
  210. <p>While open source appears to have made progress towards a consensus around open source licenses that are commercially acceptable, then, the NATS storm was a black eye for open source broadly. Between the simmering antagonism towards foundations at scale, the direct attacks on the CNCF specifically and the revelations about NATS’ performance and Synadia’s alleged behaviors, it was not a great week for open source.</p>
  211. <p>Two steps forward, one step back.</p>
  212. <p><strong>Disclosure</strong>: AWS and MongoDB are RedMonk customers. The CNCF, Elastic, Grafana, MinIO and Redis are not currently RedMonk customers.</p>
  213. ]]></content:encoded>
  214. </item>
  215. <item>
  216. <title>Nothing Permanent Except Change</title>
  217. <link>https://redmonk.com/sogrady/2025/04/16/kelly/</link>
  218. <dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
  219. <pubDate>Wed, 16 Apr 2025 19:25:09 +0000</pubDate>
  220. <category><![CDATA[RedMonk Miscellaneous]]></category>
  221. <guid isPermaLink="false">https://redmonk.com/sogrady/?p=6068</guid>
  222.  
  223. <description><![CDATA[Thanks to some divided attention on our part from having Monki Gras, Kubecon and Google Next back to back to back &#8211; not to mention a child that suddenly and unexpectedly got sick at school this week &#8211; this post is coming out a few days later than we’d originally planned, but it is time]]></description>
  224. <content:encoded><![CDATA[<p>Thanks to some divided attention on our part from having Monki Gras, Kubecon and Google Next back to back to back &#8211; not to mention a child that suddenly and unexpectedly got sick at school this week &#8211; this post is coming out a few days later than we’d originally planned, but it is time to let our community know that Kelly Fitzpatrick is leaving RedMonk.</p>
  225. <p>When she joined us seven years ago, our hope was that Kelly would bring in equal parts a modern Georgia Tech-honed tech comms professor with her training as a medieval historian to the tech field, and that’s exactly what she did. From her passion for good documentation to her interviews and instructor-quality explanations of various tech trends across all forms of media, Kelly helped audiences &#8211; both developer and executive &#8211; understand technology better. And given her crazy travel schedule over the years and her enthusiasm for hosting our traditional RedMonk beers, many of you have probably hoisted a pint or two with her as well. Our clients have enjoyed working with her, and so have we.</p>
  226. <p>But now it’s time for her to continue to grow in a new role, and while her news is not ours to share, suffice it to say that she’s not going too far.</p>
  227. <p>We’ll undoubtedly be tackling the difficult job of finding someone to fill her shoes at some point in the near future, but we’re not opening the hiring floodgates immediately because we need some time to recover from a couple of very busy months and because, well, [<em>gestures broadly at everything</em>]. When we and the world around us have both had a chance to  catch our breath, we’ll begin thinking about who the next monk might be in earnest, but until then please join us in wishing Kelly a fond farewell. Thanks for all the hard work, Kelly, and best of luck in your next stop.</p>
  228. ]]></content:encoded>
  229. </item>
  230. <item>
  231. <title>DeepSeek and the Enterprise</title>
  232. <link>https://redmonk.com/sogrady/2025/01/27/deepseek-and-the-enterprise/</link>
  233. <dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
  234. <pubDate>Mon, 27 Jan 2025 22:16:55 +0000</pubDate>
  235. <category><![CDATA[AI]]></category>
  236. <guid isPermaLink="false">https://redmonk.com/sogrady/?p=6065</guid>
  237.  
  238. <description><![CDATA[A little over one month ago on December 25th, a Chinese AI lab little known in the US dropped some new weights, following with the model card and paper the next day. Four days ago, NVIDIA was worth around $3.6T. At one point today, it was down to $2.9T &#8211; still an astronomical sum, to]]></description>
  239. <content:encoded><![CDATA[<a href="http://redmonk.com/sogrady/files/2025/01/image-from-rawpixel-id-5950552-original-scaled.jpg"><img decoding="async" src="http://redmonk.com/sogrady/files/2025/01/image-from-rawpixel-id-5950552-original-1024x685.jpg" alt="" class="size-large wp-image-6066" /></a>
  240. <p>A little over one month ago on December 25th, a Chinese AI lab little known in the US dropped some <a href="https://huggingface.co/deepseek-ai/DeepSeek-V3-Base">new weights</a>, following with the <a href="https://github.com/deepseek-ai/DeepSeek-V3/blob/main/README.md">model card</a> and paper the next day.</p>
  241. <p>Four days ago, NVIDIA was worth around $3.6T. At one point today, it was down to $2.9T &#8211; still an astronomical sum, to be sure, but a market capitalization representing an almost equally epic market correction.</p>
  242. <p>Just what happened in those thirty-three days?</p>
  243. <p>The Chinese lab, of course, was DeepSeek. Keen observers <a href="https://simonwillison.net/2024/Dec/26/deepseek-v3/">realized</a> immediately that its qualities aside, the real import were the economics it represented. Per Willison’s numbers, DeepSeek v3 was a model some 40% larger than Meta’s Llama 3.1, but trained on roughly 9% as many GPU hours.</p>
  244. <p>This matters because GPUs are both expensive and difficult to source. The best hardware, in fact, is theoretically unavailable in China because of the United States government’s chip ban. The DeepSeek team responded to this challenge by deeply seeking efficiencies, and apparently found them.</p>
  245. <p>Seven days ago, the DeepSeek team released a new model, R1, a reasoning model comparable to OpenAI’s o1. That’s when things began to move quickly, because not only were DeepSeek’s training costs transformative economically, it was now bumping up against the performance of the best models the US had to offer. And unlike those models, DeepSeek’s were open. All carried the MIT license, which would theoretically make them legitimate open source software, but certain versions were trained on Llama, which is not open source software, and which in turn means that models trained on it cannot be considered open source. Likewise, we don’t have the original training data.</p>
  246. <p>Regardless of whether they meet the technical definition, however, DeepSeek dropped models that were or claimed to be truly open, highly capable and game changers from an efficiency and thus cost standpoint. It took a couple of days for the market to evaluate some of these claims, and while there is much that we still don’t know about the models, engineers who’ve taken them apart in detail have come away impressed &#8211; to the point that there have been rumors of near panic within the leaders of those engineers at large, public AI shops on sites like Blind.</p>
  247. <p>Which is why, when the market opened today, the bottom fell out for anything tangentially related to AI, with the NASDAQ closing down 3.1%. NVIDIA was far from the only tech company hammered by the market’s <a href="https://sherwood.news/markets/quick-and-dirty-timeline-of-markets-deepseek-freak/">freak out</a> today, but they were the most prominent because they are the proverbial 800 pound gorilla in the market for AI chips. With great success comes great visibility, for better and for worse.</p>
  248. <p>While this decimation will likely prove to be a short term overreaction, and  more temperate market corrections should be forthcoming in the coming days, if DeepSeek’s claims continue to be validated, this is an inflection point from an industry standpoint. Many have been examining the higher level industry and geopolitical implications from this news &#8211; if you’re a Stratechery subscriber, as one example, Ben Thompson has a good FAQ <a href="https://stratechery.com/2025/deepseek-faq/">here</a> &#8211; and we’ll all continue to sift through the fallout for weeks and months to come.</p>
  249. <p>One aspect that has not seemed to attract much attention, however, are the implications for enterprise buyers and their relationships with the large, existing model providers such as Anthropic, AWS, Google, Microsoft and OpenAI.</p>
  250. <p>While enterprise AI efforts’ biggest problem to date has not been the technology but understanding where and how best to apply it, they have had two critical concerns with respect to AI’s large, broadly capable foundational models.</p>
  251. <ul>
  252. <li>First, and most obviously, is trust. Enterprises recognize that to maximize the benefit from AI, they need to be able to grant access to their own, internal data. Generally speaking, however, they have been unwilling to do this at scale. Vendor promises notwithstanding, one common pattern of adoption has been a limited proof of concept executed on a public model, and then going to production with a privately hosted and managed equivalent that has the data access it requires.</p>
  253. </li>
  254. <li>
  255. <p>Second has been cost. Enterprises have been shocked, in many cases, at the unexpected costs &#8211; and unclear returns &#8211; from some scale investments in AI. While bringing AI back in house has offered some hope of cost reductions, internal capabilities are expensive and GPUs, as mentioned, have been difficult to acquire. This is presumably why AWS CEO Matt Garman said at reInvent, “On prem data environments are not well suited for AI. They’re just not.”</p>
  256. </li>
  257. </ul>
  258. <p>Enterprises that want to embrace AI, in other words, have reasons to want to do so on their own infrastructure. But that has posed its own set of challenges, challenges which have led many enterprises to scale back their ambitions and turn their eyes from large, expensive foundational models to small, more cost efficient and easily trained alternatives. An approach which has been compelling to users who are employing AI tactically to solve a narrow, discrete problem rather than strategically.</p>
  259. <p>DeepSeek, however, challenges these core assumptions.</p>
  260. <ul>
  261. <li>What if enterprises didn’t have to rely on closed, private models for leading edge capabilities? </li>
  262. <li>What if training costs could be reduced by an order of magnitude or more? </li>
  263. <li>What if they did not require expensive, state of the art hardware to run their models?   </li>
  264. </ul>
  265. <p>DeepSeek’s most advanced model has been available for seven days. And as stated, there is a great deal of testing and experimentation ahead &#8211; and doubtless many enterprises will have concerns for geopolitical reasons about a model trained in China on unknown data sources. But if DeepSeek’s technical and efficiency promises hold, the challenge for AI vendors may not just be for ultimate model supremacy, but for the enterprise market they’ll need to justify their sky high valuations.</p>
  266. <p><strong>Disclosure</strong>: AWS, Google and Microsoft are RedMonk customers. Anthropic, DeepSeek, OpenAI and NVIDIA are not currently customers.</p>
  267. ]]></content:encoded>
  268. </item>
  269. <item>
  270. <title>The Dream of Hadoop is Alive in AI</title>
  271. <link>https://redmonk.com/sogrady/2025/01/15/dream-of-hadoop-ai/</link>
  272. <dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
  273. <pubDate>Wed, 15 Jan 2025 15:51:34 +0000</pubDate>
  274. <category><![CDATA[AI]]></category>
  275. <category><![CDATA[Big Data]]></category>
  276. <category><![CDATA[Data]]></category>
  277. <category><![CDATA[Databases]]></category>
  278. <guid isPermaLink="false">https://redmonk.com/sogrady/?p=6062</guid>
  279.  
  280. <description><![CDATA[Nineteen years ago come April, Yahoo allowed two developers to release a project called Hadoop as open source software. Based on the Google File System and MapReduce papers from Google, it was designed to enable querying operations on large scale datasets using commodity hardware. Importantly, in contrast to the standard relational databases of the time,]]></description>
  281. <content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/files/2025/01/IMG_2745.png"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2025/01/IMG_2745-1024x706.png" alt="" width="1024" height="706" class="aligncenter size-large wp-image-6063" srcset="https://redmonk.com/sogrady/files/2025/01/IMG_2745-1024x706.png 1024w, https://redmonk.com/sogrady/files/2025/01/IMG_2745-300x207.png 300w, https://redmonk.com/sogrady/files/2025/01/IMG_2745-768x529.png 768w, https://redmonk.com/sogrady/files/2025/01/IMG_2745-1536x1059.png 1536w, https://redmonk.com/sogrady/files/2025/01/IMG_2745-2048x1412.png 2048w, https://redmonk.com/sogrady/files/2025/01/IMG_2745-480x331.png 480w, https://redmonk.com/sogrady/files/2025/01/IMG_2745-910x627.png 910w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><br />
  282. Nineteen years ago come April, Yahoo allowed two developers to release a project called Hadoop as open source software. Based on the Google File System and MapReduce papers from Google, it was designed to enable querying operations on large scale datasets using commodity hardware. Importantly, in contrast to the standard relational databases of the time, it could handle structured, semi-structured and unstructured data. The dream of Hadoop for many enterprises was opening their vast stores of accumulated data, which variedly widely in structure and normalization, to both routine and ad hoc querying. The ability to easily ask questions of data independent of its scale represented a nirvana for organizations always seeking to operate with better and more real time intelligence.</p>
  283. <p>There were several obstacles to achieving this, however and today, while Hadoop is still around and in use within many enterprises, it has largely been leapfrogged by a variety of other on premise and cloud based alternatives.</p>
  284. <p>One of the first barriers many organizations encountered was the querying itself. Writing a query in Hadoop required an engineer to understand both Java the language and the principles of MapReduce. Many outside of Google, in fact, were surprised when the company &#8211; which tended to be secretive and protective of its technology at the time &#8211; chose to release the MapReduce paper publicly at all. As it turned out, part of the justification was to simplify the on ramp for external hires; with the paper public, Google could hire talent already familiar with its principles rather than having to spend internal time and money familiarizing themselves with the concept.</p>
  285. <p>So complex, in fact, was the task of writing MapReduce jobs that multiple organizations wrote their own alternative query interfaces; two of the most popular were Hive, created by Facebook, and Pig, a product of Yahoo. Both embraced a SQL-like interface, because it was simpler to hire engineers with SQL experience than with Java and MapReduce skills. IBM, for its part, tried to graft on a spreadsheet-like interface called BigSheets to Hadoop to enable even non-programmers to leverage Hadoop’s underlying scale to query very large scale datasets &#8211; what used to be called Big Data.</p>
  286. <p>None of these alternative interfaces took off, however, and for that and a variety of other reasons including its lack of suitability for streaming workloads, number of moving parts and the ready availability of alternative managed services like AWS’ EMR/Redshift, Google BigQuery, Microsoft’s HDInsight / Synapse Analytics or &#8211; eventually &#8211; Databricks and Snowflake, Hadoop’s traction slipped.</p>
  287. <p>The dream it offered, however, has never been closer.</p>
  288. <p>The problem in recent years has not been the scale of data to be queried. While certain classes of data  workloads remain expensive and difficult to operate on, over the last two decades advances in both hardware and software have made operating on large scale data both easier and, relatively speaking at least, more cost effective.</p>
  289. <p>Instead the primary challenge has been the query interface itself. Whatever the language and frameworks selected, SQL-like otherwise, narrowed the funnel of potential users down to employees with the requisite set of technical skills. But as even modest users of today’s LLM systems are aware, querying datasets is now trivial if not a totally solved problem.</p>
  290. <p>Anyone who’s taken the time to upload a set of data &#8211; be it public corporate earnings, a climate science dataset or even personal utility consumption data &#8211; into a consumer grade LLM can test this out. Gone is the need to write complex queries or carefully refine charts and dashboards. Instead, the interface is simple, natural language questions:</p>
  291. <ol>
  292. <li>What does this balance sheet suggest about the overall health of the business?</li>
  293. <li>What are the year on year trends with respect to temperature, humidity and windspeed within this dataset?</li>
  294. <li>What are the seasonal fluctuations in my electricity consumption and how have they varied over the past three years?</li>
  295. </ol>
  296. <p>There are caveats, of course, most notably the models propensity to make basic errors and the delta between an individual’s dataset and an enterprise’s. But the absolute lack of any friction whatsoever from question to answer is absolutely transformative. While most of the industry’s attention at present has been on AI for <a href="https://redmonk.com/kholterhoff/2023/11/01/10-things-developers-want-from-ai-code-assistants/">code assistance</a>, <em>query</em> assistance is likely to be at least as useful for the average enterprise employee. The benefit to the enterprise from query assistants, in fact, may be substantially greater than code assistants if some of the <a href="https://redmonk.com/rstephens/2024/11/26/dora2024/">counterintuitive findings from the DORA report</a> prove accurate.</p>
  297. <p>Very few enterprises, of course, will be willing to feed the corporate data they once crawled with Hadoop to public models such as ChatGPT, Claude or Gemini. Regardless of promises made on the part of the public models, there is at least for the present a major gap in trust surrounding the potential for &#8211; and potential risks of &#8211; data exfiltration.</p>
  298. <p>Which explains several things. First, why Snowflake is currently valued at over $55B and Databricks closed a round one month ago valuing the company at $62B. Second, it explains why the two companies have competed fiercely around their respective in house models Arctic and DBRX. And lastly, it helps explain the massive importance of and standardization on Apache Iceberg, which one of my colleagues will be covering in a soon to be released piece.</p>
  299. <p>It’s about the dream of Hadoop, after all. It is well understood that AI <a href="https://redmonk.com/sogrady/2024/05/29/ai-patterns/">advantages incumbents</a>; all other points being equal, most enterprises would prefer to operate models on their data in place rather than have to trust new platforms and third parties, let alone migrate data. Databricks, Snowflake -along with the hyperscalers, obviously &#8211; are incumbents already trusted with large scale data from a large number of enterprises; that provides opportunity. Opportunity that they need to unlock with native, existing LLM interfaces &#8211; hence their respective investments in models. Iceberg, for its part, is fast becoming the Kubernetes of tables, which is to say the standard substrate on which everything is built across all of the above.</p>
  300. <p>Enterprises have been migrating away from specialized datastores and towards multi-modal, general purpose datastores <a href="https://redmonk.com/sogrady/2021/10/26/general-purpose-database/">for years now</a>, to be sure. AI is just the latest workload they’re expected to handle natively. AI models, in fact, may offer the cleanest path forward towards <a href="https://redmonk.com/sogrady/2022/03/21/vertical-integration/">vertically integrating</a> application-like functionality into the database. It’s more straightforward than acquiring and integrating an independent application platform, certainly. Data vendors may or may not have the market permission to absorb one of the various <a href="https://redmonk.com/sogrady/2023/02/01/ai-paas/">PaaS-like</a> players, but they are already trusted to run AI-workloads &#8211; AI workloads that overlap, sometimes significantly, with traditional application workloads. There’s a reason vendors in the space refer to themselves as data platforms: that’s exactly what they are, and are becoming.</p>
  301. <p>The dream of Hadoop isn’t here today, to be clear. Even if the technology were fully ready, questions about security, compliance, access control and more remain. And as always, there are concerns about model hallucinations. But thanks to AI, the financial markets clearly believe it to be closer than it’s ever been. And after using models to query a variety of datasets of varying size and scope, it’s hard to argue the point.</p>
  302. <p><strong>Disclosure</strong>: AWS, Google, IBM and Microsoft are RedMonk customers. Databricks, OpenAI and Snowflake  are not currently RedMonk customers.</p>
  303. ]]></content:encoded>
  304. </item>
  305. <item>
  306. <title>Open Source Discussion Archetypes</title>
  307. <link>https://redmonk.com/sogrady/2025/01/06/discussing-open-source/</link>
  308. <dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
  309. <pubDate>Mon, 06 Jan 2025 20:34:40 +0000</pubDate>
  310. <category><![CDATA[Open Source]]></category>
  311. <guid isPermaLink="false">https://redmonk.com/sogrady/?p=6058</guid>
  312.  
  313. <description><![CDATA[“It is difficult to get a man to understand something, when his salary depends on his not understanding it.” &#8211; Upton Sinclair As the calendar turns over and the world takes its first stumbling steps into 2025, one prediction seems safe: there will be a controversy associated with open source licensing. This prediction is safe]]></description>
  314. <content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/files/2025/01/1024px-Joris_Hoefnagel_-_Archetypes_and_Studies-_Death_is_the_line_that_marks_the_end_of_all_Part_I_-_2006.121_-_Cleveland_Museum_of_Art.jpg"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2025/01/1024px-Joris_Hoefnagel_-_Archetypes_and_Studies-_Death_is_the_line_that_marks_the_end_of_all_Part_I_-_2006.121_-_Cleveland_Museum_of_Art.jpg" alt="" width="1024" height="774" class="aligncenter size-full wp-image-6060" srcset="https://redmonk.com/sogrady/files/2025/01/1024px-Joris_Hoefnagel_-_Archetypes_and_Studies-_Death_is_the_line_that_marks_the_end_of_all_Part_I_-_2006.121_-_Cleveland_Museum_of_Art.jpg 1024w, https://redmonk.com/sogrady/files/2025/01/1024px-Joris_Hoefnagel_-_Archetypes_and_Studies-_Death_is_the_line_that_marks_the_end_of_all_Part_I_-_2006.121_-_Cleveland_Museum_of_Art-300x227.jpg 300w, https://redmonk.com/sogrady/files/2025/01/1024px-Joris_Hoefnagel_-_Archetypes_and_Studies-_Death_is_the_line_that_marks_the_end_of_all_Part_I_-_2006.121_-_Cleveland_Museum_of_Art-768x581.jpg 768w, https://redmonk.com/sogrady/files/2025/01/1024px-Joris_Hoefnagel_-_Archetypes_and_Studies-_Death_is_the_line_that_marks_the_end_of_all_Part_I_-_2006.121_-_Cleveland_Museum_of_Art-480x363.jpg 480w, https://redmonk.com/sogrady/files/2025/01/1024px-Joris_Hoefnagel_-_Archetypes_and_Studies-_Death_is_the_line_that_marks_the_end_of_all_Part_I_-_2006.121_-_Cleveland_Museum_of_Art-107x80.jpg 107w, https://redmonk.com/sogrady/files/2025/01/1024px-Joris_Hoefnagel_-_Archetypes_and_Studies-_Death_is_the_line_that_marks_the_end_of_all_Part_I_-_2006.121_-_Cleveland_Museum_of_Art-830x627.jpg 830w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></p>
  315. <p>“<em>It is difficult to get a man to understand something, when his salary depends on his not understanding it</em>.” &#8211; Upton Sinclair</p>
  316. <p>As the calendar turns over and the world takes its first stumbling steps into 2025, one prediction seems safe: there will be a controversy associated with open source licensing. This prediction is safe because there is always a controversy with open source licensing. Not only can the existence of controversy be predicted, its substance is likely to tread well worn paths as well.</p>
  317. <p>Before the industry sees its first open source licensing fracas of the year, then, it’s worth taking a step back to understand why the controversy and its patterns are eminently predictable. And to do that, it’s necessary to understand the individual archetypes participating in these discussions and their priors and incentives. Once these are internalized, open source licensing conversations become straightforward, even mundane.</p>
  318. <ul>
  319. <li><strong>Developers</strong><br />
  320. Developers, as has been discussed before at length, <a href="https://redmonk.com/sogrady/2023/08/03/why-opensource-matters/">do not care</a> about licensing as a rule. They advantage the open access and availability of open source licensing over proprietary alternatives, to be sure, but by and large that’s as far as their concern goes. The specifics of what is and is not open source, and more particularly distinguishing true open source from alternatives like source available, is about as much interest to the average developer as the standard warnings about forward looking statements in slides are to anyone looking at slides.</p>
  321. <p>In a cruel twist, this apathy is ironically a direct consequence of the success of open source and its principles.  Developers don&#8217;t care because open source has ensured they&#8217;ve never had to care (much).</p>
  322. </li>
  323. <li>
  324. <p><strong>Open Source Project Author</strong><br />
  325. In the early stages of a non-personal open source project with broader ambitions, before it’s seen wide adoption, the average maintainer(s) is most interested in widening distribution. Beyond the personal satisfaction of seeing their work embraced at scale, widespread usage is the most likely path towards broader commercial opportunities, improved compensation and greater industry recognition and praise.</p>
  326. <p>These incentives, coupled with the developer’s typical lack of interest in licenses, is what has led many projects to simply follow in the lead of popular projects at the time. Two decades ago that led to the selection of the GPL with Linux and MySQL at their respective heights of influence. Today, it’s more typically permissive alternatives such as Apache and MIT as the CNCF’s influence peaks.</p>
  327. </li>
  328. <li>
  329. <p><strong>Commercial Open Source Vendor</strong><br />
  330. Once a given project has crossed a certain popularity threshold and becomes monetizable, the most common path forward is to attempt to accelerate its growth via additional capital investments towards business capabilities like community, devrel, marketing and sales &#8211; and naturally a new commercial entity to house all of the above. Occasionally projects will bootstrap this process, but typically these investments arrive via outside third parties taking interests in the company in return for capital seeking a return.</p>
  331. <p>The investors, being investors, have no wider interest or concern than the return on their investment. To the extent that they exert influence within the company they have invested in, this singular focus on the project’s growth &#8211; potentially at the expense of community, industry norms and so on &#8211; becomes part of the commercial open source organization’s as well, regardless of how it may or may not align with the project’s original goals, governance, etc. Unsurprisingly, those belonging to this category have a different attitude towards licenses, in many cases, that the original project authors.</p>
  332. </li>
  333. <li>
  334. <p><strong>Open Source Conservationist</strong><br />
  335. Open source is a big tent, incorporating millions of projects, developers and other actors. Among these are unaffiliated individuals who sit outside (or can distance themselves from) the individual needs and wants of a single project, but take a wider view towards the open source landscape and do not view issues like open source licensing controversy in isolation, but what they may represent and manifest in the aggregate.</p>
  336. <p>Much like conservationists might sit in opposition to various industrial concerns, then, this class of individual is typically seeking a balance between specific, narrow commercial interests and opportunities and the broader health of the open source ecosystem itself.</p>
  337. </li>
  338. </ul>
  339. <p>There are other participants, of course, but these are the most common groups you’ll hear from in discussions of the next inevitable, upcoming open source licensing controversies. Understanding the groups and their respective motivations will not lead to material or even any improvement in your ability to reason with the groups you may disagree with. But if you can root yourself in an understanding of where the person you’re talking to is coming from and why they might hold the opinions they do, you may at least save yourself the frustration of trying to persuade people that cannot and will not be persuaded.</p>
  340. <p>In some cases, because their salary depends on it.</p>
  341. ]]></content:encoded>
  342. </item>
  343. <item>
  344. <title>The Narrows Bridge: From Open Source to AI</title>
  345. <link>https://redmonk.com/sogrady/2024/10/22/from-open-source-to-ai/</link>
  346. <dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
  347. <pubDate>Tue, 22 Oct 2024 14:02:21 +0000</pubDate>
  348. <category><![CDATA[AI]]></category>
  349. <category><![CDATA[Open Source]]></category>
  350. <guid isPermaLink="false">https://redmonk.com/sogrady/?p=6055</guid>
  351.  
  352. <description><![CDATA[In November of 1938, construction began on the Tacoma Narrows bridge in Washington state. Twenty months later in July of 1940, it opened to traffic. Connecting the Tacoma and the Kitsap Peninsula west of Seattle, it was at the time the third longest suspension bridge in the world. From the first day of construction it]]></description>
  353. <content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/files/2024/10/Opening_day_of_the_Tacoma_Narrows_Bridge_Tacoma_Washington.jpg"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2024/10/Opening_day_of_the_Tacoma_Narrows_Bridge_Tacoma_Washington.jpg" alt="" width="768" height="606" class="aligncenter size-full wp-image-6056" srcset="https://redmonk.com/sogrady/files/2024/10/Opening_day_of_the_Tacoma_Narrows_Bridge_Tacoma_Washington.jpg 768w, https://redmonk.com/sogrady/files/2024/10/Opening_day_of_the_Tacoma_Narrows_Bridge_Tacoma_Washington-300x237.jpg 300w, https://redmonk.com/sogrady/files/2024/10/Opening_day_of_the_Tacoma_Narrows_Bridge_Tacoma_Washington-480x379.jpg 480w" sizes="auto, (max-width: 768px) 100vw, 768px" /></a></p>
  354. <p>In November of 1938, construction began on the Tacoma Narrows bridge in Washington state. Twenty months later in July of 1940, it opened to traffic. Connecting the Tacoma and the Kitsap Peninsula west of Seattle, it was at the time the third longest suspension bridge in the world. From the first day of construction it was buffeted by high winds, winds that introduced substantial vertical movement into an engineering structure that generally tries to avoid such. Multiple efforts were made to mitigate these forces and keep them in check. It was an inauspicious beginning for an expensive and complex endeavor.</p>
  355. <hr />
  356. <p>Some sixty years after construction on that bridge began, a fraught and contentious debate in a then obscure corner of the technology industry resulted in both the term open source and the ten point <a href="https://opensource.org/osd">definition</a> that encapsulates it. While it was accorded little importance at the time, with the benefit of hindsight, this discussion amongst passionate but largely unrecognized technology advocates was of monumental historical importance. Over the nearly three decades since its inception, open source has grown from a seemingly utopian academic curiosity to industry default for large capital markets.</p>
  357. <p>For all of its success, however, open source has been besieged in recent years by attackers from multiple fronts. Most notably, a group of vendors and investors seeking to commercialize open source have attempted to blur that definition to the point of <a href="https://redmonk.com/sogrady/2023/08/03/why-opensource-matters/">meaninglessness</a> and irrelevance &#8211; a conflict that continues today. More recently, however, open source has come under intense pressure as both reasonable and not so reasonable actors alike try to understand how the term applies to Artificial Intelligence (AI) systems and projects.</p>
  358. <p>The term open source, of course, was originally coined to describe &#8211; and its corresponding definition was designed to apply to &#8211; source code. AI projects, however, are vastly broader in their scope. Source code is a component of AI projects, to be sure, but one among many, and in most cases not the most important. The varied other components, from data to parameters and weights, are functionally and legally quite distinct from source code. It is not clear, as but one example, whether copyright &#8211; the standard legal mechanism underpinning open source licenses &#8211; can be applied to embeddings, which are very long strings of numbers representing multidimensional transformations of various types of data.</p>
  359. <p>Source code, in other words, is a precisely and narrowly bounded subject area. AI projects are not. Their scope blends software, data, techniques, biases and more. AI is inarguably a fundamentally different asset than software alone.</p>
  360. <p>And yet, largely because vendors have been cavalierly throwing around the term open source to describe obviously non-open source projects that contain use restrictions that open source explicitly forbids, the Open Source Initiative (OSI) &#8211; stewards and defenders of the Open Source Definition (OSD) since 1998 &#8211; has been compelled to respond. The most egregious of the offenders in terms of misuse has been Meta, who has repeatedly described its Llama model as open source while omitting the fact that it imposes restrictions on usage, usage by competitors and so on. Restrictions that open source does not permit. Their behavior stands in stark contrast with their counterparts at Google, who have to their credit attempted to <a href="https://redmonk.com/sogrady/2024/02/26/ai-open-source/">hold the line</a> by being explicit that their Gemma model is open but that it does not meet the OSI’s definition of open source and therefore should not be considered such. Generally speaking, however, Meta’s careless approach is far more common than Google’s, which has resulted in pressure building on the OSI to reconsider its definition of what is and what is not open source within the new, arcane and rapidly evolving world of AI.</p>
  361. <p>The entire debate about an open source AI definition (OSAID), then, has been driven by misuse and misrepresentation. It was at the same time implicitly predicated on a single core assumption: that open source and AI are, or can be made to be, compatible. Simply stated, the current process assumes that it is possible to achieve a definition of open source that is both consistent with long held open source ideals and community norms while being equally applicable and relevant to fast emerging AI projects and their various interested parties.</p>
  362. <p>After months of observation and consideration of nascent AI projects, vendor efforts in the space and conversations with experts in the field as well as interested third party organizations, I no longer share that core assumption.</p>
  363. <p>I do not believe the term open source can or should be extended into the AI world.</p>
  364. <p>There are several problems with the application of open source to AI. These are arguably the most pressing.</p>
  365. <h2>One of These Things is Not Like The Other</h2>
  366. <p>As discussed above, software and AI are the proverbial apples and oranges &#8211; or perhaps more accurately, apples versus an apple pie. This by itself is, or should be, cause for concern. At its heart, the current deliberation around an open source definition for AI is an attempt to drag a term defined over two decades ago to describe a narrowly defined asset into the present to instead cover a brand new, far more complicated future set of artifacts. The risks of which are substantial. First, trying to bend the original open source definition and its principles to apply to AI has the potential to fall well short of fully circumscribing the new project assets in all of their complexity, which is bad. Worse, however, is the prospect of perceived shortcomings in the OSAID bleeding into the trust of and faith in the tried and true original OSD. The implications of that are far reaching and highly concerning.</p>
  367. <p>To properly address the greater complexities of AI projects, the new OSAID would need to grapple with far more complicated and nuanced issues than are involved with mere source code, and in so doing it almost certainly would have to resort to compromise. Which the release candidates to date, in fact, have.</p>
  368. <p>AI and source code are simply too different to be neatly managed side by side. The complexity of AI demands complexity of licensing, which brings us to the problem of nuance.</p>
  369. <h2>If You’re Explaining, You’re Losing</h2>
  370. <p>An open source AI definition will inevitably have key areas of contention, principally around data sharing and availability. Idealists seeking to preserve and protect the bedrock principles of open source, for example, argue that any model that doesn’t require training data is compromising the <a href="https://en.wikipedia.org/wiki/Free_and_open-source_software">four key freedoms</a> that the original open source definition satisfies. The OSI, for its part, contends that in discussions with various AI researchers, their consensus opinion is that the weights are more important than the original training data. That position may or may not be correct. What is definitely true is that even if that assertion is correct, it is a nuanced position that is counterintuitive and requires lengthy explanation.</p>
  371. <p>Similarly, data brings with it a host of legal complications &#8211; complications that are without precedent in the world of pure source code. Source code, for example, is inarguably less conflicted from a legal standpoint than, as but one example, medical data used to train AI models intended to assist in the early detection of cancer. The OSI’s approach to this &#8211; similar to what the Linux Foundation is trying to do with the Open Data Product Specification &#8211; is categorizing the various types of data. In the OSI’s case, there are four buckets of data: open, public, obtainable and unshareable. These are superficially self-descriptive, but also subtle, slippery and legalistic distinctions. Which means, ultimately, that they require nuance to be understood.</p>
  372. <p>In more simple terms, both with respect to the availability of data and the types of data being made available, or not, the OSI is trying to thread a needle of balancing open source’s legacy of making everything available with the messy reality that is data availability. The idealists want training data required for obvious reasons. The pragmatic path, unfortunately, involves substantial compromise and, more problematically, requires explanation to be understood. And as the old political adage advises: “If you&#8217;re explaining, you&#8217;re losing.”</p>
  373. <p>This is particularly true in this case, because in contrast to the OSI’s complicated position, critics have a simple and easy case to make &#8211; one made for easy black and white headlines and stories on Hacker News: if you’re not fully satisfying the four freedoms, you’re not open source. The reality might be that, at least in the case of large foundational models, even if all of the training data was made available, the number of entities that could leverage it and build their own replacements could be counted on one hand &#8211; two or three at the most. But reality does not determine perception.</p>
  374. <p>To illustrate this, consider these two potential headlines:</p>
  375. <ul>
  376. <li>“The OSI’s Open Source Definition (OSD) mandates the release of all source code. Their Open Source AI Definition (OSAID), on the other hand, does not require the release of all the training data.” </li>
  377. </ul>
  378. <p>Versus</p>
  379. <ul>
  380. <li>“The OSI says that they <em>want</em> all training data released, but that requiring it would be problematic because of legal complexity and difficult to actually leverage given the dataset size. To clarify, they’ve created four different categories of data which they go into in detail here&#8230;” </li>
  381. </ul>
  382. <p>Optically, then, the pragmatic path is a minefield. One of the things that most RedMonk clients have heard at some point is: “the market generally has no ability to appreciate nuance.” In a world in which professional technology industry reporters are unable to distinguish between genuine open source &#8211; is it on this list of approved licenses or not? &#8211; and objectively non-open source, as in the cases of licenses which are open except when they are not, there is essentially no chance licenses which depend on levels and shifting definitions will be correctly interpreted.</p>
  383. <p>The need to rely on nuance, then, to explain the OSAID seems inherently problematic.</p>
  384. <h2>An Open Source AI Definition is&#8230;Mandatory?</h2>
  385. <p>As described above, implicit in this entire multi-year process is an assumption that an open source definition that will satisfy a required consensus of parties is possible. There is, however, a second assumption underlying that, which is that AI requires a revised and updated open source definition. The most straightforward articulation of this idea arrives courtesy of <a href="https://allthingsopen.org/articles/the-open-source-ai-definition-why-we-need-it">Mark Collier</a>, who said:</p>
  386. <blockquote><p>
  387.  This brings me to the Open Source AI Definition (OSAID), an effort organized by OSI over the past two years, which I have participated in alongside others from both traditional open source backgrounds and AI experts. It is often said that naming things is the hardest problem in software engineering, but in this case we started with a name, “Open Source AI,” and set out to define it.
  388. </p></blockquote>
  389. <p>This position is certainly understandable. Rogue actors such as Meta have been abusing the term open source, when they are well aware that the arbitrary use restrictions (competitors can’t use it, you can’t use it to do certain things, etc) attached to the license make it clearly and unambiguously not open source. Meta’s actions are bad enough, but arguably worse are the columnists that have enabled Meta’s behavior by victim blaming. In the absence of an OSI AI definition &#8211; which as argued above may not be actually achievable &#8211; these writers bafflingly hold the OSI responsible for Meta’s behavior.</p>
  390. <p>In the face of this continuous assault on the OSD, then, the obvious response is for the OSI to respond to this willful misuse by way of an updated definition with more clarity and industry buy in.</p>
  391. <p>Or is it?</p>
  392. <p>Given the fact that Meta paid no attention whatsoever to the original definition which had been industry consensus for years, it’s not clear that an updated definition &#8211; again assuming, potentially counterfactually, that one is achievable &#8211; would change their behavior. That seems like a questionable assumption, particularly given the downsides discussed above if the effort is unsuccessful.</p>
  393. <p>The default path as described above was an updated open source definition, because it was implicitly assumed that that was the only option on the table. But what if it wasn’t?</p>
  394. <h1>The Road Not Traveled</h1>
  395. <p>What if, instead of trying to bend and reshape a decades old definition intended to describe one asset to encompass another, wildly different asset, we instead abandoned that approach entirely?</p>
  396. <p>On the one hand, for parties that have been fighting for the better part of two years to thread the needle between idealism and capitalism to arrive at an ideologically sound and yet commercially acceptable definition of open source AI, the idea of abandoning the effort will presumably seem horrifying and a non-starter.</p>
  397. <p>This is not the first time that outside parties have sought to reshape or redefine open source, however.</p>
  398. <p>For several years, there has been a desire on the part of some to bring open source into greater alignment with commercial interests &#8211; even at the expense of core open source principles. The response from the wider open source community, however, was rejection. The belief was then and is now that trying to shoehorn specific commercial protections into the term open source would fatally compromise it. Instead, just as when prior efforts to bend open source to other new and <a href="https://medium.com/@stephenrwalli/software-freedom-in-a-post-open-source-world-9f497f646af9">ultimately incompatible goals around ethical source</a>,  those who sought to change open source were told instead to find <a href="https://x.com/adamhjk/status/1687113805237714944">a new home</a>, a new term and a new definition for what they were building. The end result of which was “<a href="https://fair.io/">Fair Source</a>,” a new, from scratch term that borrowed some ideals from open source but is entirely its own new and unique brand.</p>
  399. <p>What if AI followed that path? The industry’s massive cumulative efforts to date to define what and how open source principles might apply to AI need not be wasted. They could instead be repurposed behind a new, clean slate term of choice &#8211; one that accurately conveys the portions of a model that are open, while not falsely advertising its features by applying the open source brand to non-open source assets.</p>
  400. <p>Naming is, as Collier mentions above, the hardest problem in software engineering, and so coming up with a new, alternative term for open source AI would not be a simple exercise. It seems likely, however, that it would be both simpler and more achievable than coming up with a definition of open source AI that might minimally satisfy both the idealists and the pragmatists.</p>
  401. <p>The only way that this would work, of course, would be if sufficient momentum could be assembled behind it. This would require the support of multiple, conflicting parties. Here are a few arguments in favor of a new term for each:</p>
  402. <p><strong>Idealists</strong>:</p>
  403. <ul>
  404. <li>Assuming some industry consensus could be achieved around a new brand &#8211; greater consensus, at least, than is behind the current OSAID release candidate &#8211; pressure on the term open source would immediately begin to decline. One “defense” of Meta’s behavior at present is that there is no accepted definition for open projects. A new definition, even without the open source branding, eliminates that argument. Second, and perhaps more importantly for idealists, if open source and the four freedoms are no longer explicitly invoked by the license, there’s a lessened need to be so strict about what’s in and what’s out. Idealists instead could center around protecting the original, source code derived definition of open source while attempting to clearly differentiate it from the new, AI-centric term of choice.  </li>
  405. </ul>
  406. <p><strong>Pragmatists</strong>:</p>
  407. <ul>
  408. <li>Those who have been most willing to compromise in this process to accommodate the vagaries of, as but one example, data licensing would no longer be fighting against the reputation and legacy of the OSD and the four freedoms. It would instead be an opportunity to start fresh, informed by open source but not beholden to it. Pragmatists would have more room to maneuver in their efforts to find a license that balances openness with a desire to maximize uptake of the license, avoiding the worst case scenario of a strict definition that few or no models satisfy. </li>
  409. </ul>
  410. <p><strong>Vendors</strong>:</p>
  411. <ul>
  412. <li>Assuming again that some level of industry consensus could be achieved, they could receive a similar if not exactly equivalent level of marketing benefit from the new definition, without the corresponding costs of constant criticism from open source communities for their willful misuse of that term. In a world in which the various large, AI players are able to agree on a) both a clear and understandable model for achieving a certain level of openness as determined by the OSI and b) a willingness to put their marketing weight behind the new brand, marketing becomes at once a simpler and less fraught exercise.</li>
  413. </ul>
  414. <p><strong>The OSI</strong>:</p>
  415. <ul>
  416. <li>The OSI, its members and various participants have been tearing each other apart for well over a year trying to achieve an outcome that is, in all likelihood, not achievable. While redirecting current efforts into the creation of a new AI-centric brand might seem like surrender, it would more accurately be described as being flexible and adaptive rather than rigid and hidebound. Given a landscape in which the forces arrayed against the new license seem to be stronger than those <a href="https://opensource.org/ai/endorsements">supporting</a> it, preemptively eliminating an entire line of attack while creating the space to introduce a new brand for a new technology area that would protect your existing brand from fall out is nothing more or less than the most logical course of action moving forward. </li>
  417. </ul>
  418. <hr />
  419. <p>As mentioned above, the Tacoma Narrows bridge opened to the public in July of 1940. A mere four months later, systemic forces &#8211; 40+ MPH winds in this case &#8211; stressed the structure to the point that its concrete and metal surface began to <a href="https://en.wikipedia.org/wiki/File:Tacoma_Narrows_Bridge_destruction.ogv">ripple and twist</a> like a ribbon. A little over an hour later, the bridge collapsed entirely. Its spectacular destruction and the engineering failure to account for the forces that destroyed it has led to it being an object lesson for engineers to this day.</p>
  420. <p>If the OSI chooses to stay the course with the OSAID, I will personally do everything in my power to help it succeed. But I fear that, as with the Narrows bridge, the fault lines are already on display, and it will not survive the high winds that are sure to be in its future.</p>
  421. <p>Better to choose a new bespoke name and brand, one specifically tailored to suit the unique and dynamic challenges of the new technology. As for what that name might be, I’m relatively indifferent. Public AI was one option floated on a recent call and that has promise, or some flavor of open model / weights might work &#8211; though as one participant in the current process pointed out, the potential for overlap and confusion might make those less than ideal. This industry is generally bad at naming, so this exercise might be no exception. The name, however, is less important than consensus. As Abraham Lincoln said, &#8220;Public sentiment is everything. With public sentiment, nothing can fail.&#8221;</p>
  422. <p>However they proceed, I wish the OSI luck in their thankless task and hope they find a solid bridge with which to bring the spirit of open source forward into the world of AI.</p>
  423. ]]></content:encoded>
  424. <enclosure url="https://en.wikipedia.org/wiki/File:Tacoma_Narrows_Bridge_destruction.ogv" length="0" type="video/ogg" />
  425.  
  426. </item>
  427. <item>
  428. <title>The RedMonk Programming Language Rankings: June 2024</title>
  429. <link>https://redmonk.com/sogrady/2024/09/12/language-rankings-6-24/</link>
  430. <dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
  431. <pubDate>Thu, 12 Sep 2024 19:22:17 +0000</pubDate>
  432. <category><![CDATA[Programming Languages]]></category>
  433. <guid isPermaLink="false">https://redmonk.com/sogrady/?p=6052</guid>
  434.  
  435. <description><![CDATA[This iteration of the RedMonk programming Language Rankings is brought to you by Amazon Web Services. AWS manages a variety of developer communities where you can join and learn more about building modern applications in your preferred language. As has become typical in recent years, our Q3 programming language rankings are arriving a few months]]></description>
  436. <content:encoded><![CDATA[<blockquote><p>
  437.  This iteration of the RedMonk programming Language Rankings is brought to you by Amazon Web Services. AWS manages a variety of <a href="https://aws.amazon.com/developer/community">developer communities</a> where you can join and learn more about building modern applications in your preferred language.
  438. </p></blockquote>
  439. <p>As has become typical in recent years, our Q3 programming language rankings are arriving a few months late. Unlike the last run when that was primarily due to the anomalous results we observed, this quarter it’s more attributable to summer vacation schedules. We’re still waiting to see what the longer term implications of coding assistants will be on these rankings, but at least for the present we’re continuing with the exercise as it continues to identify trends for us in the market. Trends which we’ll get to shortly.</p>
  440. <p>In the meantime, however, as a reminder, this work is a continuation of the work originally performed by Drew Conway and John Myles White late in 2010. While the specific means of collection has changed, the basic process remains the same: we extract language rankings from GitHub and Stack Overflow, and combine them for a ranking that attempts to reflect both code (GitHub) and discussion (Stack Overflow) traction. The idea is not to offer a statistically valid representation of current usage, but rather to correlate language discussion and usage in an effort to extract insights into potential future adoption trends.</p>
  441. <h1>Our Current Process</h1>
  442. <p>The data source used for the GitHub portion of the analysis is the GitHub Archive. We query languages by pull request in a manner similar to the one GitHub used to assemble the State of the Octoverse. Our query is designed to be as comparable as possible to the previous process.</p>
  443. <ul>
  444. <li>Language is based on the base repository language. While this continues to have the caveats outlined below, it does have the benefit of cohesion with our previous methodology.</li>
  445. <li>We exclude forked repos.</li>
  446. <li>We use the aggregated history to determine ranking (though based on the table structure changes this can no longer be accomplished via a single query.)</li>
  447. <li>For Stack Overflow, we simply collect the required metrics using their useful data explorer tool.</li>
  448. </ul>
  449. <p>With that description out of the way, please keep in mind the other usual caveats.</p>
  450. <ul>
  451. <li>To be included in this analysis, a language must be observable within both GitHub and Stack Overflow. If a given language is not present in this analysis, that’s why.</li>
  452. <li>No claims are made here that these rankings are representative of general usage more broadly. They are nothing more or less than an examination of the correlation between two populations we believe to be predictive of future use, hence their value.</li>
  453. <li>There are many potential communities that could be surveyed for this analysis. GitHub and Stack Overflow are used here first because of their size and second because of their public exposure of the data necessary for the analysis. We encourage, however, interested parties to perform their own analyses using other sources.</li>
  454. <li>All numerical rankings should be taken with a grain of salt. We rank by numbers here strictly for the sake of interest. In general, the numerical ranking is substantially less relevant than the language’s tier or grouping. In many cases, one spot on the list is not distinguishable from the next. The separation between language tiers on the plot, however, is generally representative of substantial differences in relative popularity.</li>
  455. <li>In addition, the further down the rankings one goes, the less data available to rank languages by. Beyond the top tiers of languages, depending on the snapshot, the amount of data to assess is minute, and the actual placement of languages becomes less reliable the further down the list one proceeds.</li>
  456. <li>Languages that have communities based outside of Stack Overflow such as Mathematica will be under-represented on that axis. It is not possible to scale a process that measures one hundred different community sites, both because many do not have public metrics available and because measuring different community sites against one another is not statistically valid.</li>
  457. </ul>
  458. <p>With that, here is the first quarter plot for 2024.</p>
  459. <p><a href="http://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_.png"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-1024x845.png" alt="" width="1024" height="845" class="aligncenter size-large wp-image-6053" srcset="https://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-1024x845.png 1024w, https://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-300x247.png 300w, https://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-768x633.png 768w, https://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-1536x1267.png 1536w, https://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-2048x1689.png 2048w, https://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-480x396.png 480w, https://redmonk.com/sogrady/files/2024/09/lang.rank_.0624.wm_-760x627.png 760w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></p>
  460. <p>1   JavaScript<br />
  461. 2   Python<br />
  462. 3   Java<br />
  463. 4   PHP<br />
  464. 5   C#<br />
  465. 6   TypeScript<br />
  466. 7   CSS<br />
  467. 7   C++<br />
  468. 9   Ruby<br />
  469. 10  C<br />
  470. 11  Swift<br />
  471. 12  Go<br />
  472. 12  R<br />
  473. 14  Shell<br />
  474. 14  Kotlin<br />
  475. 14  Scala<br />
  476. 17  Objective-C<br />
  477. 18  PowerShell<br />
  478. 19  Rust<br />
  479. 19  Dart</p>
  480. <p>The Top 20, in a fashion that has become typical in recent years, was not entirely devoid of movement, but nearly so. Outside of CSS moving down a spot and C++ moving up one, the Top 10 was unchanged. And even in the back half of the rankings, where languages tend to be less entrenched and movement is more common, only three languages moved at all.</p>
  481. <p>While these rankings are, as explicitly acknowledged above, not intended to be an accurate representation of typical enterprise language usage &#8211; because the data is not available to measure that &#8211; what it is clearly evidence of is a landscape resistant to change. There are a few signs of languages following in TypeScript’s footsteps and working their way up the path, both in the Top 20 and at the back end of the Top 100 as we’ll discuss shortly, but they’re the exception that proves the rule.</p>
  482. <p>It’s possible that we’ll see more fluid usage of languages, and increased usage of code assistants would theoretically make that much more likely, but at this point it’s a fairly static status quo.</p>
  483. <p>With that, some results of note:</p>
  484. <ul>
  485. <li><strong>TypeScript</strong> (6): technically TypeScript didn’t move, as it was ranked sixth in our last run, but this is the first quarter in which is has been the sole occupant of that spot. CSS, in this case, dropped one place to seven leaving TypeScript just outside the Top 5. It will be interesting to see whether or not it has more momentum to expend or whether it’s topped out for the time being. </li>
  486. <li><strong>Kotlin</strong> (14) / <strong>Scala</strong> (14): both of these JVM-based languages jumped up a couple of spots &#8211; two spots in Scala’s case and three for Kotlin. Scala’s rise is notable because it had been on something of a downward trajectory from a one time high of 12th, and Kotlin’s placement is a mild surprise because it had spent three consecutive runs not budging from 17, only to make the jump now. The tie here, meanwhile, is interesting because Scala’s long history gives it an accretive advantage over Kotlin’s more recent development, but in any case the combination is evidence of the continued staying power of the JVM.  </li>
  487. <li><strong>Objective C</strong> (17): speaking of downward trajectories and the 17th placement on this list, Objective C’s slide that began in mid-2018 continued and left the language with its lowest placement in these rankings to date at 17. That’s still an enormously impressive achievement, of course, and there are dozens of languages that would trade their usage for Objective C’s, but the direction of travel seems clear. </li>
  488. <li><strong>Dart</strong> (19) / <strong>Rust</strong> (19): while once grouped with Kotlin as up and coming languages driven by differing incentives and trends, Dart and Rust have not been able to match the ascent of their counterpart with five straight quarters of no movement. That’s not necessarily a negative; as with Objective C, these are still highly popular languages and communities, but it’s worth questioning whether new momentum will arrive and from where, particularly because the communities are experiencing <a href="https://arstechnica.com/gadgets/2024/09/rust-in-linux-lead-retires-rather-than-deal-with-more-nontechnical-nonsense/">some friction</a> in growing their usage. </li>
  489. <li><strong>Ballerina</strong> (61) / <strong>Bicep</strong> (78) / <strong>Grain</strong> / <strong>Moonbit</strong> / <strong>Zig</strong> (87): as discussed during last quarter’s run, we’re keeping an eye on Bicep, Grain, Moonbit and Zig  among others because of what they represent: an unusually visible cloud DSL, two languages optimized for WebAssembly and then a language that follows in the footsteps of C++ and Rust. Grain and Moonbit still haven’t made it into the Top 100, but Bicep jumped eight spots to 78 and Zig 10 to 87. That progress pales next to Ballerina, however, which jumped from 80 to 61 this quarter. The general purpose language from WS02, thus, is added to the list of potential up and comers we’re keeping an eye on.</li>
  490. </ul>
  491. <p><strong>Disclosure</strong>: WS02 is not currently a RedMonk client.</p>
  492. <p><strong>Credit</strong>: My colleague <a href="https://redmonk.com/rstephens/">Rachel Stephens</a> wrote the queries that are responsible for the GitHub axis in these rankings. She is also responsible for the query design for the Stack Overflow data.</p>
  493. ]]></content:encoded>
  494. </item>
  495. </channel>
  496. </rss>
  497.  

If you would like to create a banner that links to this page (i.e. this validation result), do the following:

  1. Download the "valid RSS" banner.

  2. Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)

  3. Add this HTML to your page (change the image src attribute if necessary):

If you would like to create a text link instead, here is the URL you can use:

http://www.feedvalidator.org/check.cgi?url=http%3A//redmonk.com/sogrady/feed/

Copyright © 2002-9 Sam Ruby, Mark Pilgrim, Joseph Walton, and Phil Ringnalda