This is a valid RSS feed.
This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.
<rss xmlns:bib="http://joachim-breitner.de/2004/Website/Bibliography"
href="https://www.joachim-breitner.de/blog_feed.rss"/>
^
line 46, column 0: (269 occurrences) [help]
<figcaption aria-hidden="true">The expected record surpass for the uni ...
line 78, column 96: (273 occurrences) [help]
... therwise, not much, it’s late.</p></description>
^
<summary>
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:bib="http://joachim-breitner.de/2004/Website/Bibliography"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:atom="http://www.w3.org/2005/Atom"
version="2.0">
<channel>
<title>nomeata’s mind shares</title>
<link>https://www.joachim-breitner.de/blog</link>
<atom:link rel="self" type="application/rss+xml"
href="https://www.joachim-breitner.de/blog_feed.rss"/>
<description>Joachim Breitners Denkblogade</description>
<image>
<url>https://joachim-breitner.de/avatars/avatar_128.png</url>
<title>nomeata’s mind shares</title>
<link>https://www.joachim-breitner.de/blog</link>
<width>128</width>
<height>128</height>
</image>
<item>
<title>Do surprises get larger?</title>
<link>https://www.joachim-breitner.de/blog/814-Do_surprises_get_larger_</link>
<guid>https://www.joachim-breitner.de/blog/814-Do_surprises_get_larger_</guid>
<comments>https://www.joachim-breitner.de/blog/814-Do_surprises_get_larger_#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><h3 id="the-setup">The setup</h3>
<p>Imagine you are living on a riverbank. Every now and then, the river swells and you have high water. The first few times this may come as a surprise, but soon you learn that such floods are a recurring occurrence at that river, and you make suitable preparation. Let’s say you feel well-prepared against any flood that is no higher than the highest one observed so far. The more floods you have seen, the higher that mark is, and the better prepared you are. But of course, eventually a higher flood will occur that surprises you.</p>
<p>Of course such new record floods are happening rarer and rarer as you have seen more of them. I was wondering though: By how much do the new records exceed the previous high mark? Does this excess decrease or increase over time?</p>
<p>A priori both could be. When the high mark is already rather high, maybe new record floods will just barley pass that mark? Or maybe, simply because new records are so rare events, when they do occur, they can be surprisingly bad?</p>
<p>This post is a leisurely mathematical investigating of this question, which of course isn’t restricted to high waters; it could be anything that produces a measurement repeatedly and (mostly) independently – weather events, sport results, dice rolls.</p>
<p>The answer of course depends on the <em>distribution</em> of results: How likely is each possible results.</p>
<h3 id="dice-are-simple">Dice are simple</h3>
<p>With dice rolls the answer is rather simple. Let our measurement be how often you can roll a die until it shows a 6. This simple game we can repeat many times, and keep track of our record. Let’s say the record happens to be 7 rolls. If in the next run we roll the die 7 times, and it still does not show a 6, then we know that we have broken the record, and every further roll increases by how much we beat the old record.</p>
<p>But note that how often we will now roll the die is completely independent of what happened before!</p>
<p>So for this game the answer is: The excess with which the record is broken is always the same.</p>
<p>Mathematically speaking this is because the distribution of “rolls until the die shows a 6” is <a href="https://en.wikipedia.org/wiki/Memorylessness"><em>memoryless</em></a>. Such distributions are rather special, its essentially just the example we gave (a <a href="https://en.wikipedia.org/wiki/Geometric_distribution">geometric distribution</a>), or its continuous analogue (the <a href="https://en.wikipedia.org/wiki/Exponential_distribution">exponential distributions</a>, for example the time until a radioactive particle decays).</p>
<h3 id="mathematical-formulation">Mathematical formulation</h3>
<p>With this out of the way, let us look at some other distributions, and for that, introduce some mathematical notations. Let <span class="math inline"><em>X</em></span> be a random variable with probability density function <span class="math inline"><em>φ</em>(<em>x</em>)</span> and cumulative distribution function <span class="math inline"><em>Φ</em>(<em>x</em>)</span>, and <span class="math inline"><em>a</em></span> be the previous record. We are interested in the behavior of</p>
<p><span class="math display"><em>Y</em>(<em>a</em>) = <em>X</em> − <em>a</em> ∣ <em>X</em> > <em>x</em></span></p>
<p>i.e. by how much <span class="math inline"><em>X</em></span> exceeds <span class="math inline"><em>a</em></span> under the condition that it did exceed <span class="math inline"><em>a</em></span>. How does <span class="math inline"><em>Y</em></span> change as <span class="math inline"><em>a</em></span> increases? In particular, how does the expected value of the excess <span class="math inline"><em>e</em>(<em>a</em>) = <em>E</em>(<em>Y</em>(<em>a</em>))</span> change?</p>
<h3 id="uniform-distribution">Uniform distribution</h3>
<p>If <span class="math inline"><em>X</em></span> is uniformly distributed between, say, 0 and 1, then a new record will appear uniformly distributed between <span class="math inline"><em>a</em></span> and <span class="math inline">1</span>, and as that range gets smaller, the excess must get smaller as well. More precisely,</p>
<p><span class="math display"><em>e</em>(<em>a</em>) = <em>E</em>(<em>X</em> − <em>a</em> ∣ <em>X</em> > <em>a</em>) = <em>E</em>(<em>X</em> ∣ <em>X</em> > <em>a</em>) − <em>a</em> = (1 − <em>a</em>)/2</span></p>
<p>This not very interesting linear line is plotted in blue in this diagram:</p>
<figure>
<img src="//www.joachim-breitner.de/various/surprises-plot1.svg" alt="The expected record surpass for the uniform distribution"/>
<figcaption aria-hidden="true">The expected record surpass for the uniform distribution</figcaption>
</figure>
<p>The orange line with the logarithmic scale on the right tries to convey how unlikely it is to surpass the record value <span class="math inline"><em>a</em></span>: it shows how many attempts we expect before the record is broken. This can be calculated by <span class="math inline"><em>n</em>(<em>a</em>) = 1/(1 − <em>Φ</em>(<em>a</em>))</span>.</p>
<h3 id="normal-distribution">Normal distribution</h3>
<p>For the normal distribution (with median <span class="math inline">0</span> and standard derivation <span class="math inline">1</span>, to keep things simple), we can look up the expected value of the <a href="https://en.wikipedia.org/wiki/Truncated_normal_distribution#One_sided_truncation_(of_lower_tail)%5B6%5D">one-sided truncated normal distribution</a> and obtain</p>
<p><span class="math display"><em>e</em>(<em>a</em>) = <em>E</em>(<em>X</em> ∣ <em>X</em> > <em>a</em>) − <em>a</em> = <em>φ</em>(<em>a</em>)/(1 − <em>Φ</em>(<em>a</em>)) − <em>a</em></span></p>
<p>Now is this growing or shrinking? We can plot this an have a quick look:</p>
<figure>
<img src="//www.joachim-breitner.de/various/surprises-plot2.svg" alt="The expected record surpass for the normal distribution"/>
<figcaption aria-hidden="true">The expected record surpass for the normal distribution</figcaption>
</figure>
<p>Indeed it is, too, a decreasing function!</p>
<p>(As a sanity check we can see that <span class="math inline"><em>e</em>(0) = √(2/<em>π</em>)</span>, which is the expected value of the <a href="https://en.wikipedia.org/wiki/Half-normal_distribution">half-normal distribution</a>, as it should.)</p>
<h3 id="could-it-be-any-different">Could it be any different?</h3>
<p>This settles my question: It seems that each new surprisingly high water will tend to be less surprising than the previously – assuming high waters were uniformly or normally distributed, which is unlikely to be helpful.</p>
<p>This does raise the question, though, if there are probability distributions for which <span class="math inline"><em>e</em>(<em>a</em>)</span> is be increasing?</p>
<p>I can try to construct one, and because it’s a bit easier, I’ll consider a discrete distribution on the positive natural numbers, and consider at <span class="math inline"><em>g</em>(0) = <em>E</em>(<em>X</em>)</span> and <span class="math inline"><em>g</em>(1) = <em>E</em>(<em>X</em> − 1 ∣ <em>X</em> > 1)</span>. What does it take for <span class="math inline"><em>g</em>(1) > <em>g</em>(0)</span>? Using <span class="math inline"><em>E</em>(<em>X</em>) = <em>p</em> + (1 − <em>p</em>)<em>E</em>(<em>X</em> ∣ <em>X</em> > 1)</span> for <span class="math inline"><em>p</em> = <em>P</em>(<em>X</em> = 1)</span> we find that in order to have <span class="math inline"><em>g</em>(1) > <em>g</em>(0)</span>, we need <span class="math inline"><em>E</em>(<em>X</em>) > 1/<em>p</em></span>.</p>
<p>This is plausible because we get equality when <span class="math inline"><em>E</em>(<em>X</em>) = 1/<em>p</em></span>, as it precisely the case for the geometric distribution. And it is also plausible that it helps if <span class="math inline"><em>p</em></span> is large (so that the next first record is likely just 1) and if, nevertheless, <span class="math inline"><em>E</em>(<em>X</em>)</span> is large (so that if we do get an outcome other than 1, it’s much larger).</p>
<p>Starting with the geometric distribution, where <span class="math inline"><em>P</em>(<em>X</em> > <em>n</em> ∣ <em>X</em> ≥ <em>n</em>) = <em>p</em><sub><em>n</em></sub> = <em>p</em></span> (the probability of again not rolling a six) is constant, it seems that these <span class="math inline"><em>p</em><sub><em>n</em></sub></span> is increasing, we get the desired behavior. So let <span class="math inline"><em>p</em><sub>1</sub> &lt; <em>p</em><sub>2</sub> &lt; <em>p</em><sub><em>n</em></sub> &lt; …</span> be an increasing sequence of probabilities, and define <span class="math inline"><em>X</em></span> so that <span class="math inline"><em>P</em>(<em>X</em> = <em>n</em>) = <em>p</em><sub>1</sub> ⋅ ⋯ ⋅ <em>p</em><sub><em>n</em> − 1</sub> ⋅ (1 − <em>p</em><sub><em>n</em></sub>)</span> (imagine the die wears off and the more often you roll it, the less likely it shows a 6). Then for this variation of the game, every new record tends to exceed the previous more than previous records. As the <span class="math inline"><em>p</em></span> increase, we get a flatter long end in the probability distribution.</p>
<h3 id="gamma-distribution">Gamma distribution</h3>
<p>To get a nice plot, I’ll take the intuition from this and turn to continuous distributions. The Wikipedia page for the exponential distribution says it is a special case of the <a href="https://en.m.wikipedia.org/wiki/Gamma_distribution">gamma distribution</a>, which has an additional shape parameter <span class="math inline"><em>α</em></span>, and it seems that it could influence the shape of the distribution to be and make the probability distribution have a longer end. Let’s play around with <span class="math inline"><em>β</em> = 2</span> and <span class="math inline"><em>α</em> = 0.5</span>, <span class="math inline">1</span> and <span class="math inline">1.5</span>:</p>
<figure>
<img src="//www.joachim-breitner.de/various/surprises-plot3.svg" alt="The expected record surpass for the gamma distribution"/>
<figcaption aria-hidden="true">The expected record surpass for the gamma distribution</figcaption>
</figure>
<ul>
<li><p>For <span class="math inline"><em>α</em> = 1</span> (dotted) this should just be the exponential distribution, and we see that <span class="math inline"><em>e</em>(<em>a</em>)</span> is flat, as predicted earlier.</p></li>
<li><p>For larger <span class="math inline"><em>α</em></span> (dashed) the graph does not look much different from the one for the normal distribution – not a surprise, as for <span class="math inline"><em>α</em> → ∞</span>, the gamma distribution turns into the normal distribution.</p></li>
<li><p>For smaller <span class="math inline"><em>α</em></span> (solid) we get the desired effect: <span class="math inline"><em>e</em>(<em>a</em>)</span> is increasing. This means that new records tend to break records more impressively.</p></li>
</ul>
<p>The orange line shows that this comes at a cost: for a given old record <span class="math inline"><em>a</em></span>, new records are harder to come by with smaller <span class="math inline"><em>α</em></span>.</p>
<h3 id="conclusion">Conclusion</h3>
<p>As usual, it all depends on the distribution. Otherwise, not much, it’s late.</p></description>
<pubDate>Sun, 30 Jun 2024 15:28:31 +0200</pubDate>
</item>
<item>
<title>Blogging on Lean</title>
<link>https://www.joachim-breitner.de/blog/813-Blogging_on_Lean</link>
<guid>https://www.joachim-breitner.de/blog/813-Blogging_on_Lean</guid>
<comments>https://www.joachim-breitner.de/blog/813-Blogging_on_Lean#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><p>This blog has become a bit quiet <a href="https://www.joachim-breitner.de/blog/809-Joining_the_Lean_FRO">since I joined the Lean FRO</a>. One reasons is of course that I can now improve things about Lean, rather than blog about what I think should be done (which, by contraposition, means I shouldn’t blog about what can be improved…). A better reason is that some of the things I’d otherwise write here are now published on the <a href="https://lean-lang.org/blog">official Lean blog</a>, in particular two lengthy technical posts explaining aspects of Lean that I worked on:</p>
<ul>
<li><a href="https://lean-lang.org/blog/2024-1-11-recursive-definitions-in-lean">Recursive definitions in Lean</a></li>
<li><a href="https://lean-lang.org/blog/2024-5-17-functional-induction">Functional Induction theorems</a></li>
</ul>
<p>It would not be useful to re-publish them here because the technology <a href="https://github.com/leanprover/verso"><code>verso</code></a> behind the Lean blog, created by my colleage David Thrane Christansen, enables such fancy features like type-checked code snippets, including output and lots of information on hover. So I’ll be content with just cross-linking my posts from here.</p></description>
<pubDate>Fri, 31 May 2024 13:47:06 +0100</pubDate>
</item>
<item>
<title>Convenient sandboxed development environment</title>
<link>https://www.joachim-breitner.de/blog/812-Convenient_sandboxed_development_environment</link>
<guid>https://www.joachim-breitner.de/blog/812-Convenient_sandboxed_development_environment</guid>
<comments>https://www.joachim-breitner.de/blog/812-Convenient_sandboxed_development_environment#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><p>I like using one machine and setup for everything, from serious development work to hobby projects to managing my finances. This is very convenient, as often the lines between these are blurred. But it is also scary if I think of the large number of people who I have to trust to not want to extract all my personal data. Whenever I run a <code>cabal install</code>, or a fun VSCode extension gets updated, or anything like that, I am running code that could be malicious or buggy.</p>
<p>In a way it is surprising and reassuring that, as far as I can tell, this commonly does not happen. Most open source developers out there seem to be nice and well-meaning, after all.</p>
<h3 id="convenient-or-it-wont-happen">Convenient or it won’t happen</h3>
<p>Nevertheless I thought I should do something about this. The safest option would probably to use dedicated virtual machines for the development work, with very little interaction with my main system. But knowing me, that did not seem likely to happen, as it sounded like a fair amount of hassle. So I aimed for a viable compromise between security and convenient, and one that does not get too much in the way of my current habits.</p>
<p>For instance, it seems desirable to have the project files accessible from my unconstrained environment. This way, I could perform certain actions that need access to secret keys or tokens, but are (unlikely) to run code (e.g. <code>git push</code>, <code>git pull</code> from private repositories, <code>gh pr create</code>) from “the outside”, and the actual build environment can do without access to these secrets.</p>
<p>The user experience I thus want is a quick way to enter a “development environment” where I can do most of the things I need to do while programming (network access, running command line and GUI programs), with access to the current project, but without access to my actual <code>/home</code> directory.</p>
<p>I initially followed the blog post <a href="https://msucharski.eu/posts/application-isolation-nixos-containers/">“Application Isolation using NixOS Containers” by Marcin Sucharski</a> and got something working that mostly did what I wanted, but then a colleague pointed out that tools like <a href="https://github.com/netblue30/firejail"><code>firejail</code></a> can achieve roughly the same with a less “global” setup. I tried to use <code>firejail</code>, but found it to be a bit too inflexible for my particular whims, so I ended up writing a small wrapper around the lower level sandboxing tool <a href="Bubblewrap">https://github.com/containers/bubblewrap</a>.</p>
<h3 id="selective-bubblewrapping">Selective bubblewrapping</h3>
<p>This script, called <code>dev</code> and included below, builds a new filesystem namespace with minimal <code>/proc</code> and <code>/dev</code> directories, it’s own <code>/tmp</code> directories. It then binds-mound some directories to make the host’s NixOS system available inside the container (<code>/bin</code>, <code>/usr</code>, the nix store including domain socket, stuff for OpenGL applications). My user’s home directory is taken from <code>~/.dev-home</code> and some configuration files are bind-mounted for convenient sharing. I intentionally don’t share most of the configuration – for example, a <code>direnv enable</code> in the dev environment should not affect the main environment. The X11 socket for graphical applications and the corresponding <code>.Xauthority</code> file is made available. And finally, if I run <code>dev</code> in a project directory, this project directory is bind mounted writable, and the current working directory is preserved.</p>
<p>The effect is that I can type <code>dev</code> on the command line to enter “dev mode” rather conveniently. I can run development tools, including graphical ones like VSCode, and especially the latter with its extensions is part of the sandbox. To do a <code>git push</code> I either exit the development environment (Ctrl-D) or open a separate terminal. Overall, the inconvenience of switching back and forth seems worth the extra protection.</p>
<p>Clearly, isn’t going to hold against a determined and maybe targeted attacker (e.g. access to the X11 and the nix daemon socket can probably be used to escape easily). But I hope it will help against a compromised dev dependency that just deletes or exfiltrates data, like keys or passwords, from the usual places in <code>$HOME</code>.</p>
<h3 id="rough-corners">Rough corners</h3>
<p>There is more polishing that could be done.</p>
<ul>
<li><p>In particular, clicking on a link inside VSCode in the container will currently open Firefox inside the container, without access to my settings and cookies etc. Ideally, links would be opened in the Firefox running outside. This is a problem that has a solution in the world of applications that are sandboxed with Flatpak, and involves a bunch of moving parts (a <a href="https://github.com/flatpak/xdg-desktop-portal">xdg-desktop-portal</a> user service, a <a href="https://github.com/flatpak/xdg-dbus-proxy">filtering dbus proxy</a>, exposing access to that proxy in the container). I experimented with that for a bit longer than I should have, but could not get it to work to satisfaction (even without a container involved, I could not get <code>xdg-desktop-portal</code> to heed my default browser settings…). For now I will live with manually copying and pasting URLs, we’ll see how long this lasts.</p></li>
<li><p>With this setup (and unlike the NixOS container setup I tried first), the same applications are installed inside and outside. It might be useful to separate the set of installed programs: There is simply no point in running <code>evolution</code> or <code>firefox</code> inside the container, and if I do not even have VSCode or <code>cabal</code> available outside, so that it’s less likely that I forget to enter <code>dev</code> before using these tools.</p>
<p>It shouldn’t be too hard to cargo-cult some of the NixOS Containers infrastructure to be able to have a separate system configuration that I can manage as part of my normal system configuration and make available to <code>bubblewrap</code> here.</p></li>
</ul>
<p>So likely I will refine this some more over time. Or get tired of typing <code>dev</code> and going back to what I did before…</p>
<h3 id="the-script">The script</h3>
<details>
<summary>
The <code>dev</code> script (at the time of writing)
</summary>
<div class="sourceCode" id="cb1"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"/><span class="co">#!/usr/bin/env bash</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"/></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"/><span class="va">extra</span><span class="op">=</span><span class="va">()</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"/><span class="cf">if</span> <span class="kw">[[</span> <span class="st">"</span><span class="va">$PWD</span><span class="st">"</span> <span class="ot">==</span> /home/jojo/build/<span class="pp">*</span> <span class="kw">]]</span> <span class="kw">||</span> <span class="kw">[[</span> <span class="st">"</span><span class="va">$PWD</span><span class="st">"</span> <span class="ot">==</span> /home/jojo/projekte/programming/<span class="pp">*</span> <span class="kw">]]</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"/><span class="cf">then</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"/><span class="va">extra</span><span class="op">+=</span><span class="va">(</span>--bind <span class="st">"</span><span class="va">$PWD</span><span class="st">"</span> <span class="st">"</span><span class="va">$PWD</span><span class="st">"</span> --chdir <span class="st">"</span><span class="va">$PWD</span><span class="st">"</span><span class="va">)</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"/><span class="cf">fi</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"/></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"/><span class="cf">if</span> <span class="bu">[</span> <span class="ot">-n</span> <span class="st">"</span><span class="va">$1</span><span class="st">"</span> <span class="bu">]</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"/><span class="cf">then</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"/> <span class="va">cmd</span><span class="op">=</span><span class="va">(</span> <span class="st">"</span><span class="va">$@</span><span class="st">"</span> <span class="va">)</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"/><span class="cf">else</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"/> <span class="va">cmd</span><span class="op">=</span><span class="va">(</span> bash <span class="va">)</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"/><span class="cf">fi</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"/></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"/><span class="co"># Caveats:</span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"/><span class="co"># * access to all of `/etc`</span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"/><span class="co"># * access to `/nix/var/nix/daemon-socket/socket`, and is trusted user (but needed to run nix)</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"/><span class="co"># * access to X11</span></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"/></span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"/><span class="bu">exec</span> bwrap <span class="dt">\</span></span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"/> <span class="at">--unshare-all</span> <span class="dt">\</span></span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"/><span class="dt">\</span></span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"/><span class="kw">`</span><span class="co"># blank slate</span><span class="kw">`</span> <span class="dt">\</span></span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"/> <span class="at">--share-net</span> <span class="dt">\</span></span>
<span id="cb1-26"><a href="#cb1-26" aria-hidden="true" tabindex="-1"/> <span class="at">--proc</span> /proc <span class="dt">\</span></span>
<span id="cb1-27"><a href="#cb1-27" aria-hidden="true" tabindex="-1"/> <span class="at">--dev</span> /dev <span class="dt">\</span></span>
<span id="cb1-28"><a href="#cb1-28" aria-hidden="true" tabindex="-1"/> <span class="at">--tmpfs</span> /tmp <span class="dt">\</span></span>
<span id="cb1-29"><a href="#cb1-29" aria-hidden="true" tabindex="-1"/> <span class="at">--tmpfs</span> /run/user/1000 <span class="dt">\</span></span>
<span id="cb1-30"><a href="#cb1-30" aria-hidden="true" tabindex="-1"/><span class="dt">\</span></span>
<span id="cb1-31"><a href="#cb1-31" aria-hidden="true" tabindex="-1"/><span class="kw">`</span><span class="co"># Needed for GLX applications, in paticular alacritty</span><span class="kw">`</span> <span class="dt">\</span></span>
<span id="cb1-32"><a href="#cb1-32" aria-hidden="true" tabindex="-1"/> <span class="at">--dev-bind</span> /dev/dri /dev/dri <span class="dt">\</span></span>
<span id="cb1-33"><a href="#cb1-33" aria-hidden="true" tabindex="-1"/> <span class="at">--ro-bind</span> /sys/dev/char /sys/dev/char <span class="dt">\</span></span>
<span id="cb1-34"><a href="#cb1-34" aria-hidden="true" tabindex="-1"/> <span class="at">--ro-bind</span> /sys/devices/pci0000:00 /sys/devices/pci0000:00 <span class="dt">\</span></span>
<span id="cb1-35"><a href="#cb1-35" aria-hidden="true" tabindex="-1"/> <span class="at">--ro-bind</span> /run/opengl-driver /run/opengl-driver <span class="dt">\</span></span>
<span id="cb1-36"><a href="#cb1-36" aria-hidden="true" tabindex="-1"/><span class="dt">\</span></span>
<span id="cb1-37"><a href="#cb1-37" aria-hidden="true" tabindex="-1"/> <span class="at">--ro-bind</span> /bin /bin <span class="dt">\</span></span>
<span id="cb1-38"><a href="#cb1-38" aria-hidden="true" tabindex="-1"/> <span class="at">--ro-bind</span> /usr /usr <span class="dt">\</span></span>
<span id="cb1-39"><a href="#cb1-39" aria-hidden="true" tabindex="-1"/> <span class="at">--ro-bind</span> /run/current-system /run/current-system <span class="dt">\</span></span>
<span id="cb1-40"><a href="#cb1-40" aria-hidden="true" tabindex="-1"/> <span class="at">--ro-bind</span> /nix /nix <span class="dt">\</span></span>
<span id="cb1-41"><a href="#cb1-41" aria-hidden="true" tabindex="-1"/> <span class="at">--ro-bind</span> /etc /etc <span class="dt">\</span></span>
<span id="cb1-42"><a href="#cb1-42" aria-hidden="true" tabindex="-1"/> <span class="at">--ro-bind</span> /run/systemd/resolve/stub-resolv.conf /run/systemd/resolve/stub-resolv.conf <span class="dt">\</span></span>
<span id="cb1-43"><a href="#cb1-43" aria-hidden="true" tabindex="-1"/><span class="dt">\</span></span>
<span id="cb1-44"><a href="#cb1-44" aria-hidden="true" tabindex="-1"/> <span class="at">--bind</span> ~/.dev-home /home/jojo <span class="dt">\</span></span>
<span id="cb1-45"><a href="#cb1-45" aria-hidden="true" tabindex="-1"/> <span class="at">--ro-bind</span> ~/.config/alacritty ~/.config/alacritty <span class="dt">\</span></span>
<span id="cb1-46"><a href="#cb1-46" aria-hidden="true" tabindex="-1"/> <span class="at">--ro-bind</span> ~/.config/nvim ~/.config/nvim <span class="dt">\</span></span>
<span id="cb1-47"><a href="#cb1-47" aria-hidden="true" tabindex="-1"/> <span class="at">--ro-bind</span> ~/.local/share/nvim ~/.local/share/nvim <span class="dt">\</span></span>
<span id="cb1-48"><a href="#cb1-48" aria-hidden="true" tabindex="-1"/> <span class="at">--ro-bind</span> ~/.bin ~/.bin <span class="dt">\</span></span>
<span id="cb1-49"><a href="#cb1-49" aria-hidden="true" tabindex="-1"/><span class="dt">\</span></span>
<span id="cb1-50"><a href="#cb1-50" aria-hidden="true" tabindex="-1"/> <span class="at">--bind</span> /tmp/.X11-unix/X0 /tmp/.X11-unix/X0 <span class="dt">\</span></span>
<span id="cb1-51"><a href="#cb1-51" aria-hidden="true" tabindex="-1"/> <span class="at">--bind</span> ~/.Xauthority ~/.Xauthority <span class="dt">\</span></span>
<span id="cb1-52"><a href="#cb1-52" aria-hidden="true" tabindex="-1"/> <span class="at">--setenv</span> DISPLAY :0 <span class="dt">\</span></span>
<span id="cb1-53"><a href="#cb1-53" aria-hidden="true" tabindex="-1"/><span class="dt">\</span></span>
<span id="cb1-54"><a href="#cb1-54" aria-hidden="true" tabindex="-1"/> <span class="at">--setenv</span> container dev <span class="dt">\</span></span>
<span id="cb1-55"><a href="#cb1-55" aria-hidden="true" tabindex="-1"/> <span class="st">"</span><span class="va">${extra</span><span class="op">[@]</span><span class="va">}</span><span class="st">"</span> <span class="dt">\</span></span>
<span id="cb1-56"><a href="#cb1-56" aria-hidden="true" tabindex="-1"/> <span class="at">--</span> <span class="dt">\</span></span>
<span id="cb1-57"><a href="#cb1-57" aria-hidden="true" tabindex="-1"/> <span class="st">"</span><span class="va">${cmd</span><span class="op">[@]</span><span class="va">}</span><span class="st">"</span></span></code></pre></div>
</details></description>
<pubDate>Mon, 11 Mar 2024 21:39:58 +0100</pubDate>
</item>
<item>
<title>GHC Steering Committee Retrospective</title>
<link>https://www.joachim-breitner.de/blog/811-GHC_Steering_Committee_Retrospective</link>
<guid>https://www.joachim-breitner.de/blog/811-GHC_Steering_Committee_Retrospective</guid>
<comments>https://www.joachim-breitner.de/blog/811-GHC_Steering_Committee_Retrospective#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><p>After seven years of service as member and secretary on the GHC Steering Committee, I have resigned from that role. So this is a good time to look back and retrace the formation of the GHC proposal process and committee.</p>
<p>In my memory, I helped define and shape the proposal process, optimizing it for effectiveness and throughput, but memory can be misleading, and judging from the paper trail in my email archives, this was indeed mostly Ben Gamari’s and Richard Eisenberg’s achievement: Already in Summer of 2016, Ben Gamari set up the <a href="https://github.com/ghc-proposals/ghc-proposals">ghc-proposals Github repository</a> with a sketch of a process and sent out a <a href="https://mail.haskell.org/pipermail/glasgow-haskell-users/2016-September/026396.html">call for nominations</a> on the GHC user’s mailing list, which I replied to. The Simons picked the first set of members, and in the fall of 2016 we discussed the committee’s by-laws and procedures. As so often, Richard was an influential shaping force here.</p>
<h3 id="three-ingredients">Three ingredients</h3>
<p>For example, it was him that suggested that for each proposal we have one committee member be the “Shepherd”, overseeing the discussion. I believe this was one ingredient for the process effectiveness: There is always one person in charge, and thus we avoid the delays incurred when any one of a non-singleton set of volunteers have to do the next step (and everyone hopes someone else does it).</p>
<p>The next ingredient was that we do not usually require a vote among all members (again, not easy with volunteers with limited bandwidth and occasional phases of absence). Instead, the shepherd makes a recommendation (accept/reject), and if the other committee members do not complain, this silence is taken as consent, and we come to a decision. It seems this idea can also be traced back on Richard, who suggested that “once a decision is requested, the shepherd [generates] consensus. If consensus is elusive, then we vote.”</p>
<p>At the end of the year we agreed and wrote down these rules, created the mailing list for our internal, but <a href="https://mail.haskell.org/pipermail/ghc-steering-committee/">publicly archived committee discussions</a>, and began accepting proposals, starting with <a href="https://github.com/ghc-proposals/ghc-proposals/pull/6">Adam Gundry’s <code>OverloadedRecordFields</code></a>.</p>
<p>At that point, there was no “secretary” role yet, so how I did become one? It seems that in February 2017 I started to <a href="https://mail.haskell.org/pipermail/ghc-steering-committee/2017-February/000040.html">clean-up and refine the process documentation</a>, fixing “bugs in the process” (like requiring authors to set Github labels when they don’t even have permissions to do that). This in particular meant that someone from the committee had to manually handle submissions and so on, and by the aforementioned principle that at every step there ought to be exactly one person in change, the role of a secretary followed naturally. In the <a href="https://mail.haskell.org/pipermail/ghc-steering-committee/2017-February/000044.html">email in which I described that role</a> I wrote:</p>
<blockquote>
<p>Simon already shoved me towards picking up the “secretary” hat, to reduce load on Ben.</p>
</blockquote>
<p>So when I <a href="https://github.com/ghc-proposals/ghc-proposals/commit/63d050c380b2aa380581cc9b9aafb2a7c022556a">merged the updated process documentation</a>, I already listed myself “secretary”.</p>
<p>It wasn’t just Simon’s shoving that put my into the role, though. I dug out my original self-nomination email to Ben, and among other things I wrote:</p>
<blockquote>
<p>I also hope that there is going to be clear responsibilities and a clear workflow among the committee. E.g. someone (possibly rotating), maybe called the secretary, who is in charge of having an initial look at proposals and then assigning it to a member who shepherds the proposal.</p>
</blockquote>
<p>So it is hardly a surprise that I became secretary, when it was dear to my heart to have a smooth continuous process here.</p>
<p>I am rather content with the result: These three ingredients – single secretary, per-proposal shepherds, silence-is-consent – helped the committee to be effective throughout its existence, even as every once in a while individual members dropped out.</p>
<h3 id="ulterior-motivation">Ulterior motivation</h3>
<p>I must admit, however, there was an ulterior motivation behind me grabbing the secretary role: Yes, I <em>did</em> want the committee to succeed, and I <em>did</em> want that authors receive timely, good and decisive feedback on their proposals – but I <em>did not</em> really want to have to do that part.</p>
<p>I am, in fact, a lousy proposal reviewer. I am too generous when reading proposals, and more likely mentally fill gaps in a specification rather than spotting them. Always optimistically assuming that the authors surely know what they are doing, rather than critically assessing the impact, the implementation cost and the interaction with other language features.</p>
<p>And, maybe more importantly: why should I know which changes are good and which are not so good in the long run? Clearly, the authors cared enough about a proposal to put it forward, so there is some need… and I do believe that Haskell should stay an evolving and innovating language… but how does this help me decide about this or that particular feature.</p>
<p>I even, during the formation of the committee, explicitly asked that we write down some guidance on “Vision and Guideline”; do we want to foster change or innovation, or be selective gatekeepers? Should we accept features that are proven to be useful, or should we accept features so that they can prove to be useful? This discussion, however, did not lead to a concrete result, and the assessment of proposals relied on the sum of each member’s personal preference, expertise and gut feeling. I am not saying that this was a mistake: It is hard to come up with a general guideline here, and even harder to find one that does justice to each individual proposal.</p>
<p>So the secret motivation for me to grab the secretary post was that I could contribute without having to judge proposals. Being secretary allowed me to assign most proposals to others to shepherd, and only once in a while myself took care of a proposal, when it seemed to be very straight-forward. Sneaky, ain’t it?</p>
<h3 id="years-later">7 Years later</h3>
<p>For years to come I happily played secretary: When an author finished their proposal and public discussion ebbed down they would ping me on GitHub, I would <a href="https://mail.haskell.org/pipermail/ghc-steering-committee/2022-July/002837.html">pick a suitable shepherd</a> among the committee and ask them to judge the proposal. Eventually, the committee would come to a conclusion, usually by implicit consent, sometimes by voting, and I’d merge the pull request and update the metadata thereon. Every few months I’d <a href="https://mail.haskell.org/pipermail/ghc-steering-committee/2024-January/003683.html">summarize the current state of affairs</a> to the committee (what happened since the last update, which proposals are currently on our plate), and once per year gathered the data for <a href="https://youtu.be/LFIL0myeOlo?si=P316QJs0EBSWe4Uy&amp;t=3955">Simon Peyton Jones’ annually GHC Status Report</a>. Sometimes some members needed a nudge or two to act. Some would eventually step down, and I’d sent around a call for nominations and when the nominations came in, distributed them off-list among the committee and tallied the votes.</p>
<p>Initially, that was exciting. For a long while it was a pleasant and rewarding routine. Eventually, it became a mere chore. I noticed that I didn’t quite care so much anymore about some of the discussion, and there was a decent amount of naval-gazing, meta-discussions and some wrangling about claims of authority that was probably useful and necessary, but wasn’t particularly fun.</p>
<p>I also began to notice weaknesses in the processes that I helped shape: We could really use some more automation for showing proposal statuses, notifying people when they have to act, and nudging them when they don’t. The whole silence-is-assent approach is good for throughput, but not necessary great for quality, and maybe the committee members need to be pushed more firmly to engage with each proposal. Like GHC itself, the committee processes deserve continuous refinement and refactoring, and since I could not muster the motivation to change my now well-trod secretarial ways, it was time for me to step down.</p>
<p>Luckily, Adam Gundry volunteered to take over, and that makes me feel much less bad for quitting. Thanks for that!</p>
<p>And although I am for my day job now <a href="https://lean-lang.org/">enjoying a language</a> that has many of the things out of the box that for Haskell are still only language extensions or even just future proposals (dependent types, <code>BlockArguments</code>, <code>do</code> notation with <code>(← foo)</code> expressions and 💜 Unicode), I’m still around, hosting the <a href="https://haskell.foundation/podcast/">Haskell Interlude Podcast</a>, writing on this blog and hanging out at ZuriHac etc.</p></description>
<pubDate>Thu, 25 Jan 2024 01:21:41 +0100</pubDate>
</item>
<item>
<title>The Haskell Interlude Podcast</title>
<link>https://www.joachim-breitner.de/blog/810-The_Haskell_Interlude_Podcast</link>
<guid>https://www.joachim-breitner.de/blog/810-The_Haskell_Interlude_Podcast</guid>
<comments>https://www.joachim-breitner.de/blog/810-The_Haskell_Interlude_Podcast#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><p>It was pointed out to me that I have not blogged about this, so better now than never:</p>
<p>Since 2021 I am – together with four other hosts – producing a regular podcast about Haskell, the <a href="https://haskell.foundation/podcast/"><strong>Haskell Interlude</strong></a>. Roughly every two weeks two of us interview someone from the Haskell Community, and we chat for approximately an hour about how they came to Haskell, what they are doing with it, why they are doing it and what else is on their mind. Sometimes we talk to very famous people, like Simon Peyton Jones, and sometimes to people who maybe should be famous, but aren’t quite yet.</p>
<p>For most episodes we also have a transcript, so you can read the interviews instead, if you prefer, and you should find the podcast on most podcast apps as well. I do not know how reliable these statistics are, but supposedly we regularly have around 1300 listeners. We don’t get much feedback, however, so if you like the show, or dislike it, or have feedback, let us know (for example on the <a href="https://discourse.haskell.org/">Haskell Disourse</a>, which has a thread for each episode).</p>
<p>At the time of writing, we released 40 episodes. For the benefit of my (likely hypothetical) fans, or those who want to train an AI voice model for nefarious purposes, here is the list of episodes co-hosted by me:</p>
<ul>
<li><a href="https://haskell.foundation/podcast/3">Gabriella Gonzales</a></li>
<li><a href="https://haskell.foundation/podcast/4">Jasper Van der Jeugt</a></li>
<li><a href="https://haskell.foundation/podcast/5">Chris Smith</a></li>
<li><a href="https://haskell.foundation/podcast/9">Sebastian Graf</a></li>
<li><a href="https://haskell.foundation/podcast/11">Simon Peyton Jones</a></li>
<li><a href="https://haskell.foundation/podcast/14">Ryan Trinkle</a></li>
<li><a href="https://haskell.foundation/podcast/15">Facundo Dominguez</a></li>
<li><a href="https://haskell.foundation/podcast/19">Marc Scholten</a></li>
<li><a href="https://haskell.foundation/podcast/23">Ben Gamari</a></li>
<li><a href="https://haskell.foundation/podcast/25">Andrew Lelechenko (Bodigrim)</a></li>
<li><a href="https://haskell.foundation/podcast/29">ZuriHac 2023 special</a></li>
<li><a href="https://haskell.foundation/podcast/31">Arnaud Spiwack</a></li>
<li><a href="https://haskell.foundation/podcast/37">John MacFarlane</a></li>
<li><a href="https://haskell.foundation/podcast/41">Moritz Angermann</a></li>
<li><a href="https://haskell.foundation/podcast/42">Jezen Thomas</a></li>
</ul>
<p>Can’t decide where to start? The one with Ryan Trinkle might be my favorite.</p>
<p>Thanks to the Haskell Foundation and its sponsors for supporting this podcast (hosting, editing, transscription).</p></description>
<pubDate>Fri, 22 Dec 2023 10:04:42 +0100</pubDate>
</item>
<item>
<title>Joining the Lean FRO</title>
<link>https://www.joachim-breitner.de/blog/809-Joining_the_Lean_FRO</link>
<guid>https://www.joachim-breitner.de/blog/809-Joining_the_Lean_FRO</guid>
<comments>https://www.joachim-breitner.de/blog/809-Joining_the_Lean_FRO#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><p>Tomorrow is going to be a new first day in a new job for me: I am joining the <a href="https://lean-fro.org/">Lean FRO</a>, and I’m excited.</p>
<h3 id="what-is-lean">What is Lean?</h3>
<p><a href="https://lean-lang.org/about/">Lean</a> is the new kid on the block of theorem provers.</p>
<p>It’s a pure functional programming language (like Haskell, with and on which I have worked a lot), but it’s dependently typed (which Haskell may be evolving to be as well, but rather slowly and carefully). It has a refreshing syntax, built on top of a rather good (I have been told, not an expert here) macro system.</p>
<p>As a dependently typed programming language, it is also a theorem prover, or proof assistant, and there exists already a lively community of mathematicians who started to formalize mathematics in a coherent library, creatively called <a href="https://github.com/leanprover-community/mathlib4">mathlib</a>.</p>
<h3 id="what-is-a-fro">What is a FRO?</h3>
<p>A <a href="https://www.convergentresearch.org/">Focused Research Organization</a> has the organizational form of a small start up (small team, little overhead, a few years of runway), but its goals and measure for success are not commercial, as funding is provided by donors (in the case of the Lean FRO, the Simons Foundation International, the Alfred P. Sloan Foundation, and Richard Merkin). This allows us to build something that we believe is a contribution for the greater good, even though it’s not (or not yet) commercially interesting enough and does not fit other forms of funding (such as research grants) well. This is a very comfortable situation to be in.</p>
<h3 id="why-am-i-excited">Why am I excited?</h3>
<p>To me, working on Lean seems to be the perfect mix: I have been working on language implementation for about a decade now, and always with a preference for functional languages. Add to that my interest in theorem proving, where I have used Isabelle and Coq so far, and played with Agda and others. So technically, clearly up my alley.</p>
<p>Furthermore, the language isn’t too old, and plenty of interesting things are simply still to do, rather than tried before. The ecosystem is still evolving, so there is a good chance to have some impact.</p>
<p>On the other hand, the language isn’t too young either. It is no longer an open question whether we will have users: we have them already, they hang out on <a href="https://leanprover.zulipchat.com/">zulip</a>, so if I improve something, there is likely someone going to be happy about it, which is great. And the community seems to be welcoming and full of nice people.</p>
<p>Finally, <a href="https://github.com/leanprover-community/mathlib4">this library of mathematics</a> that these users are building is itself an amazing artifact: Lots of math in a consistent, machine-readable, maintained, documented, checked form! With a little bit of optimism I can imagine this changing how math research and education will be done in the future. It could be for math what Wikipedia is for encyclopedic knowledge and OpenStreetMap for maps – and the thought of facilitating that excites me.</p>
<p>With this new job I find that when I am telling friends and colleagues about it, I do not hesitate or hedge when asked why I am doing this. This is a good sign.</p>
<h3 id="what-will-i-be-doing">What will I be doing?</h3>
<p>We’ll see what main tasks I’ll get to tackle initially, but knowing myself, I expect I’ll get broadly involved.</p>
<p>To get up to speed I started playing around with a few things already, and for example created <a href="https://loogle.lean-lang.org/">Loogle</a>, a Mathlib search engine inspired by Haskell’s <a href="https://hoogle.haskell.org/">Hoogle</a>, including a Zulip bot integration. This seems to be useful and quite well received, so I’ll continue maintaining that.</p>
<p>Expect more about this and other contributions here in the future.</p></description>
<pubDate>Wed, 01 Nov 2023 21:47:06 +0100</pubDate>
</item>
<item>
<title>Squash your Github PRs with one click</title>
<link>https://www.joachim-breitner.de/blog/808-Squash_your_Github_PRs_with_one_click</link>
<guid>https://www.joachim-breitner.de/blog/808-Squash_your_Github_PRs_with_one_click</guid>
<comments>https://www.joachim-breitner.de/blog/808-Squash_your_Github_PRs_with_one_click#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><p>TL;DR: Squash your PRs with one click at <a href="https://squasher.nomeata.de/" class="uri">https://squasher.nomeata.de/</a>.</p>
<p>Very recently I got this response from the project maintainer at a pull request I contributed: “Thanks, approved, please squash so that I can merge.”</p>
<p>It’s nice that my contribution can go it, but why did the maintainer not just press the “Squash and merge button”, and instead adds the this unnecessary roundtrip to the process? Anyways, maintainers make the rules, so I play by them. But unlike the maintainer, who can squash-and-merge with just one click, squashing the PR’s branch is surprisingly laberous: Github does not allow you to do that via the Web UI (and hence on mobile), and it seems you are expected to go to your computer and juggle with <code>git rebase --interactive</code>.</p>
<p>I found this rather annoying, so I created <a href="https://squasher.nomeata.de/"><strong>Squasher</strong></a>, a simple service that will squash your branch for you. There is no configuration, just paste the PR url. It will use the PR title and body as the commit message (which is obviously the right way™), and create the commit in your name:</p>
<figure>
<img src="/various/squasher.png" alt="Squasher in action"/>
<figcaption aria-hidden="true">Squasher in action</figcaption>
</figure>
<p>If you find this useful, or found it to be buggy, let me know. The code is at <a href="https://github.com/nomeata/squasher" class="uri">https://github.com/nomeata/squasher</a> if you are curious about it.</p></description>
<pubDate>Sun, 29 Oct 2023 22:46:56 +0100</pubDate>
</item>
<item>
<title>Left recursive parser combinators via sharing</title>
<link>https://www.joachim-breitner.de/blog/807-Left_recursive_parser_combinators_via_sharing</link>
<guid>https://www.joachim-breitner.de/blog/807-Left_recursive_parser_combinators_via_sharing</guid>
<comments>https://www.joachim-breitner.de/blog/807-Left_recursive_parser_combinators_via_sharing#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><p>At this year’s <a href="https://icfp23.sigplan.org/">ICFP in Seattle</a> I gave a talk about my <a href="https://hackage.haskell.org/package/rec-def">rec-def</a> Haskell library, which I have blogged about before here. While my <a href="https://doi.org/10.1145/3607853">functional pearl paper</a> focuses on a concrete use-case and the tricks of the implementation, in my talk I put the emphasis on the high-level idea: it beholds of a declarative lazy functional like Haskell that recursive equations just work whenever they describe a (unique) solution. Like in the paper, I used equations between sets as the running example, and only conjectured that it should also work for other domains, in particular parser combinators.</p>
<p>Naturally, someone called my bluff and asked if I actually tried it. I had not, but I should have, because it works nicely and is actually more straight-forward than with sets. I wrote up a prototype and showed it off a few days later as a lightning talk at <a href="https://icfp23.sigplan.org/home/haskellsymp-2023">Haskell Symposium</a>; here is the write up that goes along with that.</p>
<h3 id="parser-combinators">Parser combinators</h3>
<p>Parser combinators are libraries that provide little functions (combinators) that you compose to define your parser directly in your programming language, as opposed to using external tools that read some grammar description and generate parser code, and are quite popular in Haskell (e.g. <a href="https://hackage.haskell.org/package/parsec">parsec</a>, <a href="https://hackage.haskell.org/package/attoparsec">attoparsec</a>, <a href="https://hackage.haskell.org/package/megaparsec">megaparsec</a>).</p>
<p>Let us define a little parser that recognizes sequences of <code>a</code>s:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> <span class="kw">let</span> aaa <span class="ot">=</span> tok <span class="ch">'a'</span> <span class="op">*></span> aaa <span class="op">&lt;|></span> <span class="fu">pure</span> ()</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> parse aaa <span class="st">"aaaa"</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"/><span class="dt">Just</span> ()</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> parse aaa <span class="st">"aabaa"</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"/><span class="dt">Nothing</span></span></code></pre></div>
<h3 id="left-recursion">Left-recursion</h3>
<p>This works nicely, but just because we were lucky: We wrote the parser to recurse on the right (of the <code>*></code>), and this happens to work. If we put the recursive call first, it doesn’t anymore:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> <span class="kw">let</span> aaa <span class="ot">=</span> aaa <span class="op">&lt;*</span> tok <span class="ch">'a'</span> <span class="op">&lt;|></span> <span class="fu">pure</span> ()</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> parse aaa <span class="st">"aaaa"</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"/><span class="op">^</span><span class="dt">CInterrupted</span><span class="op">.</span></span></code></pre></div>
<p>This is a well-known problem (see for example <a href="https://doi.org/10.1145/3471874.3472984">Nicolas Wu’s overview paper</a>), all the common parser combinator libraries cannot handle it and the usual advise is to refactor your grammar to avoid left recursion.</p>
<p>But there are some libraries that can handle left recursion, at least with a little help from the programmer. I found two variations:</p>
<ul>
<li><p>The library provides an explicit fix point combinator, and as long as that is used, left-recursion works. This is for example described by <a href="https://link.springer.com/chapter/10.1007/978-3-540-77442-6_12">Frost, Hafiz and Callaghan</a> by, and (of course) <a href="https://okmij.org/ftp/Haskell/LeftRecursion.hs">Oleg Kiselyov</a> has an implementation of this too.</p></li>
<li><p>The library expects explicit labels on recursive productions, so that the library can recognize left-recursion. I found an implementation of this idea in the <a href="https://hackage.haskell.org/package/Agda-2.6.3/docs/Agda-Utils-Parser-MemoisedCPS.html"><code>Agda.Utils.Parser.MemoisedCPS</code></a> module in the Agda code, the <a href="https://hackage.haskell.org/package/gll-0.4.1.0/docs/GLL-Combinators-Interface.html"><code>gll</code> library</a> seems to follow this style and <a href="https://discourse.haskell.org/t/reusing-haskells-binding-when-defining-context-free-grammars/5960?u=nomeata">Jaro discusses it as well</a>.</p></li>
</ul>
<p>I took the module from the Agda source and simplified a bit for the purposes of this demonstration (<a href="https://github.com/nomeata/left-rec-parse/blob/master/Parser.hs"><code>Parser.hs</code></a>). Indeed, I can make the left-recursive grammar work:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> <span class="kw">let</span> aaa <span class="ot">=</span> memoise <span class="st">":-)"</span> <span class="op">$</span> aaa <span class="op">&lt;*</span> tok <span class="ch">'a'</span> <span class="op">&lt;|></span> <span class="fu">pure</span> ()</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> parse aaa <span class="st">"aaaa"</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"/><span class="dt">Just</span> ()</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> parse aaa <span class="st">"aabaa"</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"/><span class="dt">Nothing</span></span></code></pre></div>
<p>It does not matter what I pass to <code>memoise</code>, as long as I do not pass the same key when memoising a different parser.</p>
<p>For reference, an excerpt of the the API of <code>Parser</code>:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"/><span class="kw">data</span> <span class="dt">Parser</span> k tok a <span class="co">-- k is type of keys, tok type of tokens (e.g. Char)</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"/><span class="kw">instance</span> <span class="dt">Functor</span> (<span class="dt">Parser</span> k tok)</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"/><span class="kw">instance</span> <span class="dt">Applicative</span> (<span class="dt">Parser</span> k tok)</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"/><span class="kw">instance</span> <span class="dt">Alternative</span> (<span class="dt">Parser</span> k tok)</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"/><span class="kw">instance</span> <span class="dt">Monad</span> (<span class="dt">Parser</span> k tok)</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"/><span class="ot">parse ::</span> <span class="dt">Parser</span> k tok a <span class="ot">-></span> [tok] <span class="ot">-></span> <span class="dt">Maybe</span> a</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"/><span class="ot">sat ::</span> (tok <span class="ot">-></span> <span class="dt">Bool</span>) <span class="ot">-></span> <span class="dt">Parser</span> k tok tok</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"/><span class="ot">tok ::</span> <span class="dt">Eq</span> tok <span class="ot">=></span> tok <span class="ot">-></span> <span class="dt">Parser</span> k tok tok</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"/><span class="ot">memoise ::</span> <span class="dt">Ord</span> k <span class="ot">=></span> k <span class="ot">-></span> <span class="dt">Parser</span> k tok a <span class="ot">-></span> <span class="dt">Parser</span> k tok a</span></code></pre></div>
<h3 id="left-recursion-through-sharing">Left-recursion through sharing</h3>
<p>To follow the agenda set out in my talk, I now want to wrap that parser in a way that relieves me from having to insert the calls to <code>memoise</code>. To start, I import that parser qualified, define a newtype around it, and start lifting some of the functions:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"/><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Parser</span> <span class="kw">as</span> <span class="dt">P</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"/></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"/><span class="kw">newtype</span> <span class="dt">Parser</span> tok a <span class="ot">=</span> <span class="dt">MkP</span> {<span class="ot"> unP ::</span> <span class="dt">P.Parser</span> <span class="dt">Unique</span> tok a }</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"/></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"/><span class="ot">parse ::</span> <span class="dt">Parser</span> tok a <span class="ot">-></span> [tok] <span class="ot">-></span> <span class="dt">Maybe</span> a</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"/>parses (<span class="dt">MkP</span> p) <span class="ot">=</span> P.parse p</span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"/></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"/><span class="ot">sat ::</span> <span class="dt">Typeable</span> tok <span class="ot">=></span> (tok <span class="ot">-></span> <span class="dt">Bool</span>) <span class="ot">-></span> <span class="dt">Parser</span> tok tok</span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"/>sat p <span class="ot">=</span> <span class="dt">MkP</span> (P.sat p)</span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"/></span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"/><span class="ot">tok ::</span> <span class="dt">Eq</span> tok <span class="ot">=></span> tok <span class="ot">-></span> <span class="dt">Parser</span> tok tok</span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"/>tok t <span class="ot">=</span> <span class="dt">MkP</span> (P.tok t)</span></code></pre></div>
<p>So far, nothing interesting had to happen, because so far I cannot build recursive parsers. The first interesting combinator that allows me to do that is <code>&lt;*></code> from the <code>Applicative</code> class, so I should use <code>memoise</code> there. The question is: Where does the unique key come from?</p>
<h3 id="proprioception">Proprioception</h3>
<p>As with the rec-def library, pure code won’t do, and I have to get my hands dirty: I really want a fresh unique label out of thin air. To that end, I define the following combinator, with naming aided by Richard Eisenberg:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"/><span class="ot">propriocept ::</span> (<span class="dt">Unique</span> <span class="ot">-></span> a) <span class="ot">-></span> a</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"/>propriocept f <span class="ot">=</span> unsafePerformIO <span class="op">$</span> f <span class="op">&lt;$></span> newUnique</span></code></pre></div>
<p>A thunk defined with <code>propriocept</code> will know about it’s own identity, and will be able to tell itself apart from other such thunks. This gives us a form of observable sharing, precisely what we need. But before we return to our parser combinators, let us briefly explore this combinator.</p>
<p>Using <code>propriocept</code> I can define an operation <code>cons :: [Int] -> [Int]</code> that records (the hash of) this <code>Unique</code> in the list:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> <span class="kw">let</span> cons xs <span class="ot">=</span> propriocept (\x <span class="ot">-></span> hashUnique x <span class="op">:</span> xs)</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> <span class="op">:</span>t cons</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"/><span class="ot">cons ::</span> [<span class="dt">Int</span>] <span class="ot">-></span> [<span class="dt">Int</span>]</span></code></pre></div>
<p>This lets us see the identity of a list cell, that is, of the concrete object in memory.</p>
<p>Naturally, if we construct a finite list, each list cell is different:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> cons (cons (cons []))</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"/>[<span class="dv">1</span>,<span class="dv">2</span>,<span class="dv">3</span>]</span></code></pre></div>
<p>And if we do that again, we see that fresh list cells are allocated:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> cons (cons (cons []))</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"/>[<span class="dv">4</span>,<span class="dv">5</span>,<span class="dv">6</span>]</span></code></pre></div>
<p>We can create an infinite list; if we do it without sharing, every cell is separate:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> <span class="fu">take</span> <span class="dv">20</span> (acyclic <span class="dv">0</span>)</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"/>[<span class="dv">7</span>,<span class="dv">8</span>,<span class="dv">9</span>,<span class="dv">10</span>,<span class="dv">11</span>,<span class="dv">12</span>,<span class="dv">13</span>,<span class="dv">14</span>,<span class="dv">15</span>,<span class="dv">16</span>,<span class="dv">17</span>,<span class="dv">18</span>,<span class="dv">19</span>,<span class="dv">20</span>,<span class="dv">21</span>,<span class="dv">22</span>,<span class="dv">23</span>,<span class="dv">24</span>,<span class="dv">25</span>,<span class="dv">26</span>]</span></code></pre></div>
<p>but if we tie the knot using sharing, all the cells in the list are actually the same:</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> <span class="kw">let</span> cyclic <span class="ot">=</span> cons cyclic</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> <span class="fu">take</span> <span class="dv">20</span> cyclic</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"/>[<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>,<span class="dv">27</span>]</span></code></pre></div>
<p>We can achieve the same using <a href="https://hackage.haskell.org/package/base/docs/Data-Function.html#v:fix"><code>fix</code> from <code>Data.Function</code></a>:</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> <span class="kw">import</span> <span class="dt">Data.Function</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> <span class="fu">take</span> <span class="dv">20</span> (fix cons)</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"/>[<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>,<span class="dv">28</span>]</span></code></pre></div>
<p>I explore these heap structures more visually in a <a href="https://www.youtube.com/playlist?list=PL4FcLyLhO9jggmkqJyJ2i9pCSiDpwKiVu">series of screencasts</a>.</p>
<p>So with <code>propriocept</code> we can distinguish different heap objects, and also recognize when we come across the same heap object again.</p>
<h3 id="left-recursion-through-sharing-cont.">Left-recursion through sharing (cont.)</h3>
<p>With that we return to our parser. We define a smart constructor for the new <code>Parser</code> that passes the unique from <code>propriocept</code> to the underlying parser’s <code>memoise</code> function:</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"/><span class="ot">withMemo ::</span> <span class="dt">P.Parser</span> <span class="dt">Unique</span> tok a <span class="ot">-></span> <span class="dt">Parser</span> tok a</span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"/>withMemo p <span class="ot">=</span> propriocept <span class="op">$</span> \u <span class="ot">-></span> <span class="dt">MkP</span> <span class="op">$</span> P.memoise u p</span></code></pre></div>
<p>If we now use this in the definition of all possibly recursive parsers, then the necessary calls to <code>memoise</code> will be in place:</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"/><span class="kw">instance</span> <span class="dt">Functor</span> (<span class="dt">Parser</span> tok) <span class="kw">where</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"/> <span class="fu">fmap</span> f p <span class="ot">=</span> withMemo (<span class="fu">fmap</span> f (unP p))</span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"/></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"/><span class="kw">instance</span> <span class="dt">Applicative</span> (<span class="dt">Parser</span> tok) <span class="kw">where</span></span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"/> <span class="fu">pure</span> x <span class="ot">=</span> <span class="dt">MkP</span> (<span class="fu">pure</span> x)</span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"/> p1 <span class="op">&lt;*></span> p2 <span class="ot">=</span> withMemo (unP p1 <span class="op">&lt;*></span> unP p2)</span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"/></span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"/><span class="kw">instance</span> <span class="dt">Alternative</span> (<span class="dt">Parser</span> tok) <span class="kw">where</span></span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"/> empty <span class="ot">=</span> <span class="dt">MkP</span> empty</span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"/> p1 <span class="op">&lt;|></span> p2 <span class="ot">=</span> withMemo (unP p1 <span class="op">&lt;|></span> unP p2)</span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"/></span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"/><span class="kw">instance</span> <span class="dt">Monad</span> (<span class="dt">Parser</span> tok) <span class="kw">where</span></span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"/> <span class="fu">return</span> <span class="ot">=</span> <span class="fu">pure</span></span>
<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"/> p1 <span class="op">>>=</span> f <span class="ot">=</span> withMemo <span class="op">$</span> unP p1 <span class="op">>>=</span> unP <span class="op">.</span> f</span></code></pre></div>
<p>And indeed, it works (see <a href="https://github.com/nomeata/left-rec-parse/blob/master/RParser.hs"><code>RParser.hs</code></a> for the full code):</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> <span class="kw">let</span> aaa <span class="ot">=</span> aaa <span class="op">&lt;*</span> tok <span class="ch">'a'</span> <span class="op">&lt;|></span> <span class="fu">pure</span> ()</span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> parse aaa <span class="st">"aaaa"</span></span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"/><span class="dt">Just</span> ()</span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> parse aaa <span class="st">"aabaa"</span></span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"/><span class="dt">Nothing</span></span></code></pre></div>
<h3 id="a-larger-example">A larger example</h3>
<p>Let us try this on a larger example, and parse (simple) BNF grammars. Here is a data type describing them</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"/><span class="kw">type</span> <span class="dt">Ident</span> <span class="ot">=</span> <span class="dt">String</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"/><span class="kw">type</span> <span class="dt">RuleRhs</span> <span class="ot">=</span> [<span class="dt">Seq</span>]</span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"/><span class="kw">type</span> <span class="dt">Seq</span> <span class="ot">=</span> [<span class="dt">Atom</span>]</span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"/><span class="kw">data</span> <span class="dt">Atom</span> <span class="ot">=</span> <span class="dt">Lit</span> <span class="dt">String</span> <span class="op">|</span> <span class="dt">NonTerm</span> <span class="dt">Ident</span> <span class="kw">deriving</span> <span class="dt">Show</span></span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"/><span class="kw">type</span> <span class="dt">Rule</span> <span class="ot">=</span> (<span class="dt">Ident</span>, <span class="dt">RuleRhs</span>)</span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"/><span class="kw">type</span> <span class="dt">BNF</span> <span class="ot">=</span> [<span class="dt">Rule</span>]</span></code></pre></div>
<p>For the concrete syntax, I’d like to be able to parse something like</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"/><span class="ot">numExp ::</span> <span class="dt">String</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"/>numExp <span class="ot">=</span> <span class="fu">unlines</span></span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"/> [ <span class="st">"term := sum;"</span></span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"/> , <span class="st">"pdigit := '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';"</span></span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"/> , <span class="st">"digit := '0' | pdigit;"</span></span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"/> , <span class="st">"pnum := pdigit | pnum digit;"</span></span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"/> , <span class="st">"num := '0' | pnum;"</span></span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"/> , <span class="st">"prod := atom | atom '*' prod;"</span></span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"/> , <span class="st">"sum := prod | prod '+' sum;"</span></span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"/> , <span class="st">"atom := num | '(' term ')';"</span></span>
<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"/> ]</span></code></pre></div>
<p>so here is a possible parser; mostly straight-forward use of parser combinator:</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"/><span class="kw">type</span> <span class="dt">P</span> <span class="ot">=</span> <span class="dt">Parser</span> <span class="dt">Char</span></span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"/></span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"/><span class="ot">snoc ::</span> [a] <span class="ot">-></span> a <span class="ot">-></span> [a]</span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"/>snoc xs x <span class="ot">=</span> xs <span class="op">++</span> [x]</span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"/></span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"/><span class="ot">l ::</span> <span class="dt">P</span> a <span class="ot">-></span> <span class="dt">P</span> a</span>
<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"/>l p <span class="ot">=</span> p <span class="op">&lt;|></span> l p <span class="op">&lt;*</span> sat <span class="fu">isSpace</span></span>
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"/><span class="ot">quote ::</span> <span class="dt">P</span> <span class="dt">Char</span></span>
<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"/>quote <span class="ot">=</span> tok <span class="ch">'\''</span></span>
<span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"/><span class="ot">quoted ::</span> <span class="dt">P</span> a <span class="ot">-></span> <span class="dt">P</span> a</span>
<span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"/>quoted p <span class="ot">=</span> quote <span class="op">*></span> p <span class="op">&lt;*</span> quote</span>
<span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"/><span class="ot">str ::</span> <span class="dt">P</span> <span class="dt">String</span></span>
<span id="cb18-13"><a href="#cb18-13" aria-hidden="true" tabindex="-1"/>str <span class="ot">=</span> some (sat (<span class="fu">not</span> <span class="op">.</span> (<span class="op">==</span> <span class="ch">'\''</span>)))</span>
<span id="cb18-14"><a href="#cb18-14" aria-hidden="true" tabindex="-1"/><span class="ot">ident ::</span> <span class="dt">P</span> <span class="dt">Ident</span></span>
<span id="cb18-15"><a href="#cb18-15" aria-hidden="true" tabindex="-1"/>ident <span class="ot">=</span> some (sat (\c <span class="ot">-></span> <span class="fu">isAlphaNum</span> c <span class="op">&amp;&amp;</span> <span class="fu">isAscii</span> c))</span>
<span id="cb18-16"><a href="#cb18-16" aria-hidden="true" tabindex="-1"/><span class="ot">atom ::</span> <span class="dt">P</span> <span class="dt">Atom</span></span>
<span id="cb18-17"><a href="#cb18-17" aria-hidden="true" tabindex="-1"/>atom <span class="ot">=</span> <span class="dt">Lit</span> <span class="op">&lt;$></span> l (quoted str)</span>
<span id="cb18-18"><a href="#cb18-18" aria-hidden="true" tabindex="-1"/> <span class="op">&lt;|></span> <span class="dt">NonTerm</span> <span class="op">&lt;$></span> l ident</span>
<span id="cb18-19"><a href="#cb18-19" aria-hidden="true" tabindex="-1"/><span class="ot">eps ::</span> <span class="dt">P</span> ()</span>
<span id="cb18-20"><a href="#cb18-20" aria-hidden="true" tabindex="-1"/>eps <span class="ot">=</span> void <span class="op">$</span> l (tok <span class="ch">'ε'</span>)</span>
<span id="cb18-21"><a href="#cb18-21" aria-hidden="true" tabindex="-1"/><span class="ot">sep ::</span> <span class="dt">P</span> ()</span>
<span id="cb18-22"><a href="#cb18-22" aria-hidden="true" tabindex="-1"/>sep <span class="ot">=</span> void <span class="op">$</span> some (sat <span class="fu">isSpace</span>)</span>
<span id="cb18-23"><a href="#cb18-23" aria-hidden="true" tabindex="-1"/><span class="ot">sq ::</span> <span class="dt">P</span> <span class="dt">Seq</span></span>
<span id="cb18-24"><a href="#cb18-24" aria-hidden="true" tabindex="-1"/>sq <span class="ot">=</span> [] <span class="op">&lt;$</span> eps</span>
<span id="cb18-25"><a href="#cb18-25" aria-hidden="true" tabindex="-1"/> <span class="op">&lt;|></span> snoc <span class="op">&lt;$></span> sq <span class="op">&lt;*</span> sep <span class="op">&lt;*></span> atom</span>
<span id="cb18-26"><a href="#cb18-26" aria-hidden="true" tabindex="-1"/> <span class="op">&lt;|></span> <span class="fu">pure</span> <span class="op">&lt;$></span> atom</span>
<span id="cb18-27"><a href="#cb18-27" aria-hidden="true" tabindex="-1"/><span class="ot">ruleRhs ::</span> <span class="dt">P</span> <span class="dt">RuleRhs</span></span>
<span id="cb18-28"><a href="#cb18-28" aria-hidden="true" tabindex="-1"/>ruleRhs <span class="ot">=</span> <span class="fu">pure</span> <span class="op">&lt;$></span> sq</span>
<span id="cb18-29"><a href="#cb18-29" aria-hidden="true" tabindex="-1"/> <span class="op">&lt;|></span> snoc <span class="op">&lt;$></span> ruleRhs <span class="op">&lt;*</span> l (tok <span class="ch">'|'</span>) <span class="op">&lt;*></span> sq</span>
<span id="cb18-30"><a href="#cb18-30" aria-hidden="true" tabindex="-1"/><span class="ot">rule ::</span> <span class="dt">P</span> <span class="dt">Rule</span></span>
<span id="cb18-31"><a href="#cb18-31" aria-hidden="true" tabindex="-1"/>rule <span class="ot">=</span> (,) <span class="op">&lt;$></span> l ident <span class="op">&lt;*</span> l (tok <span class="ch">':'</span> <span class="op">*></span> tok <span class="ch">'='</span>) <span class="op">&lt;*></span> ruleRhs <span class="op">&lt;*</span> l (tok <span class="ch">';'</span>)</span>
<span id="cb18-32"><a href="#cb18-32" aria-hidden="true" tabindex="-1"/><span class="ot">bnf ::</span> <span class="dt">P</span> <span class="dt">BNF</span></span>
<span id="cb18-33"><a href="#cb18-33" aria-hidden="true" tabindex="-1"/>bnf <span class="ot">=</span> <span class="fu">pure</span> <span class="op">&lt;$></span> rule</span>
<span id="cb18-34"><a href="#cb18-34" aria-hidden="true" tabindex="-1"/> <span class="op">&lt;|></span> snoc <span class="op">&lt;$></span> bnf <span class="op">&lt;*></span> rule</span></code></pre></div>
<p>I somewhat sillily used <code>snoc</code> rather than <code>(:)</code> to build my lists, just so that I can show off all the left-recursion in this grammar.</p>
<h3 id="sharing-is-tricky">Sharing is tricky</h3>
<p>Let’s try it:</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> parse bnf numExp</span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"/><span class="op">^</span><span class="dt">CInterrupted</span><span class="op">.</span></span></code></pre></div>
<p>What a pity, it does not work! What went wrong?</p>
<p>The underlying library can handle left-recursion if it can recognize it by seeing a <code>memoise</code> label passed again. This works fine in all the places where we re-use a parser definition (e.g. in <code>bnf</code>), but it really requires that values are shared!</p>
<p>If we look carefully at our definition of <code>l</code> (which parses a lexeme, i.e. something possibly followed by whitespace), then it recurses via a fresh function call, and the program will keep expanding the definition – just like the <code>acyclic</code> above:</p>
<div class="sourceCode" id="cb20"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"/><span class="ot">l ::</span> <span class="dt">P</span> a <span class="ot">-></span> <span class="dt">P</span> a</span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"/>l p <span class="ot">=</span> p <span class="op">&lt;|></span> l p <span class="op">&lt;*</span> sat <span class="fu">isSpace</span></span></code></pre></div>
<p>The fix (sic!) is to make sure that the recursive call is using the parser we are currently defining, which we can easily do with a local definition:</p>
<div class="sourceCode" id="cb21"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"/><span class="ot">l ::</span> <span class="dt">P</span> a <span class="ot">-></span> <span class="dt">P</span> a</span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"/>l p <span class="ot">=</span> p'</span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"/> <span class="kw">where</span> p' <span class="ot">=</span> p <span class="op">&lt;|></span> p' <span class="op">&lt;*</span> sat <span class="fu">isSpace</span></span></code></pre></div>
<p>With this little fix, the parser can parse the example grammar:</p>
<div class="sourceCode" id="cb22"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> parse bnf numExp</span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"/><span class="dt">Just</span> [(<span class="st">"term"</span>,[[<span class="dt">NonTerm</span> <span class="st">"sum"</span>]]),(<span class="st">"pdigit"</span>,[[<span class="dt">Lit</span> <span class="st">"1"</span>],…</span></code></pre></div>
<h3 id="going-meta">Going meta</h3>
<p>The main demonstration is over, but since we now have already have a parser for grammar descriptions at hand, let’s go a bit further and <em>dynamically</em> construct a parser from such a description. The parser should only accept strings according to that grammar, and return a parse tree annotated with the non-terminals used:</p>
<div class="sourceCode" id="cb23"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"/><span class="ot">interp ::</span> <span class="dt">BNF</span> <span class="ot">-></span> <span class="dt">P</span> <span class="dt">String</span></span>
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"/>interp bnf <span class="ot">=</span> parsers <span class="op">M.!</span> start</span>
<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"/> <span class="kw">where</span></span>
<span id="cb23-4"><a href="#cb23-4" aria-hidden="true" tabindex="-1"/><span class="ot"> start ::</span> <span class="dt">Ident</span></span>
<span id="cb23-5"><a href="#cb23-5" aria-hidden="true" tabindex="-1"/> start <span class="ot">=</span> <span class="fu">fst</span> (<span class="fu">head</span> bnf)</span>
<span id="cb23-6"><a href="#cb23-6" aria-hidden="true" tabindex="-1"/></span>
<span id="cb23-7"><a href="#cb23-7" aria-hidden="true" tabindex="-1"/><span class="ot"> parsers ::</span> <span class="dt">M.Map</span> <span class="dt">Ident</span> (<span class="dt">P</span> <span class="dt">String</span>)</span>
<span id="cb23-8"><a href="#cb23-8" aria-hidden="true" tabindex="-1"/> parsers <span class="ot">=</span> M.fromList [ (i, parseRule i rhs) <span class="op">|</span> (i, rhs) <span class="ot">&lt;-</span> bnf ]</span>
<span id="cb23-9"><a href="#cb23-9" aria-hidden="true" tabindex="-1"/></span>
<span id="cb23-10"><a href="#cb23-10" aria-hidden="true" tabindex="-1"/><span class="ot"> parseRule ::</span> <span class="dt">Ident</span> <span class="ot">-></span> <span class="dt">RuleRhs</span> <span class="ot">-></span> <span class="dt">P</span> <span class="dt">String</span></span>
<span id="cb23-11"><a href="#cb23-11" aria-hidden="true" tabindex="-1"/> parseRule ident rhs <span class="ot">=</span> trace <span class="op">&lt;$></span> asum (<span class="fu">map</span> parseSeq rhs)</span>
<span id="cb23-12"><a href="#cb23-12" aria-hidden="true" tabindex="-1"/> <span class="kw">where</span> trace s <span class="ot">=</span> ident <span class="op">++</span> <span class="st">"("</span> <span class="op">++</span> s <span class="op">++</span> <span class="st">")"</span></span>
<span id="cb23-13"><a href="#cb23-13" aria-hidden="true" tabindex="-1"/></span>
<span id="cb23-14"><a href="#cb23-14" aria-hidden="true" tabindex="-1"/><span class="ot"> parseSeq ::</span> <span class="dt">Seq</span> <span class="ot">-></span> <span class="dt">P</span> <span class="dt">String</span></span>
<span id="cb23-15"><a href="#cb23-15" aria-hidden="true" tabindex="-1"/> parseSeq <span class="ot">=</span> <span class="fu">fmap</span> <span class="fu">concat</span> <span class="op">.</span> <span class="fu">traverse</span> parseAtom</span>
<span id="cb23-16"><a href="#cb23-16" aria-hidden="true" tabindex="-1"/></span>
<span id="cb23-17"><a href="#cb23-17" aria-hidden="true" tabindex="-1"/><span class="ot"> parseAtom ::</span> <span class="dt">Atom</span> <span class="ot">-></span> <span class="dt">P</span> <span class="dt">String</span></span>
<span id="cb23-18"><a href="#cb23-18" aria-hidden="true" tabindex="-1"/> parseAtom (<span class="dt">Lit</span> s) <span class="ot">=</span> <span class="fu">traverse</span> tok s</span>
<span id="cb23-19"><a href="#cb23-19" aria-hidden="true" tabindex="-1"/> parseAtom (<span class="dt">NonTerm</span> i) <span class="ot">=</span> parsers <span class="op">M.!</span> i</span></code></pre></div>
<p>Let’s see it in action (full code in <a href="https://github.com/nomeata/left-rec-parse/blob/master/BNFEx.hs"><code>BNFEx.hs</code></a>):</p>
<div class="sourceCode" id="cb24"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> <span class="dt">Just</span> bnfp <span class="ot">=</span> parse bnf numExp</span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> <span class="op">:</span>t bnfp</span>
<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"/><span class="ot">bnfp ::</span> <span class="dt">BNF</span></span>
<span id="cb24-4"><a href="#cb24-4" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> parse (inter</span>
<span id="cb24-5"><a href="#cb24-5" aria-hidden="true" tabindex="-1"/><span class="fu">interact</span> interp</span>
<span id="cb24-6"><a href="#cb24-6" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> parse (interp bnfp) <span class="st">"12+3*4"</span></span>
<span id="cb24-7"><a href="#cb24-7" aria-hidden="true" tabindex="-1"/><span class="dt">Just</span> <span class="st">"term(sum(prod(atom(num(pnum(pnum(pdigit(1))digit(pdigit(2))))))+sum(prod(atom(num(pnum(pdigit(3))))*prod(atom(num(pnum(pdigit(4)))))))))"</span></span>
<span id="cb24-8"><a href="#cb24-8" aria-hidden="true" tabindex="-1"/>ghci<span class="op">></span> parse (interp bnfp) <span class="st">"12+3*4+"</span></span>
<span id="cb24-9"><a href="#cb24-9" aria-hidden="true" tabindex="-1"/><span class="dt">Nothing</span></span></code></pre></div>
<p>It is worth noting that the <code>numExp</code> grammar is also left-recursive, so implementing <code>interp</code> with a conventional parser combinator library would not have worked. But thanks to our <code>propriocept</code> tick, it does! Again, the sharing is important; in the code above it is the map <code>parsers</code> that is defined in terms of itself, and will ensure that the left-recursive productions will work.</p>
<h3 id="closing-thoughts">Closing thoughts</h3>
<p>I am using <code>unsafePerformIO</code>, so I need to justify its use. Clearly, <code>propriocept</code> is <em>not</em> a pure function, and it’s type is a lie. In general, using it will break the nice equational properties of Haskell, as we have seen in our experiments with <code>cons</code>.</p>
<p>In the case of our parser library, however, we use it in specific ways, namely to feed a fresh name to <code>memoise</code>. Assuming the underlying parser library’s behavior does not observably depend on where and with which key <code>memoise</code> is used, this results in a properly pure interface, and all is well again. (NB: I did not investigate if this assumption actually holds for the parser library used here, or if it may for example affect the order of parse trees returned.)</p>
<p>I also expect that this implementation, which will memoise <em>every</em> parser involved, will be rather slow. It seems plausible to analyze the graph structure and figure out which <code>memoise</code> calls are actually needed to break left-recursion (at least if we drop the <code>Monad</code> instance or always memoise <code>>>=</code>).</p>
<p>If you liked this post, you might enjoy reading <a href="https://doi.org/10.1145/3607853">the paper about rec-def</a>, watch one of my talks about it (MuniHac, BOBKonf, ICFP23; the presentation evolved over time), or if you just want to see more about how things are laid out on Haskell’s heap, go to <a href="https://www.youtube.com/playlist?list=PL4FcLyLhO9jggmkqJyJ2i9pCSiDpwKiVu">my screen casts exploring the Haskell heap</a>.</p></description>
<pubDate>Sun, 10 Sep 2023 17:16:08 -0700</pubDate>
</item>
<item>
<title>Generating bibtex bibliographies from DOIs via DBLP</title>
<link>https://www.joachim-breitner.de/blog/806-Generating_bibtex_bibliographies_from_DOIs_via_DBLP</link>
<guid>https://www.joachim-breitner.de/blog/806-Generating_bibtex_bibliographies_from_DOIs_via_DBLP</guid>
<comments>https://www.joachim-breitner.de/blog/806-Generating_bibtex_bibliographies_from_DOIs_via_DBLP#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><p>I sometimes <a href="https://www.joachim-breitner.de/publications">write papers</a> and part of paper writing is assembling the bibliography. In my case, this is done using BibTeX. So when I need to add another citation, I have to find suitable data in Bibtex format.</p>
<p>Often I copy snippets from <code>.bib</code> files from earlier paper.</p>
<p>Or I search for the paper on <a href="https://dblp.org/">DBLP</a>, which in my experience has highest quality BibTeX entries and best coverage of computer science related publications, copy it to my <code>.bib</code> file, and change the key to whatever I want to refer the paper by.</p>
<p>But in the days of pervasive use of DOIs (digital object identifiers) for almost all publications, manually keeping the data in bibtex files seems outdated. Instead I’d rather just put the two pieces of data I care about: the <em>key</em> that I want to use for citation, and the <em>doi</em>. The rest I do not want to be bothered with.</p>
<p>So I wrote a small script that takes a <code>.yaml</code> file like</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode yaml"><code class="sourceCode yaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"/><span class="fu">entries</span><span class="kw">:</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">unsafePerformIO</span><span class="kw">:</span><span class="at"> 10.1007/10722298_3</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">dejafu</span><span class="kw">:</span><span class="at"> 10.1145/2804302.2804306</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">runST</span><span class="kw">:</span><span class="at"> 10.1145/3158152</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">quickcheck</span><span class="kw">:</span><span class="at"> 10.1145/351240.351266</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">optimiser</span><span class="kw">:</span><span class="at"> 10.1016/S0167-6423(97)00029-4</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">sabry</span><span class="kw">:</span><span class="at"> 10.1017/s0956796897002943</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">concurrent</span><span class="kw">:</span><span class="at"> 10.1145/237721.237794</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">launchbury</span><span class="kw">:</span><span class="at"> 10.1145/158511.158618</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">datafun</span><span class="kw">:</span><span class="at"> 10.1145/2951913.2951948</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">observable-sharing</span><span class="kw">:</span><span class="at"> 10.1007/3-540-46674-6_7</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">kildall-73</span><span class="kw">:</span><span class="at"> 10.1145/512927.512945</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">kam-ullman-76</span><span class="kw">:</span><span class="at"> 10.1145/321921.321938</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">spygame</span><span class="kw">:</span><span class="at"> 10.1145/3371101</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">cocaml</span><span class="kw">:</span><span class="at"> 10.3233/FI-2017-1473</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">secrets</span><span class="kw">:</span><span class="at"> 10.1017/S0956796802004331</span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">modular</span><span class="kw">:</span><span class="at"> 10.1017/S0956796817000016</span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">longley</span><span class="kw">:</span><span class="at"> 10.1145/317636.317775</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">nievergelt</span><span class="kw">:</span><span class="at"> 10.1145/800152.804906</span></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">runST2</span><span class="kw">:</span><span class="at"> 10.1145/3527326</span></span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">polakow</span><span class="kw">:</span><span class="at"> 10.1145/2804302.2804309</span></span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">lvars</span><span class="kw">:</span><span class="at"> 10.1145/2502323.2502326</span></span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">typesafe-sharing</span><span class="kw">:</span><span class="at"> 10.1145/1596638.1596653</span></span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">pure-functional</span><span class="kw">:</span><span class="at"> 10.1007/978-3-642-14162-1_17</span></span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">clairvoyant</span><span class="kw">:</span><span class="at"> 10.1145/3341718</span></span>
<span id="cb1-26"><a href="#cb1-26" aria-hidden="true" tabindex="-1"/><span class="fu">subs</span><span class="kw">:</span></span>
<span id="cb1-27"><a href="#cb1-27" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">replace</span><span class="kw">:</span><span class="at"> Peyton Jones</span></span>
<span id="cb1-28"><a href="#cb1-28" aria-hidden="true" tabindex="-1"/><span class="at"> </span><span class="fu">with</span><span class="kw">:</span><span class="at"> </span><span class="st">'{Peyton Jones}'</span></span></code></pre></div>
<p>and turns it into a nice <code>.bibtex</code> file:</p>
<pre class="shell"><code>$ ./doi2bib.py &lt; doibib.yaml > dblp.bib
$ head dblp.bib
@inproceedings{unsafePerformIO,
author = {Simon L. {Peyton Jones} and
Simon Marlow and
Conal Elliott},
editor = {Pieter W. M. Koopman and
Chris Clack},
title = {Stretching the Storage Manager: Weak Pointers and Stable Names in
Haskell},
booktitle = {Implementation of Functional Languages, 11th International Workshop,
IFL'99, Lochem, The Netherlands, September 7-10, 1999, Selected Papers},</code></pre>
<p>The last bit allows me to do some fine-tuning of the file, because unfortunately, not even DBLP BibTeX files are perfect, for example in the presence of two family names.</p>
<p>Now I have less moving parts to worry about, and a more consistent bibliography.</p>
<p>The script is rather small, so I’ll just share it here:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"/><span class="co">#!/usr/bin/env python3</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"/></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"/><span class="im">import</span> sys</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"/><span class="im">import</span> yaml</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"/><span class="im">import</span> requests</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"/><span class="im">import</span> requests_cache</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"/><span class="im">import</span> re</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"/></span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"/>requests_cache.install_cache(backend<span class="op">=</span><span class="st">'sqlite'</span>)</span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"/></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"/>data <span class="op">=</span> yaml.safe_load(sys.stdin)</span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"/></span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"/><span class="cf">for</span> key, doi <span class="kw">in</span> data[<span class="st">'entries'</span>].items():</span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"/> bib <span class="op">=</span> requests.get(<span class="ss">f"https://dblp.org/doi/</span><span class="sc">{</span>doi<span class="sc">}</span><span class="ss">.bib"</span>).text</span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"/> bib <span class="op">=</span> re.sub(<span class="st">'{DBLP.*,'</span>, <span class="st">'{'</span> <span class="op">+</span> key <span class="op">+</span> <span class="st">','</span>, bib)</span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"/> <span class="cf">for</span> subs <span class="kw">in</span> data[<span class="st">'subs'</span>]:</span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"/> bib <span class="op">=</span> re.sub(subs[<span class="st">'replace'</span>], subs[<span class="st">'with'</span>], bib)</span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"/> <span class="bu">print</span>(bib)</span></code></pre></div>
<p>There are similar projects out there, e.g. <a href="https://github.com/cr-marcstevens/dblpbibtex"><code>dblpbibtex</code></a> in C++ and <a href="https://github.com/PJK/dblpbib"><code>dblpbib</code></a> in Ruby. These allow direct use of <code>\cite{DBLP:rec/conf/isit/BreitnerS20}</code> in Latex, which is also nice, but for now I like to choose more speaking citation keys myself.</p></description>
<pubDate>Wed, 12 Jul 2023 14:32:19 +0200</pubDate>
</item>
<item>
<title>ICFP Pearl preprint on rec-def</title>
<link>https://www.joachim-breitner.de/blog/805-ICFP_Pearl_preprint_on_rec-def</link>
<guid>https://www.joachim-breitner.de/blog/805-ICFP_Pearl_preprint_on_rec-def</guid>
<comments>https://www.joachim-breitner.de/blog/805-ICFP_Pearl_preprint_on_rec-def#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><p>I submitted a Functional Pearl to this <a href="https://icfp23.sigplan.org/">year’s ICFP</a> and it got accepted!</p>
<p>It is about the idea of using Haskell’s inherent ability to define <em>recursive equations</em>, and use them for more than just functions and lazy data structures. I blogged about this before (<a href="https://www.joachim-breitner.de/blog/792-More_recursive_definitions">introducing the idea</a>, <a href="https://www.joachim-breitner.de/blog/793-rec-def__Behind_the_scenes">behind the scenes</a>, applications to <a href="https://www.joachim-breitner.de/blog/794-rec-def__Program_analysis_case_study">program analysis</a>, <a href="https://www.joachim-breitner.de/blog/795-rec-def__Dominators_case_study">graph algorithms</a> and <a href="https://www.joachim-breitner.de/blog/796-rec-def__Minesweeper_case_study">minesweeper</a>), but hopefully the paper brings out the idea even more clearly. The constructive feedback from a few friendly readers (Claudio, Sebastian, and also the anonymous reviewers) certainly improved the paper.</p>
<blockquote>
<h3 id="abstract">Abstract</h3>
<p>Haskell’s laziness allows the programmer to solve some problems naturally and declaratively via recursive equations. Unfortunately, if the input is “too recursive”, these very elegant idioms can fall into the dreaded black hole, and the programmer has to resort to more pedestrian approaches.</p>
<p>It does not have to be that way: We built variants of common pure data structures (Booleans, sets) where recursive definitions are productive. Internally, the infamous unsafePerformIO is at work, but the user only sees a beautiful and pure API, and their pretty recursive idioms – magically – work again.</p>
</blockquote>
<p>If you are curious, please have a look at the <a href="//www.joachim-breitner.de/publications/rec-def-pearl.pdf">current version of the paper</a>. Any feedback is welcome; even more if it comes before July 11, because then I can include it in the camera ready version.</p>
<hr/>
<p>There are still some open questions around this work. What bothers me maybe most is the lack of a denotational semantics that unifies the partial order underlying the Haskell fragment, and the partial order underlying the domain of the embedded equations.</p>
<p>The crux of the probem is maybe best captured by this question:</p>
<blockquote>
<p>Imagine an untyped lambda calculus with constructors, lazy evaluation, and an operation <code>rseq</code> that recursively evaluates constructors, but terminates in the presence of cycles. So for example</p>
<pre><code>rseq (let x = 1 :: x in x ) ≡ ()
rseq (let x () = 1 :: x () in x ()) ≡ ⊥</code></pre>
<p>In this language, knot tying is observable. What is the “nicest” denotational semantics.</p>
</blockquote>
<p><strong>Update</strong>: I made some progress via a <a href="https://discourse.haskell.org/t/icfp-pearl-on-rec-def/6626/14?u=nomeata">discussion on the Haskell Discource</a> and started <a href="https://www.joachim-breitner.de/publications/rec-def-denotational.pdf">some rough notes on a denotational semantics</a>.</p></description>
<pubDate>Thu, 22 Jun 2023 18:21:11 +0200</pubDate>
</item>
<item>
<title>The curious case of the half-half Bitcoin ECDSA nonces</title>
<link>https://www.joachim-breitner.de/blog/804-The_curious_case_of_the_half-half_Bitcoin_ECDSA_nonces</link>
<guid>https://www.joachim-breitner.de/blog/804-The_curious_case_of_the_half-half_Bitcoin_ECDSA_nonces</guid>
<comments>https://www.joachim-breitner.de/blog/804-The_curious_case_of_the_half-half_Bitcoin_ECDSA_nonces#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><p>This is the week of the Gulaschprogrammiernacht, a yearly Chaos Computer Club even in Karlsruhe, so it was exactly a year ago that I sat in my AirBnB room and went over the slides for my talk <a href="https://media.ccc.de/v/gpn20-66-lattice-attacks-on-ethereum-bitcoin-and-https">“Lattice Attacks on Ethereum, Bitcoin, and HTTPS”</a> that I would give there.</p>
<p>It reports on <a href="https://eprint.iacr.org/2019/023">research</a> done with <a href="https://cseweb.ucsd.edu/~nadiah/">Nadia Heninger</a> while I was in Phildalephia, and I really liked giving that talk: At some point we look at some rather odd signatures we found on the bitcoin blockchain, and part of the signature (the “nonce”) happens to share some bytes with the secret key. A clear case of some buffer overlap in a memory unsafe language, which I, as a fan of languages like Haskell, are very happy to sneer at!</p>
<figure>
<img src="/various/biased-nonces-slide-17.svg" alt="A sneery slide"/>
<figcaption aria-hidden="true">A sneery slide</figcaption>
</figure>
<p>But last year, as I was going over <a href="https://www.joachim-breitner.de/publications/BiasedNonces-GPN20-2022-05-20.pdf">the slides</a> I looked at the raw data again for some reason, and I found that we overlooked something: Not only was the the upper half ot the nonce equal to the lower half of the secret key, but he lower half of the nonce was also equal to the upper half of the message hash!</p>
<p>This now looks much less like an accident to me, and more like a (overly) simple form of deterministic nonce creation… so much for my nice anecdote. (I still used the anecdote in my talk, followed up with an “actually”.)</p>
<p>When I told Nadia about this, she got curious as well, and quickly saw that from a signature with such a nonce, one can rather easily extract the secret key. So together with her student Dylan Rowe, we implemented this analysis and searched the bitcoin blockchain for more instance of such signatures. We did find a few, and were even able to trace them back to a somewhat infamous bitcoin activist going under the pseudonym Amaclin.</p>
<p>This research and sleuthing turned into another paper, <a href="https://eprint.iacr.org/2023/841">“The curious case of the half-half Bitcoin ECDSA nonces”</a>, to be presented at AfricaCrypt 2023. Enjoy!</p></description>
<pubDate>Wed, 07 Jun 2023 08:42:28 +0200</pubDate>
</item>
<item>
<title>Giving back to OPLSS</title>
<link>https://www.joachim-breitner.de/blog/803-Giving_back_to_OPLSS</link>
<guid>https://www.joachim-breitner.de/blog/803-Giving_back_to_OPLSS</guid>
<comments>https://www.joachim-breitner.de/blog/803-Giving_back_to_OPLSS#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><p>Nine years ago, when I was a PhD student, I attended the <a href="https://www.cs.uoregon.edu/research/summerschool/">Oregon Programming Language Summer School</a> in Eugene. I had a great time and learned a lot.</p>
<figure>
<img src="/various/OPLSS14Photo-small.jpg" alt="The OPLSS’14 full image"/>
<figcaption aria-hidden="true">The OPLSS’14 <a href="/various/OPLSS14Photo.jpg">full image</a></figcaption>
</figure>
<p>Learning some of the things I learned there, and meeting some of the people I met there, also led to me graduating, which led to me becoming <a href="https://www.cis.upenn.edu/~plclub/">a PostDoc at UPenn</a>, which led to me later joining DFINITY to implement the <a href="https://github.com/dfinity/motoko/">Motoko programming language</a> and help design and specify the public interface of their “Internet Computer”, including the <a href="https://medium.com/dfinity/how-internet-computer-responses-are-certified-as-authentic-2ff1bb1ea659">response certification</a> (<a href="https://www.youtube.com/watch?v=mZbFhRIHIiY">video</a>).</p>
<p>So when the <a href="https://icdevs.org/">ICDevs</a> non-profit offered a <a href="https://icdevs.org/bounties/2023/01/09/36-Signing-Tree-and-DER-Encoding.html">development bounty for a Motoko library implementing the merkle trees involved in certification</a>, this sounded like a fun little coding task, so I completed it; likely with less effort than it would have taken someone who first had to get into these topics.</p>
<p>The bounty was quite generous, at US$ 10k, and I was too vain to “just” have it donated to some large charity, as I <a href="//www.joachim-breitner.de/blog/798-Pro-charity_consulting">recently with a few coding and consulting gigs</a>, and looked for more personal. So, the ICDevs guys and I agreed to donate the money to <a href="https://www.cs.uoregon.edu/research/summerschool/summer23/">this year’s OPLSS</a>, where I heard it can cover the cost of about 8 students, and hopefully helps the PL cause.</p>
<p>(You will not find us listed as sponsors because for some reason, a “donation” instead of “sponsorship” comes with less strings attached to the organizers.)</p></description>
<pubDate>Sun, 04 Jun 2023 14:18:33 +0200</pubDate>
</item>
<item>
<title>More thoughts on a bootstrappable GHC</title>
<link>https://www.joachim-breitner.de/blog/802-More_thoughts_on_a_bootstrappable_GHC</link>
<guid>https://www.joachim-breitner.de/blog/802-More_thoughts_on_a_bootstrappable_GHC</guid>
<comments>https://www.joachim-breitner.de/blog/802-More_thoughts_on_a_bootstrappable_GHC#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><p>The <a href="https://bootstrappable.org/">bootstrappable builds project</a> tries to find ways of building all our software from source, without relying on binary artifacts. A noble goal, and one that is often thwarted by languages with self-hosting compilers, like GHC: In order to build GHC, you need GHC. A <a href="https://github.com/NixOS/nixpkgs/pull/227914">Pull Request against nixpkgs</a>, adding first steps of the bootstrapping pipeline, reminded me of the issue with GHC, which I have noted down <a href="https://www.joachim-breitner.de/blog/748-Thoughts_on_bootstrapping_GHC">some thoughts about</a> before and I played around a bit more.</p>
<p>The most promising attempt to bootstrap GHC was done by <a href="https://elephly.net/posts/2017-01-09-bootstrapping-haskell-part-1.html">rekado in 2017</a>. He observed that Hugs is maybe the most recently maintained bootstrappable (since written in C) Haskell compiler, but noticed that “it cannot deal with mutually recursive module dependencies, which is a feature that even the earliest versions of GHC rely on. This means that running a variant of GHC inside of Hugs is not going to work without major changes.” He then tries to bootstrap another very old Haskell compiler (nhc) with Hugs, and makes good but incomplete progress.</p>
<p>This made me wonder: What <em>if</em> Hugs supported mutually recursive modules? Would that make a big difference? Anthony Clayden <a href="https://www.joachim-breitner.de/blog/748-Thoughts_on_bootstrapping_GHC#comment_3">keeps advocating Hugs as a viable Haskell implementation</a>, so maybe if that was the main blocker, then adding support to Hugs for that is probably not too hard (at least in a compile-the-strongly-connected-component-as-one-unit mode) and worthwhile?</p>
<h3 id="all-of-ghc-in-one-file">All of GHC in one file?</h3>
<p>That reminded me of a situation I was in before, where I had to combine multiple Haskell modules into one before: For my talk <a href="https://www.youtube.com/watch?v=2kKvVe673MA">“Lock-step simulation is child’s play”</a> I wrote a multi-player game, a simulation environment for it, and a presentation tool around it, all in the <a href="https://code.world">CodeWorld</a> programming environment, which supports only a single module. So I hacked the a small tool <a href="https://github.com/nomeata/hs-all-in-one/">hs-all-in-one</a> that takes multiple Haskell modules and combines them into one, mangling the names to avoid name clashes.</p>
<p>This made me wonder: Can I turn <em>all</em> of GHC into one module, and compile that?</p>
<p>At this point I have probably left the direct path towards bootstrapping, but I kinda good hooked.</p>
<ol type="1">
<li><p>Using GHC’s <code>hadrian/ghci</code> tool, I got it to produce the necessary generated files (e.g. from <code>happy</code> grammars) and spit out the lit of modules that make up GHC, which I could feed to <code>hs-all-in-one</code>.</p></li>
<li><p>It uses <a href="https://hackage.haskell.org/package/haskell-src-exts"><code>haskell-src-exts</code></a> for parsing, and it was almost able to parse all of that. It has a different opinion about how <a href="https://github.com/haskell-suite/haskell-src-exts/issues/468"><code>MultiWayIf</code> should be indented</a>, whether <a href="https://github.com/haskell-suite/haskell-src-exts/issues/469"><code>EmptyCase</code> needs <code>{}</code></a> and <a href="https://github.com/haskell-suite/haskell-src-exts/issues/470">issues</a> <a href="https://github.com/haskell-suite/haskell-src-exts/issues/471">pretty-printing</a> some promoted values, but otherwise the round-tripping worked fine, and I as able to generate a large file (680,000 loc, 41 MB) that passes GHC’s parser.</p></li>
<li><p>It also uses <a href="https://hackage.haskell.org/package/haskell-names"><code>haskell-names</code></a> to resolve names.</p>
<p>This library is less up-to-date with various Haskell features, so I added support for renaming in some pragmas (<code>ANN</code>, <code>SPECIALIZE</code>), pattern signatures etc.</p>
<p>For my previous use-case I could just combine all the imports, knowing that I would not introduce conflicts. For GHC, this is far from true: Some modules import <code>Data.Map.Strict</code>, others <code>Data.Map.Lazy</code>, and yet others introduce names that clash with stuff imported from the prelude… so I had to change the tool to fully qualify all imported values. This isn’t so bad, I can do that using <code>haskell-names</code>, if I somehow know what all the modules in <code>base</code>, <code>containers</code>, <code>transformers</code> and <code>array</code> export.</p>
<p>The <code>haskell-names</code> library itself comes with a hard-coded database of <code>base</code> exports, but it is incomplete and doesn’t help me with, say, <code>containers</code>.</p>
<p>I then wrote a little parser for the <code>.txt</code> files that <code>haddock</code> produces for the benefit of <code>hoogle</code>, and that are conveniently installed along the packages (at least on nix). This would have been great, if these files wouldn’t simply omit all <em>reexported</em> entities! I added some manual hacks (<code>Data.Map</code> has the same exports as <code>Data.IntMap</code>; <code>Prelude</code> exports all entities as known by <code>haskell-names</code>, but those that are also exported from <code>Data.List</code>, use the symbol from there…)</p>
<p>I played this game of whack-a-mole for a while, solving many of the problems that GHC’s renamer reports, but eventually stopped to write this blog post. I am fairly confident that this could be pulled through, though.</p></li>
</ol>
<h3 id="back-to-bootstrapping">Back to bootstrapping</h3>
<p>So what if we could pull this through? We’d have a very large code file that GHC may be able to compile to produce a <code>ghc</code> binary without exhausting my RAM. But that doesn’t help with bootstrapping yet.</p>
<p>If lack of support for recursive modules is all that Hugs is missing, we’d be done indeed. But quite contrary, it is probably the least of our worries, given that contemporary GHC uses many many other features not supported by Hugs.</p>
<p>Some of them a syntactic and can easily be rewritten to more normal Haskell in a preprocessing step (e.g. <code>MultiWayIf</code>).</p>
<p>Others are deep and hard (<code>GADTs</code>, Pattern synonyms, Type Families), and prohibit attempting to compile a current version of GHC (even if its all one module) with Hugs. So one would certainly have to go back in time and find a version of GHC that is not yet using all these features. For example, the <a href="https://gitlab.haskell.org/ghc/ghc/-/commit/889c084e943779e76d19f2ef5e970ff655f511eb">first use of GADTs</a> was introduced by Simon Marlow in 2011, so this suggests going back to at least GHC 7.0.4, maybe earlier.</p>
<p>Still, being able to mangle the source code before passing it to Hugs is probably a useful thing. This poses the question whether Hugs can compile such a tool; in particular, is it capable of compiling <code>haskell-src-exts</code>, which I am not too optimistic about either. Did someone check this already?</p>
<p>So one plan of attack could be</p>
<ol type="1">
<li><p>Identify an old version of GHC that</p>
<ul>
<li>One <em>can</em> use to bootstrap subsequent versions until today.</li>
<li>Is old enough to use as few features not supported by hugs as possible.</li>
<li>Is still new enough so that one can obtain a compatible toolchain.</li>
</ul></li>
<li><p>Wrangle the build system to tell you which files to compile, with which preprocessor flags etc.</p></li>
<li><p>Boostrap all pre-processing tools used by GHC (<code>cpphs</code> or use plan cpp, <code>happy</code>, <code>alex</code>).</p></li>
<li><p>For every language feature not supported by Hugs, either</p>
<ul>
<li>Implement it in Hugs,</li>
<li>Manually edit the source code to avoid compiling the problematic code, if it is optional (e.g. in an optimization pass)</li>
<li>Rewrite the problematic code</li>
<li>Write a pre-processing tool (like the one above) that compiles the feature away</li>
</ul></li>
<li><p>Similarly, since Hugs probably ships a <code>base</code> that is different than what GHC, or the libraries used by GHC expects, either adjust Hugs’ <code>base</code>, or modify the GHC code that uses it.</p></li>
</ol>
<p>My actual plan, though, for now is to throw these thoughts out, maybe make some noise on <a href="https://discourse.haskell.org/t/what-s-needed-to-bootstrap-ghc-with-hugs/6205">Discourse</a>, <a href="https://mastodon.online/@nomeata/110263917613134533">Mastodon</a>, <a href="https://twitter.com/nomeata/status/1651125309700206593">Twitter</a> and <a href="https://lobste.rs/s/di37ga">lobste.rs</a>, and then let it sit and hope someone else will pick it up.</p></description>
<pubDate>Wed, 26 Apr 2023 07:41:50 +0200</pubDate>
</item>
<item>
<title>rclone, WebDav, Mailbox.org</title>
<link>https://www.joachim-breitner.de/blog/801-rclone%2C_WebDav%2C_Mailbox_org</link>
<guid>https://www.joachim-breitner.de/blog/801-rclone%2C_WebDav%2C_Mailbox_org</guid>
<comments>https://www.joachim-breitner.de/blog/801-rclone%2C_WebDav%2C_Mailbox_org#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><p>Just a small googleable note for those trying to set this up too:</p>
<p>If you try to access your “My Files” <a href="https://mailbox.org/">mailbox.org</a> WebDav storage using <a href="https://rclone.org/webdav/"><code>rclone</code></a>, then the URL is not just <code>https://dav.mailbox.org/servlet/webdav.infostore/</code>, as written <a href="https://kb.mailbox.org/de/privat/datei-cloud-mailbox-org-drive/webdav-unter-linux">on the mailbox.org documentation</a>, but actually <code>https://dav.mailbox.org/servlet/webdav.infostore/Userstore/Your Name</code>.</p>
<p>(You can start with <code>https://dav.mailbox.org/servlet/webdav.infostore/</code> and then use <code>rclone ls</code> to find out the full path, but then it may be more convenient to change it to point directly to the “My Files” space where you can actually create files.)</p></description>
<pubDate>Wed, 19 Apr 2023 09:39:06 +0200</pubDate>
</item>
<item>
<title>Farewell quimby and fry, welcome richard</title>
<link>https://www.joachim-breitner.de/blog/800-Farewell_quimby_and_fry%2C_welcome_richard</link>
<guid>https://www.joachim-breitner.de/blog/800-Farewell_quimby_and_fry%2C_welcome_richard</guid>
<comments>https://www.joachim-breitner.de/blog/800-Farewell_quimby_and_fry%2C_welcome_richard#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><p>For a little more than two decades, I have been running one or two dedicated servers for a fair number of services. At some time or the other, it was hosting</p>
<ul>
<li>A web server for my personal website</li>
<li>A web server for various private and professional webpages for friends and family</li>
<li>An email server with IMAP, SMTP, Spam filtering, for me and family</li>
<li>A mailing list server for various free software project</li>
<li>A DNS server</li>
<li>A Half-Life and Counter Strike server, with a statistics web page</li>
<li>The web page for <a href="https://www.netz-zwerge.de/">my Counter Strike clan</a>, running on a custom Perl-and-Mysql based CMS</li>
<li>The web page for my high school class, running on the same system (this was before everyone used, or had used, Facebook, and even supported tagging people on images…)</li>
<li>A <a href="https://zpub.de/de/">Docbook-and-SVN-based documentation management system</a> that my brother and I built and even sold to a few customers.</li>
<li>A custom SVN-, Perl and Template-Toolkit based static site generating CMS, before that was cool, also with one or two actual customers.</li>
<li>A SVN- and later Git based <a href="https://mitschriebwiki.nomeata.de/">site for collaborative editing of math and computer science lecture notes</a>; back then there was no Overleaf</li>
<li>A Jabber server</li>
<li>The backend for an <a href="https://sumserum.nomeata.de/">online adaption of the board game “Sim Serim”</a> which got <a href="https://www.joachim-breitner.de/blog/673-Sim_Serim_geschenkt_bekommen">the author to gift</a> me the real thing</li>
<li>An SVN server</li>
<li>A darcs server</li>
<li>A git server</li>
<li>A tool to track darcs patches that were submitted by mailing lists</li>
<li>A blog aggregator (a “planet”) for friends, and one for Ghana’s Free Software community</li>
<li>An Owncloud instance for family</li>
<li>Virtual machines maintained by friends, to share resources more cheaply</li>
<li>A OpenVPN and later tinc based VPN for my machines</li>
<li>Jobs that RSS feeds to IMAP (using <a href="https://github.com/feed2imap/feed2imap">feed2imap</a> and later <a href="https://github.com/Necoro/feed2imap-go">feed2imap-go</a>)</li>
<li>Jobs that send email greetings to a friend of mine that I have collected from his wedding guests, and are now delivered at decreasing rate over the next decades.</li>
<li>Online questionnaires to gather data for a <a href="https://www.joachim-breitner.de/blog/492-401_page_family_book_published">genealogy project</a></li>
<li>Jobs that send an email with a daily summary of family events from that date.</li>
<li>A Django app to organize a larger family gathering</li>
<li>Multiple Mediawiki instances</li>
<li>A <a href="https://freenetproject.org/">freenet node</a> and a <a href="https://www.torproject.org/">tor node</a></li>
<li>Code that demonstrated the <a href="https://www.joachim-breitner.de/blog/56-Like_XSS,_just_simpler_and_harder_to_prevent__The_Cross_Site_Auth_(XSA)_Attack">Cross-site authentication attack</a></li>
<li>… and probably more stuff that I don’t remember anymore</li>
</ul>
<h3 id="its-not-you-its-me">It’s not you, it’s me</h3>
<p>Running this kind of service on my own was great fun and a huge learning experience, and in some cases at that time the only way to have a certain feature. But in recent years my interests shifted a bit, more into Programming Languages Theory (and practice) than Devops, and I was no longer paying attention as much as these services require. This gave me a bad conscience, especially in terms of security updates.</p>
<p>Especially running your own email server, while still possible, isn’t fire-and-forget: One has to stay on top of spam protection measures, both on the receiving end (spamassassin etc.) as well as when sending email (DKIM, configuring mailing lists to rewrite sender etc.)</p>
<p>Also some family members were commercially relying on these servers and services working, which was no longer tenable.</p>
<h3 id="weaning-off">Weaning off</h3>
<p>Therefore, more than a year ago, I decided to wind this down. Turns out that getting rid of responsibilities takes at least as long as taking them on, especially if your “customers” are content and a move to something else is but an annoyance. But last weekend I was finally able to turn the machines, called quimby and fry, off.</p>
<p>Many of the services above were already no longer active when I started the weaning off (Jabber, Freenet, Tor). So what happened to the rest?</p>
<ul>
<li>For emails, we all moved to <a href="https://mailbox.org" class="uri">https://mailbox.org</a>. Happy to pay for such a crucial service.</li>
<li>For the mailing lists, especially for the <a href="https://tttool.entropia.de/">Tip-Toi-Hacking project</a> , <a href="https://jpberlin.de/" class="uri">https://jpberlin.de/</a> has decent enough rates that I don’t feel bad for paying for it.</li>
<li>Managing DNS is made super easy by <a href="https://github.com/StackExchange/dnscontrol">dnscontrol</a>; I’m using Hetzner’s DNS servers, but thanks to that tool that does not matter a lot</li>
<li>For those websites that are static enough, Github pages is nice. This includes the lecture notes site.</li>
<li>For those websites that need a little bit of server-side logic, e.g. for complex redirects and normalization procedures, or access control, but no state, I found that netlify introduced their <a href="https://docs.netlify.com/edge-functions/overview/">Edge functions</a> feature just in time. This was crucial for my main website.</li>
<li>I got rid of the backend of the two-player game <a href="https://sumserum.nomeata.de/">Sum Serum</a> completely, by using <a href="https://www.joachim-breitner.de/blog/797-Serverless_WebRTC_instead_of_websocket">WebRTC</a>. In terms of hosting, it is now just a boring static website, which is the best kind of website.</li>
<li>I converted all SVN and Darcs repositories to git, and pushed them to GitHub.</li>
<li>Although I was mildly proud to have kept the websites of my high school class and Counter Strike clan live and functional for many years after anyone stopped caring about them, I decided it is silly to keep doing that. I briefly thought about entombing that running system in a locked down virtual machine or so, but in the end simply used <code>wget</code> to create a static mirror of them, which is now hosted on netlify (using Edge Functions to restrict public access to a few pages).</li>
</ul>
<p>In the end, I was not able to get rid of all services, so there is still a server running for me (smaller now, and virtual):</p>
<ul>
<li>My photo album, <a href="https://bilder.joachim-breitner.de/" class="uri">https://bilder.joachim-breitner.de/</a>, which is a bit too big for something like netlify.</li>
<li>Some dumb static webspace used by a family member’s business for internal reasons, which likewise is a bit too large for something like netlify or github actions, and not as critical as other sites.</li>
<li>The feed2imap jobs</li>
<li>For one Mediawiki instanced used for a genealogy project of a relative of mine I could not find a suitable hosted alternative. However, it is HTTP-AUTH-password-protected, so I am a bit less worried about staying on top of security updates for this PHP-drive site.</li>
<li>Also, I am still running the daily email job from the genealogy project, but now through <code>nullmailer</code> and <a href="https://aws.amazon.com/de/sns/">Amazon SNS</a>, to worry a bit less about the intricacies of modern email.</li>
</ul>
<h3 id="debian-nix">Debian → Nix</h3>
<p>I took this opportunity to set up a new server for the residual services. I have been using Debian since 2001 and was a Debian Developer from 2003 to 2022, and it is a great operating system and a great project.</p>
<p>But after learning about <a href="https://nixos.org/">Nix</a> at <a href="https://dfinity.org/">DFINITY</a>, and using NixOS on my laptop for almost two years, managing a system in a non-declarative way simply feels … wrong. Similar to programming in a non-functional programming language.</p>
<p>Hence I have shut down my Debian-based systems (two virtual machines called <code>quimby</code> and <code>fry</code> and the surrounding Dom0 <code>freddy</code> – my machines are named after Simpsons side kicks and (sometimes) Futurama characters), and am using the NixOS-based aarch64 host <code>richard</code> (because of the [“Nix” in the name]) instead.</p></description>
<pubDate>Mon, 17 Apr 2023 08:33:24 +0200</pubDate>
</item>
</channel>
</rss>
If you would like to create a banner that links to this page (i.e. this validation result), do the following:
Download the "valid RSS" banner.
Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)
Add this HTML to your page (change the image src
attribute if necessary):
If you would like to create a text link instead, here is the URL you can use: