FEED Validator

for Atom and RSS and KML

Congratulations!

This is a valid RSS feed.

Recommendations

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

line 2, column 0: Use of unknown namespace: http://joachim-breitner.de/2004/Website/Bibliography [help]
```
<rss xmlns:bib="http://joachim-breitner.de/2004/Website/Bibliography"
```

line 10, column 71: Self reference doesn't match document location [help]

                 href="https://www.joachim-breitner.de/blog_feed.rss"/>
                                                                       ^

line 169, column 346: description should not contain relative URL references: /blog/813-Blogging_on_Lean (275 occurrences) [help]

... that’s called path dependence.&lt;/p&gt;</description>
                                             ^

line 182, column 0: description should not contain aria-hidden attribute (270 occurrences) [help]
```
&lt;figcaption aria-hidden="true"&gt;Passively lit coding&lt;/figcaption&gt;
```
line 356, column 0: Non-html tag: summary [help]
```
&lt;summary&gt;
```

Source: http://www.joachim-breitner.de/blog/feeds/categories/1-English.rss

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:bib="http://joachim-breitner.de/2004/Website/Bibliography"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:atom="http://www.w3.org/2005/Atom"
version="2.0">
<channel>
<title>nomeata’s mind shares</title>
<link>https://www.joachim-breitner.de/blog</link>
<atom:link rel="self" type="application/rss+xml"
href="https://www.joachim-breitner.de/blog_feed.rss"/>
<description>Joachim Breitners Denkblogade</description>
<image>
<url>https://joachim-breitner.de/avatars/avatar_128.png</url>
<title>nomeata’s mind shares</title>
<link>https://www.joachim-breitner.de/blog</link>
<width>128</width>
<height>128</height>
</image>
<item>
<title>Extrinsic termination proofs for well-founded recursion in Lean</title>
<link>https://www.joachim-breitner.de/blog/816-Extrinsic_termination_proofs_for_well-founded_recursion_in_Lean</link>
<guid>https://www.joachim-breitner.de/blog/816-Extrinsic_termination_proofs_for_well-founded_recursion_in_Lean</guid>
<comments>https://www.joachim-breitner.de/blog/816-Extrinsic_termination_proofs_for_well-founded_recursion_in_Lean#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description>A few months ago <a href="/blog/813-Blogging_on_Lean">I explained</a> that one reason why this blog has become more quiet is that all my work on Lean is covered elsewhere.
This post is an exception, because it is an observation that is (arguably) interesting, but does not lead anywhere, so where else to put it than my own blog…
Want to share your thoughts about this? Please <a href="https://leanprover.zulipchat.com/#narrow/channel/270676-lean4/topic/Extrinsic.20termination.20proofs.20for.20well-founded.20recursion">join the discussion on the Lean community zulip</a>!
<h3 id="background">Background</h3>
When defining a function recursively in Lean that has nested recursion, e.g. a recusive call that is in the argument to a higher-order function like <code>List.map</code>, then extra attention used to be necessary so that Lean can see that <code>xs.map</code> applies its argument only elements of the list <code>xs</code>. The usual idiom is to write <code>xs.attach.map</code> instead, where <a href="https://leanprover-community.github.io/mathlib4_docs/Init/Data/List/Attach.html#List.attach"><code>List.attach</code></a> attaches to the list elements a proof that they are in that list. You can read more about this my <a href="https://lean-lang.org/blog/2024-1-11-recursive-definitions-in-lean/">Lean blog post on recursive definitions</a> and our <a href="https://lean-lang.org/doc/reference/latest/Definitions/Recursive-Definitions/#recursive-definitions">new shiny reference manual</a>, look for Example “Nested Recursion in Higher-order Functions”.
To make this step less tedious I taught Lean to automatically rewrite <code>xs.map</code> to <code>xs.attach.map</code> (where suitable) within the construction of well-founded recursion, so that nested recursion just works (<a href="https://github.com/leanprover/lean4/issues/5471">issue #5471</a>). We already do such a rewriting to change <code>if c then … else …</code> to the dependent <code>if h : c then … else …</code>, but the attach-introduction is much more ambitious (the rewrites are not definitionally equal, there are higher-order arguments etc.) Rewriting the terms in a way that we can still prove the connection later when creating the equational lemmas is hairy at best. Also, we want the whole machinery to be extensible by the user, setting up their own higher order functions to add more facts to the context of the termination proof.
I implemented it like this (<a href="https://github.com/leanprover/lean4/pull/6744">PR #6744</a>) and it ships with 4.18.0, but in the course of this work I thought about a quite different and maybe better™ way to do this, and well-founded recursion in general:
<h3 id="a-simpler-fix">A simpler <code>fix</code></h3>
Recall that to use <a href="https://leanprover-community.github.io/mathlib4_docs/Init/WF.html#WellFounded.fix">WellFounded.fix</a>
<pre class="lean"><code>WellFounded.fix : (hwf : WellFounded r) (F : (x : α) → ((y : α) → r y x → C y) → C x) (x : α) : C x</code></pre>
we have to rewrite the functorial of the recursive function, which naturally has type
<pre class="lean"><code>F : ((y : α) → C y) → ((x : α) → C x)</code></pre>
to the one above, where all recursive calls take the termination proof <code>r y x</code>. This is a fairly hairy operation, mangling the type of matcher’s motives and whatnot.
Things are simpler for recursive definitions using <a href="https://lean-lang.org/doc/reference/latest/Recursive-Definitions/Partial-Fixpoint-Recursion/#partial-fixpoint">the new <code>partial_fixpoint</code> machinery</a>, where we use <a href="https://leanprover-community.github.io/mathlib4_docs/Init/Internal/Order/Basic.html#Lean.Order.fix">Lean.Order.fix</a>
<pre class="lean"><code>Lean.Order.fix : [CCPO α] (F : β → β) (hmono : monotone F) : β</code></pre>
so the functorial’s type is unmodified (here <code>β</code> will be <code>((x : α) → C x)</code>), and everything else is in the propositional side-condition <code>montone F</code>. For this predicate we have a syntax-guided compositional tactic, and it’s easily extensible, e.g. by
<pre class="lean"><code>theorem monotone_mapM (f : γ → α → m β) (xs : List α) (hmono : monotone f) :
monotone (fun x => xs.mapM (f x)) </code></pre>
Once given, we don’t care about the content of that proof. In particular proving the unfolding theorem only deals with the unmodified <code>F</code> that closely matches the function definition as written by the user. Much simpler!
<h3 id="isabelle-has-it-easier">Isabelle has it easier</h3>
<a href="https://isabelle.in.tum.de/">Isabelle</a> also supports well-founded recursion, and has great support for nested recursion. And it’s much simpler!
There, all you have to do to make nested recursion work is to define a congruence lemma of the form, for <code>List.map</code> something like our <a href="https://leanprover-community.github.io/mathlib4_docs/Init/Data/List/Lemmas.html#List.map_congr_left">List.map_congr_left</a>
<pre class="lean"><code>List.map_congr_left : (h : ∀ a ∈ l, f a = g a) :
List.map f l = List.map g l</code></pre>
This is because in Isabelle, too, the termination proofs is a side-condition that essentially states “the functorial <code>F</code> calls its argument <code>f</code> only on smaller arguments”.
<h3 id="can-we-have-it-easy-too">Can we have it easy, too?</h3>
I had wished we could do the same in Lean for a while, but that form of congruence lemma just isn’t strong enough for us.
But maybe there is a way to do it, using an existential to give a witness that <code>F</code> can alternatively implemented using the more restrictive argument. The following <code>callsOn P F</code> predicate can express that <code>F</code> calls its higher-order argument only on arguments that satisfy the predicate <code>P</code>:
<pre class="lean"><code>section setup
variable {α : Sort u}
variable {β : α → Sort v}
variable {γ : Sort w}
def callsOn (P : α → Prop) (F : (∀ y, β y) → γ) :=
∃ (F': (∀ y, P y → β y) → γ), ∀ f, F' (fun y _ => f y) = F f
variable (R : α → α → Prop)
variable (F : (∀ y, β y) → (∀ x, β x))
local infix:50 " ≺ " => R
def recursesVia : Prop := ∀ x, callsOn (· ≺ x) (fun f => F f x)
noncomputable def fix (wf : WellFounded R) (h : recursesVia R F) : (∀ x, β x) :=
wf.fix (fun x => (h x).choose)
def fix_eq (wf : WellFounded R) h x :
fix R F wf h x = F (fix R F wf h) x := by
unfold fix
rw [wf.fix_eq]
apply (h x).choose_spec</code></pre>
This allows nice compositional lemmas to discharge <code>callsOn</code> predicates:
<pre class="lean"><code>theorem callsOn_base (y : α) (hy : P y) :
callsOn P (fun (f : ∀ x, β x) => f y) := by
exists fun f => f y hy
intros; rfl
@[simp]
theorem callsOn_const (x : γ) :
callsOn P (fun (_ : ∀ x, β x) => x) :=
⟨fun _ => x, fun _ => rfl⟩
theorem callsOn_app
{γ₁ : Sort uu} {γ₂ : Sort ww}
(F₁ : (∀ y, β y) → γ₂ → γ₁) -- can this also support dependent types?
(F₂ : (∀ y, β y) → γ₂)
(h₁ : callsOn P F₁)
(h₂ : callsOn P F₂) :
callsOn P (fun f => F₁ f (F₂ f)) := by
obtain ⟨F₁', h₁⟩ := h₁
obtain ⟨F₂', h₂⟩ := h₂
exists (fun f => F₁' f (F₂' f))
intros; simp_all
theorem callsOn_lam
{γ₁ : Sort uu}
(F : γ₁ → (∀ y, β y) → γ) -- can this also support dependent types?
(h : ∀ x, callsOn P (F x)) :
callsOn P (fun f x => F x f) := by
exists (fun f x => (h x).choose f)
intro f
ext x
apply (h x).choose_spec
theorem callsOn_app2
{γ₁ : Sort uu} {γ₂ : Sort ww}
(g : γ₁ → γ₂ → γ)
(F₁ : (∀ y, β y) → γ₁) -- can this also support dependent types?
(F₂ : (∀ y, β y) → γ₂)
(h₁ : callsOn P F₁)
(h₂ : callsOn P F₂) :
callsOn P (fun f => g (F₁ f) (F₂ f)) := by
apply_rules [callsOn_app, callsOn_const]</code></pre>
With this setup, we can have the following, possibly user-defined, lemma expressing that <code>List.map</code> calls its arguments only on elements of the list:
<pre class="lean"><code>theorem callsOn_map (δ : Type uu) (γ : Type ww)
(P : α → Prop) (F : (∀ y, β y) → δ → γ) (xs : List δ)
(h : ∀ x, x ∈ xs → callsOn P (fun f => F f x)) :
callsOn P (fun f => xs.map (fun x => F f x)) := by
suffices callsOn P (fun f => xs.attach.map (fun ⟨x, h⟩ => F f x)) by
simpa
apply callsOn_app
· apply callsOn_app
· apply callsOn_const
· apply callsOn_lam
intro ⟨x', hx'⟩
dsimp
exact (h x' hx')
· apply callsOn_const
end setup</code></pre>
So here is the (manual) construction of a nested <code>map</code> for trees:
<pre class="lean"><code>section examples
structure Tree (α : Type u) where
val : α
cs : List (Tree α)
-- essentially
-- def Tree.map (f : α → β) : Tree α → Tree β :=
-- fun t => ⟨f t.val, t.cs.map Tree.map⟩)
noncomputable def Tree.map (f : α → β) : Tree α → Tree β :=
fix (sizeOf · &lt; sizeOf ·) (fun map t => ⟨f t.val, t.cs.map map⟩)
(InvImage.wf (sizeOf ·) WellFoundedRelation.wf) &lt;| by
intro ⟨v, cs⟩
dsimp only
apply callsOn_app2
· apply callsOn_const
· apply callsOn_map
intro t' ht'
apply callsOn_base
-- ht' : t' ∈ cs -- !
-- ⊢ sizeOf t' &lt; sizeOf { val := v, cs := cs }
decreasing_trivial
end examples</code></pre>
This makes me happy!
All details of the construction are now contained in a proof that can proceed by a syntax-driven tactic and that’s easily and (likely robustly) extensible by the user. It also means that we can share a lot of code paths (e.g. everything related to equational theorems) between well-founded recursion and <code>partial_fixpoint</code>.
I wonder if this construction is really as powerful as our current one, or if there are certain (likely dependently typed) functions where this doesn’t fit, but the <code>β</code> above is dependent, so it looks good.
With this construction, functions defined by well-founded recursion will reduce even worse in the kernel, I assume. <a href="https://github.com/leanprover/lean4/issues/5192">This may be a good thing</a>.
<h3 id="the-cake-is-a-lie">The cake is a lie</h3>
What unfortunately kills this idea, though, is the generation of the <a href="https://lean-lang.org/blog/2024-5-17-functional-induction/">functional induction principles</a>, which I believe is not (easily) possible with this construction: The functional induction principle is proved by massaging <code>F</code> to return a proof, but since the extra assumptions (e.g. for <code>ite</code> or <code>List.map</code>) only exist in the termination proof, they are not available in <code>F</code>.
Oh wey, how anticlimactic.
<h3 id="ps-path-dependencies">PS: Path dependencies</h3>
Curiously, if we didn’t have functional induction at this point yet, then very likely I’d change Lean to use this construction, and then we’d either not get functional induction, or it would be implemented very differently, maybe a more syntactic approach that would re-prove termination. I guess that’s called path dependence.</description>
<pubDate>Mon, 10 Mar 2025 18:47:59 +0100</pubDate>
</item>
<item>
<title>Coding on my eInk Tablet</title>
<link>https://www.joachim-breitner.de/blog/815-Coding_on_my_eInk_Tablet</link>
<guid>https://www.joachim-breitner.de/blog/815-Coding_on_my_eInk_Tablet</guid>
<comments>https://www.joachim-breitner.de/blog/815-Coding_on_my_eInk_Tablet#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description>For many years I wished I had a setup that would allow me to work (that is, code) productively outside in the bright sun. It’s winter right now, but when its summer again it’s always a bit. this weekend I got closer to that goal.
TL;DR: Using <a href="https://github.com/coder/code-server"><code>code-server</code></a> on a beefy machine seems to be quite neat.
<figure>
<img src="//www.joachim-breitner.de/various/code-server-tablet.jpg" alt="Passively lit coding"/>
<figcaption aria-hidden="true">Passively lit coding</figcaption>
</figure>
<h3 id="personal-history">Personal history</h3>
Looking back at my own old blog entries I find one from 10 years ago describing how I bought a <a href="https://www.joachim-breitner.de/blog/660-Using_my_Kobo_eBook_reader_as_an_external_eInk_monitor">Kobo eBook reader</a> with the intent of using it as an external monitor for my laptop. It seems that I got a proof-of-concept setup working, using VNC, but it was tedious to set up, and I never actually used that. I subsequently noticed that the eBook reader is rather useful to read eBooks, and it has been in heavy use for that every since.
Four years ago I gave this old idea another shot and bought an <a href="https://onyxboox.com/boox_maxlumi">Onyx BOOX Max Lumi</a>. This is an A4-sized tablet running Android and had the very promising feature of an HDMI input. So hopefully I’d attach it to my laptop and it just works™. Turns out that this never worked as well as I hoped: Even if I set the resolution to exactly the tablet’s screen’s resolution I got blurry output, and it also drained the battery a lot, so I gave up on this. I subsequently noticed that the tablet is rather useful to take notes, and it has been in sporadic use for that.
Going off on this tangent: I later learned that the HDMI input of this device appears to the system like a camera input, and I don’t have to use Boox’s “monitor” app but could other apps like <a href="https://f-droid.org/de/packages/troop.com.freedcam/">FreeDCam</a> as well. This somehow managed to fix the resolution issues, but the setup still wasn’t as convenient to be used regularly.
I also played around with pure terminal approaches, e.g. SSH’ing into a system, but since my usual workflow was never purely text-based (I was at least used to using a window manager instead of a terminal multiplexer like <a href="https://www.gnu.org/software/screen/"><code>screen</code></a> or <a href="https://github.com/tmux/tmux/wiki"><code>tmux</code></a>) that never led anywhere either.
<h3 id="vscode-working-remotely">VSCode, working remotely</h3>
Since these attempts I have started a <a href="https://www.joachim-breitner.de/blog/809-Joining_the_Lean_FRO">new job working on the Lean theorem prover</a>, and working on or with Lean basically means using VSCode. (There is a <a href="https://github.com/Julian/lean.nvim/">very good neovim plugin as well</a>, but I’m using VSCode nevertheless, if only to make sure I am dogfooding our default user experience).
My colleagues have said good things about using VSCode with the remote SSH extension to work on a beefy machine, so I gave this a try now as well, and while it’s not a complete game changer for me, it does make certain tasks (rebuilding everything after a switching branches, running the test suite) very convenient. And it’s a bit spooky to run these work loads without the laptop’s fan spinning up.
In this setup, the workspace is remote, but VSCode still runs locally. But it made me wonder about my old goal of being able to work reasonably efficient on my eInk tablet. Can I replicate this setup there?
VSCode itself doesn’t run on Android directly. There are project that run a Linux chroot or in termux on the Android system, and then you can VNC to connect to it (e.g. on <a href="https://andronix-app.gitbook.io/andronix-app/software/ides/vs-code">Andronix</a>)… but that did not seem promising. It seemed fiddly, and I probably should take it easy on the tablet’s system.
<h3 id="code-server-running-remotely">code-server, running remotely</h3>
A more promising option is <a href="https://github.com/coder/code-server"><code>code-server</code></a>. This is a fork of VSCode (actually of VSCodium) that runs completely on the remote machine, and the client machine just needs a browser. I set that up this weekend and found that I was able to do a little bit of work reasonably.
<h4 id="access">Access</h4>
With <code>code-server</code> one has to decide <a href="https://coder.com/docs/code-server/guide">how to expose it safely enough</a>. I decided against the tunnel-over-SSH option, as I expected that to be somewhat tedious to set up (both initially and for each session) on the android system, and I liked the idea of being able to use any device to work in my environment.
I also decided against the more involved “reverse proxy behind proper hostname with SSL” setups, because they involve a few extra steps, and some of them I cannot do as I do not have root access on the shared beefy machine I wanted to use.
That left me with the option of using a <code>code-server’</code>s built-in support for self-signed certificates and a password:
<pre><code>$ cat .config/code-server/config.yaml
bind-addr: 1.2.3.4:8080
auth: password
password: xxxxxxxxxxxxxxxxxxxxxxxx
cert: true</code></pre>
With trust-on-first-use this seems reasonably secure.
Update: I noticed that the browsers would forget that I trust this self-signed cert after restarting the browser, and also that I cannot “install” the page (as a <a href="https://de.wikipedia.org/wiki/Progressive_Web_App">Progressive Web App</a>) unless it has a valid certificate. But since I don’t have superuser access to that machine, I can’t just follow the official recommendation of using a reverse proxy on port 80 or 431 with automatic certificates. Instead, I pointed a hostname that I control to that machine, obtained a certificate manually on my laptop (using <code>acme.sh</code>) and copied the files over, so the configuration now reads as follows:
<pre><code>bind-addr: 1.2.3.4:3933
auth: password
password: xxxxxxxxxxxxxxxxxxxxxxxx
cert: .acme.sh/foobar.nomeata.de_ecc/foobar.nomeata.de.cer
cert-key: .acme.sh/foobar.nomeata.de_ecc/foobar.nomeata.de.key</code></pre>
(This is getting very specific to my particular needs and constraints, so I’ll spare you the details.)
<h4 id="service">Service</h4>
To keep <code>code-server</code> running I created a systemd service that’s managed by my user’s systemd instance:
<pre><code>~ $ cat ~/.config/systemd/user/code-server.service
[Unit]
Description=code-server
After=network-online.target
[Service]
Environment=PATH=/home/joachim/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
ExecStart=/nix/var/nix/profiles/default/bin/nix run nixpkgs#code-server
[Install]
WantedBy=default.target</code></pre>
(I am using <code>nix</code> as a package manager on a Debian system there, hence the additional <code>PATH</code> and complex <code>ExecStart</code>. If you have a more conventional setup then you do not have to worry about <code>Environment</code> and can likely use <code>ExecStart=code-server</code>.
For this to survive me logging out I had to ask the system administrator to run <code>loginctl enable-linger joachim</code>, so that <a href="https://unix.stackexchange.com/q/396522/20526">systemd allows my jobs to linger</a>.
<h4 id="git-credentials">Git credentials</h4>
The next issue to be solved was how to access the git repositories. The work is all on public repositories, but I still need a way to push my work. With the classic VSCode-SSH-remote setup from my laptop, this is no problem: My local SSH key is forwarded using the SSH agent, so I can seamlessly use that on the other side. But with <code>code-server</code> there is no SSH key involved.
I could create a new SSH key and store it on the server. That did not seem appealing, though, because SSH keys on Github always have full access. It wouldn’t be horrible, but I still wondered if I can do better.
I thought of creating <a href="https://github.blog/security/application-security/introducing-fine-grained-personal-access-tokens-for-github/">fine-grained personal access tokens</a> that only me to push code to specific repositories, and nothing else, and just store them permanently on the remote server. Still a neat and convenient option, but creating PATs for our org requires approval and I didn’t want to bother anyone on the weekend.
So I am experimenting with Github’s <a href="https://github.com/git-ecosystem/git-credential-manager">git-credential-manager</a> now. I have configured it to use git’s credential cache with an elevated timeout, so that once I log in, I don’t have to again for one workday.
<pre><code>$ nix-env -iA nixpkgs.git-credential-manager
$ git-credential-manager configure
$ git config --global credential.credentialStore cache
$ git config --global credential.cacheOptions "--timeout 36000"</code></pre>
To login, I have to <code>https://github.com/login/device</code> on an authenticated device (e.g. my phone) and enter a 8-character code. Not too shabby in terms of security. I only wish that webpage would not require me to press Tab after each character…
This still grants rather broad permissions to the <code>code-server</code>, but at least only temporarily
<h4 id="android-setup">Android setup</h4>
On the client side I could now open <code>https://host.example.com:8080</code> in Firefox on my eInk Android tablet, click through the warning about self-signed certificates, log in with the fixed password mentioned above, and start working!
I switched to a theme that supposedly is eInk-optimized (<a href="https://open-vsx.org/extension/Mufanza/e-ink-theme">eInk by Mufanza</a>). It’s not perfect (e.g. git diffs are unhelpful because it is not possible to distinguish deleted from added lines), but it’s a start. There are more <a href="https://marketplace.visualstudio.com/items?itemName=eddjrn.e-ink">eInk themes</a> on the official Visual Studio Marketplace, but because <code>code-server</code> is a fork it cannot use that marketplace, and for example this theme <a href="https://gitlab.com/eddjrn/vs-code-e-ink-theme/-/issues/3">isn’t on Open-VSX</a>.
For some reason the F11 key doesn’t work, but going fullscreen is crucial, because screen estate is scarce in this setup. I can go fullscreen using VSCode’s command palette (Ctrl-P) and invoking the command there, but Firefox often jumps out of the fullscreen mode, which is annoying. I still have to pay attention to when that’s happening; maybe its the Esc key, which I am of course using a lot due to me using vim bindings.
A more annoying problem was that on my Boox tablet, sometimes the on-screen keyboard would pop up, which is seriously annoying! It took me a while to track this down: The Boox has two virtual keyboards installed: The usual Google ASOP keyboard, and the Onyx Keyboard. The former is clever enough to stay hidden when there is a physical keyboard attached, but the latter isn’t. Moreover, pressing Shift-Ctrl on the physical keyboard rotates through the virtual keyboards. Now, VSCode has many keyboard shortcuts that require Shift-Ctrl (especially on an eInk device, where you really want to avoid using the mouse). And the limited settings exposed by the Boox Android system do not allow you configure that or disable the Onyx keyboard! To solve this, I had to install the KISS Launcher, which would allow me to see more Android settings, and in particular allow me to <a href="https://www.reddit.com/r/Onyx_Boox/comments/13rhcg4/comment/k95ngmj/">disable the Onyx keyboard</a>. So this is fixed.
<del>I was hoping to improve the experience even more by opening the web page as a Progressive Web App (PWA), <a href="https://coder.com/docs/code-server/FAQ#how-do-i-make-my-keyboard-shortcuts-work">as described in the <code>code-server</code> FAQ</a>. Unfortunately, that did not work. Firefox on Android did not recognize the site as a PWA (even though it recognizes a <a href="https://whatpwacando.today/">PWA test page</a>). And I couldn’t use Chrome either because (unlike Firefox) it would not consider a site with a self-signed certificate as a secure context, and then <code>code-server</code> does not <a href="https://coder.com/docs/code-server/FAQ#why-do-web-views-not-work">work fully</a>. Maybe this is just some bug that gets fixed in later versions.</del>
Now that I use a proper certificate, I can use it as a Progressive Web App, and with Firefox on Android this starts the app in full-screen mode (no system bars, no location bar). The F11 key still does’t work, and using the command palette to enter fullscreen does nothing visible, but then Esc leaves that fullscreen mode and I suddenly have the system bars again. But maybe if I just don’t do that I get the full screen experience. We’ll see.
I did not work enough with this yet to assess how much the smaller screen estate, the lack of colors and the slower refresh rate will bother me. I probably need to hide Lean’s InfoView more often, and maybe use the <a href="https://marketplace.visualstudio.com/items?itemName=usernamehw.errorlens">Error Lens extension</a>, to avoid having to split my screen vertically.
I also cannot easily work on a park bench this way, with a tablet and a separate external keyboard. I’d need at least a table, or some additional piece of hardware that turns tablet + keyboard into some laptop-like structure that I can put on my, well, lap. There are <a href="https://onyxboox.com/accessory#g-onyx-boox-cases">cases for Onyx products</a> that include a keyboard, and maybe they work on the lap, but they don’t have the Trackpoint that I have on my <a href="https://support.lenovo.com/de/en/accessories/acc500164-thinkpad-trackpoint-keyboard-ii-overview-and-service-parts">ThinkPad TrackPoint Keyboard II</a>, and how can you live without that?
<h4 id="conclusion">Conclusion</h4>
After this initial setup chances are good that entering and using this environment is convenient enough for me to actually use it; we will see when it gets warmer.
A few bits could be better. In particular logging in and authenticating GitHub access could be both more convenient and more safe – I could imagine that when I open the page I confirm that on my phone (maybe with a fingerprint), and that temporarily grants access to the <code>code-server</code> and to specific GitHub repositories only. Is that easily possible?</description>
<pubDate>Sun, 02 Feb 2025 16:07:35 +0100</pubDate>
</item>
<item>
<title>Do surprises get larger?</title>
<link>https://www.joachim-breitner.de/blog/814-Do_surprises_get_larger_</link>
<guid>https://www.joachim-breitner.de/blog/814-Do_surprises_get_larger_</guid>
<comments>https://www.joachim-breitner.de/blog/814-Do_surprises_get_larger_#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description><h3 id="the-setup">The setup</h3>
Imagine you are living on a riverbank. Every now and then, the river swells and you have high water. The first few times this may come as a surprise, but soon you learn that such floods are a recurring occurrence at that river, and you make suitable preparation. Let’s say you feel well-prepared against any flood that is no higher than the highest one observed so far. The more floods you have seen, the higher that mark is, and the better prepared you are. But of course, eventually a higher flood will occur that surprises you.
Of course such new record floods are happening rarer and rarer as you have seen more of them. I was wondering though: By how much do the new records exceed the previous high mark? Does this excess decrease or increase over time?
A priori both could be. When the high mark is already rather high, maybe new record floods will just barley pass that mark? Or maybe, simply because new records are so rare events, when they do occur, they can be surprisingly bad?
This post is a leisurely mathematical investigating of this question, which of course isn’t restricted to high waters; it could be anything that produces a measurement repeatedly and (mostly) independently – weather events, sport results, dice rolls.
The answer of course depends on the distribution of results: How likely is each possible results.
<h3 id="dice-are-simple">Dice are simple</h3>
With dice rolls the answer is rather simple. Let our measurement be how often you can roll a die until it shows a 6. This simple game we can repeat many times, and keep track of our record. Let’s say the record happens to be 7 rolls. If in the next run we roll the die 7 times, and it still does not show a 6, then we know that we have broken the record, and every further roll increases by how much we beat the old record.
But note that how often we will now roll the die is completely independent of what happened before!
So for this game the answer is: The excess with which the record is broken is always the same.
Mathematically speaking this is because the distribution of “rolls until the die shows a 6” is <a href="https://en.wikipedia.org/wiki/Memorylessness">memoryless</a>. Such distributions are rather special, its essentially just the example we gave (a <a href="https://en.wikipedia.org/wiki/Geometric_distribution">geometric distribution</a>), or its continuous analogue (the <a href="https://en.wikipedia.org/wiki/Exponential_distribution">exponential distributions</a>, for example the time until a radioactive particle decays).
<h3 id="mathematical-formulation">Mathematical formulation</h3>
With this out of the way, let us look at some other distributions, and for that, introduce some mathematical notations. Let X be a random variable with probability density function φ(x) and cumulative distribution function Φ(x), and a be the previous record. We are interested in the behavior of
Y(a) = X − a ∣ X > x
i.e. by how much X exceeds a under the condition that it did exceed a. How does Y change as a increases? In particular, how does the expected value of the excess e(a) = E(Y(a)) change?
<h3 id="uniform-distribution">Uniform distribution</h3>
If X is uniformly distributed between, say, 0 and 1, then a new record will appear uniformly distributed between a and 1, and as that range gets smaller, the excess must get smaller as well. More precisely,
e(a) = E(X − a ∣ X > a) = E(X ∣ X > a) − a = (1 − a)/2
This not very interesting linear line is plotted in blue in this diagram:
<figure>
<img src="//www.joachim-breitner.de/various/surprises-plot1.svg" alt="The expected record surpass for the uniform distribution"/>
<figcaption aria-hidden="true">The expected record surpass for the uniform distribution</figcaption>
</figure>
The orange line with the logarithmic scale on the right tries to convey how unlikely it is to surpass the record value a: it shows how many attempts we expect before the record is broken. This can be calculated by n(a) = 1/(1 − Φ(a)).
<h3 id="normal-distribution">Normal distribution</h3>
For the normal distribution (with median 0 and standard derivation 1, to keep things simple), we can look up the expected value of the <a href="https://en.wikipedia.org/wiki/Truncated_normal_distribution#One_sided_truncation_(of_lower_tail)%5B6%5D">one-sided truncated normal distribution</a> and obtain
e(a) = E(X ∣ X > a) − a = φ(a)/(1 − Φ(a)) − a
Now is this growing or shrinking? We can plot this an have a quick look:
<figure>
<img src="//www.joachim-breitner.de/various/surprises-plot2.svg" alt="The expected record surpass for the normal distribution"/>
<figcaption aria-hidden="true">The expected record surpass for the normal distribution</figcaption>
</figure>
Indeed it is, too, a decreasing function!
(As a sanity check we can see that e(0) = √(2/π), which is the expected value of the <a href="https://en.wikipedia.org/wiki/Half-normal_distribution">half-normal distribution</a>, as it should.)
<h3 id="could-it-be-any-different">Could it be any different?</h3>
This settles my question: It seems that each new surprisingly high water will tend to be less surprising than the previously – assuming high waters were uniformly or normally distributed, which is unlikely to be helpful.
This does raise the question, though, if there are probability distributions for which e(a) is be increasing?
I can try to construct one, and because it’s a bit easier, I’ll consider a discrete distribution on the positive natural numbers, and consider at g(0) = E(X) and g(1) = E(X − 1 ∣ X > 1). What does it take for g(1) > g(0)? Using E(X) = p + (1 − p)E(X ∣ X > 1) for p = P(X = 1) we find that in order to have g(1) > g(0), we need E(X) > 1/p.
This is plausible because we get equality when E(X) = 1/p, as it precisely the case for the geometric distribution. And it is also plausible that it helps if p is large (so that the next first record is likely just 1) and if, nevertheless, E(X) is large (so that if we do get an outcome other than 1, it’s much larger).
Starting with the geometric distribution, where P(X > n ∣ X ≥ n) = pn = p (the probability of again not rolling a six) is constant, it seems that these pn is increasing, we get the desired behavior. So let p1 &lt; p2 &lt; pn &lt; … be an increasing sequence of probabilities, and define X so that P(X = n) = p1 ⋅ ⋯ ⋅ pn − 1 ⋅ (1 − pn) (imagine the die wears off and the more often you roll it, the less likely it shows a 6). Then for this variation of the game, every new record tends to exceed the previous more than previous records. As the p increase, we get a flatter long end in the probability distribution.
<h3 id="gamma-distribution">Gamma distribution</h3>
To get a nice plot, I’ll take the intuition from this and turn to continuous distributions. The Wikipedia page for the exponential distribution says it is a special case of the <a href="https://en.m.wikipedia.org/wiki/Gamma_distribution">gamma distribution</a>, which has an additional shape parameter α, and it seems that it could influence the shape of the distribution to be and make the probability distribution have a longer end. Let’s play around with β = 2 and α = 0.5, 1 and 1.5:
<figure>
<img src="//www.joachim-breitner.de/various/surprises-plot3.svg" alt="The expected record surpass for the gamma distribution"/>
<figcaption aria-hidden="true">The expected record surpass for the gamma distribution</figcaption>
</figure>
<ul>
<li>For α = 1 (dotted) this should just be the exponential distribution, and we see that e(a) is flat, as predicted earlier.</li>
<li>For larger α (dashed) the graph does not look much different from the one for the normal distribution – not a surprise, as for α → ∞, the gamma distribution turns into the normal distribution.</li>
<li>For smaller α (solid) we get the desired effect: e(a) is increasing. This means that new records tend to break records more impressively.</li>
</ul>
The orange line shows that this comes at a cost: for a given old record a, new records are harder to come by with smaller α.
<h3 id="conclusion">Conclusion</h3>
As usual, it all depends on the distribution. Otherwise, not much, it’s late.</description>
<pubDate>Sun, 30 Jun 2024 15:28:31 +0200</pubDate>
</item>
<item>
<title>Blogging on Lean</title>
<link>https://www.joachim-breitner.de/blog/813-Blogging_on_Lean</link>
<guid>https://www.joachim-breitner.de/blog/813-Blogging_on_Lean</guid>
<comments>https://www.joachim-breitner.de/blog/813-Blogging_on_Lean#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description>This blog has become a bit quiet <a href="https://www.joachim-breitner.de/blog/809-Joining_the_Lean_FRO">since I joined the Lean FRO</a>. One reasons is of course that I can now improve things about Lean, rather than blog about what I think should be done (which, by contraposition, means I shouldn’t blog about what can be improved…). A better reason is that some of the things I’d otherwise write here are now published on the <a href="https://lean-lang.org/blog">official Lean blog</a>, in particular two lengthy technical posts explaining aspects of Lean that I worked on:
<ul>
<li><a href="https://lean-lang.org/blog/2024-1-11-recursive-definitions-in-lean">Recursive definitions in Lean</a></li>
<li><a href="https://lean-lang.org/blog/2024-5-17-functional-induction">Functional Induction theorems</a></li>
</ul>
It would not be useful to re-publish them here because the technology <a href="https://github.com/leanprover/verso"><code>verso</code></a> behind the Lean blog, created by my colleage David Thrane Christansen, enables such fancy features like type-checked code snippets, including output and lots of information on hover. So I’ll be content with just cross-linking my posts from here.</description>
<pubDate>Fri, 31 May 2024 13:47:06 +0100</pubDate>
</item>
<item>
<title>Convenient sandboxed development environment</title>
<link>https://www.joachim-breitner.de/blog/812-Convenient_sandboxed_development_environment</link>
<guid>https://www.joachim-breitner.de/blog/812-Convenient_sandboxed_development_environment</guid>
<comments>https://www.joachim-breitner.de/blog/812-Convenient_sandboxed_development_environment#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description>I like using one machine and setup for everything, from serious development work to hobby projects to managing my finances. This is very convenient, as often the lines between these are blurred. But it is also scary if I think of the large number of people who I have to trust to not want to extract all my personal data. Whenever I run a <code>cabal install</code>, or a fun VSCode extension gets updated, or anything like that, I am running code that could be malicious or buggy.
In a way it is surprising and reassuring that, as far as I can tell, this commonly does not happen. Most open source developers out there seem to be nice and well-meaning, after all.
<h3 id="convenient-or-it-wont-happen">Convenient or it won’t happen</h3>
Nevertheless I thought I should do something about this. The safest option would probably to use dedicated virtual machines for the development work, with very little interaction with my main system. But knowing me, that did not seem likely to happen, as it sounded like a fair amount of hassle. So I aimed for a viable compromise between security and convenient, and one that does not get too much in the way of my current habits.
For instance, it seems desirable to have the project files accessible from my unconstrained environment. This way, I could perform certain actions that need access to secret keys or tokens, but are (unlikely) to run code (e.g. <code>git push</code>, <code>git pull</code> from private repositories, <code>gh pr create</code>) from “the outside”, and the actual build environment can do without access to these secrets.
The user experience I thus want is a quick way to enter a “development environment” where I can do most of the things I need to do while programming (network access, running command line and GUI programs), with access to the current project, but without access to my actual <code>/home</code> directory.
I initially followed the blog post <a href="https://msucharski.eu/posts/application-isolation-nixos-containers/">“Application Isolation using NixOS Containers” by Marcin Sucharski</a> and got something working that mostly did what I wanted, but then a colleague pointed out that tools like <a href="https://github.com/netblue30/firejail"><code>firejail</code></a> can achieve roughly the same with a less “global” setup. I tried to use <code>firejail</code>, but found it to be a bit too inflexible for my particular whims, so I ended up writing a small wrapper around the lower level sandboxing tool <a href="Bubblewrap">https://github.com/containers/bubblewrap</a>.
<h3 id="selective-bubblewrapping">Selective bubblewrapping</h3>
This script, called <code>dev</code> and included below, builds a new filesystem namespace with minimal <code>/proc</code> and <code>/dev</code> directories, it’s own <code>/tmp</code> directories. It then binds-mound some directories to make the host’s NixOS system available inside the container (<code>/bin</code>, <code>/usr</code>, the nix store including domain socket, stuff for OpenGL applications). My user’s home directory is taken from <code>~/.dev-home</code> and some configuration files are bind-mounted for convenient sharing. I intentionally don’t share most of the configuration – for example, a <code>direnv enable</code> in the dev environment should not affect the main environment. The X11 socket for graphical applications and the corresponding <code>.Xauthority</code> file is made available. And finally, if I run <code>dev</code> in a project directory, this project directory is bind mounted writable, and the current working directory is preserved.
The effect is that I can type <code>dev</code> on the command line to enter “dev mode” rather conveniently. I can run development tools, including graphical ones like VSCode, and especially the latter with its extensions is part of the sandbox. To do a <code>git push</code> I either exit the development environment (Ctrl-D) or open a separate terminal. Overall, the inconvenience of switching back and forth seems worth the extra protection.
Clearly, isn’t going to hold against a determined and maybe targeted attacker (e.g. access to the X11 and the nix daemon socket can probably be used to escape easily). But I hope it will help against a compromised dev dependency that just deletes or exfiltrates data, like keys or passwords, from the usual places in <code>$HOME</code>.
<h3 id="rough-corners">Rough corners</h3>
There is more polishing that could be done.
<ul>
<li>In particular, clicking on a link inside VSCode in the container will currently open Firefox inside the container, without access to my settings and cookies etc. Ideally, links would be opened in the Firefox running outside. This is a problem that has a solution in the world of applications that are sandboxed with Flatpak, and involves a bunch of moving parts (a <a href="https://github.com/flatpak/xdg-desktop-portal">xdg-desktop-portal</a> user service, a <a href="https://github.com/flatpak/xdg-dbus-proxy">filtering dbus proxy</a>, exposing access to that proxy in the container). I experimented with that for a bit longer than I should have, but could not get it to work to satisfaction (even without a container involved, I could not get <code>xdg-desktop-portal</code> to heed my default browser settings…). For now I will live with manually copying and pasting URLs, we’ll see how long this lasts.</li>
<li>With this setup (and unlike the NixOS container setup I tried first), the same applications are installed inside and outside. It might be useful to separate the set of installed programs: There is simply no point in running <code>evolution</code> or <code>firefox</code> inside the container, and if I do not even have VSCode or <code>cabal</code> available outside, so that it’s less likely that I forget to enter <code>dev</code> before using these tools.
It shouldn’t be too hard to cargo-cult some of the NixOS Containers infrastructure to be able to have a separate system configuration that I can manage as part of my normal system configuration and make available to <code>bubblewrap</code> here.</li>
</ul>
So likely I will refine this some more over time. Or get tired of typing <code>dev</code> and going back to what I did before…
<h3 id="the-script">The script</h3>
<details>
<summary>
The <code>dev</code> script (at the time of writing)
</summary>
<div class="sourceCode" id="cb1"><pre class="sourceCode bash"><code class="sourceCode bash"><a href="#cb1-1" aria-hidden="true" tabindex="-1"/>#!/usr/bin/env bash
<a href="#cb1-2" aria-hidden="true" tabindex="-1"/>
<a href="#cb1-3" aria-hidden="true" tabindex="-1"/>extra=()
<a href="#cb1-4" aria-hidden="true" tabindex="-1"/>if [[ "$PWD" == /home/jojo/build/* ]] || [[ "$PWD" == /home/jojo/projekte/programming/* ]]
<a href="#cb1-5" aria-hidden="true" tabindex="-1"/>then
<a href="#cb1-6" aria-hidden="true" tabindex="-1"/>extra+=(--bind "$PWD" "$PWD" --chdir "$PWD")
<a href="#cb1-7" aria-hidden="true" tabindex="-1"/>fi
<a href="#cb1-8" aria-hidden="true" tabindex="-1"/>
<a href="#cb1-9" aria-hidden="true" tabindex="-1"/>if [ -n "$1" ]
<a href="#cb1-10" aria-hidden="true" tabindex="-1"/>then
<a href="#cb1-11" aria-hidden="true" tabindex="-1"/> cmd=( "$@" )
<a href="#cb1-12" aria-hidden="true" tabindex="-1"/>else
<a href="#cb1-13" aria-hidden="true" tabindex="-1"/> cmd=( bash )
<a href="#cb1-14" aria-hidden="true" tabindex="-1"/>fi
<a href="#cb1-15" aria-hidden="true" tabindex="-1"/>
<a href="#cb1-16" aria-hidden="true" tabindex="-1"/># Caveats:
<a href="#cb1-17" aria-hidden="true" tabindex="-1"/># * access to all of `/etc`
<a href="#cb1-18" aria-hidden="true" tabindex="-1"/># * access to `/nix/var/nix/daemon-socket/socket`, and is trusted user (but needed to run nix)
<a href="#cb1-19" aria-hidden="true" tabindex="-1"/># * access to X11
<a href="#cb1-20" aria-hidden="true" tabindex="-1"/>
<a href="#cb1-21" aria-hidden="true" tabindex="-1"/>exec bwrap \
<a href="#cb1-22" aria-hidden="true" tabindex="-1"/> --unshare-all \
<a href="#cb1-23" aria-hidden="true" tabindex="-1"/>\
<a href="#cb1-24" aria-hidden="true" tabindex="-1"/>`# blank slate` \
<a href="#cb1-25" aria-hidden="true" tabindex="-1"/> --share-net \
<a href="#cb1-26" aria-hidden="true" tabindex="-1"/> --proc /proc \
<a href="#cb1-27" aria-hidden="true" tabindex="-1"/> --dev /dev \
<a href="#cb1-28" aria-hidden="true" tabindex="-1"/> --tmpfs /tmp \
<a href="#cb1-29" aria-hidden="true" tabindex="-1"/> --tmpfs /run/user/1000 \
<a href="#cb1-30" aria-hidden="true" tabindex="-1"/>\
<a href="#cb1-31" aria-hidden="true" tabindex="-1"/>`# Needed for GLX applications, in paticular alacritty` \
<a href="#cb1-32" aria-hidden="true" tabindex="-1"/> --dev-bind /dev/dri /dev/dri \
<a href="#cb1-33" aria-hidden="true" tabindex="-1"/> --ro-bind /sys/dev/char /sys/dev/char \
<a href="#cb1-34" aria-hidden="true" tabindex="-1"/> --ro-bind /sys/devices/pci0000:00 /sys/devices/pci0000:00 \
<a href="#cb1-35" aria-hidden="true" tabindex="-1"/> --ro-bind /run/opengl-driver /run/opengl-driver \
<a href="#cb1-36" aria-hidden="true" tabindex="-1"/>\
<a href="#cb1-37" aria-hidden="true" tabindex="-1"/> --ro-bind /bin /bin \
<a href="#cb1-38" aria-hidden="true" tabindex="-1"/> --ro-bind /usr /usr \
<a href="#cb1-39" aria-hidden="true" tabindex="-1"/> --ro-bind /run/current-system /run/current-system \
<a href="#cb1-40" aria-hidden="true" tabindex="-1"/> --ro-bind /nix /nix \
<a href="#cb1-41" aria-hidden="true" tabindex="-1"/> --ro-bind /etc /etc \
<a href="#cb1-42" aria-hidden="true" tabindex="-1"/> --ro-bind /run/systemd/resolve/stub-resolv.conf /run/systemd/resolve/stub-resolv.conf \
<a href="#cb1-43" aria-hidden="true" tabindex="-1"/>\
<a href="#cb1-44" aria-hidden="true" tabindex="-1"/> --bind ~/.dev-home /home/jojo \
<a href="#cb1-45" aria-hidden="true" tabindex="-1"/> --ro-bind ~/.config/alacritty ~/.config/alacritty \
<a href="#cb1-46" aria-hidden="true" tabindex="-1"/> --ro-bind ~/.config/nvim ~/.config/nvim \
<a href="#cb1-47" aria-hidden="true" tabindex="-1"/> --ro-bind ~/.local/share/nvim ~/.local/share/nvim \
<a href="#cb1-48" aria-hidden="true" tabindex="-1"/> --ro-bind ~/.bin ~/.bin \
<a href="#cb1-49" aria-hidden="true" tabindex="-1"/>\
<a href="#cb1-50" aria-hidden="true" tabindex="-1"/> --bind /tmp/.X11-unix/X0 /tmp/.X11-unix/X0 \
<a href="#cb1-51" aria-hidden="true" tabindex="-1"/> --bind ~/.Xauthority ~/.Xauthority \
<a href="#cb1-52" aria-hidden="true" tabindex="-1"/> --setenv DISPLAY :0 \
<a href="#cb1-53" aria-hidden="true" tabindex="-1"/>\
<a href="#cb1-54" aria-hidden="true" tabindex="-1"/> --setenv container dev \
<a href="#cb1-55" aria-hidden="true" tabindex="-1"/> "${extra[@]}" \
<a href="#cb1-56" aria-hidden="true" tabindex="-1"/> -- \
<a href="#cb1-57" aria-hidden="true" tabindex="-1"/> "${cmd[@]}"</code></pre></div>
</details></description>
<pubDate>Mon, 11 Mar 2024 21:39:58 +0100</pubDate>
</item>
<item>
<title>GHC Steering Committee Retrospective</title>
<link>https://www.joachim-breitner.de/blog/811-GHC_Steering_Committee_Retrospective</link>
<guid>https://www.joachim-breitner.de/blog/811-GHC_Steering_Committee_Retrospective</guid>
<comments>https://www.joachim-breitner.de/blog/811-GHC_Steering_Committee_Retrospective#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description>After seven years of service as member and secretary on the GHC Steering Committee, I have resigned from that role. So this is a good time to look back and retrace the formation of the GHC proposal process and committee.
In my memory, I helped define and shape the proposal process, optimizing it for effectiveness and throughput, but memory can be misleading, and judging from the paper trail in my email archives, this was indeed mostly Ben Gamari’s and Richard Eisenberg’s achievement: Already in Summer of 2016, Ben Gamari set up the <a href="https://github.com/ghc-proposals/ghc-proposals">ghc-proposals Github repository</a> with a sketch of a process and sent out a <a href="https://mail.haskell.org/pipermail/glasgow-haskell-users/2016-September/026396.html">call for nominations</a> on the GHC user’s mailing list, which I replied to. The Simons picked the first set of members, and in the fall of 2016 we discussed the committee’s by-laws and procedures. As so often, Richard was an influential shaping force here.
<h3 id="three-ingredients">Three ingredients</h3>
For example, it was him that suggested that for each proposal we have one committee member be the “Shepherd”, overseeing the discussion. I believe this was one ingredient for the process effectiveness: There is always one person in charge, and thus we avoid the delays incurred when any one of a non-singleton set of volunteers have to do the next step (and everyone hopes someone else does it).
The next ingredient was that we do not usually require a vote among all members (again, not easy with volunteers with limited bandwidth and occasional phases of absence). Instead, the shepherd makes a recommendation (accept/reject), and if the other committee members do not complain, this silence is taken as consent, and we come to a decision. It seems this idea can also be traced back on Richard, who suggested that “once a decision is requested, the shepherd [generates] consensus. If consensus is elusive, then we vote.”
At the end of the year we agreed and wrote down these rules, created the mailing list for our internal, but <a href="https://mail.haskell.org/pipermail/ghc-steering-committee/">publicly archived committee discussions</a>, and began accepting proposals, starting with <a href="https://github.com/ghc-proposals/ghc-proposals/pull/6">Adam Gundry’s <code>OverloadedRecordFields</code></a>.
At that point, there was no “secretary” role yet, so how I did become one? It seems that in February 2017 I started to <a href="https://mail.haskell.org/pipermail/ghc-steering-committee/2017-February/000040.html">clean-up and refine the process documentation</a>, fixing “bugs in the process” (like requiring authors to set Github labels when they don’t even have permissions to do that). This in particular meant that someone from the committee had to manually handle submissions and so on, and by the aforementioned principle that at every step there ought to be exactly one person in change, the role of a secretary followed naturally. In the <a href="https://mail.haskell.org/pipermail/ghc-steering-committee/2017-February/000044.html">email in which I described that role</a> I wrote:
<blockquote>
Simon already shoved me towards picking up the “secretary” hat, to reduce load on Ben.
</blockquote>
So when I <a href="https://github.com/ghc-proposals/ghc-proposals/commit/63d050c380b2aa380581cc9b9aafb2a7c022556a">merged the updated process documentation</a>, I already listed myself “secretary”.
It wasn’t just Simon’s shoving that put my into the role, though. I dug out my original self-nomination email to Ben, and among other things I wrote:
<blockquote>
I also hope that there is going to be clear responsibilities and a clear workflow among the committee. E.g. someone (possibly rotating), maybe called the secretary, who is in charge of having an initial look at proposals and then assigning it to a member who shepherds the proposal.
</blockquote>
So it is hardly a surprise that I became secretary, when it was dear to my heart to have a smooth continuous process here.
I am rather content with the result: These three ingredients – single secretary, per-proposal shepherds, silence-is-consent – helped the committee to be effective throughout its existence, even as every once in a while individual members dropped out.
<h3 id="ulterior-motivation">Ulterior motivation</h3>
I must admit, however, there was an ulterior motivation behind me grabbing the secretary role: Yes, I did want the committee to succeed, and I did want that authors receive timely, good and decisive feedback on their proposals – but I did not really want to have to do that part.
I am, in fact, a lousy proposal reviewer. I am too generous when reading proposals, and more likely mentally fill gaps in a specification rather than spotting them. Always optimistically assuming that the authors surely know what they are doing, rather than critically assessing the impact, the implementation cost and the interaction with other language features.
And, maybe more importantly: why should I know which changes are good and which are not so good in the long run? Clearly, the authors cared enough about a proposal to put it forward, so there is some need… and I do believe that Haskell should stay an evolving and innovating language… but how does this help me decide about this or that particular feature.
I even, during the formation of the committee, explicitly asked that we write down some guidance on “Vision and Guideline”; do we want to foster change or innovation, or be selective gatekeepers? Should we accept features that are proven to be useful, or should we accept features so that they can prove to be useful? This discussion, however, did not lead to a concrete result, and the assessment of proposals relied on the sum of each member’s personal preference, expertise and gut feeling. I am not saying that this was a mistake: It is hard to come up with a general guideline here, and even harder to find one that does justice to each individual proposal.
So the secret motivation for me to grab the secretary post was that I could contribute without having to judge proposals. Being secretary allowed me to assign most proposals to others to shepherd, and only once in a while myself took care of a proposal, when it seemed to be very straight-forward. Sneaky, ain’t it?
<h3 id="years-later">7 Years later</h3>
For years to come I happily played secretary: When an author finished their proposal and public discussion ebbed down they would ping me on GitHub, I would <a href="https://mail.haskell.org/pipermail/ghc-steering-committee/2022-July/002837.html">pick a suitable shepherd</a> among the committee and ask them to judge the proposal. Eventually, the committee would come to a conclusion, usually by implicit consent, sometimes by voting, and I’d merge the pull request and update the metadata thereon. Every few months I’d <a href="https://mail.haskell.org/pipermail/ghc-steering-committee/2024-January/003683.html">summarize the current state of affairs</a> to the committee (what happened since the last update, which proposals are currently on our plate), and once per year gathered the data for <a href="https://youtu.be/LFIL0myeOlo?si=P316QJs0EBSWe4Uy&amp;t=3955">Simon Peyton Jones’ annually GHC Status Report</a>. Sometimes some members needed a nudge or two to act. Some would eventually step down, and I’d sent around a call for nominations and when the nominations came in, distributed them off-list among the committee and tallied the votes.
Initially, that was exciting. For a long while it was a pleasant and rewarding routine. Eventually, it became a mere chore. I noticed that I didn’t quite care so much anymore about some of the discussion, and there was a decent amount of naval-gazing, meta-discussions and some wrangling about claims of authority that was probably useful and necessary, but wasn’t particularly fun.
I also began to notice weaknesses in the processes that I helped shape: We could really use some more automation for showing proposal statuses, notifying people when they have to act, and nudging them when they don’t. The whole silence-is-assent approach is good for throughput, but not necessary great for quality, and maybe the committee members need to be pushed more firmly to engage with each proposal. Like GHC itself, the committee processes deserve continuous refinement and refactoring, and since I could not muster the motivation to change my now well-trod secretarial ways, it was time for me to step down.
Luckily, Adam Gundry volunteered to take over, and that makes me feel much less bad for quitting. Thanks for that!
And although I am for my day job now <a href="https://lean-lang.org/">enjoying a language</a> that has many of the things out of the box that for Haskell are still only language extensions or even just future proposals (dependent types, <code>BlockArguments</code>, <code>do</code> notation with <code>(← foo)</code> expressions and 💜 Unicode), I’m still around, hosting the <a href="https://haskell.foundation/podcast/">Haskell Interlude Podcast</a>, writing on this blog and hanging out at ZuriHac etc.</description>
<pubDate>Thu, 25 Jan 2024 01:21:41 +0100</pubDate>
</item>
<item>
<title>The Haskell Interlude Podcast</title>
<link>https://www.joachim-breitner.de/blog/810-The_Haskell_Interlude_Podcast</link>
<guid>https://www.joachim-breitner.de/blog/810-The_Haskell_Interlude_Podcast</guid>
<comments>https://www.joachim-breitner.de/blog/810-The_Haskell_Interlude_Podcast#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description>It was pointed out to me that I have not blogged about this, so better now than never:
Since 2021 I am – together with four other hosts – producing a regular podcast about Haskell, the <a href="https://haskell.foundation/podcast/">Haskell Interlude</a>. Roughly every two weeks two of us interview someone from the Haskell Community, and we chat for approximately an hour about how they came to Haskell, what they are doing with it, why they are doing it and what else is on their mind. Sometimes we talk to very famous people, like Simon Peyton Jones, and sometimes to people who maybe should be famous, but aren’t quite yet.
For most episodes we also have a transcript, so you can read the interviews instead, if you prefer, and you should find the podcast on most podcast apps as well. I do not know how reliable these statistics are, but supposedly we regularly have around 1300 listeners. We don’t get much feedback, however, so if you like the show, or dislike it, or have feedback, let us know (for example on the <a href="https://discourse.haskell.org/">Haskell Disourse</a>, which has a thread for each episode).
At the time of writing, we released 40 episodes. For the benefit of my (likely hypothetical) fans, or those who want to train an AI voice model for nefarious purposes, here is the list of episodes co-hosted by me:
<ul>
<li><a href="https://haskell.foundation/podcast/3">Gabriella Gonzales</a></li>
<li><a href="https://haskell.foundation/podcast/4">Jasper Van der Jeugt</a></li>
<li><a href="https://haskell.foundation/podcast/5">Chris Smith</a></li>
<li><a href="https://haskell.foundation/podcast/9">Sebastian Graf</a></li>
<li><a href="https://haskell.foundation/podcast/11">Simon Peyton Jones</a></li>
<li><a href="https://haskell.foundation/podcast/14">Ryan Trinkle</a></li>
<li><a href="https://haskell.foundation/podcast/15">Facundo Dominguez</a></li>
<li><a href="https://haskell.foundation/podcast/19">Marc Scholten</a></li>
<li><a href="https://haskell.foundation/podcast/23">Ben Gamari</a></li>
<li><a href="https://haskell.foundation/podcast/25">Andrew Lelechenko (Bodigrim)</a></li>
<li><a href="https://haskell.foundation/podcast/29">ZuriHac 2023 special</a></li>
<li><a href="https://haskell.foundation/podcast/31">Arnaud Spiwack</a></li>
<li><a href="https://haskell.foundation/podcast/37">John MacFarlane</a></li>
<li><a href="https://haskell.foundation/podcast/41">Moritz Angermann</a></li>
<li><a href="https://haskell.foundation/podcast/42">Jezen Thomas</a></li>
</ul>
Can’t decide where to start? The one with Ryan Trinkle might be my favorite.
Thanks to the Haskell Foundation and its sponsors for supporting this podcast (hosting, editing, transscription).</description>
<pubDate>Fri, 22 Dec 2023 10:04:42 +0100</pubDate>
</item>
<item>
<title>Joining the Lean FRO</title>
<link>https://www.joachim-breitner.de/blog/809-Joining_the_Lean_FRO</link>
<guid>https://www.joachim-breitner.de/blog/809-Joining_the_Lean_FRO</guid>
<comments>https://www.joachim-breitner.de/blog/809-Joining_the_Lean_FRO#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description>Tomorrow is going to be a new first day in a new job for me: I am joining the <a href="https://lean-fro.org/">Lean FRO</a>, and I’m excited.
<h3 id="what-is-lean">What is Lean?</h3>
<a href="https://lean-lang.org/about/">Lean</a> is the new kid on the block of theorem provers.
It’s a pure functional programming language (like Haskell, with and on which I have worked a lot), but it’s dependently typed (which Haskell may be evolving to be as well, but rather slowly and carefully). It has a refreshing syntax, built on top of a rather good (I have been told, not an expert here) macro system.
As a dependently typed programming language, it is also a theorem prover, or proof assistant, and there exists already a lively community of mathematicians who started to formalize mathematics in a coherent library, creatively called <a href="https://github.com/leanprover-community/mathlib4">mathlib</a>.
<h3 id="what-is-a-fro">What is a FRO?</h3>
A <a href="https://www.convergentresearch.org/">Focused Research Organization</a> has the organizational form of a small start up (small team, little overhead, a few years of runway), but its goals and measure for success are not commercial, as funding is provided by donors (in the case of the Lean FRO, the Simons Foundation International, the Alfred P. Sloan Foundation, and Richard Merkin). This allows us to build something that we believe is a contribution for the greater good, even though it’s not (or not yet) commercially interesting enough and does not fit other forms of funding (such as research grants) well. This is a very comfortable situation to be in.
<h3 id="why-am-i-excited">Why am I excited?</h3>
To me, working on Lean seems to be the perfect mix: I have been working on language implementation for about a decade now, and always with a preference for functional languages. Add to that my interest in theorem proving, where I have used Isabelle and Coq so far, and played with Agda and others. So technically, clearly up my alley.
Furthermore, the language isn’t too old, and plenty of interesting things are simply still to do, rather than tried before. The ecosystem is still evolving, so there is a good chance to have some impact.
On the other hand, the language isn’t too young either. It is no longer an open question whether we will have users: we have them already, they hang out on <a href="https://leanprover.zulipchat.com/">zulip</a>, so if I improve something, there is likely someone going to be happy about it, which is great. And the community seems to be welcoming and full of nice people.
Finally, <a href="https://github.com/leanprover-community/mathlib4">this library of mathematics</a> that these users are building is itself an amazing artifact: Lots of math in a consistent, machine-readable, maintained, documented, checked form! With a little bit of optimism I can imagine this changing how math research and education will be done in the future. It could be for math what Wikipedia is for encyclopedic knowledge and OpenStreetMap for maps – and the thought of facilitating that excites me.
With this new job I find that when I am telling friends and colleagues about it, I do not hesitate or hedge when asked why I am doing this. This is a good sign.
<h3 id="what-will-i-be-doing">What will I be doing?</h3>
We’ll see what main tasks I’ll get to tackle initially, but knowing myself, I expect I’ll get broadly involved.
To get up to speed I started playing around with a few things already, and for example created <a href="https://loogle.lean-lang.org/">Loogle</a>, a Mathlib search engine inspired by Haskell’s <a href="https://hoogle.haskell.org/">Hoogle</a>, including a Zulip bot integration. This seems to be useful and quite well received, so I’ll continue maintaining that.
Expect more about this and other contributions here in the future.</description>
<pubDate>Wed, 01 Nov 2023 21:47:06 +0100</pubDate>
</item>
<item>
<title>Squash your Github PRs with one click</title>
<link>https://www.joachim-breitner.de/blog/808-Squash_your_Github_PRs_with_one_click</link>
<guid>https://www.joachim-breitner.de/blog/808-Squash_your_Github_PRs_with_one_click</guid>
<comments>https://www.joachim-breitner.de/blog/808-Squash_your_Github_PRs_with_one_click#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description>TL;DR: Squash your PRs with one click at <a href="https://squasher.nomeata.de/" class="uri">https://squasher.nomeata.de/</a>.
Very recently I got this response from the project maintainer at a pull request I contributed: “Thanks, approved, please squash so that I can merge.”
It’s nice that my contribution can go it, but why did the maintainer not just press the “Squash and merge button”, and instead adds the this unnecessary roundtrip to the process? Anyways, maintainers make the rules, so I play by them. But unlike the maintainer, who can squash-and-merge with just one click, squashing the PR’s branch is surprisingly laberous: Github does not allow you to do that via the Web UI (and hence on mobile), and it seems you are expected to go to your computer and juggle with <code>git rebase --interactive</code>.
I found this rather annoying, so I created <a href="https://squasher.nomeata.de/">Squasher</a>, a simple service that will squash your branch for you. There is no configuration, just paste the PR url. It will use the PR title and body as the commit message (which is obviously the right way™), and create the commit in your name:
<figure>
<img src="/various/squasher.png" alt="Squasher in action"/>
<figcaption aria-hidden="true">Squasher in action</figcaption>
</figure>
If you find this useful, or found it to be buggy, let me know. The code is at <a href="https://github.com/nomeata/squasher" class="uri">https://github.com/nomeata/squasher</a> if you are curious about it.</description>
<pubDate>Sun, 29 Oct 2023 22:46:56 +0100</pubDate>
</item>
<item>
<title>Left recursive parser combinators via sharing</title>
<link>https://www.joachim-breitner.de/blog/807-Left_recursive_parser_combinators_via_sharing</link>
<guid>https://www.joachim-breitner.de/blog/807-Left_recursive_parser_combinators_via_sharing</guid>
<comments>https://www.joachim-breitner.de/blog/807-Left_recursive_parser_combinators_via_sharing#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description>At this year’s <a href="https://icfp23.sigplan.org/">ICFP in Seattle</a> I gave a talk about my <a href="https://hackage.haskell.org/package/rec-def">rec-def</a> Haskell library, which I have blogged about before here. While my <a href="https://doi.org/10.1145/3607853">functional pearl paper</a> focuses on a concrete use-case and the tricks of the implementation, in my talk I put the emphasis on the high-level idea: it beholds of a declarative lazy functional like Haskell that recursive equations just work whenever they describe a (unique) solution. Like in the paper, I used equations between sets as the running example, and only conjectured that it should also work for other domains, in particular parser combinators.
Naturally, someone called my bluff and asked if I actually tried it. I had not, but I should have, because it works nicely and is actually more straight-forward than with sets. I wrote up a prototype and showed it off a few days later as a lightning talk at <a href="https://icfp23.sigplan.org/home/haskellsymp-2023">Haskell Symposium</a>; here is the write up that goes along with that.
<h3 id="parser-combinators">Parser combinators</h3>
Parser combinators are libraries that provide little functions (combinators) that you compose to define your parser directly in your programming language, as opposed to using external tools that read some grammar description and generate parser code, and are quite popular in Haskell (e.g. <a href="https://hackage.haskell.org/package/parsec">parsec</a>, <a href="https://hackage.haskell.org/package/attoparsec">attoparsec</a>, <a href="https://hackage.haskell.org/package/megaparsec">megaparsec</a>).
Let us define a little parser that recognizes sequences of <code>a</code>s:
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb1-1" aria-hidden="true" tabindex="-1"/>ghci> let aaa = tok 'a' *> aaa &lt;|> pure ()
<a href="#cb1-2" aria-hidden="true" tabindex="-1"/>ghci> parse aaa "aaaa"
<a href="#cb1-3" aria-hidden="true" tabindex="-1"/>Just ()
<a href="#cb1-4" aria-hidden="true" tabindex="-1"/>ghci> parse aaa "aabaa"
<a href="#cb1-5" aria-hidden="true" tabindex="-1"/>Nothing</code></pre></div>
<h3 id="left-recursion">Left-recursion</h3>
This works nicely, but just because we were lucky: We wrote the parser to recurse on the right (of the <code>*></code>), and this happens to work. If we put the recursive call first, it doesn’t anymore:
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb2-1" aria-hidden="true" tabindex="-1"/>ghci> let aaa = aaa &lt;* tok 'a' &lt;|> pure ()
<a href="#cb2-2" aria-hidden="true" tabindex="-1"/>ghci> parse aaa "aaaa"
<a href="#cb2-3" aria-hidden="true" tabindex="-1"/>^CInterrupted.</code></pre></div>
This is a well-known problem (see for example <a href="https://doi.org/10.1145/3471874.3472984">Nicolas Wu’s overview paper</a>), all the common parser combinator libraries cannot handle it and the usual advise is to refactor your grammar to avoid left recursion.
But there are some libraries that can handle left recursion, at least with a little help from the programmer. I found two variations:
<ul>
<li>The library provides an explicit fix point combinator, and as long as that is used, left-recursion works. This is for example described by <a href="https://link.springer.com/chapter/10.1007/978-3-540-77442-6_12">Frost, Hafiz and Callaghan</a> by, and (of course) <a href="https://okmij.org/ftp/Haskell/LeftRecursion.hs">Oleg Kiselyov</a> has an implementation of this too.</li>
<li>The library expects explicit labels on recursive productions, so that the library can recognize left-recursion. I found an implementation of this idea in the <a href="https://hackage.haskell.org/package/Agda-2.6.3/docs/Agda-Utils-Parser-MemoisedCPS.html"><code>Agda.Utils.Parser.MemoisedCPS</code></a> module in the Agda code, the <a href="https://hackage.haskell.org/package/gll-0.4.1.0/docs/GLL-Combinators-Interface.html"><code>gll</code> library</a> seems to follow this style and <a href="https://discourse.haskell.org/t/reusing-haskells-binding-when-defining-context-free-grammars/5960?u=nomeata">Jaro discusses it as well</a>.</li>
</ul>
I took the module from the Agda source and simplified a bit for the purposes of this demonstration (<a href="https://github.com/nomeata/left-rec-parse/blob/master/Parser.hs"><code>Parser.hs</code></a>). Indeed, I can make the left-recursive grammar work:
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb3-1" aria-hidden="true" tabindex="-1"/>ghci> let aaa = memoise ":-)" $ aaa &lt;* tok 'a' &lt;|> pure ()
<a href="#cb3-2" aria-hidden="true" tabindex="-1"/>ghci> parse aaa "aaaa"
<a href="#cb3-3" aria-hidden="true" tabindex="-1"/>Just ()
<a href="#cb3-4" aria-hidden="true" tabindex="-1"/>ghci> parse aaa "aabaa"
<a href="#cb3-5" aria-hidden="true" tabindex="-1"/>Nothing</code></pre></div>
It does not matter what I pass to <code>memoise</code>, as long as I do not pass the same key when memoising a different parser.
For reference, an excerpt of the the API of <code>Parser</code>:
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb4-1" aria-hidden="true" tabindex="-1"/>data Parser k tok a -- k is type of keys, tok type of tokens (e.g. Char)
<a href="#cb4-2" aria-hidden="true" tabindex="-1"/>instance Functor (Parser k tok)
<a href="#cb4-3" aria-hidden="true" tabindex="-1"/>instance Applicative (Parser k tok)
<a href="#cb4-4" aria-hidden="true" tabindex="-1"/>instance Alternative (Parser k tok)
<a href="#cb4-5" aria-hidden="true" tabindex="-1"/>instance Monad (Parser k tok)
<a href="#cb4-6" aria-hidden="true" tabindex="-1"/>parse :: Parser k tok a -> [tok] -> Maybe a
<a href="#cb4-7" aria-hidden="true" tabindex="-1"/>sat :: (tok -> Bool) -> Parser k tok tok
<a href="#cb4-8" aria-hidden="true" tabindex="-1"/>tok :: Eq tok => tok -> Parser k tok tok
<a href="#cb4-9" aria-hidden="true" tabindex="-1"/>memoise :: Ord k => k -> Parser k tok a -> Parser k tok a</code></pre></div>
<h3 id="left-recursion-through-sharing">Left-recursion through sharing</h3>
To follow the agenda set out in my talk, I now want to wrap that parser in a way that relieves me from having to insert the calls to <code>memoise</code>. To start, I import that parser qualified, define a newtype around it, and start lifting some of the functions:
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb5-1" aria-hidden="true" tabindex="-1"/>import qualified Parser as P
<a href="#cb5-2" aria-hidden="true" tabindex="-1"/>
<a href="#cb5-3" aria-hidden="true" tabindex="-1"/>newtype Parser tok a = MkP { unP :: P.Parser Unique tok a }
<a href="#cb5-4" aria-hidden="true" tabindex="-1"/>
<a href="#cb5-5" aria-hidden="true" tabindex="-1"/>parse :: Parser tok a -> [tok] -> Maybe a
<a href="#cb5-6" aria-hidden="true" tabindex="-1"/>parses (MkP p) = P.parse p
<a href="#cb5-7" aria-hidden="true" tabindex="-1"/>
<a href="#cb5-8" aria-hidden="true" tabindex="-1"/>sat :: Typeable tok => (tok -> Bool) -> Parser tok tok
<a href="#cb5-9" aria-hidden="true" tabindex="-1"/>sat p = MkP (P.sat p)
<a href="#cb5-10" aria-hidden="true" tabindex="-1"/>
<a href="#cb5-11" aria-hidden="true" tabindex="-1"/>tok :: Eq tok => tok -> Parser tok tok
<a href="#cb5-12" aria-hidden="true" tabindex="-1"/>tok t = MkP (P.tok t)</code></pre></div>
So far, nothing interesting had to happen, because so far I cannot build recursive parsers. The first interesting combinator that allows me to do that is <code>&lt;*></code> from the <code>Applicative</code> class, so I should use <code>memoise</code> there. The question is: Where does the unique key come from?
<h3 id="proprioception">Proprioception</h3>
As with the rec-def library, pure code won’t do, and I have to get my hands dirty: I really want a fresh unique label out of thin air. To that end, I define the following combinator, with naming aided by Richard Eisenberg:
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb6-1" aria-hidden="true" tabindex="-1"/>propriocept :: (Unique -> a) -> a
<a href="#cb6-2" aria-hidden="true" tabindex="-1"/>propriocept f = unsafePerformIO $ f &lt;$> newUnique</code></pre></div>
A thunk defined with <code>propriocept</code> will know about it’s own identity, and will be able to tell itself apart from other such thunks. This gives us a form of observable sharing, precisely what we need. But before we return to our parser combinators, let us briefly explore this combinator.
Using <code>propriocept</code> I can define an operation <code>cons :: [Int] -> [Int]</code> that records (the hash of) this <code>Unique</code> in the list:
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb7-1" aria-hidden="true" tabindex="-1"/>ghci> let cons xs = propriocept (\x -> hashUnique x : xs)
<a href="#cb7-2" aria-hidden="true" tabindex="-1"/>ghci> :t cons
<a href="#cb7-3" aria-hidden="true" tabindex="-1"/>cons :: [Int] -> [Int]</code></pre></div>
This lets us see the identity of a list cell, that is, of the concrete object in memory.
Naturally, if we construct a finite list, each list cell is different:
<div class="sourceCode" id="cb8"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb8-1" aria-hidden="true" tabindex="-1"/>ghci> cons (cons (cons []))
<a href="#cb8-2" aria-hidden="true" tabindex="-1"/>[1,2,3]</code></pre></div>
And if we do that again, we see that fresh list cells are allocated:
<div class="sourceCode" id="cb9"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb9-1" aria-hidden="true" tabindex="-1"/>ghci> cons (cons (cons []))
<a href="#cb9-2" aria-hidden="true" tabindex="-1"/>[4,5,6]</code></pre></div>
We can create an infinite list; if we do it without sharing, every cell is separate:
<div class="sourceCode" id="cb10"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb10-1" aria-hidden="true" tabindex="-1"/>ghci> take 20 (acyclic 0)
<a href="#cb10-2" aria-hidden="true" tabindex="-1"/>[7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]</code></pre></div>
but if we tie the knot using sharing, all the cells in the list are actually the same:
<div class="sourceCode" id="cb11"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb11-1" aria-hidden="true" tabindex="-1"/>ghci> let cyclic = cons cyclic
<a href="#cb11-2" aria-hidden="true" tabindex="-1"/>ghci> take 20 cyclic
<a href="#cb11-3" aria-hidden="true" tabindex="-1"/>[27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27,27]</code></pre></div>
We can achieve the same using <a href="https://hackage.haskell.org/package/base/docs/Data-Function.html#v:fix"><code>fix</code> from <code>Data.Function</code></a>:
<div class="sourceCode" id="cb12"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb12-1" aria-hidden="true" tabindex="-1"/>ghci> import Data.Function
<a href="#cb12-2" aria-hidden="true" tabindex="-1"/>ghci> take 20 (fix cons)
<a href="#cb12-3" aria-hidden="true" tabindex="-1"/>[28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28,28]</code></pre></div>
I explore these heap structures more visually in a <a href="https://www.youtube.com/playlist?list=PL4FcLyLhO9jggmkqJyJ2i9pCSiDpwKiVu">series of screencasts</a>.
So with <code>propriocept</code> we can distinguish different heap objects, and also recognize when we come across the same heap object again.
<h3 id="left-recursion-through-sharing-cont.">Left-recursion through sharing (cont.)</h3>
With that we return to our parser. We define a smart constructor for the new <code>Parser</code> that passes the unique from <code>propriocept</code> to the underlying parser’s <code>memoise</code> function:
<div class="sourceCode" id="cb13"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb13-1" aria-hidden="true" tabindex="-1"/>withMemo :: P.Parser Unique tok a -> Parser tok a
<a href="#cb13-2" aria-hidden="true" tabindex="-1"/>withMemo p = propriocept $ \u -> MkP $ P.memoise u p</code></pre></div>
If we now use this in the definition of all possibly recursive parsers, then the necessary calls to <code>memoise</code> will be in place:
<div class="sourceCode" id="cb14"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb14-1" aria-hidden="true" tabindex="-1"/>instance Functor (Parser tok) where
<a href="#cb14-2" aria-hidden="true" tabindex="-1"/> fmap f p = withMemo (fmap f (unP p))
<a href="#cb14-3" aria-hidden="true" tabindex="-1"/>
<a href="#cb14-4" aria-hidden="true" tabindex="-1"/>instance Applicative (Parser tok) where
<a href="#cb14-5" aria-hidden="true" tabindex="-1"/> pure x = MkP (pure x)
<a href="#cb14-6" aria-hidden="true" tabindex="-1"/> p1 &lt;*> p2 = withMemo (unP p1 &lt;*> unP p2)
<a href="#cb14-7" aria-hidden="true" tabindex="-1"/>
<a href="#cb14-8" aria-hidden="true" tabindex="-1"/>instance Alternative (Parser tok) where
<a href="#cb14-9" aria-hidden="true" tabindex="-1"/> empty = MkP empty
<a href="#cb14-10" aria-hidden="true" tabindex="-1"/> p1 &lt;|> p2 = withMemo (unP p1 &lt;|> unP p2)
<a href="#cb14-11" aria-hidden="true" tabindex="-1"/>
<a href="#cb14-12" aria-hidden="true" tabindex="-1"/>instance Monad (Parser tok) where
<a href="#cb14-13" aria-hidden="true" tabindex="-1"/> return = pure
<a href="#cb14-14" aria-hidden="true" tabindex="-1"/> p1 >>= f = withMemo $ unP p1 >>= unP . f</code></pre></div>
And indeed, it works (see <a href="https://github.com/nomeata/left-rec-parse/blob/master/RParser.hs"><code>RParser.hs</code></a> for the full code):
<div class="sourceCode" id="cb15"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb15-1" aria-hidden="true" tabindex="-1"/>ghci> let aaa = aaa &lt;* tok 'a' &lt;|> pure ()
<a href="#cb15-2" aria-hidden="true" tabindex="-1"/>ghci> parse aaa "aaaa"
<a href="#cb15-3" aria-hidden="true" tabindex="-1"/>Just ()
<a href="#cb15-4" aria-hidden="true" tabindex="-1"/>ghci> parse aaa "aabaa"
<a href="#cb15-5" aria-hidden="true" tabindex="-1"/>Nothing</code></pre></div>
<h3 id="a-larger-example">A larger example</h3>
Let us try this on a larger example, and parse (simple) BNF grammars. Here is a data type describing them
<div class="sourceCode" id="cb16"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb16-1" aria-hidden="true" tabindex="-1"/>type Ident = String
<a href="#cb16-2" aria-hidden="true" tabindex="-1"/>type RuleRhs = [Seq]
<a href="#cb16-3" aria-hidden="true" tabindex="-1"/>type Seq = [Atom]
<a href="#cb16-4" aria-hidden="true" tabindex="-1"/>data Atom = Lit String | NonTerm Ident deriving Show
<a href="#cb16-5" aria-hidden="true" tabindex="-1"/>type Rule = (Ident, RuleRhs)
<a href="#cb16-6" aria-hidden="true" tabindex="-1"/>type BNF = [Rule]</code></pre></div>
For the concrete syntax, I’d like to be able to parse something like
<div class="sourceCode" id="cb17"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb17-1" aria-hidden="true" tabindex="-1"/>numExp :: String
<a href="#cb17-2" aria-hidden="true" tabindex="-1"/>numExp = unlines
<a href="#cb17-3" aria-hidden="true" tabindex="-1"/> [ "term := sum;"
<a href="#cb17-4" aria-hidden="true" tabindex="-1"/> , "pdigit := '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';"
<a href="#cb17-5" aria-hidden="true" tabindex="-1"/> , "digit := '0' | pdigit;"
<a href="#cb17-6" aria-hidden="true" tabindex="-1"/> , "pnum := pdigit | pnum digit;"
<a href="#cb17-7" aria-hidden="true" tabindex="-1"/> , "num := '0' | pnum;"
<a href="#cb17-8" aria-hidden="true" tabindex="-1"/> , "prod := atom | atom '*' prod;"
<a href="#cb17-9" aria-hidden="true" tabindex="-1"/> , "sum := prod | prod '+' sum;"
<a href="#cb17-10" aria-hidden="true" tabindex="-1"/> , "atom := num | '(' term ')';"
<a href="#cb17-11" aria-hidden="true" tabindex="-1"/> ]</code></pre></div>
so here is a possible parser; mostly straight-forward use of parser combinator:
<div class="sourceCode" id="cb18"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb18-1" aria-hidden="true" tabindex="-1"/>type P = Parser Char
<a href="#cb18-2" aria-hidden="true" tabindex="-1"/>
<a href="#cb18-3" aria-hidden="true" tabindex="-1"/>snoc :: [a] -> a -> [a]
<a href="#cb18-4" aria-hidden="true" tabindex="-1"/>snoc xs x = xs ++ [x]
<a href="#cb18-5" aria-hidden="true" tabindex="-1"/>
<a href="#cb18-6" aria-hidden="true" tabindex="-1"/>l :: P a -> P a
<a href="#cb18-7" aria-hidden="true" tabindex="-1"/>l p = p &lt;|> l p &lt;* sat isSpace
<a href="#cb18-8" aria-hidden="true" tabindex="-1"/>quote :: P Char
<a href="#cb18-9" aria-hidden="true" tabindex="-1"/>quote = tok '\''
<a href="#cb18-10" aria-hidden="true" tabindex="-1"/>quoted :: P a -> P a
<a href="#cb18-11" aria-hidden="true" tabindex="-1"/>quoted p = quote *> p &lt;* quote
<a href="#cb18-12" aria-hidden="true" tabindex="-1"/>str :: P String
<a href="#cb18-13" aria-hidden="true" tabindex="-1"/>str = some (sat (not . (== '\'')))
<a href="#cb18-14" aria-hidden="true" tabindex="-1"/>ident :: P Ident
<a href="#cb18-15" aria-hidden="true" tabindex="-1"/>ident = some (sat (\c -> isAlphaNum c &amp;&amp; isAscii c))
<a href="#cb18-16" aria-hidden="true" tabindex="-1"/>atom :: P Atom
<a href="#cb18-17" aria-hidden="true" tabindex="-1"/>atom = Lit &lt;$> l (quoted str)
<a href="#cb18-18" aria-hidden="true" tabindex="-1"/> &lt;|> NonTerm &lt;$> l ident
<a href="#cb18-19" aria-hidden="true" tabindex="-1"/>eps :: P ()
<a href="#cb18-20" aria-hidden="true" tabindex="-1"/>eps = void $ l (tok 'ε')
<a href="#cb18-21" aria-hidden="true" tabindex="-1"/>sep :: P ()
<a href="#cb18-22" aria-hidden="true" tabindex="-1"/>sep = void $ some (sat isSpace)
<a href="#cb18-23" aria-hidden="true" tabindex="-1"/>sq :: P Seq
<a href="#cb18-24" aria-hidden="true" tabindex="-1"/>sq = [] &lt;$ eps
<a href="#cb18-25" aria-hidden="true" tabindex="-1"/> &lt;|> snoc &lt;$> sq &lt;* sep &lt;*> atom
<a href="#cb18-26" aria-hidden="true" tabindex="-1"/> &lt;|> pure &lt;$> atom
<a href="#cb18-27" aria-hidden="true" tabindex="-1"/>ruleRhs :: P RuleRhs
<a href="#cb18-28" aria-hidden="true" tabindex="-1"/>ruleRhs = pure &lt;$> sq
<a href="#cb18-29" aria-hidden="true" tabindex="-1"/> &lt;|> snoc &lt;$> ruleRhs &lt;* l (tok '|') &lt;*> sq
<a href="#cb18-30" aria-hidden="true" tabindex="-1"/>rule :: P Rule
<a href="#cb18-31" aria-hidden="true" tabindex="-1"/>rule = (,) &lt;$> l ident &lt;* l (tok ':' *> tok '=') &lt;*> ruleRhs &lt;* l (tok ';')
<a href="#cb18-32" aria-hidden="true" tabindex="-1"/>bnf :: P BNF
<a href="#cb18-33" aria-hidden="true" tabindex="-1"/>bnf = pure &lt;$> rule
<a href="#cb18-34" aria-hidden="true" tabindex="-1"/> &lt;|> snoc &lt;$> bnf &lt;*> rule</code></pre></div>
I somewhat sillily used <code>snoc</code> rather than <code>(:)</code> to build my lists, just so that I can show off all the left-recursion in this grammar.
<h3 id="sharing-is-tricky">Sharing is tricky</h3>
Let’s try it:
<div class="sourceCode" id="cb19"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb19-1" aria-hidden="true" tabindex="-1"/>ghci> parse bnf numExp
<a href="#cb19-2" aria-hidden="true" tabindex="-1"/>^CInterrupted.</code></pre></div>
What a pity, it does not work! What went wrong?
The underlying library can handle left-recursion if it can recognize it by seeing a <code>memoise</code> label passed again. This works fine in all the places where we re-use a parser definition (e.g. in <code>bnf</code>), but it really requires that values are shared!
If we look carefully at our definition of <code>l</code> (which parses a lexeme, i.e. something possibly followed by whitespace), then it recurses via a fresh function call, and the program will keep expanding the definition – just like the <code>acyclic</code> above:
<div class="sourceCode" id="cb20"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb20-1" aria-hidden="true" tabindex="-1"/>l :: P a -> P a
<a href="#cb20-2" aria-hidden="true" tabindex="-1"/>l p = p &lt;|> l p &lt;* sat isSpace</code></pre></div>
The fix (sic!) is to make sure that the recursive call is using the parser we are currently defining, which we can easily do with a local definition:
<div class="sourceCode" id="cb21"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb21-1" aria-hidden="true" tabindex="-1"/>l :: P a -> P a
<a href="#cb21-2" aria-hidden="true" tabindex="-1"/>l p = p'
<a href="#cb21-3" aria-hidden="true" tabindex="-1"/> where p' = p &lt;|> p' &lt;* sat isSpace</code></pre></div>
With this little fix, the parser can parse the example grammar:
<div class="sourceCode" id="cb22"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb22-1" aria-hidden="true" tabindex="-1"/>ghci> parse bnf numExp
<a href="#cb22-2" aria-hidden="true" tabindex="-1"/>Just [("term",[[NonTerm "sum"]]),("pdigit",[[Lit "1"],…</code></pre></div>
<h3 id="going-meta">Going meta</h3>
The main demonstration is over, but since we now have already have a parser for grammar descriptions at hand, let’s go a bit further and dynamically construct a parser from such a description. The parser should only accept strings according to that grammar, and return a parse tree annotated with the non-terminals used:
<div class="sourceCode" id="cb23"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb23-1" aria-hidden="true" tabindex="-1"/>interp :: BNF -> P String
<a href="#cb23-2" aria-hidden="true" tabindex="-1"/>interp bnf = parsers M.! start
<a href="#cb23-3" aria-hidden="true" tabindex="-1"/> where
<a href="#cb23-4" aria-hidden="true" tabindex="-1"/> start :: Ident
<a href="#cb23-5" aria-hidden="true" tabindex="-1"/> start = fst (head bnf)
<a href="#cb23-6" aria-hidden="true" tabindex="-1"/>
<a href="#cb23-7" aria-hidden="true" tabindex="-1"/> parsers :: M.Map Ident (P String)
<a href="#cb23-8" aria-hidden="true" tabindex="-1"/> parsers = M.fromList [ (i, parseRule i rhs) | (i, rhs) &lt;- bnf ]
<a href="#cb23-9" aria-hidden="true" tabindex="-1"/>
<a href="#cb23-10" aria-hidden="true" tabindex="-1"/> parseRule :: Ident -> RuleRhs -> P String
<a href="#cb23-11" aria-hidden="true" tabindex="-1"/> parseRule ident rhs = trace &lt;$> asum (map parseSeq rhs)
<a href="#cb23-12" aria-hidden="true" tabindex="-1"/> where trace s = ident ++ "(" ++ s ++ ")"
<a href="#cb23-13" aria-hidden="true" tabindex="-1"/>
<a href="#cb23-14" aria-hidden="true" tabindex="-1"/> parseSeq :: Seq -> P String
<a href="#cb23-15" aria-hidden="true" tabindex="-1"/> parseSeq = fmap concat . traverse parseAtom
<a href="#cb23-16" aria-hidden="true" tabindex="-1"/>
<a href="#cb23-17" aria-hidden="true" tabindex="-1"/> parseAtom :: Atom -> P String
<a href="#cb23-18" aria-hidden="true" tabindex="-1"/> parseAtom (Lit s) = traverse tok s
<a href="#cb23-19" aria-hidden="true" tabindex="-1"/> parseAtom (NonTerm i) = parsers M.! i</code></pre></div>
Let’s see it in action (full code in <a href="https://github.com/nomeata/left-rec-parse/blob/master/BNFEx.hs"><code>BNFEx.hs</code></a>):
<div class="sourceCode" id="cb24"><pre class="sourceCode haskell"><code class="sourceCode haskell"><a href="#cb24-1" aria-hidden="true" tabindex="-1"/>ghci> Just bnfp = parse bnf numExp
<a href="#cb24-2" aria-hidden="true" tabindex="-1"/>ghci> :t bnfp
<a href="#cb24-3" aria-hidden="true" tabindex="-1"/>bnfp :: BNF
<a href="#cb24-4" aria-hidden="true" tabindex="-1"/>ghci> parse (inter
<a href="#cb24-5" aria-hidden="true" tabindex="-1"/>interact interp
<a href="#cb24-6" aria-hidden="true" tabindex="-1"/>ghci> parse (interp bnfp) "12+3*4"
<a href="#cb24-7" aria-hidden="true" tabindex="-1"/>Just "term(sum(prod(atom(num(pnum(pnum(pdigit(1))digit(pdigit(2))))))+sum(prod(atom(num(pnum(pdigit(3))))*prod(atom(num(pnum(pdigit(4)))))))))"
<a href="#cb24-8" aria-hidden="true" tabindex="-1"/>ghci> parse (interp bnfp) "12+3*4+"
<a href="#cb24-9" aria-hidden="true" tabindex="-1"/>Nothing</code></pre></div>
It is worth noting that the <code>numExp</code> grammar is also left-recursive, so implementing <code>interp</code> with a conventional parser combinator library would not have worked. But thanks to our <code>propriocept</code> tick, it does! Again, the sharing is important; in the code above it is the map <code>parsers</code> that is defined in terms of itself, and will ensure that the left-recursive productions will work.
<h3 id="closing-thoughts">Closing thoughts</h3>
I am using <code>unsafePerformIO</code>, so I need to justify its use. Clearly, <code>propriocept</code> is not a pure function, and it’s type is a lie. In general, using it will break the nice equational properties of Haskell, as we have seen in our experiments with <code>cons</code>.
In the case of our parser library, however, we use it in specific ways, namely to feed a fresh name to <code>memoise</code>. Assuming the underlying parser library’s behavior does not observably depend on where and with which key <code>memoise</code> is used, this results in a properly pure interface, and all is well again. (NB: I did not investigate if this assumption actually holds for the parser library used here, or if it may for example affect the order of parse trees returned.)
I also expect that this implementation, which will memoise every parser involved, will be rather slow. It seems plausible to analyze the graph structure and figure out which <code>memoise</code> calls are actually needed to break left-recursion (at least if we drop the <code>Monad</code> instance or always memoise <code>>>=</code>).
If you liked this post, you might enjoy reading <a href="https://doi.org/10.1145/3607853">the paper about rec-def</a>, watch one of my talks about it (MuniHac, BOBKonf, ICFP23; the presentation evolved over time), or if you just want to see more about how things are laid out on Haskell’s heap, go to <a href="https://www.youtube.com/playlist?list=PL4FcLyLhO9jggmkqJyJ2i9pCSiDpwKiVu">my screen casts exploring the Haskell heap</a>.</description>
<pubDate>Sun, 10 Sep 2023 17:16:08 -0700</pubDate>
</item>
<item>
<title>Generating bibtex bibliographies from DOIs via DBLP</title>
<link>https://www.joachim-breitner.de/blog/806-Generating_bibtex_bibliographies_from_DOIs_via_DBLP</link>
<guid>https://www.joachim-breitner.de/blog/806-Generating_bibtex_bibliographies_from_DOIs_via_DBLP</guid>
<comments>https://www.joachim-breitner.de/blog/806-Generating_bibtex_bibliographies_from_DOIs_via_DBLP#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description>I sometimes <a href="https://www.joachim-breitner.de/publications">write papers</a> and part of paper writing is assembling the bibliography. In my case, this is done using BibTeX. So when I need to add another citation, I have to find suitable data in Bibtex format.
Often I copy snippets from <code>.bib</code> files from earlier paper.
Or I search for the paper on <a href="https://dblp.org/">DBLP</a>, which in my experience has highest quality BibTeX entries and best coverage of computer science related publications, copy it to my <code>.bib</code> file, and change the key to whatever I want to refer the paper by.
But in the days of pervasive use of DOIs (digital object identifiers) for almost all publications, manually keeping the data in bibtex files seems outdated. Instead I’d rather just put the two pieces of data I care about: the key that I want to use for citation, and the doi. The rest I do not want to be bothered with.
So I wrote a small script that takes a <code>.yaml</code> file like
<div class="sourceCode" id="cb1"><pre class="sourceCode yaml"><code class="sourceCode yaml"><a href="#cb1-1" aria-hidden="true" tabindex="-1"/>entries:
<a href="#cb1-2" aria-hidden="true" tabindex="-1"/> unsafePerformIO: 10.1007/10722298_3
<a href="#cb1-3" aria-hidden="true" tabindex="-1"/> dejafu: 10.1145/2804302.2804306
<a href="#cb1-4" aria-hidden="true" tabindex="-1"/> runST: 10.1145/3158152
<a href="#cb1-5" aria-hidden="true" tabindex="-1"/> quickcheck: 10.1145/351240.351266
<a href="#cb1-6" aria-hidden="true" tabindex="-1"/> optimiser: 10.1016/S0167-6423(97)00029-4
<a href="#cb1-7" aria-hidden="true" tabindex="-1"/> sabry: 10.1017/s0956796897002943
<a href="#cb1-8" aria-hidden="true" tabindex="-1"/> concurrent: 10.1145/237721.237794
<a href="#cb1-9" aria-hidden="true" tabindex="-1"/> launchbury: 10.1145/158511.158618
<a href="#cb1-10" aria-hidden="true" tabindex="-1"/> datafun: 10.1145/2951913.2951948
<a href="#cb1-11" aria-hidden="true" tabindex="-1"/> observable-sharing: 10.1007/3-540-46674-6_7
<a href="#cb1-12" aria-hidden="true" tabindex="-1"/> kildall-73: 10.1145/512927.512945
<a href="#cb1-13" aria-hidden="true" tabindex="-1"/> kam-ullman-76: 10.1145/321921.321938
<a href="#cb1-14" aria-hidden="true" tabindex="-1"/> spygame: 10.1145/3371101
<a href="#cb1-15" aria-hidden="true" tabindex="-1"/> cocaml: 10.3233/FI-2017-1473
<a href="#cb1-16" aria-hidden="true" tabindex="-1"/> secrets: 10.1017/S0956796802004331
<a href="#cb1-17" aria-hidden="true" tabindex="-1"/> modular: 10.1017/S0956796817000016
<a href="#cb1-18" aria-hidden="true" tabindex="-1"/> longley: 10.1145/317636.317775
<a href="#cb1-19" aria-hidden="true" tabindex="-1"/> nievergelt: 10.1145/800152.804906
<a href="#cb1-20" aria-hidden="true" tabindex="-1"/> runST2: 10.1145/3527326
<a href="#cb1-21" aria-hidden="true" tabindex="-1"/> polakow: 10.1145/2804302.2804309
<a href="#cb1-22" aria-hidden="true" tabindex="-1"/> lvars: 10.1145/2502323.2502326
<a href="#cb1-23" aria-hidden="true" tabindex="-1"/> typesafe-sharing: 10.1145/1596638.1596653
<a href="#cb1-24" aria-hidden="true" tabindex="-1"/> pure-functional: 10.1007/978-3-642-14162-1_17
<a href="#cb1-25" aria-hidden="true" tabindex="-1"/> clairvoyant: 10.1145/3341718
<a href="#cb1-26" aria-hidden="true" tabindex="-1"/>subs:
<a href="#cb1-27" aria-hidden="true" tabindex="-1"/> - replace: Peyton Jones
<a href="#cb1-28" aria-hidden="true" tabindex="-1"/> with: '{Peyton Jones}'</code></pre></div>
and turns it into a nice <code>.bibtex</code> file:
<pre class="shell"><code>$ ./doi2bib.py &lt; doibib.yaml > dblp.bib
$ head dblp.bib
@inproceedings{unsafePerformIO,
author = {Simon L. {Peyton Jones} and
Simon Marlow and
Conal Elliott},
editor = {Pieter W. M. Koopman and
Chris Clack},
title = {Stretching the Storage Manager: Weak Pointers and Stable Names in
Haskell},
booktitle = {Implementation of Functional Languages, 11th International Workshop,
IFL'99, Lochem, The Netherlands, September 7-10, 1999, Selected Papers},</code></pre>
The last bit allows me to do some fine-tuning of the file, because unfortunately, not even DBLP BibTeX files are perfect, for example in the presence of two family names.
Now I have less moving parts to worry about, and a more consistent bibliography.
The script is rather small, so I’ll just share it here:
<div class="sourceCode" id="cb3"><pre class="sourceCode python"><code class="sourceCode python"><a href="#cb3-1" aria-hidden="true" tabindex="-1"/>#!/usr/bin/env python3
<a href="#cb3-2" aria-hidden="true" tabindex="-1"/>
<a href="#cb3-3" aria-hidden="true" tabindex="-1"/>import sys
<a href="#cb3-4" aria-hidden="true" tabindex="-1"/>import yaml
<a href="#cb3-5" aria-hidden="true" tabindex="-1"/>import requests
<a href="#cb3-6" aria-hidden="true" tabindex="-1"/>import requests_cache
<a href="#cb3-7" aria-hidden="true" tabindex="-1"/>import re
<a href="#cb3-8" aria-hidden="true" tabindex="-1"/>
<a href="#cb3-9" aria-hidden="true" tabindex="-1"/>requests_cache.install_cache(backend='sqlite')
<a href="#cb3-10" aria-hidden="true" tabindex="-1"/>
<a href="#cb3-11" aria-hidden="true" tabindex="-1"/>data = yaml.safe_load(sys.stdin)
<a href="#cb3-12" aria-hidden="true" tabindex="-1"/>
<a href="#cb3-13" aria-hidden="true" tabindex="-1"/>for key, doi in data['entries'].items():
<a href="#cb3-14" aria-hidden="true" tabindex="-1"/> bib = requests.get(f"https://dblp.org/doi/{doi}.bib").text
<a href="#cb3-15" aria-hidden="true" tabindex="-1"/> bib = re.sub('{DBLP.*,', '{' + key + ',', bib)
<a href="#cb3-16" aria-hidden="true" tabindex="-1"/> for subs in data['subs']:
<a href="#cb3-17" aria-hidden="true" tabindex="-1"/> bib = re.sub(subs['replace'], subs['with'], bib)
<a href="#cb3-18" aria-hidden="true" tabindex="-1"/> print(bib)</code></pre></div>
There are similar projects out there, e.g. <a href="https://github.com/cr-marcstevens/dblpbibtex"><code>dblpbibtex</code></a> in C++ and <a href="https://github.com/PJK/dblpbib"><code>dblpbib</code></a> in Ruby. These allow direct use of <code>\cite{DBLP:rec/conf/isit/BreitnerS20}</code> in Latex, which is also nice, but for now I like to choose more speaking citation keys myself.</description>
<pubDate>Wed, 12 Jul 2023 14:32:19 +0200</pubDate>
</item>
<item>
<title>ICFP Pearl preprint on rec-def</title>
<link>https://www.joachim-breitner.de/blog/805-ICFP_Pearl_preprint_on_rec-def</link>
<guid>https://www.joachim-breitner.de/blog/805-ICFP_Pearl_preprint_on_rec-def</guid>
<comments>https://www.joachim-breitner.de/blog/805-ICFP_Pearl_preprint_on_rec-def#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description>I submitted a Functional Pearl to this <a href="https://icfp23.sigplan.org/">year’s ICFP</a> and it got accepted!
It is about the idea of using Haskell’s inherent ability to define recursive equations, and use them for more than just functions and lazy data structures. I blogged about this before (<a href="https://www.joachim-breitner.de/blog/792-More_recursive_definitions">introducing the idea</a>, <a href="https://www.joachim-breitner.de/blog/793-rec-def__Behind_the_scenes">behind the scenes</a>, applications to <a href="https://www.joachim-breitner.de/blog/794-rec-def__Program_analysis_case_study">program analysis</a>, <a href="https://www.joachim-breitner.de/blog/795-rec-def__Dominators_case_study">graph algorithms</a> and <a href="https://www.joachim-breitner.de/blog/796-rec-def__Minesweeper_case_study">minesweeper</a>), but hopefully the paper brings out the idea even more clearly. The constructive feedback from a few friendly readers (Claudio, Sebastian, and also the anonymous reviewers) certainly improved the paper.
<blockquote>
<h3 id="abstract">Abstract</h3>
Haskell’s laziness allows the programmer to solve some problems naturally and declaratively via recursive equations. Unfortunately, if the input is “too recursive”, these very elegant idioms can fall into the dreaded black hole, and the programmer has to resort to more pedestrian approaches.
It does not have to be that way: We built variants of common pure data structures (Booleans, sets) where recursive definitions are productive. Internally, the infamous unsafePerformIO is at work, but the user only sees a beautiful and pure API, and their pretty recursive idioms – magically – work again.
</blockquote>
If you are curious, please have a look at the <a href="//www.joachim-breitner.de/publications/rec-def-pearl.pdf">current version of the paper</a>. Any feedback is welcome; even more if it comes before July 11, because then I can include it in the camera ready version.
<hr/>
There are still some open questions around this work. What bothers me maybe most is the lack of a denotational semantics that unifies the partial order underlying the Haskell fragment, and the partial order underlying the domain of the embedded equations.
The crux of the probem is maybe best captured by this question:
<blockquote>
Imagine an untyped lambda calculus with constructors, lazy evaluation, and an operation <code>rseq</code> that recursively evaluates constructors, but terminates in the presence of cycles. So for example
<pre><code>rseq (let x = 1 :: x in x ) ≡ ()
rseq (let x () = 1 :: x () in x ()) ≡ ⊥</code></pre>
In this language, knot tying is observable. What is the “nicest” denotational semantics.
</blockquote>
Update: I made some progress via a <a href="https://discourse.haskell.org/t/icfp-pearl-on-rec-def/6626/14?u=nomeata">discussion on the Haskell Discource</a> and started <a href="https://www.joachim-breitner.de/publications/rec-def-denotational.pdf">some rough notes on a denotational semantics</a>.</description>
<pubDate>Thu, 22 Jun 2023 18:21:11 +0200</pubDate>
</item>
<item>
<title>The curious case of the half-half Bitcoin ECDSA nonces</title>
<link>https://www.joachim-breitner.de/blog/804-The_curious_case_of_the_half-half_Bitcoin_ECDSA_nonces</link>
<guid>https://www.joachim-breitner.de/blog/804-The_curious_case_of_the_half-half_Bitcoin_ECDSA_nonces</guid>
<comments>https://www.joachim-breitner.de/blog/804-The_curious_case_of_the_half-half_Bitcoin_ECDSA_nonces#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description>This is the week of the Gulaschprogrammiernacht, a yearly Chaos Computer Club even in Karlsruhe, so it was exactly a year ago that I sat in my AirBnB room and went over the slides for my talk <a href="https://media.ccc.de/v/gpn20-66-lattice-attacks-on-ethereum-bitcoin-and-https">“Lattice Attacks on Ethereum, Bitcoin, and HTTPS”</a> that I would give there.
It reports on <a href="https://eprint.iacr.org/2019/023">research</a> done with <a href="https://cseweb.ucsd.edu/~nadiah/">Nadia Heninger</a> while I was in Phildalephia, and I really liked giving that talk: At some point we look at some rather odd signatures we found on the bitcoin blockchain, and part of the signature (the “nonce”) happens to share some bytes with the secret key. A clear case of some buffer overlap in a memory unsafe language, which I, as a fan of languages like Haskell, are very happy to sneer at!
<figure>
<img src="/various/biased-nonces-slide-17.svg" alt="A sneery slide"/>
<figcaption aria-hidden="true">A sneery slide</figcaption>
</figure>
But last year, as I was going over <a href="https://www.joachim-breitner.de/publications/BiasedNonces-GPN20-2022-05-20.pdf">the slides</a> I looked at the raw data again for some reason, and I found that we overlooked something: Not only was the the upper half ot the nonce equal to the lower half of the secret key, but he lower half of the nonce was also equal to the upper half of the message hash!
This now looks much less like an accident to me, and more like a (overly) simple form of deterministic nonce creation… so much for my nice anecdote. (I still used the anecdote in my talk, followed up with an “actually”.)
When I told Nadia about this, she got curious as well, and quickly saw that from a signature with such a nonce, one can rather easily extract the secret key. So together with her student Dylan Rowe, we implemented this analysis and searched the bitcoin blockchain for more instance of such signatures. We did find a few, and were even able to trace them back to a somewhat infamous bitcoin activist going under the pseudonym Amaclin.
This research and sleuthing turned into another paper, <a href="https://eprint.iacr.org/2023/841">“The curious case of the half-half Bitcoin ECDSA nonces”</a>, to be presented at AfricaCrypt 2023. Enjoy!</description>
<pubDate>Wed, 07 Jun 2023 08:42:28 +0200</pubDate>
</item>
<item>
<title>Giving back to OPLSS</title>
<link>https://www.joachim-breitner.de/blog/803-Giving_back_to_OPLSS</link>
<guid>https://www.joachim-breitner.de/blog/803-Giving_back_to_OPLSS</guid>
<comments>https://www.joachim-breitner.de/blog/803-Giving_back_to_OPLSS#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description>Nine years ago, when I was a PhD student, I attended the <a href="https://www.cs.uoregon.edu/research/summerschool/">Oregon Programming Language Summer School</a> in Eugene. I had a great time and learned a lot.
<figure>
<img src="/various/OPLSS14Photo-small.jpg" alt="The OPLSS’14 full image"/>
<figcaption aria-hidden="true">The OPLSS’14 <a href="/various/OPLSS14Photo.jpg">full image</a></figcaption>
</figure>
Learning some of the things I learned there, and meeting some of the people I met there, also led to me graduating, which led to me becoming <a href="https://www.cis.upenn.edu/~plclub/">a PostDoc at UPenn</a>, which led to me later joining DFINITY to implement the <a href="https://github.com/dfinity/motoko/">Motoko programming language</a> and help design and specify the public interface of their “Internet Computer”, including the <a href="https://medium.com/dfinity/how-internet-computer-responses-are-certified-as-authentic-2ff1bb1ea659">response certification</a> (<a href="https://www.youtube.com/watch?v=mZbFhRIHIiY">video</a>).
So when the <a href="https://icdevs.org/">ICDevs</a> non-profit offered a <a href="https://icdevs.org/bounties/2023/01/09/36-Signing-Tree-and-DER-Encoding.html">development bounty for a Motoko library implementing the merkle trees involved in certification</a>, this sounded like a fun little coding task, so I completed it; likely with less effort than it would have taken someone who first had to get into these topics.
The bounty was quite generous, at US$ 10k, and I was too vain to “just” have it donated to some large charity, as I <a href="//www.joachim-breitner.de/blog/798-Pro-charity_consulting">recently with a few coding and consulting gigs</a>, and looked for more personal. So, the ICDevs guys and I agreed to donate the money to <a href="https://www.cs.uoregon.edu/research/summerschool/summer23/">this year’s OPLSS</a>, where I heard it can cover the cost of about 8 students, and hopefully helps the PL cause.
(You will not find us listed as sponsors because for some reason, a “donation” instead of “sponsorship” comes with less strings attached to the organizers.)</description>
<pubDate>Sun, 04 Jun 2023 14:18:33 +0200</pubDate>
</item>
<item>
<title>More thoughts on a bootstrappable GHC</title>
<link>https://www.joachim-breitner.de/blog/802-More_thoughts_on_a_bootstrappable_GHC</link>
<guid>https://www.joachim-breitner.de/blog/802-More_thoughts_on_a_bootstrappable_GHC</guid>
<comments>https://www.joachim-breitner.de/blog/802-More_thoughts_on_a_bootstrappable_GHC#comments</comments>
<author>mail@joachim-breitner.de (Joachim Breitner)</author>
<description>The <a href="https://bootstrappable.org/">bootstrappable builds project</a> tries to find ways of building all our software from source, without relying on binary artifacts. A noble goal, and one that is often thwarted by languages with self-hosting compilers, like GHC: In order to build GHC, you need GHC. A <a href="https://github.com/NixOS/nixpkgs/pull/227914">Pull Request against nixpkgs</a>, adding first steps of the bootstrapping pipeline, reminded me of the issue with GHC, which I have noted down <a href="https://www.joachim-breitner.de/blog/748-Thoughts_on_bootstrapping_GHC">some thoughts about</a> before and I played around a bit more.
The most promising attempt to bootstrap GHC was done by <a href="https://elephly.net/posts/2017-01-09-bootstrapping-haskell-part-1.html">rekado in 2017</a>. He observed that Hugs is maybe the most recently maintained bootstrappable (since written in C) Haskell compiler, but noticed that “it cannot deal with mutually recursive module dependencies, which is a feature that even the earliest versions of GHC rely on. This means that running a variant of GHC inside of Hugs is not going to work without major changes.” He then tries to bootstrap another very old Haskell compiler (nhc) with Hugs, and makes good but incomplete progress.
This made me wonder: What if Hugs supported mutually recursive modules? Would that make a big difference? Anthony Clayden <a href="https://www.joachim-breitner.de/blog/748-Thoughts_on_bootstrapping_GHC#comment_3">keeps advocating Hugs as a viable Haskell implementation</a>, so maybe if that was the main blocker, then adding support to Hugs for that is probably not too hard (at least in a compile-the-strongly-connected-component-as-one-unit mode) and worthwhile?
<h3 id="all-of-ghc-in-one-file">All of GHC in one file?</h3>
That reminded me of a situation I was in before, where I had to combine multiple Haskell modules into one before: For my talk <a href="https://www.youtube.com/watch?v=2kKvVe673MA">“Lock-step simulation is child’s play”</a> I wrote a multi-player game, a simulation environment for it, and a presentation tool around it, all in the <a href="https://code.world">CodeWorld</a> programming environment, which supports only a single module. So I hacked the a small tool <a href="https://github.com/nomeata/hs-all-in-one/">hs-all-in-one</a> that takes multiple Haskell modules and combines them into one, mangling the names to avoid name clashes.
This made me wonder: Can I turn all of GHC into one module, and compile that?
At this point I have probably left the direct path towards bootstrapping, but I kinda good hooked.
<ol type="1">
<li>Using GHC’s <code>hadrian/ghci</code> tool, I got it to produce the necessary generated files (e.g. from <code>happy</code> grammars) and spit out the lit of modules that make up GHC, which I could feed to <code>hs-all-in-one</code>.</li>
<li>It uses <a href="https://hackage.haskell.org/package/haskell-src-exts"><code>haskell-src-exts</code></a> for parsing, and it was almost able to parse all of that. It has a different opinion about how <a href="https://github.com/haskell-suite/haskell-src-exts/issues/468"><code>MultiWayIf</code> should be indented</a>, whether <a href="https://github.com/haskell-suite/haskell-src-exts/issues/469"><code>EmptyCase</code> needs <code>{}</code></a> and <a href="https://github.com/haskell-suite/haskell-src-exts/issues/470">issues</a> <a href="https://github.com/haskell-suite/haskell-src-exts/issues/471">pretty-printing</a> some promoted values, but otherwise the round-tripping worked fine, and I as able to generate a large file (680,000 loc, 41 MB) that passes GHC’s parser.</li>
<li>It also uses <a href="https://hackage.haskell.org/package/haskell-names"><code>haskell-names</code></a> to resolve names.
This library is less up-to-date with various Haskell features, so I added support for renaming in some pragmas (<code>ANN</code>, <code>SPECIALIZE</code>), pattern signatures etc.
For my previous use-case I could just combine all the imports, knowing that I would not introduce conflicts. For GHC, this is far from true: Some modules import <code>Data.Map.Strict</code>, others <code>Data.Map.Lazy</code>, and yet others introduce names that clash with stuff imported from the prelude… so I had to change the tool to fully qualify all imported values. This isn’t so bad, I can do that using <code>haskell-names</code>, if I somehow know what all the modules in <code>base</code>, <code>containers</code>, <code>transformers</code> and <code>array</code> export.
The <code>haskell-names</code> library itself comes with a hard-coded database of <code>base</code> exports, but it is incomplete and doesn’t help me with, say, <code>containers</code>.
I then wrote a little parser for the <code>.txt</code> files that <code>haddock</code> produces for the benefit of <code>hoogle</code>, and that are conveniently installed along the packages (at least on nix). This would have been great, if these files wouldn’t simply omit all reexported entities! I added some manual hacks (<code>Data.Map</code> has the same exports as <code>Data.IntMap</code>; <code>Prelude</code> exports all entities as known by <code>haskell-names</code>, but those that are also exported from <code>Data.List</code>, use the symbol from there…)
I played this game of whack-a-mole for a while, solving many of the problems that GHC’s renamer reports, but eventually stopped to write this blog post. I am fairly confident that this could be pulled through, though.</li>
</ol>
<h3 id="back-to-bootstrapping">Back to bootstrapping</h3>
So what if we could pull this through? We’d have a very large code file that GHC may be able to compile to produce a <code>ghc</code> binary without exhausting my RAM. But that doesn’t help with bootstrapping yet.
If lack of support for recursive modules is all that Hugs is missing, we’d be done indeed. But quite contrary, it is probably the least of our worries, given that contemporary GHC uses many many other features not supported by Hugs.
Some of them a syntactic and can easily be rewritten to more normal Haskell in a preprocessing step (e.g. <code>MultiWayIf</code>).
Others are deep and hard (<code>GADTs</code>, Pattern synonyms, Type Families), and prohibit attempting to compile a current version of GHC (even if its all one module) with Hugs. So one would certainly have to go back in time and find a version of GHC that is not yet using all these features. For example, the <a href="https://gitlab.haskell.org/ghc/ghc/-/commit/889c084e943779e76d19f2ef5e970ff655f511eb">first use of GADTs</a> was introduced by Simon Marlow in 2011, so this suggests going back to at least GHC 7.0.4, maybe earlier.
Still, being able to mangle the source code before passing it to Hugs is probably a useful thing. This poses the question whether Hugs can compile such a tool; in particular, is it capable of compiling <code>haskell-src-exts</code>, which I am not too optimistic about either. Did someone check this already?
So one plan of attack could be
<ol type="1">
<li>Identify an old version of GHC that
<ul>
<li>One can use to bootstrap subsequent versions until today.</li>
<li>Is old enough to use as few features not supported by hugs as possible.</li>
<li>Is still new enough so that one can obtain a compatible toolchain.</li>
</ul></li>
<li>Wrangle the build system to tell you which files to compile, with which preprocessor flags etc.</li>
<li>Boostrap all pre-processing tools used by GHC (<code>cpphs</code> or use plan cpp, <code>happy</code>, <code>alex</code>).</li>
<li>For every language feature not supported by Hugs, either
<ul>
<li>Implement it in Hugs,</li>
<li>Manually edit the source code to avoid compiling the problematic code, if it is optional (e.g. in an optimization pass)</li>
<li>Rewrite the problematic code</li>
<li>Write a pre-processing tool (like the one above) that compiles the feature away</li>
</ul></li>
<li>Similarly, since Hugs probably ships a <code>base</code> that is different than what GHC, or the libraries used by GHC expects, either adjust Hugs’ <code>base</code>, or modify the GHC code that uses it.</li>
</ol>
My actual plan, though, for now is to throw these thoughts out, maybe make some noise on <a href="https://discourse.haskell.org/t/what-s-needed-to-bootstrap-ghc-with-hugs/6205">Discourse</a>, <a href="https://mastodon.online/@nomeata/110263917613134533">Mastodon</a>, <a href="https://twitter.com/nomeata/status/1651125309700206593">Twitter</a> and <a href="https://lobste.rs/s/di37ga">lobste.rs</a>, and then let it sit and hope someone else will pick it up.</description>
<pubDate>Wed, 26 Apr 2023 07:41:50 +0200</pubDate>
</item>
</channel>
</rss>

If you would like to create a banner that links to this page (i.e. this validation result), do the following:

Download the "valid RSS" banner.
Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)
Add this HTML to your page (change the image src attribute if necessary):

<a href="http://www.feedvalidator.org/check.cgi?url=http%3A//www.joachim-breitner.de/blog/feeds/categories/1-English.rss"><img src="valid-rss-rogers.png" alt="[Valid RSS]" title="Validate my RSS feed" /></a>

If you would like to create a text link instead, here is the URL you can use:

http://www.feedvalidator.org/check.cgi?url=http%3A//www.joachim-breitner.de/blog/feeds/categories/1-English.rss

Home · About · News · Docs · Terms