FEED Validator

for Atom and RSS and KML

Congratulations!

This is a valid RSS feed.

Recommendations

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

line 33, column 0: Use of unknown namespace: com-wordpress:feed-additions:1 (11 occurrences) [help]
```
<site xmlns="com-wordpress:feed-additions:1">155530815</site>	<item>
```
line 98, column 0: content:encoded should not contain decoding attribute (2 occurrences) [help]
```
<img decoding="async" src="https://upload.wikimedia.org/wikipedia/commons ...
```
line 103, column 0: content:encoded should not contain data-recalc-dims attribute [help]
```
<img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/imgs.xk ...
```
line 299, column 3: content:encoded should not contain relative URL references: #fn:1 (2 occurrences) [help]
```
]]></content:encoded>
   ^
```
line 1062, column 0: style attribute contains potentially dangerous content: max-height (4 occurrences) [help]
```
And I haven&#8217;t even mentioned asynchronous ranges yet. But  ...
```

line 2049, column 0: Non-html tag: some-type (2 occurrences) [help]

<p>But as before, this isn&#8217;t enough; we would also need to overload <c ...

Source: http://ericniebler.com/feed/

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
xmlns:georss="http://www.georss.org/georss"
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
>
<channel>
<title>Eric Niebler</title>
<atom:link href="http://ericniebler.com/feed/" rel="self" type="application/rss+xml" />
<link>https://ericniebler.com</link>
<description>Judge me by my C++, not my WordPress</description>
<lastBuildDate>Thu, 15 Feb 2024 02:09:29 +0000</lastBuildDate>
<language>en-US</language>
<sy:updatePeriod>
hourly </sy:updatePeriod>
<sy:updateFrequency>
1 </sy:updateFrequency>
<generator>https://wordpress.org/?v=6.8.1</generator>
<image>
<url>https://i0.wp.com/ericniebler.com/wp-content/uploads/2024/08/cropped-favicon-big-transparent.png?fit=32%2C32&ssl=1</url>
<title>Eric Niebler</title>
<link>https://ericniebler.com</link>
<width>32</width>
<height>32</height>
</image>
<site xmlns="com-wordpress:feed-additions:1">155530815</site> <item>
<title>What are Senders Good For, Anyway?</title>
<link>https://ericniebler.com/2024/02/04/what-are-senders-good-for-anyway/?utm_source=rss&utm_medium=rss&utm_campaign=what-are-senders-good-for-anyway</link>
<comments>https://ericniebler.com/2024/02/04/what-are-senders-good-for-anyway/#comments</comments>
<dc:creator><![CDATA[Eric Niebler]]></dc:creator>
<pubDate>Mon, 05 Feb 2024 07:46:42 +0000</pubDate>
<category><![CDATA[concurrency]]></category>
<category><![CDATA[coroutines]]></category>
<category><![CDATA[generic-programming]]></category>
<guid isPermaLink="false">https://ericniebler.com/?p=3174</guid>
<description><![CDATA[Some colleagues of mine and I have been working for the past few years to give C++ a standard async programming model. That effort has resulted in a proposal, P2300, that has been design-approved for C++26. If you know me <a class="more-link" href="https://ericniebler.com/2024/02/04/what-are-senders-good-for-anyway/">Continue reading What are Senders Good For, Anyway?→</a>]]></description>
<content:encoded><![CDATA[Some colleagues of mine and I have been working for the past few years to give C++ a standard async programming model. That effort has resulted in a proposal, <a href="https://wg21.link/p2300">P2300</a>, that has been design-approved for C++26. If you know me or if you follow me on social media, you know I’m embarrassingly excited about this work and its potential impact. I am aware, however, that not everybody shares my enthusiasm. I hear these things a lot these days:
<blockquote>
Why would I want to use it?
Why do we need senders when C++ has coroutines?
It’s all just too complicated!
</blockquote>
The P2300 crew have collectively done a terrible job of making this work accessible. At the heart of P2300 is a simple, elegant (IMHO) core that brings many benefits, but it’s hard to see that forest for all the trees.
So let’s make this concrete. In this post, I’ll show how to bring a crusty old C-style async API into the world of senders, and why you might want to do that.
<h1>C(lassic) async APIs</h1>
In a past life I did a lot of Win32 programming. Win32 has several async models to choose from, but the simplest is the good ol’ callback. Async IO APIs like <a href="https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-readfileex"><code>ReadFileEx</code></a> are shaped like this:
<pre class="brush: cpp; notranslate">/// Old-style async C API with a callback
/// (like Win32's ReadFileEx)
struct overlapped {
// ...OS internals here...
};
using overlapped_callback =
void(int status, int bytes, overlapped* user);
int read_file(FILE*, char* buffer, int bytes,
overlapped* user, overlapped_callback* cb);
</pre>
The protocol is pretty simple: you call <code>read_file</code> passing in the usual arguments plus two extra: a pointer to an “<code>overlapped</code>” structure and a callback. The OS will use the <code>overlapped</code> struct for its own purposes, but the user can stuff data there too that the callback can later use. It looks like this:
<pre class="brush: cpp; notranslate">struct my_overlapped : overlapped {
// ... my extra data goes here ...
};
void my_callback(int status, int bytes, overlapped* data) {
auto* my_data = static_cast<my_overlapped*>(data);
// ...use the extra stuff we put in the `my_overlapped`
// object...
delete my_data; // clean up
}
void enqueue_read(FILE* pfile) {
// Allocate and initialize my_data...
auto* my_data =
new my_overlapped{{}, /* my data goes here */ };
int status =
read_file(pfile, buff, bytes, my_data, my_callback);
// ...
}
</pre>
What happens is this: <code>read_file</code> causes the OS to enqueue an IO operation, saving the <code>overlapped</code> and <code>overlapped_callback</code> pointers with it. When the IO completes, the OS invokes the callback, passing in the pointer to the <code>overlapped</code> struct. I’ve written code like this hundreds of times. You probably have too.
It’s simple. It works. Why make this more complicated, right?
<h1>C(onfusion of) async APIs</h1>
There’s nothing wrong with the callback API. What’s wrong is that every library that exposes asynchrony uses a slightly different callback API. If you want to chain two async operations from two different libraries, you’re going to need to write a bunch of glue code to map this async abstraction to that async abstraction. It’s the Tower of Babel problem.
<blockquote>
<img decoding="async" src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/50/Pieter_Bruegel_the_Elder_-_The_Tower_of_Babel_%28Vienna%29_-_Google_Art_Project.jpg/800px-Pieter_Bruegel_the_Elder_-_The_Tower_of_Babel_%28Vienna%29_-_Google_Art_Project.jpg" alt="Pieter Brueghel the Elder, Public domain, via Wikimedia Commons" /> Pieter Brueghel the Elder, Public domain, via Wikimedia Commons
“Look, they are one people, and they have all one language, and this is only the beginning of what they will do; nothing that they propose to do will now be impossible for them. Come, let us go down and confuse their language there, so that they will not understand one another’s speech.” — Genesis 11:6–7
</blockquote>
So, when there are too many incompatible ways to do a thing, what do we do? We make another of course.
<blockquote>
<img data-recalc-dims="1" decoding="async" src="https://i0.wp.com/imgs.xkcd.com/comics/standards.png?w=530&ssl=1" alt="xkcd comic about how standards proliferate" /> <a href="https://xkcd.com/927/">https://xkcd.com/927/</a>
</blockquote>
The way to escape this trap is for the C++ standard to endorse one async abstraction. Then all the libraries that expose asynchrony can map their abstractions to the standard one, and we’ll all be able to talk to each other. Babel solved.
So that is why the C++ Standardization Committee is interested in this problem. Practically speaking, it’s a problem that can only be solved by the standard.
<h1>C(omposable) async APIs</h1>
Which brings us to senders, the topic of P2300. There’s a lot I could say about them (they’re efficient! they’re structured! they compose!), but instead I’m going to show some code and let the code do the talking.
If we look at the <code>read_file</code> API above, we can identify some different parts:
<ol>
<li>The allocation of any resources the async operation needs,</li>
<li>The data that must live at a stable address for the duration of the operation (i.e., the <code>overlapped</code> structure),</li>
<li>The initiation of the async operation that enqueues the async IO, and the handling of any initiation failure,</li>
<li>The user-provided continuation that is executed after the async operation completes (i.e., the callback).</li>
<li>The reclamation of any resources allocated in step 1.</li>
</ol>
Senders have all these pieces too, but in a uniform shape that makes it possible to work with them generically. In fact, the only meaningful difference between senders and the C-style API is that instead of one callback, in senders there are three: one each for success, failure, and cancellation.
<h2>Step 1: The Allocation</h2>
Our re-imagined <code>read_file</code> API will look like this:
<pre class="brush: cpp; notranslate">read_file_sender
async_read_file(FILE* file, char* buffer, int size)
{
return {file, buffer, size};
}
</pre>
The only job of this function is to put the arguments into a sender-shaped object, which looks as follows (explanation after the break)[*]:
[*]: Or rather, it will look like this after <a href="https://wg21.link/P2855">P2855</a> is accepted.
<pre class="brush: cpp; notranslate">namespace stdex = std::execution;
struct read_file_sender
{
using sender_concept = stdex::sender_t; // (1)
using completion_signatures = // (2)
stdex::completion_signatures<
stdex::set_value_t( int, char* ),
stdex::set_error_t( int ) >;
auto connect( stdex::receiver auto rcvr ) // (3)
{
return read_file_operation{{}, {}, pfile, buffer,
size, std::move(rcvr)};
}
FILE* pfile;
char* buffer;
int size;
};
</pre>
The job of a sender is to describe the asynchronous operation. (It is also a factory for the operation state, but that’s step 2.) On the line marked “(1)”, we declare this type to be a sender. On the line marked “(2)”, we declare the ways in which this asynchronous operation can complete. We do this using a list of function types. This:
<pre class="brush: cpp; notranslate">stdex::set_value_t( int, char* )
</pre>
… declares that this async operation may complete successfully by passing an <code>int</code> and a <code>char*</code> to the value callback. (Remember, there are three callbacks.) And this:
<pre class="brush: cpp; notranslate">stdex::set_error_t( int )
</pre>
… declares that this async operation may complete in error by passing an <code>int</code> to the error callback. (If this async operation were cancelable, it would declare that with <code>stdex::set_stopped_t()</code>.)
<h2>Step 2: The Data</h2>
On the line marked “(3)” above, the <code>connect</code> member function accepts a “receiver” and returns an “operation state”. A receiver is an amalgamation of three callbacks: value, error, and stopped (canceled, more or less). The result of connecting a sender and a receiver is an operation state. The operation state, like the <code>overlapped</code> struct in the C API, is the data for async operation. It must live at a stable address for the duration.
The <code>connect</code> function returns a <code>read_file_operation</code> object. The caller of <code>connect</code> assumes responsibility for ensuring that this object stays alive and doesn’t move until one of the callbacks is executed. The <code>read_file_operation</code> type looks like this (explanation after the break):
<pre class="brush: cpp; notranslate">struct immovable {
immovable() = default;
immovable(immovable&&) = delete;
};
template <class Receiver>
struct read_file_operation : overlapped, immovable // (1)
{
static void _callback(int status, int bytes, // (2)
overlapped* data)
{
auto* op =
static_cast<read_file_operation*>(data); // (3)
if (status == OK)
stdex::set_value(std::move(op->rcvr), // (4)
bytes, op->buffer);
else
stdex::set_error(std::move(op->rcvr),
status);
}
void start() noexcept // (5)
{
int status =
read_file(pfile, buffer, size, this, &_callback);
if (status != OK)
stdex::set_error(std::move(rcvr), status);
}
FILE* pfile;
char* buffer;
int size;
Receiver rcvr;
};
</pre>
The operation state stores the arguments needed to initiate the async operation as well as the receiver (the three callbacks). Let’s break this down by line.
<ul>
<li>“(1)”: The operation state inherits from <code>overlapped</code> so we can pass a pointer to it into <code>read_file</code>. It also inherits from an <code>immovable</code> struct. Although not strictly necessary, this ensures we don’t move the operation state by accident.</li>
<li>“(2)”: We define the <code>overlapped_callback</code> that we will pass to <code>read_file</code> as a class <code>static</code> function.</li>
<li>“(3)”: In the callback, we down-cast the <code>overlapped</code> pointer back into a pointer to the <code>read_file_operation</code> object.</li>
<li>“(4)”: In the callback, we check the <code>status</code> to see if the operation completed successfully or not, and we call <code>set_value</code> or <code>set_error</code> on the receiver as appropriate.</li>
<li>“(5)”: In the <code>start()</code> function — which all operation states must have — we actually initiate the read operation. If the initiation fails, we pass the error to the receiver immediately since the callback will never execute.</li>
</ul>
<h2>Step 3: The Initiation</h2>
You’ll notice that when we call the sender-ified <code>async_read_file</code> function, we are just constructing a sender. No actual work is started. Then we call <code>connect</code> with a receiver and get back an operation state, but still no work has been started. We’ve just been lining up our ducks, making sure everything is in place at a stable address so we can initiate the work. Work isn’t initiated until <code>.start()</code> is called on the operation state. Only then do we make a call to the C-style <code>read_file</code> API, thereby enqueuing an IO operation.
All this hoop-jumping becomes important once we start building pipelines and task graphs of senders. Separating the launch of the work from the construction of the operation state lets us aggregate lots of operation states into one that contains all the data needed by the entire task graph, swiveling everything into place before any work gets started. That means we can launch lots of async work with complex dependencies with only a single dynamic allocation or, in some cases, no allocations at all.
Now I have to fess up. I fibbed a little when I said that the caller of <code>connect</code> needs to keep the operation state alive at a stable address until one of the callbacks is executed. That only becomes true once <code>.start()</code> has been called on it. It is perfectly acceptable to connect a sender to a receiver and then drop the operation state on the floor as long as <code>.start()</code> hasn’t been called yet. But with <code>.start()</code> you’re committed. <code>.start()</code> launches the rockets. There’s no calling them back.
OK, we’ve constructed the operation state, and we’ve called <code>.start()</code> on it. Now the proverbial ball is in the operating system’s court.
<h2>Step 4: The Continuation</h2>
The operating system does its IO magic. Time passes. When the IO operation is finished, it will invoke the <code>_callback</code> function with a status code, a pointer to the <code>overlapped</code> struct (our <code>read_file_operation</code>) and, if successful, the number of bytes read. The <code>_callback</code> passes the completion information to the receiver that was <code>connect</code>-ed to the sender, and the circle is complete.
But wait, what about “Step 5: Deallocation”? We never really allocated anything in the first place! The <code>connect</code> function returned the operation state by value. It’s up to the caller of <code>connect</code>, whoever that is, to keep it alive. They may do that by putting it on the heap, in which case they are responsible for cleaning it up. Or, if this async operation is part of a task graph, they may do that by aggregating the operation state into a larger one.
<h2>Step 6: Profit!</h2>
At this point you may be wondering what’s the point to all of this. Senders and receivers, operation states with fiddly lifetime requirements, <code>connect</code>, <code>start</code>, three different callbacks — who wants to manage all of this? The C API was way simpler. It’s true! So why am I so unreasonably excited about all of this?
<blockquote>
The caller of <code>async_read_file</code> doesn’t need to care about any of that.
</blockquote>
The end user, the caller of <code>async_read_file</code>, is not going to be mucking about with receivers and operation states. They are going to be awaiting senders in coroutines. Look! The following code uses a coroutine task type from the <a href="https://github.com/NVIDIA/stdexec">stdexec</a> library.
<pre class="brush: cpp; notranslate">exec::task< std::string > process_file( FILE* pfile )
{
std::string str;
str.resize( MAX_BUFFER );
auto [bytes, buff] =
co_await async_read_file(pfile, str.data(), str.size());
str.resize( bytes );
co_return str;
}
</pre>
What wizardry is this? We wrote a sender, not an awaitable, right? But this is working code! See for yourself: <a href="https://godbolt.org/z/1rjsxWxh7">https://godbolt.org/z/1rjsxWxh7</a>.
This is where we get to reap the benefits of programming to a standard async model. Generic code, whether from the standard library or from third party libraries, will work with our senders. In the case above, the <code>task<></code> type from the stdexec library knows how to await anything that looks like a sender. If you have a sender, you can <code>co_await</code> it without doing any extra work.
P2300 comes with a small collection of generic async algorithms for common async patterns — things like chaining (<code>then</code>), dynamically picking the next task (<code>let_value</code>), grouping senders into one (<code>when_all</code>), and blocking until a sender is complete (<code>sync_wait</code>). It’s a paltry set to be sure, but it will grow with future standards. And as third party libraries begin to adopt this model, more and more async code will work together. Some day very soon you’ll be able to initiate some file IO, read from the network, wait on a timer, listen for a cancellation from the user, wait for them all to complete, and then transfer execution to a thread pool to do more work — whew! — even if each sender and the thread pool came from different libraries.
<h2>Senders, FTW!</h2>
So, back to those pernicious questions folks keep asking me.
<blockquote>
Why would I want to use it?
</blockquote>
You want to use senders because then you can stitch your async operations together with other operations from other libraries using generic algorithms from still other libraries. And so you can <code>co_await</code> your async operations in coroutines without having to write an additional line of code.
<blockquote>
Why do we need senders when C++ has coroutines?
</blockquote>
I hope you realize by now that this isn’t an either/or. Senders are part of the coroutine story. If your library exposes asynchrony, then returning a sender is a great choice: your users can await the sender in a coroutine if they like, or they can avoid the coroutine frame allocation and use the sender with a generic algorithm like <code>then()</code> or <code>when_all()</code>. The lack of allocations makes senders an especially good choice for embedded developers.
<blockquote>
It’s all just too complicated!
</blockquote>
I’ll grant that implementing senders is more involved than using ordinary C-style callbacks. But consuming senders is as easy as typing <code>co_await</code>, or as simple as passing one to an algorithm like <code>sync_wait()</code>. Opting in to senders is opting into an ecosystem of reusable code that will grow over time.
THAT is why I’m excited about senders.
(And be honest, was wrapping <code>read_file</code> in a sender really all that hard, after all?)
<div style="display:none">
<pre class="brush: cpp; title: ; notranslate">"\e"</pre>
</div>
]]></content:encoded>
<wfw:commentRss>https://ericniebler.com/2024/02/04/what-are-senders-good-for-anyway/feed/</wfw:commentRss>
<slash:comments>24</slash:comments>
<post-id xmlns="com-wordpress:feed-additions:1">3174</post-id> </item>
<item>
<title>Asynchronous Stacks and Scopes</title>
<link>https://ericniebler.com/2021/08/29/asynchronous-stacks-and-scopes/?utm_source=rss&utm_medium=rss&utm_campaign=asynchronous-stacks-and-scopes</link>
<comments>https://ericniebler.com/2021/08/29/asynchronous-stacks-and-scopes/#comments</comments>
<dc:creator><![CDATA[Eric Niebler]]></dc:creator>
<pubDate>Sun, 29 Aug 2021 17:07:08 +0000</pubDate>
<category><![CDATA[Uncategorized]]></category>
<guid isPermaLink="false">https://ericniebler.com/?p=3161</guid>
<description><![CDATA[In Structured Concurrency, I talk about what structured concurrency is and why it’s a big deal for C++ especially. In this post I discuss some more interesting properties of asynchronous code that is structured: async stacks and async scopes. Structured <a class="more-link" href="https://ericniebler.com/2021/08/29/asynchronous-stacks-and-scopes/">Continue reading Asynchronous Stacks and Scopes→</a>]]></description>
<content:encoded><![CDATA[In <a href="https://ericniebler.com/2020/11/08/structured-concurrency/">Structured Concurrency</a>, I talk about what structured concurrency is and why it’s a big deal for C++ especially. In this post I discuss some more interesting properties of asynchronous code that is structured: async stacks and async scopes.
<h2>Structured concurrency</h2>
Concurrency is structured when “callee” async functions complete before their “caller” functions resume. This can be done without blocking a thread: the caller (parent) launches the callee (child) task and passes it a handle to itself, effectively telling the child, “When you have your result, call me back. Until then, I’m going to sleep.”
Immediately after the parent launches the child, the parent function does an ordinary return, often to something like an event loop that is churning through async tasks.
<h2>Async stacks</h2>
When we talk about parent/child async tasks, we are talking about a notional caller/callee relationship: there is a sequence of async operations that has caused the current one to be executing. This chain of operations is exactly like a call stack, but asynchronous. The actual program stack will look nothing like it.
Anyone who has debugged a multithreaded application knows that the actual program stack doesn’t really tell you what you want to know: How did I get here? All it generally shows is that some event loop is currently processing a certain function. The notional async stack tells you why. From the PoV of the event loop, async work is getting scheduled onto it willy-nilly. The structure of the async computation is a higher-level property of your program’s execution.
Or it isn’t, as often is the case in multithreaded C++ applications written today. Until C++20, C++ provided no language support for writing structured async code, and so that code is typically unstructured: no parent/child relationships exist at all. Work is scheduled with fire-and-forget semantics, using ad hoc out-of-band mechanisms to synchronize work, propagate values and errors, and keep data alive. It’s like programming with <code>jmp</code> instructions instead of functions — no stack at all.
<h2>Async scopes</h2>
C++ programmers have simply accepted this state of affairs because they didn’t have anything better. Until C++20 introduced coroutines, that is. Coroutines are transformative, not because the syntax is nice, but because they cause async scopes to coincide with lexical scopes.
What’s an async scope? If an async stack is a chain of async function activations, then an async scope corresponds to the activation of a single async function. It encompasses all the state — variables and whatnot — that need to live for the duration of an async operation and all of its nested child operations. With callbacks, the async scope spans disjoint lexical scopes: it starts when an async function is called and ends when the callback returns — that is, if your code is structured.
If your async code is unstructured, there are no async scopes at all because there’s no notion of child operations that nest within parents. Or you could say there are overlapping scopes. Unsurprisingly, this makes resource management hard, which is why so much async C++ is littered with <code>std::shared_ptr</code>.
<h2>Coroutines</h2>
Which brings us back to coroutines. For coroutines, the async scope starts when the coroutine is first called and it ends when the coroutine returns (or <code>co_return</code>s I should say). Well, that’s just like ordinary functions with ordinary scopes! Which is exactly the point.
Forget that coroutines make async code read like synchronous code. Forget that the syntax is nice. The overwhelming benefit of coroutines in C++ is its ability to make your async scopes line up with lexical scopes because now we get to leverage everything we already know about functions, scopes, and resource management. Do you need some piece of data to live as long as this async operation? No problem. Make it a local variable in a coroutine.
<h2>Beyond coroutines…</h2>
Coroutines make the idea of structured concurrency obvious by manifesting it in code. We don’t have to worry about notional stacks and scopes.<a href="#fn:1" rel="footnote">1</a> There’s the scope right there, between the curly braces! Here’s the mindbender though: Just as Dorothy could have gone home to Kansas any time she wanted, so too could we have been structuring our async code all along.
Here’s a dirty secret about coroutines: they’re just sugar over callbacks; everything after the <code>co_await</code> in a coroutine is a callback. The compiler makes it so. And damn, we’ve had callbacks forever, we’ve just been misusing them. Structured concurrency has been just three heel-clicks away all this time.
Language support makes it much easier to ensure that child operations nest within parents, but with the right library abstractions, structured concurrency in C++ is totally possible without coroutines — and damn efficient.
Next post, I’ll introduce these library abstractions, which are the subject of the C++ standard proposal <a href="http://wg21.link/P2300">P2300</a>, and what the library abstractions bring over and above C++20 coroutines.
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">
Well, actually we still do until debuggers grok coroutines and can let us view the async stack. <a href="#fnref:1" rev="footnote">↩</a>
</li>
</ol>
</div>
]]></content:encoded>
<wfw:commentRss>https://ericniebler.com/2021/08/29/asynchronous-stacks-and-scopes/feed/</wfw:commentRss>
<slash:comments>5</slash:comments>
<post-id xmlns="com-wordpress:feed-additions:1">3161</post-id> </item>
<item>
<title>Structured Concurrency</title>
<link>https://ericniebler.com/2020/11/08/structured-concurrency/?utm_source=rss&utm_medium=rss&utm_campaign=structured-concurrency</link>
<comments>https://ericniebler.com/2020/11/08/structured-concurrency/#comments</comments>
<dc:creator><![CDATA[Eric Niebler]]></dc:creator>
<pubDate>Mon, 09 Nov 2020 04:02:48 +0000</pubDate>
<category><![CDATA[concurrency]]></category>
<category><![CDATA[coroutines]]></category>
<guid isPermaLink="false">https://ericniebler.com/?p=3117</guid>
<description><![CDATA[TL;DR: “Structured concurrency” refers to a way to structure async computations so that child operations are guaranteed to complete before their parents, just the way a function is guaranteed to complete before its caller. This sounds simple and boring, but <a class="more-link" href="https://ericniebler.com/2020/11/08/structured-concurrency/">Continue reading Structured Concurrency→</a>]]></description>
<content:encoded><![CDATA[TL;DR: “Structured concurrency” refers to a way to structure async computations so that child operations are guaranteed to complete before their parents, just the way a function is guaranteed to complete before its caller. This sounds simple and boring, but in C++ it’s anything but. Structured concurrency — most notably, C++20 coroutines — has profound implications for the correctness and the simplicity of async architecture. It brings the <a href="https://docs.microsoft.com/en-us/cpp/cpp/welcome-back-to-cpp-modern-cpp?view=msvc-160">Modern C++ style</a> to our async programs by making async lifetimes correspond to ordinary C++ lexical scopes, eliminating the need for reference counting to manage object lifetime.
<h2>Structured Programming and C++</h2>
Back in the 1950’s, the nascent computing industry discovered structured programming: that high-level programming languages with lexical scopes, control structures, and subroutines resulted in programs that were far easier to read, write, and maintain than programming at the assembly level with test-and-jump instructions and <code>goto</code>. The advance was such a quantum leap that nobody talks about structured programming anymore; it’s just “programming”.
C++, more so than any other language, leverages structured programming to the hilt. The semantics of object lifetime mirror — and are tied to — the strict nesting of scopes; i.e., the structure of your code. Function activations nest, scopes nest, and object lifetimes nest. Objects’ lifetimes end with a scope’s closing curly brace, and objects are destroyed in the reverse order of their construction to preserve the strict nesting.
The Modern C++ programming style is built on this structured foundation. Objects have value semantics — they behave like the ints — and resources are cleaned up in destructors deterministically, which guarantees structurally that resources aren’t used after their lifetimes have ended. This is very important.
When we abandon this strict nesting of scopes and lifetimes — say, when we reference count an object on the heap, or when we use the singleton pattern — we are fighting against the strengths of the language rather than working with them.
<h2>The Trouble With Threads</h2>
Writing correct programs in the presence of concurrency is far more difficult than in single-threaded code. There are lots of reasons for this. One reason is that threads, like singletons and dynamically allocated objects, scoff at your puny nested scopes. Although you can use the Modern C++ style within a thread, when logic and lifetimes are scattered across threads, the hierarchical structure of your program is lost. The tools we use to manage complexity in single-threaded code — in particular, nested lifetimes tied to nested scopes — simply don’t translate to async code.
To see what I mean, let’s look at what happens when we take a simple synchronous function and make it asynchronous.
<pre class="brush: cpp; notranslate">void computeResult(State & s);
int doThing() {
State s;
computeResult(s);
return s.result;
}
</pre>
<code>doThing()</code> is simple enough. It declares some local state, calls a helper, then returns some result. Now imagine that we want to make both functions async, maybe because they take too long. No problem, let’s use Boost futures, which support continuation chaining:
<pre class="brush: cpp; notranslate">boost::future<void> computeResult(State & s);
boost::future<int> doThing() {
State s;
auto fut = computeResult(s);
return fut.then(
[&](auto&&) { return s.result; }); // OOPS
}
</pre>
If you’ve programmed with futures before, you’re probably screaming, “Nooooo!” The <code>.then()</code> on the last line queues up some work to run after <code>computeResult()</code> completes. <code>doThing()</code> then returns the resulting future. The trouble is, when <code>doThing()</code> returns, the lifetime of the <code>State</code> object ends, and the continuation is still referencing it. That is now a dangling reference, and will likely cause a crash.
What has gone wrong? Futures let us compute with results that aren’t available yet, and the Boost flavor lets us chain continuations. But the continuation is a separate function with a separate scope. We often need to share data across those separate scopes. No more tidy nested scopes, no more nested lifetimes. We have to manage the lifetime of the state manually, something like this:
<pre class="brush: cpp; notranslate">boost::future<void>
computeResult(shared_ptr<State> s); // addref
// the state
boost::future<int> doThing() {
auto s = std::make_shared<State>();
auto fut = computeResult(s);
return fut.then(
[s](auto&&) { return s.result; }); // addref
// the state
}
</pre>
Since both async operations refer to the state, they both need to share responsibility to keep it alive.
Another way to think about this is: what is the lifetime of this asynchronous computation? It starts when <code>doThing()</code> is called, but it doesn’t end until the continuation — the lambda passed to <code>future.then()</code> — returns. There is no lexical scope that corresponds to that lifetime. And that is the source of our woes.
<h2>Unstructured Concurrency</h2>
The story gets more complicated yet when we consider executors. Executors are handles to executions contexts that let you schedule work onto, say, a thread or thread pool. Many codebases have some notion of an executor, and some let you schedule things with a delay or with some other policy. This lets us do cool things, like move a computation from an IO thread pool to a CPU thread pool, or retry an async operation with a delay. Handy, but like <code>goto</code> it is a very low-level control structure that tends to obfuscate rather than clarify.
For instance, I recently came across an algorithm that uses executors and callbacks (called Listeners here) that retries the async allocation of some resource. Below is a greatly abridged version. It is described after the break.
<pre class="brush: cpp; notranslate">// This is a continuation that gets invoked when
// the async operation completes:
struct Manager::Listener : ListenerInterface {
shared_ptr<Manager> manager_;
executor executor_;
size_t retriesCount_;
void onSucceeded() override {
/* ...yay, allocation succeeded... */
}
void onFailed() override {
// When the allocation fails, post a retry
// to the executor with a delay
auto alloc = [manager = manager_]() {
manager->allocate();
};
// Run "alloc" at some point in the future:
executor_.execute_after(
alloc, 10ms * (1 << retriesCount_));
}
};
// Try asynchronously allocating some resource
// with the above class as a continuation
void Manager::allocate() {
// Have we already tried too many times?
if (retriesCount_ > kMaxRetries) {
/* ...notify any observers that we failed */
return;
}
// Try once more:
++retriesCount_;
allocator_.doAllocate(
make_shared<Listener>(
shared_from_this(),
executor_,
retriesCount_));
}
</pre>
The <code>allocate()</code> member function first checks to see if the operation has already been retried too many times. If not it calls a helper <code>doAllocate()</code> function, passing in a callback to be notified on either success or failure. On failure, the handler posts deferred work to the executor, which will call <code>allocate()</code> back, thus retrying the allocation with a delay.
This is a heavily stateful and rather circuitous async algorithm. The logic spans many functions and several objects, and the control and data flow is not obvious. Note the intricate ref-counting dance necessary to keep the objects alive. Posting the work to an executor makes it even harder. Executors in this code have no notion of continuations, so errors that happen during task execution have nowhere to go. The <code>allocate()</code> function can’t signal an error by throwing an exception if it wants any part of the program to be able to recover from the error. Error handling must be done manually and out-of-band. Ditto if we wanted to support cancellation.
This is unstructured concurrency: we queue up async operations in an ad hoc fashion; we chain dependent work, use continuations or “strand” executors to enforce sequential consistency; and we use strong and weak reference counts to keep data alive until we are certain it’s no longer needed. There is no formal notion of task A being a child of task B, no way to enforce that child tasks complete before their parents, and no one place in the code that we can point to and say, “Here is the algorithm.”
<blockquote>
If you don’t mind the analogy, the hops through the executor are a bit like <code>goto</code> statements that are non-local in both time and space: “Jump to this point in the program, X milliseconds from now, on this particular thread.”
</blockquote>
That non-local discontinuity makes it hard to reason about correctness and efficiency. Scale unstructured concurrency up to whole programs handling lots of concurrent real-time events, and the incidental complexity of manually handling out-of-band asynchronous control and data flow, controlling concurrent access to shared state, and managing object lifetime becomes overwhelming.
<h2>Structured Concurrency</h2>
Recall that in the early days of computing, unstructured programming styles rapidly gave way to structured styles. With the addition of coroutines to C++, we are seeing a similar phase shift happening today to our asynchronous code. If we were to rewrite the above retry algorithm in terms of coroutines (using Lewis Baker’s popular <a href="https://github.com/lewissbaker/cppcoro">cppcoro</a> library), it might look something like this:
<pre class="brush: cpp; notranslate">// Try asynchronously allocating some resource
// with retry:
cppcoro::task<> Manager::allocate() {
// Retry the allocation up to kMaxRetries
// times:
for (int retriesCount = 1;
retriesCount <= kMaxRetries;
++retriesCount) {
try {
co_await allocator_.doAllocate();
co_return; // success!
} catch (...) {}
// Oops, it failed. Yield the thread for a
// bit and then retry:
co_await scheduler_.schedule_after(
10ms * (1 << retriesCount));
}
// Error, too many retries
throw std::runtime_error(
"Resource allocation retry count exceeded.");
}
</pre>
<blockquote>
Aside: This replaces the <code>executor_</code> with a <code>scheduler_</code> that implements cppcoro’s <a href="https://github.com/lewissbaker/cppcoro#delayedscheduler-concept">DelayedScheduler</a> concept.
</blockquote>
Let’s list the ways in which this is an improvement:
<ol>
<li>It’s all in one function! Good locality.</li>
<li>The state (like <code>retriesCount</code>) can be maintained in local variables instead of as members of objects that need to be ref-counted.</li>
<li>We can use ordinary C++ error handling techniques.</li>
<li>We are guaranteed structurally that the async call to <code>allocator_.doAllocate()</code> completes before this function continues executing.</li>
</ol>
Point (4) has profound implications. Consider the trivial example from the beginning of the article. The following re-implementation in terms of coroutines is perfectly safe:
<pre class="brush: cpp; notranslate">cppcoro::task<> computeResult(State & s);
cppcoro::task<int> doThing() {
State s;
co_await computeResult(s);
co_return s.result;
}
</pre>
The above code is safe because we know that <code>computeResult</code> completes before <code>doThing</code> is resumed and thus before <code>s</code> is destructed.
<blockquote>
With structured concurrency, it is perfectly safe to pass local variables by reference to child tasks that are immediately awaited.
</blockquote>
<h2>Cancellation</h2>
Taking a structured approach to concurrency, where the lifetime of concurrent operations is strictly nested within the lifetime of resources that it uses and is tied to program scopes, allows us to avoid needing to use garbage collection techniques like <code>shared_ptr</code> to manage lifetime. This can lead to code that is more efficient, requiring fewer heap-allocations and fewer atomic reference-counting operations, as well as code that is easier to reason about and is less bug-prone. However, one implication of this approach is that it means that we must always join and wait for child operations before the parent operation can complete. We can no longer just detach from those child operations and let the resources get cleaned up automatically when their ref-counts drop to zero. To avoid having to wait unnecessarily long times for child operations whose results are no longer needed, we need a mechanism to be able to cancel those child operations so that they complete quickly. Thus the structured concurrency model requires deep support for cancellation to avoid introducing unnecessary latency.
Note that we rely on structured lifetime and structured concurrency every time we pass a local variable to a child coroutine by reference. We must ensure that the child coroutine has completed and is no longer using that object before the parent coroutine exits the scope of that local variable and destroys it.
<h2>Structured Concurrency > Coroutines</h2>
When I talk about “structured concurrency,” I am not just talking about coroutines — although that is its most obvious manifestation. To see what I mean, let’s talk briefly about what coroutines are and what they are not. In particular, there is nothing inherently concurrent about C++ coroutines at all! They are really just a way to get the compiler to carve your function up into callbacks for you.
Consider the simple coroutine above:
<pre class="brush: cpp; notranslate">cppcoro::task<> computeResult(State & s);
cppcoro::task<int> doThing() {
State s;
co_await computeResult(s);
co_return s.result;
}
</pre>
What does <code>co_await</code> here mean? The trite answer is: it means whatever the author of <code>cppcoro::task<></code> wants it to mean (within certain bounds). The fuller answer is that <code>co_await</code> suspends the current coroutine, bundles up the rest of the coroutine (here, the statement <code>co_return s.result;</code>) as a continuation, and passes it to the awaitable object (here, the <code>task<></code> returned by <code>computeResult(s)</code>). That awaitable will typically store it somewhere so it can be invoked later, when the child task completes. That’s what <code>cppcoro::task<></code> does, for instance.
In other words, the <code>task<></code> type and the coroutines language feature conspire together to layer “structured concurrency” on top of boring ol’ callbacks. That’s it. That’s the magic. It’s all just callbacks, but callbacks in a very particular pattern, and it is that pattern that makes this “structured.” The pattern ensures that child operations complete before parents, and that property is what brings the benefits.
Once we recognize that structured concurrency is really just callbacks in a particular pattern, we realize that we can achieve structured concurrency without coroutines. Programming with callbacks is nothing new, of course, and the patterns can be codified into a library and made reusable. That’s what <a href="https://github.com/facebookexperimental/libunifex">libunifex</a> does. If you follow C++ standardization, it is also what the sender/receiver abstraction from <a href="http://wg21.link/P0443">the Executors proposal</a> does.
Using libunifex as a basis for structured concurrency, we can write the example above as follows:
<pre class="brush: cpp; notranslate">unifex::any_sender_of<> computeResult(State & s);
auto doThing() {
return unifex::let_with(
// Declare a "local variable" of type State:
[] { return State{}; },
// Use the local to construct an async task:
[](State & s) {
return unifex::transform(
computeResult(s),
[&] { return s.result; });
});
}
</pre>
Why would anybody write that when we have coroutines? You would certainly need a good reason, but I can think of a few. With coroutines, you have an allocation when a coroutine is first called, and an indirect function call each time it is resumed. The compiler can sometimes eliminate that overhead, but sometimes not. By using callbacks directly — but in a structured concurrency pattern — we can get many of the benefits of coroutines without the tradeoffs.
That style of programming makes a different tradeoff, however: it is far harder to write and read than the equivalent coroutine. I think that >90% of all async code in the future should be coroutines simply for maintainability. For hot code, selectively replace coroutines with the lower-level equivalent, and let the benchmarks be your guide.
<h2>Concurrency</h2>
I mention above that coroutines aren’t inherently concurrent; they’re just a way of writing callbacks. Coroutines are inherently sequential in nature and the laziness of <code>task<></code> types — where a coroutine starts suspended and doesn’t start executing until it is awaited — means that we can’t use it to introduce concurrency in the program. Existing <code>future</code>-based code often assumes that the operation has already started eagerly, introducing ad hoc concurrency that you need to be careful to prune back. That forces you to re-implement concurrency patterns over and over in an ad hoc fashion.
With structured concurrency, we codify concurrency patterns into reusable algorithms to introduce concurrency in a structured way. For instance, if we have a bunch of <code>task</code>s and would like to wait until they have all completed and return their results in a <code>tuple</code>, we pass them all to the <code>cppcoro::when_all</code> and <code>co_await</code> the result. (Libunifex also has a <code>when_all</code> algorithm.)
At present, neither cppcoro nor libunifex has a <code>when_any</code> algorithm, so you can’t launch a bunch of concurrent operations and return when the first one completes. It’s a very important and interesting foundational algorithm, though. To maintain the guarantees of structured concurrency, when the first child task completes, <code>when_any</code> should request cancellation on all the other tasks and then wait for them all to finish. The utility of this algorithm depends upon all async operations in your program responding promptly to cancellation requests, which demonstrates just how important deep support for cancellation is in modern async programs.
<h2>Migration</h2>
So far, I’ve discussed what structured concurrency is and why it matters. I haven’t discussed how we get there. If you are already using coroutines to write async C++, then congrats. You may keep enjoying the benefits of structured concurrency, perhaps with a deeper understanding and appreciation for why coroutines are so transformative.
For codebases that lack structured concurrency, deep support for cancellation, or maybe even an abstraction for asynchrony, the job is hard. It may even start with introducing complexity in order to carve out an island in which the surrounding code provides the guarantees that structured concurrency patterns require. This includes, for instance, creating the impression of prompt cancellation of scheduled work, even when the underlying execution contexts don’t offer that directly. That added complexity can be isolated in a layer, and the islands of structured concurrency can be built on top. Then the simplifying work can begin, taking future- or callback-style code and converting them to coroutines, teasing out parent/child relationships, ownership, and lifetime.
<h2>Summary</h2>
Adding <code>co_await</code> makes a synchronous function asynchronous, without disturbing the structure of the computation. The async operation being awaited necessarily completes before the calling function does, just like ordinary function calls. The revolution is: nothing changes. Scopes and lifetimes still nest as they always have, except now the scopes are discontinuous in time. With raw callbacks and futures, that structure is lost.
Coroutines, and structured concurrency more generally, bring the advantages of the Modern C++ style — value semantics, algorithm-driven design, clear ownership semantics with deterministic finalization — into our async programming. It does that because it ties async lifetimes back to ordinary C++ lexical scopes. Coroutines carve our async functions up into callbacks at suspension points, callbacks that get called in a very specific pattern to maintain that strict nesting of scopes, lifetimes, and function activations.
We sprinkle <code>co_await</code> in our code and we get to continue using all our familiar idioms: exceptions for error handling, state in local variables, destructors for releasing resources, arguments passed by value or by reference, and all the other hallmarks of good, safe, and idiomatic Modern C++.
Thanks for reading.
<hr />
If you want to hear more about structured concurrency in C++, be sure to check out <a href="https://www.youtube.com/watch?v=1Wy5sq3s2rg">Lewis Baker’s CppCon talk</a> from 2019 about it.
<div style="display:none">
<pre class="brush: cpp; title: ; notranslate">"\e"</pre>
</div>
]]></content:encoded>
<wfw:commentRss>https://ericniebler.com/2020/11/08/structured-concurrency/feed/</wfw:commentRss>
<slash:comments>28</slash:comments>
<post-id xmlns="com-wordpress:feed-additions:1">3117</post-id> </item>
<item>
<title>Standard Ranges</title>
<link>https://ericniebler.com/2018/12/05/standard-ranges/?utm_source=rss&utm_medium=rss&utm_campaign=standard-ranges</link>
<comments>https://ericniebler.com/2018/12/05/standard-ranges/#comments</comments>
<dc:creator><![CDATA[Eric Niebler]]></dc:creator>
<pubDate>Thu, 06 Dec 2018 01:58:12 +0000</pubDate>
<category><![CDATA[generic-programming]]></category>
<category><![CDATA[library-design]]></category>
<category><![CDATA[ranges]]></category>
<category><![CDATA[std]]></category>
<category><![CDATA[std2]]></category>
<guid isPermaLink="false">http://ericniebler.com/?p=1058</guid>
<description><![CDATA[As you may have heard by now, Ranges got merged and will be part of C++20. This is huge news and represents probably the biggest shift the Standard Library has seen since it was first standardized way back in 1998. <a class="more-link" href="https://ericniebler.com/2018/12/05/standard-ranges/">Continue reading Standard Ranges→</a>]]></description>
<content:encoded><![CDATA[As you may have heard by now, Ranges got merged and will be part of C++20. This is huge news and represents probably the biggest shift the Standard Library has seen since it was first standardized way back in 1998.
This has been a long time coming. Personally, I’ve been working toward this since at least November 2013, when I opined, “In my opinion, it’s time for a range library for the modern world,” in a <a href="http://ericniebler.com/2013/11/07/input-iterators-vs-input-ranges/">blog post on input ranges</a>. Since then, I’ve been busy building that <a href="https://github.com/ericniebler/range-v3">modern range library</a> and nailing down <a href="http://wg21.link/P0896">its specification</a> with the help of some <a href="https://twitter.com/CoderCasey">very talented people</a>.
Future blog posts will discuss how we got here and the gritty details of how the old stuff and the new stuff play together (we’re C++ programmers, we love gritty details), but this post is strictly about the what.
<h1>What is coming in C++20?</h1>
All of the Ranges TS — and then some — will ship as part of C++20. Here’s a handy table of all the major features that will be shipping as part of the next standard:
<table>
<thead>
<tr>
<th>Feature</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr>
<td>Fundamental concepts</td>
<td><code>std::Copyable<T></code></td>
</tr>
<tr>
<td>Iterator and range concepts</td>
<td><code>std::InputIterator</code></td>
</tr>
<tr>
<td>New convenience iterator traits</td>
<td><code>std::iter_value_t</code></td>
</tr>
<tr>
<td>Safer range access functions</td>
<td><code>std::ranges::begin(rng)</code></td>
</tr>
<tr>
<td>Proxy iterator support</td>
<td><code>std::iter_value_t tmp =</code>     <code>std::ranges::iter_move(i);</code></td>
</tr>
<tr>
<td>Contiguous iterator support</td>
<td><code>std::ContiguousIterator</code></td>
</tr>
<tr>
<td>Constrained algorithms</td>
<td><code>std::ranges::sort(v.begin(), v.end());</code></td>
</tr>
<tr>
<td>Range algorithms</td>
<td><code>std::ranges::sort(v);</code></td>
</tr>
<tr>
<td>Constrained function objects</td>
<td><code>std::ranges::less</code></td>
</tr>
<tr>
<td>Generalized callables</td>
<td><code>std::ranges::for_each(v, &T::frobnicate);</code></td>
</tr>
<tr>
<td>Projections</td>
<td><code>std::ranges::sort(employees, less{},</code>     <code>&Employee::id);</code></td>
</tr>
<tr>
<td>Range utilities</td>
<td><code>struct my_view : std::view_interface<my_view> {</code></td>
</tr>
<tr>
<td>Range generators</td>
<td><code>auto indices = std::view::iota(0u, v.size());</code></td>
</tr>
<tr>
<td>Range adaptors</td>
<td><code>for (auto x : v | std::view::filter(pred)) {</code></td>
</tr>
</tbody>
</table>
Below, I say a few words about each. But first I wanted to revisit an old coding challenge and recast its solution in terms of standard C++20.
<h1>Pythagorian Triples, Revisited</h1>
Some years ago now, I wrote a <a href="http://ericniebler.com/2014/04/27/range-comprehensions/">blog post</a> about how to use ranges to generate an infinite list of Pythagorean triples: 3-tuples of integers where the sum of the squares of the first two equals the square of the third.
Below is the complete solution as it will look in standard C++20. I take the solution apart after the break.
<pre class="brush: cpp; notranslate">// A sample standard C++20 program that prints
// the first N Pythagorean triples.
#include <iostream>
#include <optional>
#include <ranges> // New header!
using namespace std;
// maybe_view defines a view over zero or one
// objects.
template<Semiregular T>
struct maybe_view : view_interface<maybe_view<T>> {
maybe_view() = default;
maybe_view(T t) : data_(std::move(t)) {
}
T const *begin() const noexcept {
return data_ ? &*data_ : nullptr;
}
T const *end() const noexcept {
return data_ ? &*data_ + 1 : nullptr;
}
private:
optional<T> data_{};
};
// "for_each" creates a new view by applying a
// transformation to each element in an input
// range, and flattening the resulting range of
// ranges.
// (This uses one syntax for constrained lambdas
// in C++20.)
inline constexpr auto for_each =
[]<Range R,
Iterator I = iterator_t<R>,
IndirectUnaryInvocable Fun>(R&& r, Fun fun)
requires Range<indirect_result_t<Fun, I>> {
return std::forward<R>(r)
| view::transform(std::move(fun))
| view::join;
};
// "yield_if" takes a bool and a value and
// returns a view of zero or one elements.
inline constexpr auto yield_if =
[]<Semiregular T>(bool b, T x) {
return b ? maybe_view{std::move(x)}
: maybe_view<T>{};
};
int main() {
// Define an infinite range of all the
// Pythagorean triples:
using view::iota;
auto triples =
for_each(iota(1), [](int z) {
return for_each(iota(1, z+1), [=](int x) {
return for_each(iota(x, z+1), [=](int y) {
return yield_if(x*x + y*y == z*z,
make_tuple(x, y, z));
});
});
});
// Display the first 10 triples
for(auto triple : triples | view::take(10)) {
cout << '('
<< get<0>(triple) << ','
<< get<1>(triple) << ','
<< get<2>(triple) << ')' << '\n';
}
}
</pre>
The above program prints the following:
<pre><code>(3,4,5)
(6,8,10)
(5,12,13)
(9,12,15)
(8,15,17)
(12,16,20)
(7,24,25)
(15,20,25)
(10,24,26)
(20,21,29)
</code></pre>
This program is (lazily) generating an infinite list of Pythagorean triples, taking the first 10, and printing them out. Below is a quick rundown on how it works. Along the way, I’ll point out the parts of that solution that will be standard starting in C++20.
<h2><code>main()</code></h2>
First, let’s look at <code>main</code>, which creates the infinite list of triples and prints out the first 10. It makes repeated use of <code>for_each</code> to define the infinite list. A use like this:
<pre class="brush: cpp; notranslate">auto x = for_each( some-range, [](auto elem) {
return some-view;
} );
</pre>
means: For every element in some-range, call the lambda. Lazily collect all the views thus generated and flatten them into a new view. If the lambda were to return <code>view::single(elem)</code>, for instance — which returns a view of exactly one element — then the above is a no-op: first carve some-range into N subranges of 1-element each, then flatten them all back into a single range.
Armed with that knowledge, we can make sense of the triply-nested invocations of <code>for_each</code>:
<pre class="brush: cpp; notranslate">for_each(iota(1), [](int z) {
return for_each(iota(1, z+1), [=](int x) {
return for_each(iota(x, z+1), [=](int y) {
</pre>
This code is generating every combination of integers <code>x</code>, <code>y</code>, and <code>z</code> in some order (selecting the bounds so that <code>x</code> and <code>y</code> are never larger than <code>z</code>, because those can’t be Pythagorean triples). At each level we create structure: we start with a single range (<code>iota(1)</code>, described below), and then get a range of ranges where each inner range corresponds to all the combinations that share a value for <code>z</code>. Those inner ranges are themselves further decomposed into subranges, each of which represents all the combinations that share a value of <code>x</code>. And so on.
The innermost lambda has <code>x</code>, <code>y</code>, and <code>z</code> and can decide whether to emit the triple or not:
<pre class="brush: cpp; notranslate">return yield_if(x*x + y*y == z*z,
make_tuple(x, y, z));
</pre>
<code>yield_if</code> takes a Boolean (have we found a Pythagorean triple?) and the triple, and either emits an empty range or a 1-element range containing the triple. That set of ranges then gets flattened, flattened, and flattened again into the infinite list of the Pythagorean triples.
We then pipe that infinite list to <code>view::take(10)</code>, which truncates the infinite list to the first 10 elements. Then we iterate over those elements with an ordinary range-based <code>for</code> loop and print out the results. Phwew!
Now that we have a high-level understanding of what this program is doing, we can take a closer look at the individual components.
<h2><code>view::iota</code></h2>
This is a very simple view. It takes either one or two objects of <code>Incrementable</code> type. It builds a range out of them, using the second argument as the upper bound of a half-closed (i.e., exclusive) range, taking the upper bound to be an unreachable sentinel if none is specified (i.e., the range is infinite). Here we use it to build a range of integers, but any incrementable types will do, including iterators.
The name “<code>iota</code>” comes from the <code>std::iota</code> numeric algorithm, which itself has an <a href="https://stackoverflow.com/a/9244949/195873">interesting naming history</a>.
<h2><code>for_each</code></h2>
The range-v3 library comes with <code>view::for_each</code> and <code>yield_if</code>, but those haven’t been proposed yet. But <code>view::for_each</code> is a trivial composition of <code>view::transform</code> and <code>view::join</code> which will be part of C++20, so we can implement it as follows:
<pre class="brush: cpp; notranslate">inline constexpr auto for_each =
[]<Range R,
Iterator I = iterator_t<R>,
IndirectUnaryInvocable Fun>(R&& r, Fun fun)
requires Range<indirect_result_t<Fun, I>> {
return std::forward<R>(r)
| view::transform(std::move(fun))
| view::join;
};
</pre>
This declares an object <code>for_each</code> that is a C++20 constrained generic lambda with explicitly specified template parameters. “<code>Range</code>” and “<code>IndirectUnaryInvocable</code>” are standard concepts in C++20 that live in namespace <code>std</code>. They constrain the arguments <code>r</code> and <code>fun</code> of the lambda to be a range (duh) and a function that is callable with the values of the range. We then further constrain the lambda with a trailing <code>requires</code> clause, ensuring that the function’s return type must be a <code>Range</code> as well. <code>indirect_result_t</code> will also be standard in C++20. It answers the question: if I call this function with the result of dereferencing this iterator, what type do I get back?
The lambda first lazily transforms the range <code>r</code> by piping it to <code>view::transform</code>, moving <code>fun</code> in. <code>view::</code> is a namespace within <code>std::</code> in which all the new lazy range adaptors live. Since <code>fun</code> returns a <code>Range</code> (we required that!), the result of the transformation is a range of ranges. We then pipe that to <code>view::join</code> to flatten the ranges into one big range.
The actual code, lines 6-8, kind of gets lost in the sea of constraints, which are not strictly necessary to use the library; I’m being a bit pedantic for didactic purposes here, so please don’t let that trip you up.
I also could have very easily written <code>for_each</code> as a vanilla function template instead of making it an object initialized with a constrained generic lambda. I opted for an object in large part because I wanted to demonstrate how to use concepts with lambdas in C++20. Function objects have <a href="http://ericniebler.com/2014/10/21/customization-point-design-in-c11-and-beyond/">other nice properties</a>, besides.
<h2><code>yield_if</code></h2>
<code>yield_if</code> is simpler conceptually, but it requires a little legwork on our part. It is a function that takes a Boolean and an object, and it returns either an empty range (if the Boolean is false), or a range of length one containing the object. For that, we need to write our own view type, called <code>maybe_view</code>, since there isn’t one in C++20. (Not yet, at least. There is <a href="http://wg21.link/p1255">a proposal</a>.)
Writing views is made a little simpler with the help of <code>std::view_interface</code>, which generates some of the boilerplate from <code>begin()</code> and <code>end()</code> functions that you provide. <code>view_interface</code> provides some handy members like <code>.size()</code>, <code>.operator[]</code>, <code>.front()</code>, and <code>.back()</code>.
<code>maybe_view</code> is reproduced below. Notice how it is trivially implemented in terms of <code>std::optional</code> and <code>std::view_interface</code>.
<pre class="brush: cpp; notranslate">template<Semiregular T>
struct maybe_view : view_interface<maybe_view<T>> {
maybe_view() = default;
maybe_view(T t) : data_(std::move(t)) {
}
T const *begin() const noexcept {
return data_ ? &*data_ : nullptr;
}
T const *end() const noexcept {
return data_ ? &*data_ + 1 : nullptr;
}
private:
optional<T> data_{};
};
</pre>
Once we have <code>maybe_view</code>, the implementation of <code>yield_if</code> is also trivial. It returns either an empty <code>maybe_view</code>, or one containing a single element, depending on the Boolean argument.
<pre class="brush: cpp; notranslate">inline constexpr auto yield_if =
[]<Semiregular T>(bool b, T x) {
return b ? maybe_view{std::move(x)}
: maybe_view<T>{};
};
</pre>
<blockquote>
Note: <code>maybe_view</code> owns its elements. It is generally a violation of the <code>View</code> concept’s semantic requirements for a view to own its elements because it gives the type’s copy and move operations O(N) behavior. However, in this case — where N is either 0 or 1 — we just squeak by.
</blockquote>
And that’s it. This program demonstrates how to use <code>view::iota</code>, <code>view::transform</code>, <code>view::join</code>, <code>view_interface</code>, and some standard concepts to implement a very useful bit of library functionality, and then uses it to construct an infinite list with some interesting properties. If you have used list comprehensions in Python or Haskell, this should feel pretty natural.
But these features are just a tiny slice of the range support in C++20. Below, I go through each row of the table at the top of the post, and give an example of each.
<h1>Fundamental Concepts</h1>
The C++20 Standard Library is getting a host of generally useful concept definitions that users can use in their own code to constrain their templates and to define higher-level concepts that are meaningful for them. These all live in the new <code><concepts></code> header, and they include things like <code>Same<A, B></code>, <code>ConvertibleTo<From, To></code>, <code>Constructible<T, Args...></code>, and <code>Regular<T></code>.
Say for instance that you have a thread pool class with an <code>enqueue</code> member function that takes something that is callable with no arguments. Today, you would write it like this:
<pre class="brush: cpp; notranslate">struct ThreadPool {
template <class Fun>
void enqueue( Fun fun );
};
</pre>
Users reading this code might wonder: what are the requirements on the type <code>Fun</code>? We can enforce the requirement in code using C++20’s <code>std::Invocable</code> concept, along with the recently-added support for abreviated function syntax:
<pre class="brush: cpp; notranslate">#include <concepts>
struct ThreadPool {
void enqueue( std::Invocable auto fun );
};
</pre>
This states that <code>fun</code> has to be invocable with no arguments. We didn’t even have to type <code>template <class ...></code>! (<code>std::Invocable<std::error_code &> auto fun</code> would declare a function that must be callable with a reference to a <code>std::error_code</code>, to take another example.)
<h1>Iterator and Range Concepts</h1>
A large part of the Standard Library concerns itself with containers, iterators, and algorithms, so it makes sense that the conceptual vocabulary would be especially rich in this area. Look for useful concept definitions like <code>Sentinel<S, I></code>, <code>InputIterator</code>, and <code>RandomAccessIterator</code> in the <code><iterator></code> header, in addition to useful compositions like <code>IndirectRelation<R, I1, I2></code> which test that <code>R</code> imposes a relation on the result of dereferencing iterators <code>I1</code> and <code>I2</code>.
Say for example that you have a custom container type in your codebase called <code>SmallVector</code> that, like <code>std::vector</code>, can be initialized by passing it two iterators denoting a range. We can write this with concepts from <code><iterator></code> and <code><concepts></code> as follows:
<pre class="brush: cpp; notranslate">template <std::Semiregular T>
struct SmallVector {
template <std::InputIterator I>
requires std::Same<T, std::iter_value_t>
SmallVector( I i, std::Sentinel auto s ) {
// ...push back all elements in [i,s)
}
// ...
</pre>
Likewise, this type can get a constructor that takes a range directly using concepts defined in the new <code><ranges></code> header:
<pre class="brush: cpp; notranslate"> // ... as before
template <std::InputRange R>
requires std::Same<T, std::range_value_t<R>>
explicit SmallVector( R && r )
: SmallVector(std::ranges::begin(r),
std::ranges::end(r)) {
}
};
</pre>
<blockquote>
Note: <code>range_value_t<R></code> hasn’t been formally accepted yet. It is an alias for <code>iter_value_t<iterator_t<R>></code>.
</blockquote>
<h1>New Convenience Iterator Traits</h1>
In C++17, if you want to know the value type of an iterator <code>I</code>, you have to type <code>typename std::iterator_traits::value_type</code>. That is a mouthful. In C++20, that is vastly shortened to <code>std::iter_value_t</code>. Here are the newer, shorter type aliases and what they mean:
<table>
<thead>
<tr>
<th>New iterator type alias</th>
<th>Old equivalent</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>iter_difference_t</code></td>
<td><code>typename iterator_traits::difference_type</code></td>
</tr>
<tr>
<td><code>iter_value_t</code></td>
<td><code>typename iterator_traits::value_type</code></td>
</tr>
<tr>
<td><code>iter_reference_t</code></td>
<td><code>typename iterator_traits::reference</code></td>
</tr>
<tr>
<td><code>iter_rvalue_reference</code></td>
<td>no equivalent, see below</td>
</tr>
</tbody>
</table>
There is no <code>iter_category_t</code> to get an iterator’s tag type because tag dispatching is now passé. Now that you can dispatch on iterator concept using language support, there is no need for tags.
<h1>Safe Range Access Functions</h1>
What is wrong with <code>std::begin</code> and <code>std::end</code>? Surprise! they are not memory safe. Consider what this code does:
<pre class="brush: cpp; notranslate">extern std::vector<int> get_data();
auto it = std::begin(get_data());
int i = *it; // BOOM
</pre>
<code>std::begin</code> has two overloads for <code>const</code> and non-<code>const</code> lvalues. Trouble is, rvalues bind to <code>const</code> lvalue references, leading to the dangling iterator <code>it</code> above. If we had instead called <code>std::ranges::begin</code>, the code would not have compiled.
<code>ranges::begin</code> has other niceties besides. It does the <a href="http://ericniebler.com/2014/10/21/customization-point-design-in-c11-and-beyond/">ADL two-step</a> for you saving you from remembering to type <code>using std::begin;</code> in generic code. In other words, it dispatches to a <code>begin()</code> free function found by ADL, but only if it returns an <code>Iterator</code>. That’s an extra bit of sanity-checking that you won’t get from <code>std::begin</code>.
Basically, prefer <code>ranges::begin</code> in all new code in C++20 and beyond. It’s more better.
<h1>Prvalue and Proxy Iterator Support</h1>
The C++98 iterator categories are fairly restrictive. If your iterator returns a temporary (i.e., a prvalue) from its <code>operator*</code>, then the strongest iterator category it could model was <code>InputIterator</code>. <code>ForwardIterator</code> required <code>operator*</code> to return by reference. That meant that a trivial iterator that returns monotonically increasing integers by value, for example, cannot satisfy <code>ForwardIterator</code>. Shame, because that’s a useful iterator! More generally, any iterator that computes values on-demand could not model <code>ForwardIterator</code>. That’s :’-(.
It also means that iterators that return proxies — types that act like references — cannot be <code>ForwardIterator</code>s. Hence, whether it was a good idea or not, <code>std::vector<bool></code> is not a real container since its iterators return proxies.
The new C++20 iterator concepts solve both of this problems with the help of <code>std::ranges::iter_swap</code> (a constrained version of <code>std::iter_swap</code>), and the new <code>std::ranges::iter_move</code>. Use <code>ranges::iter_swap(i, j)</code> to swap the values referred to by <code>i</code> and <code>j</code>. And use the following:
<pre class="brush: cpp; notranslate">iter_value_t tmp = ranges::iter_move(i);
</pre>
… to move an element at position <code>i</code> out of sequence and into the temporary object <code>tmp</code>.
Authors of proxy iterator types can hook these two customization points to make their iterators play nicely with the constrained algorithms in the <code>std::ranges</code> namespace (see below).
The new <code>iter_rvalue_reference_t</code> type alias mentioned above names the return type of <code>ranges::iter_move(i)</code>.
<h1>Contiguous Iterator Support</h1>
In Stepanov’s STL, <code>RandomAccessIterator</code> is the strongest iterator category. But whether elements are contiguous in memory is a useful piece of information, and there exist algorithms that can take advantage of that information to become more efficient. Stepanov was aware of that but felt that raw pointers were the only interesting model of contiguous iterators, so he didn’t need to add a new category. He would have been appalled at the library vendors who ship <code>std::vector</code> implementations with wrapped debug iterators.
TL;DR, we are now defining an extra category that subsumes (refines) <code>RandomAccessIterator</code> called <code>ContiguousIterator</code>. A type must opt-in to contiguity by defining a nested type named <code>iterator_concept</code> (note: not <code>iterator_category</code>) that is an alias for the new <code>std::contiguous_iterator_tag</code> tag type. Or you could specialize <code>std::iterator_traits</code> for your type and specify <code>iterator_concept</code> there.
<blockquote>
There is a whole blog post coming about <code>iterator_category</code>, <code>iterator_concept</code>, and how to write an iterator type that conforms both to the old iterator concepts and the new, with different strengths in each. It’s a brave new world of back-compat considerations.
</blockquote>
<h1>Constrained Algorithms</h1>
Ever tried to pass a <code>std::list</code>‘s iterator to <code>std::sort</code>? Or any other combination of nonesense? When you accidentally fail to meet an algorithm’s (unstated) type requirements today, your compiler will inform you in the most obscure and voluminous way possible, spewing errors that appear to come from within the guts of your STL implementation.
Concepts are designed to help with this. For instance, look at this code that is using the <a href="https://github.com/CaseyCarter/cmcstl2">cmcstl2</a> reference implementation (which puts <code>std::ranges</code> in <code>std::experimental::ranges</code> for now):
<pre class="brush: cpp; notranslate">#include <list>
#include <stl2/algorithm.hpp>
using ranges = std::experimental::ranges;
int main() {
std::list<int> l {82,3,7,2,5,8,3,0,4,23,89};
ranges::sort( l.begin(), l.end() );
}
</pre>
Rather than an error deep in the guts of <code>ranges::sort</code>, the error message points right to the line in <code>main</code> that failed to meet the constraints of the <code>sort</code> template. “error: no matching call for <code>ranges::sort(list<int>::iterator, list<int>::iterator)</code>“, followed by a message that shows the prototype that failed to match and an explanation that the constraints within <code>RandomAccessIterator</code> we not satisfied. You can see the full error <a href="https://godbolt.org/z/6FXw65">here</a>.
Much can be done to make the error more user-friendly, but it’s already a vast improvement over the status quo.
<h1>Range Algorithms</h1>
This one is fairly obvious. It’s been 20 years since the STL was standardized, and all I want to do is pass a <code>vector</code> to <code>sort</code>. Is that too much to ask? Nope. With C++20, you will finally be able to do this:
<pre class="brush: cpp; notranslate">std::vector< int > v = // ...
std::ranges::sort( v ); // Hurray!
</pre>
<h1>Constrained Function Objects</h1>
Have you ever used <code>std::less<></code>, the “diamond” specializations of the comparison function objects that were added in C++14? These let you compare things without having to say up front what type you’re comparing or forcing conversions. These exist in the <code>std::ranges</code> namespace too, but you don’t have to type <code><></code> because they are not templates. Also, they have constrained function call operators. So <code>less</code>, <code>greater</code>, <code>less_equal</code>, and <code>greater_equal</code> are all constrained with <code>StrictTotallyOrderedWith</code>, for instance.
These types are particularly handy when defining APIs that accept a user-specified relation, but default the relation to <code>operator<</code> or <code>operator==</code>. For instance:
<pre class="brush: cpp; notranslate">template <class T, Relation<T, T> R = ranges::less>
T max( T a, T b, R r = {} ) {
return r( a, b ) ? b : a;
}
</pre>
This function has the nice property that if the user specifies a relation, it will be used and the constraints guarantee that <code>R</code> is a <code>Relation</code> over type <code>T</code>. If the user doesn’t specify a relation, then the constraints require that <code>T</code> satisfies <code>StrictTotallyOrderedWith</code> itself. That is implicit in the fact that <code>R</code> defaults to <code>ranges::less</code>, and <code>ranges::less::operator()</code> is constrained with <code>StrictTotallyOrderedWith</code>.
<h1>Generalized Callables</h1>
In C++17, the Standard Library got a handy function: <code>std::invoke</code>. It lets you call any “Callable” thing with some arguments, where “Callable” includes ordinary function-like things in addition to pointers to members. However, the standard algorithms were not respecified to use <code>std::invoke</code>, which meant that code like the following failed to compile:
<pre class="brush: cpp; notranslate">struct Wizard {
void frobnicate();
};
int main() {
std::vector<Wizard> vw { /*...*/ };
std::for_each( vw.begin(), vw.end(),
&Wizard::frobnicate ); // Nope!
}
</pre>
<code>std::for_each</code> is expecting something callable like <code>fun(t)</code>, not <code>std::invoke(fun, t)</code>.
The new algorithms in the <code>std::ranges</code> namespace are required to use <code>std::invoke</code>, so if the above code is changed to use <code>std::ranges::for_each</code>, it will work as written.
<h1>Projections</h1>
Ever wanted to sort a range of things by some property of those things? Maybe sort a vector of Employees by their ids? Or last name? Or maybe you want to seach an array of points for one where the magnitude is equal to a certain value. For those things, projections are very handy. A projection is a unary transformation function passed to an algorithm that gets applied to each element before the algorithm operates on the element.
To take the example of sorting a vector of Employees by id, you can use a projection argument to <code>std::ranges::sort</code> as follows:
<pre class="brush: cpp; notranslate">struct Employee {
int Id;
std::string Name;
Currency Salary;
};
int main() {
using namespace std;
vector<Employee> employees { /*...*/ };
ranges::sort( employees, ranges::less{},
&Employee::Id );
}
</pre>
The third argument to <code>std::ranges::sort</code> is the projection. Notice that we used a generalized callable for it, from the previous section. This <code>sort</code> command sorts the Employees by the <code>Id</code> field.
Or for the example of searching an array of points for one where the magnitude is equal to a certain value, you would do the following:
<pre class="brush: cpp; notranslate">using namespace std;
array< Point > points { /*...*/ };
auto it = ranges::find( points, value, [](auto p) {
return sqrt(p.x*p.x + p.y*p.y);
} );
</pre>
Here we are using a projection to compute a property of each element and operating on the computed property.
Once you get the hang of projections, you’ll find they have many uses.
<h1>Range Utilities</h1>
The part of the standard library shipping in the <code><ranges></code> header has a lot of goodies. Besides an initial set of lazy range adaptors (described below), it has some handy, general-purpose utilities.
<h2>view_interface</h2>
As in the Pythagorean triples example above, your custom view types can inhert from <code>view_interface</code> to get a host of useful member functions, like <code>.front()</code>, <code>.back()</code>, <code>.empty()</code>, <code>.size()</code>, <code>.operator[]</code>, and even an explicit conversion to <code>bool</code> so that view types can be used in <code>if</code> statements:
<pre class="brush: cpp; notranslate">// Boolean conversion operator comes from view_interface:
if ( auto evens = vec | view::filter(is_even) ) {
// yup, we have some evens. Do something.
}
</pre>
<h2>subrange</h2>
<code>std::ranges::subrange<I, S></code> is probably the most handy of the range utilities. It is an iterator/sentinel pair that models the <code>View</code> concept. You can use it to bundle together two iterators, or an iterator and a sentinel, for when you want to return a range or call an API that expects a range.
It also has deduction guides that make it quite painless to use. Consider the following code:
<pre class="brush: cpp; notranslate">auto [b,e] = subrange{vec};
</pre>
This code is equivalent in effect to:
<pre class="brush: cpp; notranslate">auto b = ranges::begin(vec);
auto e = ranges::end(vec);
</pre>
The expression <code>subrange{vec}</code> deduces the iterator and sentinel template parameters from the range <code>vec</code>, and since <code>subrange</code> is tuple-like, we can unpack the iterator/sentinel pair using structured bindings.
<h2>ref_view</h2>
Although not officially merged yet, C++20 will have a <code>std::ranges::ref_view<R></code> which, like <code>std::reference_wrapper</code> is, well, a wrapper around a reference. In the case of <code>ref_view</code>, it is a reference to a range. It turns an lvalue container like <code>std::vector<int>&</code> into a <code>View</code> of the same elements that is cheap to copy: it simply wraps a pointer to the vector.
<h1>Range Generators</h1>
Now we get to the really fun stuff. The <code><ranges></code> header has a couple of ways to generate new ranges of values, including <code>std::view::iota</code> that we saw above. Here is how to use them, and what they mean:
<table>
<thead>
<tr>
<th>Syntax</th>
<th>Semantics</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>view::iota(i)</code></td>
<td>Given the incrementable object <code>i</code>, generates an infinite range of values like <code>[i,i+1,i+2,i+3,...)</code>.</td>
</tr>
<tr>
<td><code>view::iota(i,j)</code></td>
<td>Given the incrementable object <code>i</code> and some other object <code>j</code> that is comparable to <code>i</code> (but not necessarily the same type), generates a range of values like <code>[i,i+1,i+2,i+3,...,j-1]</code>. Note that the upper bound (<code>j</code>) is excluded, which makes this form usable with iterator/sentinel pairs. It can also be used to generate the indices of a range with <code>view::iota(0u, ranges::size(rng))</code>.</td>
</tr>
<tr>
<td><code>view::single(x)</code></td>
<td>Construct a one-element view of the value <code>x</code>; that is, <code>[x]</code>.</td>
</tr>
<tr>
<td><code>view::empty<T></code></td>
<td>A zero-element view of elements of type <code>T</code>.</td>
</tr>
<tr>
<td><code>view::counted(it, n)</code></td>
<td>Given an iterator <code>it</code> and a count <code>n</code>, constructs a finite range of <code>n</code> elements starting at the element denoted by <code>it</code>.</td>
</tr>
</tbody>
</table>
<h1>Range Adaptors</h1>
This is the really, really fun stuff. The true power of ranges lies in the ability to create pipelines that transform ranges on the fly. The range-v3 library has dozens of useful range adaptors. C++20 will only be getting a handful, but expect the set to grow over time.
<table>
<thead>
<tr>
<th>Syntax</th>
<th>Semantics</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>r | view::all</code></td>
<td>Create a <code>View</code> over all the elements in <code>Range</code> <code>r</code>. Perhaps <code>r</code> is already a <code>View</code>. If not, turn it into one with <code>ref_view</code> if possible, or <code>subrange</code> failing that. Rvalue containers are not “viewable,” and so code like <code>std::vector<int>{} | view::all</code> will fail to compile.</td>
</tr>
<tr>
<td><code>r | view::filter(pred)</code></td>
<td>Given a viewable range <code>r</code> and a predicate <code>pred</code>, return a <code>View</code> that consists of all the elements <code>e</code> for which <code>invoke(pred, e)</code> returns <code>true</code>.</td>
</tr>
<tr>
<td><code>r | view::transform(fn)</code></td>
<td>Given a viewable range <code>r</code> and a function <code>fn</code>, return a <code>View</code> that consists of all the elements of <code>r</code> transformed with <code>fn</code>.</td>
</tr>
<tr>
<td><code>r | view::reverse</code></td>
<td>Given a viewable range <code>r</code>, return a <code>View</code> that iterates <code>r</code>‘s values in reverse order.</td>
</tr>
<tr>
<td><code>r | view::take(n)</code></td>
<td>Given a viewable range <code>r</code>, return a <code>View</code> containing the first <code>n</code> elements of <code>r</code>, or all the elements of <code>r</code> if <code>r</code> has fewer than <code>n</code> elements.</td>
</tr>
<tr>
<td><code>r | view::join</code></td>
<td>Given a viewable range of ranges, flatten all the ranges into a single range.</td>
</tr>
<tr>
<td><code>r | view::split(r2)</code></td>
<td>Given a viewable range <code>r</code> and a pattern range <code>r2</code>, return a <code>View</code> of <code>View</code>s where the inner ranges are delimited by <code>r2</code>. Alternativly, the delimiter can be a single value <code>v</code> which is treated as if it were <code>view::single(v)</code>.</td>
</tr>
<tr>
<td><code>r | view::common</code></td>
<td>Given a viewable range <code>r</code>, return a <code>View</code> for which the begin and end iterators of the range have the same type. (Some ranges use a sentinel for the end position.) This range adaptor is useful primarily as a means of interfacing with older code (like the <code>std::</code> algorithms) that expects begin and end to have the same type.</td>
</tr>
</tbody>
</table>
These adaptors can be chained, so for instance, you can do the following:
<pre class="brush: cpp; notranslate">using namespace std;
for ( auto && e : r | view::filter(pred)
| view::transform(fn) ) {
// Iterate over filtered, transformed range
}
</pre>
Of course, you can also use range adaptor pipelines as arguments to the range-based algorithms in <code>std::ranges</code>:
<pre class="brush: cpp; notranslate">using namespace std;
// Insert a filtered, transformed range into
// the back of container `v`.
ranges::copy( r | view::filter(pred)
| view::transform(fn),
back_inserter(v) );
</pre>
Lazily adapting ranges is a powerful way to structure your programs. If you want a demonstration of how far this programming style can take you, see my <a href="https://www.youtube.com/watch?v=mFUXNMfaciE">CppCon keynote on ranges from 2015</a>, or just skim the code of the <a href="https://github.com/ericniebler/range-v3/blob/master/example/calendar.cpp">calendar application</a> I describe there, and note the lack of loops, branches, and overt state manipulation. ‘Nuf said.
<h1>Future Directions</h1>
Clearly, C++20 is getting a lot of new functionality in support of ranges. Getting here has taken a long time, mostly because nobody had ever built a fully general, industrial strength, generic library using the C++20 language support for concepts before. But now we are over that hump. All the foundational pieces are in place, and we’ve acrued a lot of knowledge in the process. Expect the feature set to expand rapidly post-C++20. There are already papers in flight.
Things currently in the works include:
<ul>
<li>Constructors for the standard containers that accept ranges,</li>
<li>A <code>take_while</code> range adaptor that accepts a predicate and returns a view of the first N elements for which the predicate evaluates to <code>true</code>,</li>
<li>A <code>drop</code> range adaptor that returns a view after dropping the first N elements of the input range,</li>
<li>A <code>drop_while</code> view that drops elements from an input range that satisfy a predicate.</li>
<li>An <code>istream_view</code> that is parameterized on a type and that reads elements of that type from a standard <code>istream</code>,</li>
<li>A <code>zip</code> view that takes N ranges and produces a view where the elements are N-tuples of the elements of the input ranges, and</li>
<li>A <code>zip_with</code> view that takes N ranges and a N-ary function, and produces a view where the elements are the result of calling the function with the elements of the input ranges.</li>
</ul>
And there’s more, lots more in range-v3 that has proven useful and will eventually be proposed by myself or some other interested range-r. Things I would especially like to see:
<ul>
<li>An iterator façade class template like range-v3’s <code>basic_iterator</code>;</li>
<li>A view façade class template like range-v3’s <code>view_facade</code>;</li>
<li>Range-ified versions of the numeric algorithms (e.g., <code>accumulate</code>, <code>partial_sum</code>, <code>inner_product</code>);</li>
<li>More range generators and adaptors, like <code>view::chunk</code>, <code>view::concat</code>, <code>view::group_by</code>, <code>view::cycle</code>, <code>view::slice</code>, <code>view::stride</code>, <code>view::generate[_n]</code>, <code>view::repeat[_n]</code>, a <code>view::join</code> that takes a delimiter, <code>view::intersperse</code>, <code>view::unique</code>, and <code>view::cartesian_product</code>, to name the more important ones; and</li>
<li>A “complete” set of actions to go along with the views. Actions, like the adaptors in the <code>view::</code> namespace, operate on ranges and compose into pipelines, but actions act eagerly on whole containers, and they are potentially mutating. (The views are non-mutating.)</li>
</ul>
With actions, it should be possible to do:
<pre class="brush: cpp; notranslate">v = move(v) | action::sort | action::unique;
</pre>
…to sort a vector and remove all duplicate elements.
And I haven’t even mentioned asynchronous ranges yet. But that’s a whole other <a href="http://ericniebler.com/2017/08/17/ranges-coroutines-and-react-early-musings-on-the-future-of-async-in-c/">blog post</a>. <img src="https://s.w.org/images/core/emoji/15.1.0/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" />
<h1>Summary</h1>
C++20 is rapidly approaching, and now that the Ranges work has been officially merged into the working draft, I have been hearing from Standard Library vendors who are starting to think about implementing all of this. Only GCC is in a position to ship the ranges support any time soon, since it is the only compiler currently shipping with support for concepts. But clang has a <a href="https://github.com/saarraz/clang-concepts">concepts branch</a> which is already usable, so there is hope for concepts — and ranges — in clang trunk sometime in the not-too-distant future. And Microsoft has publicly committed to supporting all of C++20 including concepts and ranges, and the conformance of the Microsoft compiler has been rapidly improving, recently gaining the ability to <a href="https://blogs.msdn.microsoft.com/vcblog/2018/11/07/use-the-official-range-v3-with-msvc-2017-version-15-9/">compile range-v3</a>. So things are looking good there, too.
It’s a stRANGE new world. Thanks for reading.
<div style="display:none">
<pre class="brush: cpp; title: ; notranslate">"\e"</pre>
</div>
]]></content:encoded>
<wfw:commentRss>https://ericniebler.com/2018/12/05/standard-ranges/feed/</wfw:commentRss>
<slash:comments>51</slash:comments>
<post-id xmlns="com-wordpress:feed-additions:1">1058</post-id> </item>
<item>
<title>Ranges, Coroutines, and React: Early Musings on the Future of Async in C++</title>
<link>https://ericniebler.com/2017/08/17/ranges-coroutines-and-react-early-musings-on-the-future-of-async-in-c/?utm_source=rss&utm_medium=rss&utm_campaign=ranges-coroutines-and-react-early-musings-on-the-future-of-async-in-c</link>
<comments>https://ericniebler.com/2017/08/17/ranges-coroutines-and-react-early-musings-on-the-future-of-async-in-c/#comments</comments>
<dc:creator><![CDATA[Eric Niebler]]></dc:creator>
<pubDate>Thu, 17 Aug 2017 21:41:08 +0000</pubDate>
<category><![CDATA[coroutines]]></category>
<category><![CDATA[generic-programming]]></category>
<category><![CDATA[library-design]]></category>
<category><![CDATA[ranges]]></category>
<category><![CDATA[reactive]]></category>
<category><![CDATA[std]]></category>
<category><![CDATA[std2]]></category>
<guid isPermaLink="false">http://ericniebler.com/?p=1029</guid>
<description><![CDATA[Disclaimer: these are my early thoughts. None of this is battle ready. You’ve been warned. Hello, Coroutines! At the recent C++ Committee meeting in Toronto, the Coroutines TS was forwarded to ISO for publication. That roughly means that the coroutine <a class="more-link" href="https://ericniebler.com/2017/08/17/ranges-coroutines-and-react-early-musings-on-the-future-of-async-in-c/">Continue reading Ranges, Coroutines, and React: Early Musings on the Future of Async in C++→</a>]]></description>
<content:encoded><![CDATA[Disclaimer: these are my early thoughts. None of this is battle ready. You’ve been warned.
<h2>Hello, Coroutines!</h2>
At the recent C++ Committee meeting in Toronto, the Coroutines TS was forwarded to ISO for publication. That roughly means that the coroutine “feature branch” is finished, and is ready to be merged into trunk (standard C++) after a suitable vetting period (no less than a year). That puts it on target for C++20. What does that mean for idiomatic modern C++?
Lots, actually. With the resumable functions (aka, stackless coroutines) from the Coroutines TS, we can do away with callbacks, event loops, and future chaining (<code>future.then()</code>) in our asynchronous APIs. Instead, our APIs can return “awaitable” types. Programmers can then just use these APIs in a synchronous-looking style, spamming <code>co_await</code> in front of any async API call and returning an awaitable type.
This is a bit abstract, so <a href="https://blogs.msdn.microsoft.com/vcblog/2017/02/02/using-ibuv-with-c-resumable-functions/">this blog post</a> make it more concrete. It describes how the author wrapped the interface of libuv — a C library that provides the asynchronous I/O in Node.js — in awaitables. In libuv, all async APIs take a callback and loop on an internal event loop, invoking the callback when the operation completes. Wrapping the interfaces in awaitables makes for a much better experience without the callbacks and the <a href="https://en.wikipedia.org/wiki/Inversion_of_control">inversion of control</a> they bring.
Below, for instance, is a function that (asynchronously) opens a file, reads from it, writes it to <code>stdout</code>, and closes it:
<pre class="brush: cpp; notranslate">auto start_dump_file( const std::string& str )
-> future_t<void>
{
// We can use the same request object for
// all file operations as they don't overlap.
static_buf_t<1024> buffer;
fs_t openreq;
uv_file file = co_await fs_open(uv_default_loop(),
&openreq,
str.c_str(),
O_RDONLY,
0);
if (file > 0)
{
while (1)
{
fs_t readreq;
int result = co_await fs_read(uv_default_loop(),
&readreq,
file,
&buffer,
1,
-1);
if (result <= 0)
break;
buffer.len = result;
fs_t req;
(void) co_await fs_write(uv_default_loop(),
&req,
1 /*stdout*/,
&buffer,
1,
-1);
}
fs_t closereq;
(void) co_await fs_close(uv_default_loop(),
&closereq,
file);
}
}
</pre>
You can see that this looks almost exactly like ordinary synchronous code, with two exceptions:
<ol>
<li>Calls to asynchronous operations are preceded with <code>co_await</code>, and</li>
<li>The function returns an awaitable type (<code>future_t<void></code>).</li>
</ol>
Very nice. But this code snippet does too much in my opinion. Wouldn’t it be nice to have a reusable component for asynchronously reading a file, separate from the bit about writing it to <code>stdout</code>? What would that even look like?
<h2>Hello, Ranges!</h2>
Also at the recent C++ Committee meeting in Toronto, the Ranges TS was forwarded to ISO for publication. This is the first baby step toward a complete reimagining and reimplementation of the C++ standard library in which interfaces are specified in terms of ranges in addition to iterators.
Once we have “range” as an abstraction, we can build range adaptors and build pipelines that transform ranges of values in interesting ways. More than just a curiosity, this is a very functional style that lets you program without a lot of state manipulation. The fewer states your program can be in, the easier it is for you to reason about your code, and the fewer bugs you’ll have. (For more on that, you can see my <a href="https://www.youtube.com/watch?v=mFUXNMfaciE">2015 C++Con talk about ranges</a>; or just look at <a href="https://github.com/ericniebler/range-v3/blob/master/example/calendar.cpp">the source for a simple app</a> that prints a formatted calendar to <code>stdout</code>, and note the lack of loops, conditionals, and overt state manipulation.)
For instance, if we have a range of characters, we might want to lazily convert each character to lowercase. Using the <a href="https://github.com/ericniebler/range-v3/">range-v3 library</a>, you can do the following:
<pre class="brush: cpp; notranslate">std::string hello("Hello, World!");
using namespace ranges;
auto lower = hello
| view::transform([](char c){
return (char)std::tolower(c);});
</pre>
Now <code>lower</code> presents a view of <code>hello</code> where each character is run through the <code>tolower</code> transform on the fly.
Although the range adaptors haven’t been standardized yet, the Committee has already put its stamp of approval on the overall direction, including adaptors and pipelines. (See <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4128.html">N4128</a> for the ranges position paper.) Someday, these components will all be standard, and the C++ community can encourage their use in idiomatic modern C++.
<h2>Ranges + Coroutines == ?</h2>
With coroutines, ranges become even more powerful. For one thing, the <code>co_yield</code> keyword makes it trivial to define your own (synchronous) ranges. Already with range-v3 you can use the following code to define a range of all the integers and apply a filter to them:
<pre class="brush: cpp; notranslate">#include <iostream>
#include <range/v3/all.hpp>
#include <range/v3/experimental/utility/generator.hpp>
using namespace ranges;
// Define a range of all the unsigned shorts:
experimental::generator<unsigned short> ushorts()
{
unsigned short u = 0;
do { co_yield u; } while (++u);
}
int main()
{
// Filter all the even unsigned shorts:
auto evens = ushorts()
| view::filter([](auto i) {
return (i % 2) == 0; });
// Write the evens to cout:
copy( evens, ostream_iterator<>(std::cout, "\n") );
}
</pre>
Put the above code in a .cpp file, compile with a recent clang and <code>-fcoroutines-ts -std=gnu++1z</code>, and away you go. Congrats, you’re using coroutines and ranges together. This is a trivial example, but you get the idea.
<h2>Asynchronous Ranges</h2>
That great and all, but it’s not asynchronous, so who cares? If it were asynchronous, what would that look like? Moving to the first element of the range would be an awaitable operation, and then moving to every subsequent element would also be awaitable.
In the ranges world, moving to the first element of a range <code>R</code> is spelled “<code>auto it = begin(R)</code>”, and moving to subsequent elements is spelled “<code>++it</code>”. So for an asynchronous range, those two operations should be awaitable. In other words, given an asynchronous range <code>R</code>, we should be able to do:
<pre class="brush: cpp; notranslate">// Consume a range asynchronously
for( auto it = co_await begin(R);
it != end(R);
co_await ++it )
{
auto && e = *it;
do_something( e );
}
</pre>
In fact, the Coroutines TS anticipates this and has a asynchronous range-based <code>for</code> loop for just this abstraction. The above code can be rewritten:
<pre class="brush: cpp; notranslate">// Same as above:
for co_await ( auto&& e : R )
{
do_something( e );
}
</pre>
Now we have two different but closely related abstractions: Range and AsynchronousRange. In the first, <code>begin</code> returns something that models an Iterator. In the second, <code>begin</code> returns an Awaitable of an AsynchronousIterator. What does that buy us?
<h2>Asynchronous Range Adaptors</h2>
Once we have an abstraction, we can program against that abstraction. Today we have a <code>view::transform</code> that knows how to operate on synchronous ranges. It can be extended to also work with asynchronous ranges. So can all the other range adaptors: <code>filter</code>, <code>join</code>, <code>chunk</code>, <code>group_by</code>, <code>interleave</code>, <code>transpose</code>, etc, etc. So it will be possible to build a pipeline of operations, and apply the pipeline to a synchronous range to get a (lazy) synchronous transformation, and apply the same exact pipeline to an asynchronous range to get a non-blocking asynchronous transformation. The benefits are:
<ul>
<li>The same functional style can be used for synchronous and asynchronous code, reusing the same components and the same idioms.</li>
<li>Asynchronous code, when expressed with ranges and transformations, can be made largely stateless, as can be done today with synchronous range-based code. This leads to programs with fewer states and hence fewer state-related bugs.</li>
<li>Range-based code composes very well and encourages a decomposition of problems into orthogonal pieces which are easily testable in isolation. (E.g., a <code>view::filter</code> component can be used with any input range, synchronous or asynchronous, and can be easily tested in isolation of any particular range.)</li>
</ul>
Another way to look at this is that synchronous ranges are an example of a pull-based interface: the user extracts elements from the range and processes them one at a time. Asynchronous ranges, on the other hand, represent more of a push-based model: things happen when data shows up, whenever that may be. This is akin to the reactive style of programming.
By using ranges and coroutines together, we unify push and pull based idioms into a consistent, functional style of programming. And that’s going to be important, I think.
<h2>Back to LibUV</h2>
Earlier, we wondered about a reusable libuv component that used its asynchronous operations to read a file. Now we know what such a component could look like: an asynchronous range. Let’s start with an asynchronous range of characters. (Here I’m glossing over the fact that libuv deals with UTF-8, not ASCII. I’m also ignoring errors, which is another can of worms.)
<pre class="brush: cpp; notranslate">auto async_file( const std::string& str )
-> async_generator<char>
{
// We can use the same request object for
// all file operations as they don't overlap.
static_buf_t<1024> buffer;
fs_t openreq;
uv_file file = co_await fs_open(uv_default_loop(),
&openreq,
str.c_str(),
O_RDONLY,
0);
if (file > 0)
{
while (1)
{
fs_t readreq;
int result = co_await fs_read(uv_default_loop(),
&readreq,
file,
&buffer,
1,
-1);
if (result <= 0)
break;
// Yield the characters one at a time.
for ( int i = 0; i < result; ++i )
{
co_yield buffer.buffer[i];
}
}
fs_t closereq;
(void) co_await fs_close(uv_default_loop(),
&closereq,
file);
}
}
</pre>
The <code>async_file</code> function above asynchronously reads a block of text from the file and then <code>co_yield</code>s the individual characters one at a time. The result is an asynchronous range of characters: <code>async_generator<char></code>. (For an implementation of <code>async_generator</code>, look in <a href="https://github.com/lewissbaker/cppcoro">Lewis Baker’s cppcoro library</a>.)
Now that we have an asynchronous range of characters representing the file, we can apply transformations to it. For instance, we could convert all the characters to lowercase:
<pre class="brush: cpp; notranslate">// Create an asynchronous range of characters read
// from a file and lower-cased:
auto async_lower = async_file("some_input.txt")
| view::transform([](char c){
return (char)std::tolower(c);});
</pre>
That’s the same transformation we applied above to a <code>std::string</code> synchronously, but here it’s used asynchronously. Such an asynchronous range can then be passed through further transforms, asynchronously written out, or passed to an asynchronous <code>std::</code> algorithm (because we’ll need those, too!)
<h2>One More Thing</h2>
I hear you saying, “Processing a file one character at a time like this would be too slow! I want to operate on chunks.” The above <code>async_file</code> function is still doing too much. It should be an asynchronous range of chunks. Let’s try again:
<pre class="brush: cpp; notranslate">auto async_file_chunk( const std::string& str )
-> async_generator<static_buf_t<1024>&>
{
// We can use the same request object for
// all file operations as they don't overlap.
static_buf_t<1024> buffer;
fs_t openreq;
uv_file file = co_await fs_open(uv_default_loop(),
&openreq,
str.c_str(),
O_RDONLY,
0);
if (file > 0)
{
while (1)
{
fs_t readreq;
int result = co_await fs_read(uv_default_loop(),
&readreq,
file,
&buffer,
1,
-1);
if (result <= 0)
break;
// Just yield the buffer.
buffer.len = result;
co_yield buffer;
}
fs_t closereq;
(void) co_await fs_close(uv_default_loop(),
&closereq,
file);
}
}
</pre>
Now if I want to, I can asynchronously read a block and asynchronously write the block, as the original code was doing, but while keeping those components separate, as they should be.
For some uses, a flattened view would be more convenient. No problem. That’s what the adaptors are for. If <code>static_buf_t</code> is a (synchronous) range of characters, we already have the tools we need:
<pre class="brush: cpp; notranslate">// Create an asynchronous range of characters read from a
// chunked file and lower-cased:
auto async_lower = async_file_chunk("some_input.txt")
| view::join
| view::transform([](char c){
return (char)std::tolower(c);});
</pre>
Notice the addition of <code>view::join</code>. Its job is to take a range of ranges and flatten it. Let’s see what joining an asynchronous range might look like:
<pre class="brush: cpp; notranslate">template <class AsyncRange>
auto async_join( AsyncRange&& rng )
-> async_generator<range_value_t<
async_range_value_t<AsyncRange>>>
{
for co_await ( auto&& chunk : rng )
{
for ( auto&& e : chunk )
co_yield e;
}
}
</pre>
We (asynchronously) loop over the outer range, then (synchronously) loop over the inner ranges, and <code>co_yield</code> each value. Pretty easy. From there, it’s just a matter of rigging up <code>operator|</code> to <code>async_join</code> to make joining work in pipelines. (A fully generic <code>view::join</code> will be more complicated than that since both the inner and outer ranges can be either synchronous or asynchronous, but this suffices for now.)
<h2>Summary</h2>
With ranges and coroutines together, we can unify the push and pull programming idioms, bringing ordinary C++ and reactive C++ closer together. The C++ Standard Library is already evolving in this direction, and I’m working to make that happen both on the Committee and internally at Facebook.
There’s LOTS of open questions. How well does this perform at runtime? Does this scale? Is it flexible enough to handle lots of interesting use cases? How do we handle errors in the middle of an asynchronous pipeline? What about splits and joins in the async call graph? Can this handle streaming interfaces? And so on. I’ll be looking into all this, but at least for now I have a promising direction, and that’s fun.
<div style="display:none">
<pre class="brush: cpp; title: ; notranslate">"\e"</pre>
</div>
]]></content:encoded>
<wfw:commentRss>https://ericniebler.com/2017/08/17/ranges-coroutines-and-react-early-musings-on-the-future-of-async-in-c/feed/</wfw:commentRss>
<slash:comments>20</slash:comments>
<post-id xmlns="com-wordpress:feed-additions:1">1029</post-id> </item>
<item>
<title>Post-Conditions on Self-Move</title>
<link>https://ericniebler.com/2017/03/31/post-conditions-on-self-move/?utm_source=rss&utm_medium=rss&utm_campaign=post-conditions-on-self-move</link>
<comments>https://ericniebler.com/2017/03/31/post-conditions-on-self-move/#comments</comments>
<dc:creator><![CDATA[Eric Niebler]]></dc:creator>
<pubDate>Fri, 31 Mar 2017 15:50:21 +0000</pubDate>
<category><![CDATA[c++11]]></category>
<category><![CDATA[library-design]]></category>
<guid isPermaLink="false">http://ericniebler.com/?p=992</guid>
<description><![CDATA[UPDATE April 8, 2016 This post has been edited since publication to reflect my evolving understanding. As a result of the issues raised in this post, it’s possible that the committee decides to strengthen the post-conditions on move, so the <a class="more-link" href="https://ericniebler.com/2017/03/31/post-conditions-on-self-move/">Continue reading Post-Conditions on Self-Move→</a>]]></description>
<content:encoded><![CDATA[UPDATE April 8, 2016 This post has been edited since publication to reflect my evolving understanding. As a result of the issues raised in this post, it’s possible that the committee decides to strengthen the post-conditions on move, so the recommendations made here may evolve further. Stay tuned.
TL;DR: In addition to the usual rule about move operations leaving the source object in a valid but unspecified state, we can add an additional rule:
Self-move assignment should “work” and at the very least leave the object in a valid but unspecified state.
<h2>Discussion</h2>
What do you think the following code should do?
<pre class="brush: cpp; notranslate">X x = {/*something*/};
x = std::move(x);
</pre>
Yes, it’s dumb, but with our alias-happy language, it can happen. So what does the standard say about this? For that we turn to [res.on.arguments]/p1.3 taken from the library introduction (emphasis mine):
<blockquote>
If a function argument binds to an rvalue reference parameter, the implementation may assume that this parameter is a unique reference to this argument. […] If a program casts an lvalue to an xvalue while passing that lvalue to a library function (e.g. by calling the function with the argument <code>std::move(x)</code>), the program is effectively asking that function to treat that lvalue as a temporary. The implementation is free to optimize away aliasing checks which might be needed if the argument <strike>was</strike>were an lvalue.
</blockquote>
(I fixed the grammar mistake because I am a Huge Dork.) The above seems to say that <code>std::swap(x, x)</code> is playing with fire, because <code>std::swap</code> is implemented as follows:
<pre class="brush: cpp; notranslate">template <class T>
void swap(T& a, T& b) {
auto x(std::move(a));
a = std::move(b); // Here be dragons
b = std::move(x);
}
</pre>
If <code>a</code> and <code>b</code> refer to the same object, the second line of <code>std::swap</code> does a self-move assign. Blamo! Undefined behavior, right?
Such was what I thought when I first wrote this post until Howard Hinnant drew my attention to the requirements table for the MoveAssignable concept, which says that for the expression <code>t = rv</code> (emphasis mine):
<blockquote>
If <code>t</code> and <code>rv</code> do not refer to the same object, <code>t</code> is equivalent to the value of <code>rv</code> before the assignment […] <code>rv</code>’s state is unspecified. [ Note: rv must still meet the requirements of the library component that is using it, whether or not <code>t</code> and <code>rv</code> refer to the same object. […] –end note]
</blockquote>
Ah, ha! So here we have it. After a self-move, the object is required to be in a valid-but-unspecified state.
My attention we drawn to this issue during a code review of a change I wanted to make to <a href="https://github.com/facebook/folly">Folly</a>‘s <a href="https://github.com/facebook/folly/blob/master/folly/docs/Function.md"><code>Function</code></a> class template. I wanted to change this:
<pre class="brush: cpp; notranslate">Function& operator=(Function&& that) noexcept {
if (this != &that) {
// do the move
}
return *this;
}
</pre>
to this:
<pre class="brush: cpp; notranslate">Function& operator=(Function&& that) noexcept {
assert(this != &that);
// do the move
return *this;
}
</pre>
The reason: let’s make moves as fast as possible and take advantage of the fact that Self-Moves Shouldn’t Happen. We assert, fix up the places that get it wrong, and make our programs an iota faster. Right?
Not so fast, said one clued-in reviewer. Self-swaps can happen quite easily in generic algorithms, and they shouldn’t trash the state of the object or the state of the program. This rang true, and so begin my investigation.
A few Google searches later turned up <a href="https://stackoverflow.com/questions/9322174/move-assignment-operator-and-if-this-rhs/9322542#9322542">this StackOverflow gem from Howard Hinnant</a>. C++ wonks know Howard Hinnant. He’s the author of libc++, and an old time C++ library developer. (Remember <a href="https://en.wikipedia.org/wiki/CodeWarrior">Metrowerks CodeWarrior</a>? No? Get off my lawn.) He also happens to be the person who wrote the proposal to add rvalue references to the language, so you know, Howard’s given this some thought. First Howard says this:
<blockquote>
Some will argue that <code>swap(x, x)</code> is a good idea, or just a necessary evil. And this, if the swap goes to the default swap, can cause a self-move-assignment.
I disagree that <code>swap(x, x)</code> is ever a good idea. If found in my own code, I will consider it a performance bug and fix it.
</blockquote>
But then in an Update, he backtracks:
<blockquote>
I’ve given this issue some more thought, and changed my position somewhat. I now believe that assignment should be tolerant of self assignment, but that the post conditions on copy assignment and move assignment are different:
For copy assignment:
<pre class="brush: cpp; notranslate">x = y;
</pre>
one should have a post-condition that the value of <code>y</code> should not be altered. When <code>&x == &y</code> then this postcondition translates into: self copy assignment should have no impact on the value of <code>x</code>.
For move assignment:
<pre class="brush: cpp; notranslate">x = std::move(y);
</pre>
one should have a post-condition that <code>y</code> has a valid but unspecified state. When <code>&x == &y</code> then this postcondition translates into: <code>x</code> has a valid but unspecified state. I.e. self move assignment does not have to be a no-op. But it should not crash. This post-condition is consistent with allowing <code>swap(x, x)</code> to just work […]
</blockquote>
When Howard Hinnant changes his mind about something having to do with library design, I sit up and take note, because it means that something very deep and subtle is going on. In this case, it means I’ve been writing bad move assignment operators for years.
By Howard’s yardstick — and by the requirements for the MoveAssignable concept in the standard, thanks Howard! — this move assignment operator is wrong:
<pre class="brush: cpp; notranslate">Function& operator=(Function&& that) noexcept {
assert(this != &that); // No! Bad C++ programmer!
// do the move
return *this;
}
</pre>
Move assignment operators should accept self-moves and do no evil; indeed for <code>std::swap(f, f)</code> to work it must.
That’s not the same as saying it needs to preserve the object’s value, though, and not preserving the object’s value can be a performance win. It can save a branch, for instance. Here is how I reformulated <code>folly::Function</code>’s move assignment operator[*]:
<pre class="brush: cpp; notranslate">Function& operator=(Function&& that) noexcept {
clear_(); // Free all of the resources owned by *this
moveFrom_(that); // Move that's guts into *this.
return *this;
}
</pre>
[*] Well, not exactly, but that’s the gist.
Of note is that <code>clear_()</code> leaves <code>*this</code> in a state such that it is still OK to <code>moveFrom_(*this)</code>, which is what happens when <code>that</code> and <code>*this</code> are the same object. In the case of <code>Function</code>, it just so happens that the effect of this code is to put the <code>Function</code> object back into the default-constructed state, obliterating the previous value. The particular final state of the object isn’t important though, so long as it is still valid.
<h2>Summary</h2>
So, as always we have the rule about moves:
Move operations should leave the source object in a valid but unspecified state.
And to that we can add an additional rule:
Self-moves should do no evil and leave the object in a valid but unspecified state.
If you want to go further and leave the object unmodified, that’s not wrong per se, but it’s not required by the standard as it is today. Changing the value is perfectly OK (Howard and the standard say so!), and doing that might save you some cycles.
TIL
<div style="display:none">
<pre class="brush: cpp; title: ; notranslate">"\e"</pre>
</div>
]]></content:encoded>
<wfw:commentRss>https://ericniebler.com/2017/03/31/post-conditions-on-self-move/feed/</wfw:commentRss>
<slash:comments>22</slash:comments>
<post-id xmlns="com-wordpress:feed-additions:1">992</post-id> </item>
<item>
<title>Iterators++, Part 3</title>
<link>https://ericniebler.com/2015/03/03/iterators-plus-plus-part-3/?utm_source=rss&utm_medium=rss&utm_campaign=iterators-plus-plus-part-3</link>
<comments>https://ericniebler.com/2015/03/03/iterators-plus-plus-part-3/#comments</comments>
<dc:creator><![CDATA[Eric Niebler]]></dc:creator>
<pubDate>Wed, 04 Mar 2015 02:32:56 +0000</pubDate>
<category><![CDATA[generic-programming]]></category>
<category><![CDATA[library-design]]></category>
<category><![CDATA[ranges]]></category>
<category><![CDATA[std]]></category>
<guid isPermaLink="false">http://ericniebler.com/?p=918</guid>
<description><![CDATA[This is the fourth and final post in a series about proxy iterators, the limitations of the existing STL iterator concept hierarchy, and what could be done about it. The first three posts describe the problems of proxy iterators, the <a class="more-link" href="https://ericniebler.com/2015/03/03/iterators-plus-plus-part-3/">Continue reading Iterators++, Part 3→</a>]]></description>
<content:encoded><![CDATA[This is the fourth and final post in a series about proxy iterators, the limitations of the existing STL iterator concept hierarchy, and what could be done about it. The <a href="http://ericniebler.com/2015/01/28/to-be-or-not-to-be-an-iterator/">first</a> <a href="http://ericniebler.com/2015/02/03/iterators-plus-plus-part-1/">three</a> <a href="http://ericniebler.com/2015/02/13/iterators-plus-plus-part-2/">posts</a> describe the problems of proxy iterators, the way to swap and move their elements, and how to rigorously define what an Iterator is.
This time around I’ll be focusing on the final problem: how to properly constrain the higher-order algorithms so that they work with proxy iterators.
<h2>A Unique Algorithm</h2>
In this post, I’ll be looking at one algorithm in particular and how it interacts with proxy iterators: <code>unique_copy</code>. Here is its prototype:
<pre class="brush: cpp; notranslate">template <class InIter, class OutIter, class Fn>
OutIter unique_copy(InIter first, InIter last,
OutIter result, Fn binary_pred);
</pre>
This algorithm copies elements from one range to another, skipping adjacent elements that are equal, using a predicate for the comparison.
Consider the following invocation:
<pre class="brush: cpp; notranslate">std::stringstream sin{"1 1 2 3 3 3 4 5"};
unique_copy(
std::istream_iterator<int>{sin},
std::istream_iterator<int>{},
std::ostream_iterator<int>{std::cout, " "},
std::equal_to<int>{} );
</pre>
This reads a bunch of ints from <code>sin</code> and writes the unique ones to <code>cout</code>. Simple, right? This code prints:
<pre><code>1 2 3 4 5
</code></pre>
Think for a minute how you would implement <code>unique_copy</code>. First you read an int from the stream. Then you write it out to the other stream. Then you read another int. You want to compare it to the last one. Ah! You need to save the last element locally so that you can do the comparisons. Interesting.
When I really want to understand how some part of the STL works, I check out how the feature is implemented in ye olde <a href="https://www.sgi.com/tech/stl/">SGI STL</a>. This codebase is so old, it may have first been written on parchment and compiled by monks. But it’s the cleanest and most straightforward STL implementation I know, and I recommend reading it through. Here, modulo some edits for readability, is the relevant part of <code>unique_copy</code>:
<pre class="brush: cpp; notranslate">// Copyright (c) 1994
// Hewlett-Packard Company
// Copyright (c) 1996
// Silicon Graphics Computer Systems, Inc.
template <class InIter, class OutIter, class Fn,
class _Tp>
OutIter
__unique_copy(InIter first, InIter last,
OutIter result,
Fn binary_pred, _Tp*) {
_Tp value = *first;
*result = value;
while (++first != last)
if (!binary_pred(value, *first)) {
value = *first;
*++result = value;
}
return ++result;
}
</pre>
(The calling code ensures that <code>first != last</code>, which explains why this code skips that check. And the strange <code>_Tp*</code> argument is so that the iterator’s value type can be deduced; the monks couldn’t compile traits classes.) Note the <code>value</code> local variable on line 11, and especially note line 14, where it passes a value and a reference to <code>binary_pred</code>. Keep that in mind because it’s important!
<h2>The Plot Thickens</h2>
You probably know more about <code>unique_copy</code> now than you ever cared to. Why do I bring it up? Because it’s super problematic when used with proxy iterators. Think about what happens when you try to pass <code>vector<bool>::iterator</code> to the above <code>__unique_copy</code> function:
<pre class="brush: cpp; notranslate">std::vector<bool> vb{true, true, false, false};
using R = std::vector<bool>::reference;
__unique_copy(
vb.begin(), vb.end(),
std::ostream_iterator<bool>{std::cout, " "},
[](R b1, R b2) { return b1 == b2; }, (bool*)0 );
</pre>
This should write a “true” and a “false” to <code>cout</code>, but it doesn’t compile. Why? The lambda is expecting to be passed two objects of <code>vector<bool></code>‘s proxy reference type, but remember how <code>__unique_copy</code> calls the predicate:
<pre class="brush: cpp; notranslate">if (!binary_pred(value, *first)) { /*...*/
</pre>
That’s a <code>bool&</code> and a <code>vector<bool>::reference</code>. Ouch!
They’re just bools, and bools are cheap to copy, so take them by value. Problem solved. Well, sure, but what if they weren’t bools? What if we proxied a sequence of things that are expensive to copy? Now the problem is harder.
So for lack of anything better (and pretending that bools are expensive to copy, bear with me), you write the lambda like this:
<pre class="brush: cpp; notranslate">[](bool& b1, R b2) { return b1 == b2; }
</pre>
Yuk. Now you port this code to another STL that happens to call the predicate with reversed arguments and the code breaks again. <img src="https://s.w.org/images/core/emoji/15.1.0/72x72/1f641.png" alt="🙁" class="wp-smiley" style="height: 1em; max-height: 1em;" />
My point is this: once we introduce proxy iterators into the mix, it becomes non-obvious how to define predicates for use with the algorithms. Sometimes the algorithms call the predicates with references, sometimes with values, and sometimes — like <code>unique_copy</code> — with a mix of both. Algorithms like <code>sort</code> first call the predicate one way, and then later call it another way. Vive la différence!
<h2>A Common Fix</h2>
This problem has a very simple solution in C++14: a generic lambda. We can write the above code simply, portably, and optimally as follows:
<pre class="brush: cpp; notranslate">std::vector<bool> vb{true, true, false, false};
std::unique_copy(
vb.begin(), vb.end(),
std::ostream_iterator<bool>{std::cout, " "},
[](auto&& b1, auto&& b2) { return b1 == b2; } );
</pre>
No matter what <code>unique_copy</code> throws at this predicate, it will accommodate it with grace and style.
But still. Polymorphic function objects feel like a big hammer. Some designs require monomorphic functions, like <code>std::function</code> or virtuals, or maybe even a function pointer if you have to interface with C. My point is, it feels wrong for the STL to require the use of a polymorphic function for correctness.
To restate the problem, we don’t know how to write a monomorphic predicate for <code>unique_copy</code> when our sequence is proxied because <code>value_type&</code> may not convert to <code>reference</code>, and <code>reference</code> may not convert to <code>value_type&</code>. If only there were some other type, some other reference-like type, they could both convert to…
But there is! If you read my <a href="http://ericniebler.com/2015/02/13/iterators-plus-plus-part-2/">last post</a>, you know about <code>common_reference</code>, a trait that computes a reference-like type (possibly a proxy) to which two other references can bind (or convert). In order for a proxy iterator to model the Iterator concept, I required that an iterator’s <code>reference</code> type and its <code>value_type&</code> must share a common reference. At the time, I insinuated that the only use for such a type is to satisfy the concept-checking machinery. But there’s another use for it: the common reference is the type we could use to define our monomorphic predicate.
I can imagine a future STL providing the following trait:
<pre class="brush: cpp; notranslate">// An iterator's common reference type:
template <InputIterator I>
using iterator_common_reference_t =
common_reference_t<
typename iterator_traits::value_type &
typename iterator_traits::reference>;
</pre>
We could use that trait to write the predicate as follows:
<pre class="brush: cpp; notranslate">using I = vector<bool>::iterator;
using C = iterator_common_reference_t;
auto binary_pred = [](C r1, C r2) {
return r1 == r2;
};
</pre>
That’s certainly a fair bit of hoop-jumping just to define a predicate. But it’s not some new complexity that I’m introducing. <code>unique_copy</code> and <code>vector<bool></code> have been there since 1998. I’m just trying to make them play nice.
And these hoops almost never need to be jumped. You’ll only need to use the common reference type when all of the following are true: (a) you are dealing with a proxied sequence (or are writing generic code that could deal with proxied sequences), (b) taking the arguments by value is undesirable, and (c) using a polymorphic function is impossible or impractical for some reason. I wouldn’t think that’s very often.
<h2>Algorithm Constraints</h2>
So that’s how things look from the perspective of the end user. How do they look from the other side, from the perspective of the algorithm author? In particular, how should <code>unique_copy</code> look once we use Concepts Lite to constrain the algorithm?
The <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3351.pdf">Palo Alto TR</a> takes a stab at it. Here is how it constrains <code>unique_copy</code>:
<pre class="brush: cpp; notranslate">template <InputIterator I, WeaklyIncrementable Out,
Semiregular R>
requires Relation<R, ValueType, ValueType> &&
IndirectlyCopyable<I, Out>
Out unique_copy(I first, I last, Out result, R comp);
</pre>
There’s a lot going on there, but the relevant part is <code>Relation<R, ValueType, ValueType></code>. In other words, the type <code>R</code> must be an equivalence relation that accepts arguments of the range’s value type. For all the reasons we’ve discussed, that doesn’t work when dealing with a proxied range like <code>vector<bool></code>.
So what should the constraint be? Maybe it should be <code>Relation<R, ValueType, Reference></code>? But no, <code>unique_copy</code> doesn’t always need to copy a value into a local. Only when neither the input nor the output iterators model ForwardIterator. So sometimes the <code>unique_copy</code> calls the predicate like <code>pred(*i,*j)</code> and sometimes like <code>pred(value, *i)</code>. The constraint has to be general enough to accommodate that.
Maybe it could also use the iterator’s common reference type? What if we constrained <code>unique_copy</code> like this:
<pre class="brush: cpp; notranslate">template <InputIterator I, WeaklyIncrementable Out,
Semiregular R>
requires Relation<R, CommonReferenceType,
CommonReferenceType> &&
IndirectlyCopyable<I, Out>
Out unique_copy(I first, I last, Out result, R comp);
</pre>
This constraint make a promise to callers: “I will only pass objects of type <code>CommonReferenceType</code> to the predicate.” But that’s a lie. It’s not how <code>unique_copy</code> is actually implemented. We could change the implementation to fulfill this promise by casting the arguments before passing them to the predicate, but that’s ugly and potentially inefficient.
Really, I think we have to check that the predicate is callable with all the possible combinations of values and references. That sucks, but I don’t see a better option. With some pruning, these are the checks that I think matter enough to be required:
<pre class="brush: cpp; notranslate">Relation<R, ValueType, ValueType> &&
Relation<R, ValueType, ReferenceType> &&
Relation<R, ReferenceType, ValueType> &&
Relation<R, ReferenceType, ReferenceType> &&
Relation<R, CommonReferenceType, CommonReferenceType>
</pre>
As an implementer, I don’t want to write all that, and our users don’t want to read it, so we can bundle it up nice and neat:
<pre class="brush: cpp; notranslate">IndirectRelation<R, I, I>
</pre>
That’s easier on the eyes and on the brain.
<h2>Interesting Indirect Invokable Implications</h2>
In short, I think that everywhere the algorithms take a function, predicate, or relation, we should add a constraint like <code>IndirectFunction</code>, <code>IndirectPredicate</code>, or <code>IndirectRelation</code>. These concepts will require that the function is callable with a cross-product of values and references, with an extra requirement that the function is also callable with arguments of the common reference type.
This might seem very strict, but for non-proxy iterators, it adds exactly zero new requirements. And even for proxy iterators, it’s only saying in code the things that necessarily had to be true anyway. Rather than making things harder, the common reference type makes them easier: if your predicate takes arguments by the common reference type, all the checks succeed, guaranteed.
It’s possible that the common reference type is inefficient to use. For instance, the common reference type between <code>bool&</code> and <code>vector<bool>::reference</code> is likely to be a variant type. In that case, you might not want your predicate to take arguments by the common reference. Instead, you’d want to use a generic lambda, or define a function object with the necessary overloads. The concept checking will tell you if you forgot any overloads, ensuring that your code is correct and portable.
<h2>Summary</h2>
That’s the theory. I implemented all this in my <a href="https://github.com/ericniebler/range-v3">Range-v3</a> library. Now I can <code>sort</code> a <code>zip</code> range of <code>unique_ptr</code>s. So cool.
Here, in short, are the changes we would need to make the STL fully support proxy iterators:
<ol>
<li>The algorithms need to use <code>iter_swap</code> consistently whenever elements need to be swapped. <code>iter_swap</code> should be a documented customization point.</li>
<li>We need an <code>iter_move</code> customization point so that elements can be moved out of and back into sequence. This gives iterators a new <code>rvalue_reference</code> associated type.</li>
<li>We need a new <code>common_reference</code> trait that, like <code>common_type</code>, can be specialized on user-defined types.</li>
<li>All iterators need to guarantee that their <code>value_type</code> and <code>reference</code> associated types share a common reference. Likewise for <code>value_type</code>/<code>rvalue_reference</code>, and for <code>reference</code>/<code>rvalue_reference</code>.</li>
<li>We need <code>IndirectFunction</code>, <code>IndirectPredicate</code>, and <code>IndirectRelation</code> concepts as described above. The higher-order algorithms should be constrained with them.</li>
</ol>
From the end users’ perspective, not a lot changes. All existing code works as it did before, and all iterators that are valid today continue being valid in the future. Some proxy iterators, like <code>vector<bool></code>‘s, would need some small changes to model the Iterator concept, but afterward those iterators are on equal footing with all the other iterators for the first time ever. Code that deals with proxy sequences might need to use <code>common_reference</code> when defining predicates, or they might need to use a generic lambda instead.
So that’s it. To the best of my knowledge, this is the first comprehensive solution to the proxy iterator problem, a problem we’ve lived with from day one, and which only promises to get worse with the introduction of range views. There’s some complexity for sure, but the complexity seems to be necessary and inherent. And honestly I don’t think it’s all that bad.
<h2>Future Directions</h2>
I’m unsure where this goes from here. I plan to sit on it for a bit to see if any better solutions come along. There’s been some murmuring about a possible language solution for proxy references, but there is inherent complexity to proxy iterators, and it’s not clear to me at this point how a language solution would help.
I’m currently working on what I believe will be the first draft of a Ranges TS. That paper will not address the proxy iterator problem. I could imagine writing a future paper that proposes the changes I suggest above. Before I do that, I would probably try to start a discussion on the committee mailing lists to feel people out. If any committee members are reading this, feel free to comment below.
Thanks for following along, and thanks for all your encouraging and thought-provoking comments. Things in the C++ world are moving fast these days. It’s tough to keep up with it all. I feel blessed that you all have invested so much time exploring these issues with me. <3
As always, you can find all code described here in my <a href="https://github.com/ericniebler/range-v3">range-v3</a> repo on github.
<div style="display:none">
<pre class="brush: cpp; title: ; notranslate">"\e"</pre>
</div>
]]></content:encoded>
<wfw:commentRss>https://ericniebler.com/2015/03/03/iterators-plus-plus-part-3/feed/</wfw:commentRss>
<slash:comments>30</slash:comments>
<post-id xmlns="com-wordpress:feed-additions:1">918</post-id> </item>
<item>
<title>Iterators++, Part 2</title>
<link>https://ericniebler.com/2015/02/13/iterators-plus-plus-part-2/?utm_source=rss&utm_medium=rss&utm_campaign=iterators-plus-plus-part-2</link>
<comments>https://ericniebler.com/2015/02/13/iterators-plus-plus-part-2/#comments</comments>
<dc:creator><![CDATA[Eric Niebler]]></dc:creator>
<pubDate>Sat, 14 Feb 2015 04:09:31 +0000</pubDate>
<category><![CDATA[generic-programming]]></category>
<category><![CDATA[library-design]]></category>
<category><![CDATA[ranges]]></category>
<category><![CDATA[std]]></category>
<guid isPermaLink="false">http://104.154.63.30/?p=874</guid>
<description><![CDATA[Disclaimer: This is a long, boring post about minutia. For serious library wonks only. This is the third in a series about proxy iterators, the limitations of the existing STL iterator concept hierarchy, and what could be done about it. <a class="more-link" href="https://ericniebler.com/2015/02/13/iterators-plus-plus-part-2/">Continue reading Iterators++, Part 2→</a>]]></description>
<content:encoded><![CDATA[Disclaimer: This is a long, boring post about minutia. For serious library wonks only.
This is the third in a series about proxy iterators, the limitations of the existing STL iterator concept hierarchy, and what could be done about it. In the <a href="http://104.154.63.30/2015/01/28/to-be-or-not-to-be-an-iterator/">first post</a> I explained what proxy iterators are (an iterator like <code>vector<bool></code>‘s that, when dereferenced, returns a proxy object rather than a real reference) and three specific difficulties they cause in today’s STL:
<ol>
<li>What, if anything, can we say in general about the relationship between an iterator’s value type and its reference type?</li>
<li>How do we constrain higher-order algorithms like <code>for_each</code> and <code>find_if</code> that take functions that operate on a sequence’s elements?</li>
<li>How do we implement algorithms that must swap and move elements around, like <code>sort</code> and <code>reverse</code>?</li>
</ol>
In the <a href="http://104.154.63.30/2015/02/03/iterators-plus-plus-part-1/">second post</a>, I zoomed in on the problem (3) and showed how the existing <code>std::iter_swap</code> API could be pressed into service, along with a new API that I propose: <code>std::iter_move</code>. Together, these APIs give an iterator a channel through which to communicate to the algorithms how its elements should be swapped and moved. With the addition of the <code>iter_move</code> API, iterators pick up a new associated type: <code>rvalue_reference</code>, which can live in <code>std::iterator_traits</code> alongside the existing <code>value_type</code> and <code>reference</code> associated types.
In this post, I’ll dig into the first problem: how we define in code what an iterator is.
<h2>Values and References</h2>
As in the first two articles, I’ll use the <code>zip</code> view to motivate the discussion, because it’s easy to grok and yet totally bedeviling for the STL algorithms. Recall that <code>zip</code> lazily adapts two sequences by making them look like one sequence of <code>pair</code>s, as demonstrated below:
<pre class="brush: cpp; notranslate">std::vector<int> x{1,2,3,4};
std::vector<int> y{9,8,7,6};
using namespace ranges;
auto zipped = view::zip(x, y);
assert(*zipped.begin() == std::make_pair(1,9));
assert(&(*zipped.begin()).first == &x[0]);
</pre>
As the two assertions above show, dereferencing a <code>zip</code> iterator yields a <code>pair</code>, and that the pair is actually a pair of references, pointing into the underlying sequences. The <code>zip</code> range above has the following associated types:
<table>
<thead>
<tr>
<th>Associated type…</th>
<th>… for the <code>zip</code> view</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>value_type</code></td>
<td><code>pair<int, int></code></td>
</tr>
<tr>
<td><code>reference</code></td>
<td><code>pair<int &, int &></code></td>
</tr>
<tr>
<td><code>rvalue_reference</code></td>
<td><code>pair<int &&, int &&></code></td>
</tr>
</tbody>
</table>
With Concepts coming to C++, we’re going to need to say in code what an iterator is. The <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3351.pdf">Palo Alto TR</a>, published in 2012, takes a stab at it: an <code>InputIterator</code> is <code>Readable</code> and <code>Incrementable</code>, where <code>Readable</code> is defined as follows:
<pre class="brush: cpp; notranslate">template< typename I >
concept bool Readable =
Semiregular &&
requires(I i) {
typename ValueType;
{ *i } -> const ValueType &;
};
</pre>
This says that a <code>Readable</code> type has an associated <code>ValueType</code>. It also says that <code>*i</code> is a valid expression, and that the result of <code>*i</code> must be convertible to <code>const ValueType &</code>. This is fine when <code>*i</code> returns something simple like a real reference. But when it returns a proxy reference, like the <code>zip</code> view does, it causes problems.
Substituting a <code>zip</code> iterator into the <code>requires</code> clause above results in something like this:
<pre class="brush: cpp; notranslate">const pair<int,int>& x = *i;
</pre>
This tries to initialize <code>x</code> with a <code>pair<int&, int&></code>. This actually works in a sense; the temporary <code>pair<int &, int &></code> object is implicitly converted into a temporary <code>pair<int, int></code> by copying the underlying integers, and that new pair is bound to the <code>const &</code> because temporaries can bind to const references.
But copying values is not what we want or expect. If instead of <code>int</code>s, we had pairs of some move-only type like <code>unique_ptr</code>, this wouldn’t have worked at all.
So the <code>Readable</code> concept needs to be tweaked to handle proxy references. What can we do?
One simple way to make the <code>zip</code> iterator model the <code>Readable</code> concept is to simply remove the requirement that <code>*i</code> be convertible to <code>const ValueType&</code>. This is unsatisfying. Surely there is something we can say about the relationship between an iterator’s reference type and its value type. I think there is, and there’s a hint in the way the Palo Alto TR defines the <code>EqualityComparable</code> constraint.
<h2>Common Type Constraints</h2>
What do you think about code like this?
<pre class="brush: cpp; notranslate">vector<string> strs{"three", "blind", "mice"};
auto it = find(strs.begin(), strs.end(), "mice");
</pre>
Seems reasonable, right? This searches a range of <code>string</code>‘s for a <code>char const*</code>. This should this work, even though it’s looking for an orange in a bucket of apples. The orange is sufficiently apple-like, and because we know how to compare apples and oranges; i.e., there is an <code>operator==</code> that compares <code>string</code>s with <code>char const*</code>. But what does “sufficiently apple-like” mean? If we are ever to constrain the <code>find</code> algorithm with Concepts, we need to be able to say in code what “apple-like” means for any apple and any orange.
The Palo Alto TR doesn’t think that the mere existence of an <code>operator==</code> is enough. Instead, it defines the cross-type <code>EqualityComparable</code> concept as follows:
<pre class="brush: cpp; notranslate">template< typename T1, typename T2 >
concept bool EqualityComparable =
EqualityComparable<T1> &&
EqualityComparable<T2> &&
Common<T1, T2> &&
EqualityComparable< std::common_type_t<T1, T2> > &&
requires(T1 a, T2 b) {
{ a == b } -> bool;
{ b == a } -> bool;
{ a != b } -> bool;
{ b != a } -> bool;
/* axioms:
using C = std::common_type_t<T1, T2>;
a == b <=> C{a} == C{b};
a != b <=> C{a} != C{b};
b == a <=> C{b} == C{a};
b != a <=> C{b} != C{a};
*/
};
</pre>
In words, what this says is for two different types to be EqualityComparable, they each individually must be EqualityComparable (i.e., with themselves), they must be comparable with each other, and (the key bit) they must share a common type which is also EqualityComparable, with identical semantics.
The question then becomes: do <code>std::string</code> and <code>char const *</code> share a common type, to which they can both be converted, and which compares with the same semantics? In this case, the answer is trivial: <code>std::string</code> is the common type.
Aside: why does the Palo Alto TR place this extra CommonType requirement on the argument to <code>find</code> when surely that will break some code that works and is “correct” today? It’s an interesting question. The justification is mathematical and somewhat philosophical: when you compare things for equality, you are asking if they have the same value. Just because someone provides an <code>operator==</code> to compare, say, an <code>Employee</code> with a <code>SocialSecurityNumber</code> doesn’t make an employee a social security number, or vice versa. If we want to be able to reason mathematically about our code (and we do), we have to be able to substitute like for like. Being able to apply equational reasoning to our programs is a boon, but we have to play by its rules.
<h2>Readable and Common</h2>
You may be wondering what any of this have to do with the <code>Readable</code> concept. Let’s look again at the concept as the Palo Alto TR defines it:
<pre class="brush: cpp; notranslate">template< typename I >
concept bool Readable =
Semiregular &&
requires(I i) {
typename ValueType;
{ *i } -> const ValueType &;
};
</pre>
To my mind, what this is trying to say is there there is some substitutability, some mathematical equivalence, between an iterator’s reference type and its value type. <code>EqualityComparable</code> uses <code>Common</code> to enforce that substitutability. What if we tried to fix <code>Readable</code> in a similar way?
<pre class="brush: cpp; notranslate">template< typename I >
concept bool Readable =
Semiregular &&
requires(I i) {
typename ValueType;
requires Common< ValueType, decltype(*i) >;
};
</pre>
Here we’re saying that for <code>Readable</code> types, the reference type and the value type must share a common type. The common type is computed using something like <code>std::common_type_t</code>, which basically uses the ternary conditional operator (<code>?:</code>). (I say “something like” since <code>std::common_type_t</code> isn’t actually up to the task. See <a href="https://cplusplus.github.io/LWG/lwg-defects.html#2408">lwg2408</a> and <a href="https://cplusplus.github.io/LWG/lwg-active.html#2465">lwg2465</a>.)
Sadly, this doesn’t quite solve the problem. If you try to do <code>common_type_t<unique_ptr<int>, unique_ptr<int>&></code> you’ll see why. It doesn’t work, despite the fact that the answer seems obvious. The trouble is that <code>common_type</code> always strips top-level const and reference qualifiers before testing for the common type with the conditional operator. For move-only types, that causes the conditional operator to barf.
I’ve always found it a bit odd that <code>common_type</code> decays its arguments before testing them. Sometimes that’s what you want, but sometimes (like here) its not. Instead, what we need is a different type trait that test for the common type, but preserves reference and cv qualifications. I call it <code>common_reference</code>. It’s a bit of a misnomer though, since it doesn’t always return a reference type, although it might.
The common reference of two types is the minimally qualified type to which objects of both types can bind. <code>common_reference</code> will try to return a reference type if it can, but fall back to a value type if it must. Here’s some examples to give you a flavor:
<table>
<thead>
<tr>
<th>Common reference…</th>
<th>… result</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>common_reference_t<int &, int const &></code></td>
<td><code>int const &</code></td>
</tr>
<tr>
<td><code>common_reference_t<int &&, int &&></code></td>
<td><code>int &&</code></td>
</tr>
<tr>
<td><code>common_reference_t<int &&, int &></code></td>
<td><code>int const &</code></td>
</tr>
<tr>
<td><code>common_reference_t<int &, int></code></td>
<td><code>int</code></td>
</tr>
</tbody>
</table>
With a <code>common_reference</code> type trait, we could define a <code>CommonReference</code> concept and specify <code>Readable</code> in terms of it, as follows:
<pre class="brush: cpp; notranslate">template< typename I >
concept bool Readable =
Semiregular &&
requires(I i) {
typename ValueType;
requires CommonReference<
ValueType &,
decltype(*i) && >;
};
</pre>
The above concept requires that there is some common reference type to which both <code>*i</code> and a mutable object of the iterator’s value type can bind.
This, I think, is sufficiently general to type check all the iterators that are valid today, as well as iterators that return proxy references (though it takes some work to see that). We can further generalize this to accommodate the <code>iter_move</code> API I described in my previous post:
<pre class="brush: cpp; notranslate">template< typename I >
concept bool Readable =
Semiregular &&
requires(I i) {
typename ValueType;
requires CommonReference<
ValueType &,
decltype(*i) && >; // (1)
requires CommonReference<
decltype(iter_move(i)) &&,
decltype(*i) && >; // (2)
requires CommonReference<
ValueType const &,
decltype(iter_move(i)) &&>; // (3)
};
</pre>
OK, let’s see how this works in practice.
<h2>Iterators and CommonReference</h2>
First, let’s take the easy case of an iterator that returns a real reference like <code>int&</code>. The requirements are that its value type, reference type, and rvalue reference type satisfy the three <code>CommonReference</code> constraints above. (1) requires a common reference between <code>int&</code> and <code>int&</code>. (2), between <code>int&&</code> and <code>int&</code>, and (3) between <code>int const&</code> and <code>int&&</code>. These are all demonstrably true, so this iterator is <code>Readable</code>.
But what about the <code>zip</code> iterator? Things here are much trickier.
The three common reference constraints for the <code>zip</code> iterator amount to this:
<table>
<thead>
<tr>
<th>Common reference…</th>
<th>… result</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>common_reference_t<</code> <code>pair<int,int> &,</code> <code>pair<int&,int&> &&></code></td>
<td>???</td>
</tr>
<tr>
<td><code>common_reference_t<</code> <code>pair<int&&,int&&> &&,</code> <code>pair<int&,int&> &&></code></td>
<td>???</td>
</tr>
<tr>
<td><code>common_reference_t<</code> <code>pair<int,int> const &,</code> <code>pair<int&&,int&&> &&></code></td>
<td>???</td>
</tr>
</tbody>
</table>
Yikes. How is the <code>common_reference</code> trait supposed to evaluate this? The ternary conditional operator is just not up to the task.
OK, let’s first imagine what we would like the answers to be. Taking the last one first, consider the following code:
<pre class="brush: cpp; notranslate">void foo( pair< X, Y > p );
pair<int,int> const & a = /*...*/;
pair<int &&,int &&> b {/*...*/};
foo( a );
foo( move(b) );
</pre>
If there are types that we can pick for <code>X</code> and <code>Y</code> that make this compile, then we can make <code>pair<X,Y></code> the “common reference” for <code>pair<int&&,int&&>&&</code> and <code>pair<int,int> const &</code>. Indeed there are: <code>X</code> and <code>Y</code> should both be <code>int const &</code>.
In fact, for each of the <code>CommonReference</code> constraints, we could make the answer <code>pair<int const&,int const&></code> and be safe. So in principle, our <code>zip</code> iterator can model the <code>Readable</code> concept. W00t.
But look again at this one:
<pre class="brush: cpp; notranslate">common_reference_t<pair<int,int> &, pair<int&,int&> &&>
</pre>
If this coughs up <code>pair<int const&,int const&></code> then we’ve lost something in the translation: the ability to mutate the elements of the pair. In an ideal world, the answer would be <code>pair<int&,int&></code> because a conversion from both <code>pair<int,int>&</code> and <code>pair<int&,int&>&&</code> would be safe and meets the “minimally qualified” spirit of the <code>common_reference</code> trait. But this code doesn’t compile:
<pre class="brush: cpp; notranslate">void foo( pair< int&,int& > p );
pair<int,int> a;
pair<int&,int&> b {/*...*/};
foo( a ); // ERROR here
foo( move(b) );
</pre>
Unfortunately, <code>pair</code> doesn’t provide this conversion, even though it would be safe in theory. Is that a defect? Perhaps. But it’s something we need to work with.
Long story short, the solution I went with for range-v3 is to define my own <code>pair</code>-like type with the needed conversions. I call it <code>common_pair</code> and it inherits from <code>std::pair</code> so that things behave as you would expect. With <code>common_pair</code> and a few crafty specializations of <code>common_reference</code>, the <code>Readable</code> constraints are satisfied for the <code>zip</code> iterator as follows:
<table>
<thead>
<tr>
<th>Common reference…</th>
<th>… result</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>common_reference_t<</code> <code>pair<int,int> &,</code> <code>common_pair<int&,int&> &&></code></td>
<td><code>common_pair<int&,int&></code></td>
</tr>
<tr>
<td><code>common_reference_t<</code> <code>common_pair<int&&,int&&> &&,</code> <code>common_pair<int&,int&> &&></code></td>
<td><code>common_pair<int const&,int const&></code></td>
</tr>
<tr>
<td><code>common_reference_t<</code> <code>pair<int,int> const &,</code> <code>common_pair<int&&,int&&> &&></code></td>
<td><code>common_pair<int const&,int const&></code></td>
</tr>
</tbody>
</table>
Computing these types is not as tricky as it may appear at first. For types like <code>pair<int,int>&</code> and <code>common_pair<int&,int&>&&</code>, it goes like this:
<ol>
<li>Distribute any top-level ref and cv qualifiers to the members of the pair. <code>pair<int,int>&</code> becomes <code>pair<int&,int&></code>, and <code>common_pair<int&,int&>&&</code> becomes <code>common_pair<int&,int&></code>.</li>
<li>Compute the element-wise common reference, and bundle the result into a new <code>common_pair</code>, resulting in <code>common_pair<int&,int&></code>.</li>
</ol>
<h2>Generalizing</h2>
Our <code>zip</code> iterator, with enough ugly hackery, can model our re-specified <code>Readable</code> concept. That’s good, but what about other proxy reference types, like <code>vector<bool></code>‘s? If <code>vector<bool></code>‘s reference type is <code>bool_ref</code>, then we would need to specialize <code>common_reference</code> such that the <code>Readable</code> constraints are satisfied. This will necessarily involve defining a type such that it can be initialized with either a <code>bool_ref</code> or with a <code>bool&</code>. That would be a decidedly weird type, but it’s not impossible. (Imagine a <code>variant<bool&,bool_ref></code> if you’re having trouble visualizing it.)
Getting <code>vector<bool></code>‘s iterators to fit the mold is an ugly exercise in hackery, and actually using its common reference (the variant type) would incur a performance hit for every read and write. But the STL doesn’t actually need to use it. It just needs to exist.
What is the point of jumping through these hoops to implement an inefficient type that in all likelihood will never actually be used? This is going to be unsatisfying for many, but the answer is for the sake of mathematical rigour. There must be some substitutability relationship between an iterator’s reference type and its value type that is enforceable. Requiring that they share a common reference is the best I’ve come up with so far. And as it turns out, this “useless” type actually does have some uses, as we’ll see in the next installment.
<h2>Summary</h2>
So here we are. There is a way to define the <code>Readable</code> concept — and hence the <code>InputIterator</code> concept — in a way that is general enough to permit proxy iterators while also saying something meaningful and useful about an iterator’s associated types. Actually defining a proxy iterator such that it models this concept is no small feat and requires extensive amounts of hack work. BUT IT’S POSSIBLE.
One could even imagine defining a Universal Proxy Reference type that takes a getter and setter function and does all the hoop jumping to satisfy the Iterator concepts — one proxy reference to rule them all, if you will. That’s left as an exercise for the reader.
If you made it this far, congratulations. You could be forgiven for feeling a little let down; this solution is far from ideal. Perhaps it’s just awful enough to spur a real discussion about how we could change the language to improve the situation.
In the next installment, I’ll describe the final piece of the puzzle: how do we write the algorithm constraints such that they permit proxy iterators? Stay tuned.
As always, you can find all code described here in my <a href="https://github.com/ericniebler/range-v3">range-v3</a> repo on github.
<div style="display:none">
<pre class="brush: cpp; title: ; notranslate">"\e"</pre>
</div>
]]></content:encoded>
<wfw:commentRss>https://ericniebler.com/2015/02/13/iterators-plus-plus-part-2/feed/</wfw:commentRss>
<slash:comments>63</slash:comments>
<post-id xmlns="com-wordpress:feed-additions:1">874</post-id> </item>
<item>
<title>Iterators++, Part 1</title>
<link>https://ericniebler.com/2015/02/03/iterators-plus-plus-part-1/?utm_source=rss&utm_medium=rss&utm_campaign=iterators-plus-plus-part-1</link>
<comments>https://ericniebler.com/2015/02/03/iterators-plus-plus-part-1/#comments</comments>
<dc:creator><![CDATA[Eric Niebler]]></dc:creator>
<pubDate>Wed, 04 Feb 2015 03:31:43 +0000</pubDate>
<category><![CDATA[generic-programming]]></category>
<category><![CDATA[library-design]]></category>
<category><![CDATA[ranges]]></category>
<category><![CDATA[std]]></category>
<guid isPermaLink="false">http://104.154.63.30/?p=856</guid>
<description><![CDATA[In the last post, I described the so-called proxy iterator problem: the fact that iterators that return proxy references instead of real references don’t sit comfortably within the STL’s framework. Real, interesting, and useful iterators fall foul of this line, <a class="more-link" href="https://ericniebler.com/2015/02/03/iterators-plus-plus-part-1/">Continue reading Iterators++, Part 1→</a>]]></description>
<content:encoded><![CDATA[In the <a href="http://ericniebler.com/2015/01/28/to-be-or-not-to-be-an-iterator/">last post</a>, I described the so-called proxy iterator problem: the fact that iterators that return proxy references instead of real references don’t sit comfortably within the STL’s framework. Real, interesting, and useful iterators fall foul of this line, iterators like <code>vector<bool></code>‘s or like the iterator of the <code>zip</code> view I presented. In this post, I investigate what we could do to bring proxy iterators into the fold — what it means for both the iterator concepts and for the algorithms. Since I’m a library guy, I restrict myself to talking about pure library changes.
<h1>Recap</h1>
As in the last post, we’ll use the <code>zip</code> view to motivate the discussion. Given two sequences like:
<pre class="brush: cpp; notranslate">vector<int> x{1,2,3,4};
vector<int> y{9,8,7,6};
</pre>
…we can create a view by “zipping” the two into one, where each element of the view is a pair of corresponding elements from <code>x</code> and <code>y</code>:
<pre class="brush: cpp; notranslate">using namespace ranges;
auto rng = view::zip(x, y);
assert(*rng.begin() == make_pair(1,9));
</pre>
The type of the expression “<code>*rng.begin()</code>” — the range’s reference type — is <code>pair<int&,int&></code>, and the range’s value type is <code>pair<int,int></code>. The reference type is an example of a proxy: an object that stands in for another object, or in this case two other objects.
Since both <code>x</code> and <code>y</code> are random access, the resulting <code>zip</code> view should be random access, too. But here we run foul of STL’s “real reference” requirement: for iterators other than input iterators, the expression <code>*it</code> must return a real reference. Why? Good question! The requirement was added sometime while the STL was being standardized. I can only guess it was because the committee didn’t know what it meant to, say, sort or reverse elements that aren’t themselves persistent in memory, and they didn’t know how to communicate to the algorithms that a certain temporary object (the proxy) is a stand-in for a persistent object. (Maybe someone who was around then can confirm or deny.)
The real-reference requirement is quite restrictive. Not only does it mean the <code>zip</code> view can’t be a random access sequence, it also means that you can’t sort or reverse elements through a <code>zip</code> view. It’s also the reason why <a href="http://www.gotw.ca/publications/mill09.htm"><code>vector<bool></code> is not a real container</a>.
But simply dropping the real-reference requirement isn’t enough. We also need to say what it means to sort and reverse sequences that don’t yield real references. In the last post, I described three specific problems relating to constraining and implementing algorithms in the presence of proxy references.
<ol>
<li>What, if anything, can we say about the relationship between an iterator’s value type and its reference type?</li>
<li>How do we constrain higher-order algorithms like <code>for_each</code> and <code>find_if</code> that take functions that operate on a sequence’s elements?</li>
<li>How do we implement algorithms that must swap and move elements around, like <code>sort</code>?</li>
</ol>
Let’s take the last one first.
<h2>Swapping and Moving Elements</h2>
If somebody asked you in a job interview to implement <code>std::reverse</code>, you might write something like this:
<pre class="brush: cpp; notranslate">template< class BidiIter >
void reverse( BidiIter begin, BidiIter end )
{
using std::swap;
for(; begin != end && begin != --end; ++begin)
swap(*begin, *end);
}
</pre>
Congratulations, you’re hired. Now, if the interviewer asked you whether this algorithm works on the <code>zip</code> view I just described, what would you say? The answer, as you may have guessed, is no. There is no overload of <code>swap</code> that accepts <code>pair</code> rvalues. Even if there were, we’re on thin ice here with the <code>zip</code> view’s proxy reference type. The default <code>swap</code> implementation looks like this:
<pre class="brush: cpp; notranslate">template< class T >
void swap( T & t, T & u )
{
T tmp = move(u);
u = move(t);
t = move(tmp);
}
</pre>
Imagine what happens when <code>T</code> is <code>pair<int&,int&></code>. The first line doesn’t move any values; <code>tmp</code> just aliases the values referred to by <code>u</code>. The next line stomps the values in <code>u</code>, which mutates <code>tmp</code> because it’s an alias. Then we copy those stomped values back to <code>t</code>. Rather than swapping values, this makes them both equal to <code>t</code>. Oops.
If at this point you’re smugly saying to yourself that <code>pair</code> has its own <code>swap</code> overload that (almost) does the right thing, you’re very smart. Shut up. But if you’re saying that the above is not a standard-conforming <code>reverse</code> implementation because, unlike all the other algorithms, <code>reverse</code> is required to use <code>iter_swap</code>, then very good! That’s the clue to unraveling this whole mess.
<h2>iter_swap</h2>
<code>iter_swap</code> is a thin wrapper around <code>swap</code> that takes iterators instead of values and swaps the elements they refer to. It’s an exceedingly useless function, since <code>iter_swap(a,b)</code> is pretty much required to just call <code>swap(*a,*b)</code>. But what if we allowed it to be a bit smarter? What if <code>iter_swap</code> were a full-fledged <a href="http://ericniebler.com/2014/10/21/customization-point-design-in-c11-and-beyond/">customization point</a> that allowed proxied sequences to communicate to the algorithms how their elements should be swapped?
Imagine the <code>zip</code> view’s iterators provided an <code>iter_swap</code> that knew how to truly swap the elements in the underlying sequences. It might look like this:
<pre class="brush: cpp; notranslate">template< class It1, class It2 >
struct zip_iterator
{
It1 it1;
It2 it2;
/* ... iterator interface here... */
friend void iter_swap(zip_iterator a, zip_iterator b)
{
using std::iter_swap;
iter_swap(a.it1, b.it1);
iter_swap(a.it2, b.it2);
}
};
</pre>
Now we would implement <code>reverse</code> like this:
<pre class="brush: cpp; notranslate">template< class BidiIter >
void reverse( BidiIter begin, BidiIter end )
{
using std::iter_swap;
for(; begin != end && begin != --end; ++begin)
iter_swap(begin, end);
}
</pre>
Voilà! Now <code>reverse</code> works with <code>zip</code> views. That was easy. All that is required is (a) to advertise <code>iter_swap</code> as a customization point, and (b) use <code>iter_swap</code> consistently throughout the standard library, not just in <code>reverse</code>.
<h2>iter_move</h2>
We haven’t fixed the problem yet. Some algorithms don’t just swap elements; they move them. For instance <code>stable_sort</code> might allocate a temporary buffer and move elements into it while it works. You can’t use <code>iter_swap</code> to move an element into raw storage. But we can use a play from the <code>iter_swap</code> playbook to solve this problem. Let’s make an <code>iter_move</code> customization point that gives iterators a way to communicate how to move values out of the sequence.
<code>iter_move</code>‘s default implementation is almost trivial:
<pre class="brush: cpp; notranslate">template< class I,
class R = typename iterator_traits::reference >
conditional_t<
is_reference< R >::value,
remove_reference_t< R > &&,
R >
iter_move( I it )
{
return move(*it);
}
</pre>
The only tricky bit is the declaration of the return type. If <code>*it</code> returns a temporary, we just want to return it by value. Otherwise, we want to return it by rvalue reference. If you pass a <code>vector<string>::iterator</code> to <code>iter_move</code>, you get back a <code>string &&</code> as you might expect.
How does the <code>zip</code> view implement <code>iter_move</code>? It’s not hard at all:
<pre class="brush: cpp; notranslate">template< class It1, class It2 >
struct zip_iterator
{
It1 it1;
It2 it2;
/* ... iterator interface here... */
friend auto iter_move(zip_iterator a)
{
using std::iter_move;
using RRef1 = decltype(iter_move(a.it1));
using RRef2 = decltype(iter_move(a.it2));
return pair<RRef1, RRef2>{
iter_move(a.it1),
iter_move(a.it2)
};
}
};
</pre>
The algorithms can use <code>iter_move</code> as follows:
<pre class="brush: cpp; notranslate">// Move an element out of the sequence and into a temporary
using V = typename iterator_traits::value_type;
V tmp = iter_move( it );
// Move the value back into the sequence
*it = move( tmp );
</pre>
As an aside, this suggests a more general default implementation of <code>iter_swap</code>:
<pre class="brush: cpp; notranslate">template< class I >
void iter_swap( I a, I b )
{
using V = typename iterator_traits::value_type;
V tmp = iter_move( a );
*a = iter_move( b );
*b = move( tmp );
}
</pre>
Now proxy sequences like <code>zip</code> only have to define <code>iter_move</code> and they gets a semantically correct <code>iter_swap</code> for free. It’s analogous to how the default <code>std::swap</code> is defined in terms of <code>std::move</code>. (Doing it this way doesn’t pick up user-defined overloads of <code>swap</code>. That’s bad. There’s a work-around, but it’s beyond the scope of this post.)
For a <code>zip</code> view that has value type <code>pair<T,U></code> and reference type <code>pair<T&,U&></code>, the return type of <code>iter_move</code> is <code>pair<T&&,U&&></code>. Makes perfect sense. Take another look at the default implementation of <code>iter_swap</code> above and satisfy yourself that it correctly swaps zipped elements, even if the underlying sequences have move-only value types.
One final note about <code>iter_move</code>: the implication is that to support proxied sequences, iterators need an extra associated type: the return type of <code>iter_move</code>. We can call it <code>rvalue_reference</code> and put it in <code>iterator_traits</code> alongside <code>value_type</code> and <code>reference</code>.
<h2>Alternate Design</h2>
I find the above design clean and intuitive. But it raises an interesting question: is it OK that <code>iter_swap(a,b)</code> and <code>swap(*a,*b)</code> might mean different things? Personally I think that’s OK, but let’s imagine for a moment that it’s not. What else could we do?
An obvious alternate design is to overload <code>swap</code> for proxy references to swap the objects they refer to. Let’s imagine we add the following overload to namespace <code>std</code>:
<pre class="brush: cpp; notranslate">template< class T, class U >
void swap( pair< T&, U& > && a, pair< T&, U& > && b )
{
swap(a.first, b.first);
swap(a.second, b.second);
}
</pre>
With enough SFINAE magic we could further generalize this to support swapping pairs of proxy references, but let’s stick with this. I could live with it.
But as before, this isn’t enough; we would also need to overload <code>move</code> to take a <code>pair<T&,U&></code> and return a <code>pair<T&&,U&&></code>. And this is where I start getting uncomfortable, because <code>move</code> is used everywhere and it’s currently not a customization point. How much code is out there that assumes the type of a <code>move</code> expression is <tt><some-type>&&</tt>? What breaks when that’s no longer true?
Purely as a matter of library evolution, overloading <code>move</code> that way for pairs of references is a non-starter because it would be changing the meaning of existing code. We could avoid the problem by changing <code>zip</code>‘s reference type from <code>pair<T&,U&></code> to <code>magic_proxy_pair< T&, U& ></code> and overloading <code>swap</code> and <code>move</code> on that. <code>magic_proxy_pair</code> would inherit from <code>pair</code>, so most code would be none the wiser. Totally valid design.
<h2>Summary, For Now</h2>
I’ve run long at the mouth, and I still have two more issues to deal with, so I’ll save them for another post. We’ve covered a lot of ground. With the design suggested above, algorithms can permute elements in proxied sequences with the help of <code>iter_swap</code> and <code>iter_move</code>, and iterators get a brand new associated type called <code>rvalue_reference</code>.
Whether you prefer this design or the other depends on which you find more distasteful:
<ol>
<li><code>iter_swap(a,b)</code> can be semantically different than <code>swap(*a,*b)</code>, or</li>
<li><code>move</code> is a customization point that is allowed to return some proxy rvalue reference type.</li>
</ol>
In the next installment, I’ll describe what we can say about the relationship between an iterator’s value type and its reference type (and now its rvalue reference type), and how we can constrain higher-order algorithms like <code>for_each</code> and <code>find_if</code>.
As always, you can find all code described here in my <a href="https://github.com/ericniebler/range-v3">range-v3</a> repo on github.
<div style="display:none">
<pre class="brush: cpp; title: ; notranslate">"\e"</pre>
</div>
]]></content:encoded>
<wfw:commentRss>https://ericniebler.com/2015/02/03/iterators-plus-plus-part-1/feed/</wfw:commentRss>
<slash:comments>27</slash:comments>
<post-id xmlns="com-wordpress:feed-additions:1">856</post-id> </item>
<item>
<title>To Be or Not to Be (an Iterator)</title>
<link>https://ericniebler.com/2015/01/28/to-be-or-not-to-be-an-iterator/?utm_source=rss&utm_medium=rss&utm_campaign=to-be-or-not-to-be-an-iterator</link>
<comments>https://ericniebler.com/2015/01/28/to-be-or-not-to-be-an-iterator/#comments</comments>
<dc:creator><![CDATA[Eric Niebler]]></dc:creator>
<pubDate>Thu, 29 Jan 2015 06:31:13 +0000</pubDate>
<category><![CDATA[generic-programming]]></category>
<category><![CDATA[library-design]]></category>
<category><![CDATA[ranges]]></category>
<category><![CDATA[std]]></category>
<guid isPermaLink="false">http://104.154.63.30/?p=826</guid>
<description><![CDATA[Way back in 1999, when the ink on the first C++ standard was still damp, Herb Sutter posed a GoTW puzzler in the still extant C++ Report (RIP): When Is a Container Not a Container? In that article, Herb described <a class="more-link" href="https://ericniebler.com/2015/01/28/to-be-or-not-to-be-an-iterator/">Continue reading To Be or Not to Be (an Iterator)→</a>]]></description>
<content:encoded><![CDATA[Way back in 1999, when the ink on the first C++ standard was still damp, Herb Sutter posed a GoTW puzzler in the still extant C++ Report (RIP): <a href="http://www.gotw.ca/publications/mill09.htm">When Is a Container Not a Container?</a> In that article, Herb described the problems of the now-infamous <code>vector<bool></code>. According to the standard’s own container requirements, <code>vector<bool></code> is not a container.
In a nutshell, it’s because <code>vector<bool></code>‘s iterators claim to be random-access, but they’re not. Random-access iterators, when you dereference them, must return a real reference. They can only do that if the thing they point to really exists somewhere. But the <code>bool</code> that a <code>vector<bool>::iterator</code> points to does not exist anywhere. It’s actually a bit in a packed integer, and dereferencing a <code>vector<bool></code>‘s iterator returns an object of some type that merely acts like a <code>bool&</code> without actually being a <code>bool&</code>.
Herb goes so far as to say this:
<blockquote>
[…] although a proxied collection can be an important and useful tool, by definition it must violate the standard’s container requirements and therefore can never be a conforming container.
</blockquote>
At the end of his article, Herb suggests that people stop using <code>vector<bool></code> and use <code>std::bitset</code> if they want bit-packing. But that just pushes the problem around. Why shouldn’t <code>std::bitset</code> be a conforming container with random-access iterators? If proxied collections are so useful, why should we content ourselves with a standard library that treats them like second-class citizens?
<h2>A Brief History of Proxy Iterators</h2>
Herb wrote his article in 1999, so we’ve been living with this problem for a long time. Many have tried to fix it and ultimately failed for one reason or another. Mostly it’s because all the solutions have tried to be backwards compatible, shoehorning a <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1640.html">richer iterator hierarchy</a> into a standard that doesn’t easily allow it, or else <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1873.html">breaking iterators themselves apart</a> into separate objects that control traversal and element access. Each time the committee has balked, preferring instead the devil it knew.
An interesting historical note: the original STL design didn’t have the “true reference” requirement that causes the problem. Take a look at the SGI docs for the <a href="https://www.sgi.com/tech/stl/ForwardIterator.html">Forward Iterator</a> concept. Nowhere does it say that <code>*it</code> should be a real reference. The docs for <a href="https://www.sgi.com/tech/stl/trivial.html">Trivial Iterators</a> specifically mention proxy references and say they’re legit.
Recently, a who’s who of C++ luminaries put their names on <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3351.pdf">N3351</a>, the so-called Palo Alto TR, which proposes a concept-based redesign of the STL, using the syntax of Concepts Lite. Interestingly, the Palo Alto TR is a throw-back to the original SGI design: there is no “true-reference” requirement on the return type of <code>*it</code>; it merely must be convertible to <code>const ValueType &</code>:
<pre class="brush: cpp; notranslate">// This must work, according to the Palo Alto TR
const ValueType & val = *it;
</pre>
It’s not hard for a proxy reference type to provide such a conversion. For instance, the following compiles today:
<pre class="brush: cpp; notranslate">std::vector<bool> vb{true, false, true, false};
auto it = vb.begin();
const bool & val = *it;
</pre>
<code>*it</code> has an implicit conversion to <code>bool</code>, which binds to a <code>const bool&</code>. Awesome! So the problem is solved, right? Not quite.
<h2>A Panoply of Proxy Problems</h2>
To better see the problems with proxy iterators, let’s look at a more interesting example: a <code>zip</code> view. When you zip two sequences together, you get a single sequence where each element is a <code>std::pair</code> of elements from the two source sequences. This can be done lazily, creating pairs on demand as the zip view is iterated:
<pre class="brush: cpp; notranslate">std::vector<int> v1 { 1,2,3 };
std::vector<int> v2 { 9,8,7 };
auto z = view::zip( v1, v2 );
auto it = z.begin();
assert( *it == std::make_pair(1,9) );
assert( *++it == std::make_pair(2,8) );
assert( *++it == std::make_pair(3,7) );
</pre>
Since the zip view is generating the pairs on demand, they don’t exist anywhere in memory. But the elements they refer to do! See?
<pre class="brush: cpp; notranslate">std::pair<int&,int&> p = *z.begin();
assert( &p.first == &v1[0] );
assert( &p.second == &v2[0] );
</pre>
The zip view is a very interesting beastie. Its reference type is <code>pair<T&,U&></code> and its value type is <code>pair<T,U></code>. This poses some very interesting challenges for the iterator concepts.
<h2>1. Values and References</h2>
Recall that the Palo Alto TR requires <code>*it</code> to be convertible to <code>const ValueType&</code>. So we should be able to do this:
<pre class="brush: cpp; notranslate">auto z = view::zip( v1, v2 );
const pair<int,int>& val = *z.begin();
</pre>
That works! As it so happens, there is a conversion from <code>std::pair<T&,U&></code> to <code>std::pair<T,U></code> — but there’s a catch: it only works if <code>T</code> and <code>U</code> are copyable! And even when they’re not, it’s clear that copying is not the behavior one would expect when using <code>*it</code> to initialize a const reference. If <code>T</code> or <code>U</code> is expensive to copy, you’re not going to get the performance or the behavior you expect, and if it’s <code>unique_ptr</code> it’s not going to compile at all. <img src="https://s.w.org/images/core/emoji/15.1.0/72x72/1f641.png" alt="🙁" class="wp-smiley" style="height: 1em; max-height: 1em;" />
Requiring that an iterator’s reference type be convertible to <code>const ValueType&</code> is over-constraining. But then what useful thing can we say about the relationship between these two types?
<h2>2. Algorithm Constraints</h2>
All the algorithm signatures in the Palo Alto TR use <code>ValueType</code> in the concept checks in order to constrain the templates. For example, here’s the constrained signature of <code>for_each</code>:
<pre class="brush: cpp; notranslate">template<InputIterator I, Semiregular F>
requires Function<F, ValueType>
F for_each(I first, I last, F f);
</pre>
If you’re not familiar with C++ concepts, what lines 1 and 2 say is: <code>first</code> and <code>last</code> must satisfy the requirements of the <code>InputIterator</code> concept, <code>F</code> must be <code>Semiregular</code> (I’ll gloss over this bit), and it must be callable with one argument of the iterator’s value type.
Now imagine code like this:
<pre class="brush: cpp; notranslate">// As before, v1 and v2 are vectors of ints:
auto z = view::zip( v1, v2 );
// Let Ref be the zip iterator's reference type:
using Ref = decltype(*z.begin());
// Use for_each to increment all the ints:
for_each( z.begin(), z.end(), [](Ref r) {
++r.first;
++r.second;
});
</pre>
This seems perfectly reasonable. The lambda accepts an object of the zip view’s reference type, which is a <code>pair<int&,int&></code>, and then it increments both first and second members. But this doesn’t type-check. Why?
Remember the concept check: <code>Function<F, ValueType></code>. The function we pass to <code>for_each</code> must be callable with an object of the iterator’s value type. In this case, the value type is <code>pair<int,int></code>. There is no conversion from that to the type the function expects, which is <code>pair<int&,int&></code>. Bummer.
If we change the lambda to take a <code>pair<int,int>&</code>, then the concept check passes, but the template will fail to instantiate correctly. It’s easy to see why when you look at a typical <code>for_each</code> implementation:
<pre class="brush: cpp; notranslate">template<InputIterator I, Semiregular F>
requires Function<F, ValueType>
F for_each(I first, I last, F f) {
for(; first != last; ++first)
f(*first);
return f;
}
</pre>
The lambda is called with <code>*first</code> which has type <code>pair<int&,int&></code>, but that doesn’t convert to <code>pair<int,int>&</code>. Gah!!!
The most galling bit is that the code we wrote above — the code with the lambda that takes the reference type — works just fine if we simply delete the <code>requires Function<F, ValueType></code> constraint. Clearly something is wrong with the constraints, the concepts, or our expectations.
I should add that the problem is not specific to the <code>zip</code> view. Any sequence with a proxy reference type has this problem, <code>vector<bool></code> included. If we just slap these constraints on the existing algorithms, some code that works today will break, and the only “fix” would be to stop using the standard algorithms. <img src="https://s.w.org/images/core/emoji/15.1.0/72x72/1f641.png" alt="🙁" class="wp-smiley" style="height: 1em; max-height: 1em;" />
<h2>3. Permutability of Move-Only Types</h2>
Unfortunately, the problems don’t end there. The <code>sort</code> algorithm requires a sequence to be permutable; that is, you should be able to shuffle its elements around. And since it should support move-only types, that means that the sequence’s iterators should be indirectly-movable. The Palo Alto TR has this to say about it:
<blockquote>
The <code>IndirectlyMovable</code> and <code>IndirectlyCopyable</code> concepts describe copy and move relationships between the values of an input iterator, <code>I</code>, and an output iterator <code>Out</code>. For an output iterator <code>out</code> and an input iterator <code>in</code>, their syntactic requirements expand to:
<ul>
<li><code>IndirectlyMovable</code> requires <code>*out = move(*in)</code></li>
</ul>
</blockquote>
But what if <code>*in</code> returns a proxy? Then <code>move(*in)</code> is moving the proxy, not the object to which the proxy refers. In the case of sorting a zip view, we’re trying to move a (temporary) <code>pair<T&,U&></code> into a <code>pair<T&,U&></code>. As with issue (1), that won’t work at all for move-only types. But you would probably fail before that, at the <code>sort</code> requires clause, because of issue (2). Sheesh!
<h2>Summary, For Now…</h2>
Even though the Palo Alto TR lifts the over-constraining requirement that <code>ForwardIterator</code>s return real references, the proxy iterator problem remains. On the one hand, it says that proxy iterators are OK. On the other hand, some interesting proxy iterators fail to model the <code>Iterator</code> concept or satisfy the algorithm constraints, and those that do don’t have the right semantics or performance characteristics. What are our options?
<ol>
<li>The <code>zip</code> view, <code>vector<bool></code>, and its ilk are useful, but are not legitimate containers and ranges, and the STL can’t support them, full stop; or</li>
<li>The iterator concepts (and probably the algorithm constraints) as specified in the Palo Alto TR need to be tweaked somehow to support proxy iterators, and some algorithm implementations probably need to change, too; or</li>
<li>The language needs to change to better support proxy references (an idea from Sean Parent); or</li>
<li>Something else.</li>
</ol>
I really don’t like option (1); there are too many interesting forward iterators that can’t return true references, and I’m tired of doing without. I have some rudimentary ideas about option (2) that I plan to describe in my next post. Option (3) can’t be ruled out, but IANALL (I Am Not A Language Lawyer) and have no idea what would be involved. It’s clear that with C++17 shaping up, and with the Concepts Lite TR finally reaching PDTS status <fanfare>, and a range-ified, concept-ified STL in the works, the time to start making decisions about this stuff is now.
<div style="display:none">
<pre class="brush: cpp; title: ; notranslate">"\e"</pre>
</div>
]]></content:encoded>
<wfw:commentRss>https://ericniebler.com/2015/01/28/to-be-or-not-to-be-an-iterator/feed/</wfw:commentRss>
<slash:comments>31</slash:comments>
<post-id xmlns="com-wordpress:feed-additions:1">826</post-id> </item>
</channel>
</rss>

If you would like to create a banner that links to this page (i.e. this validation result), do the following:

Download the "valid RSS" banner.
Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)
Add this HTML to your page (change the image src attribute if necessary):

<a href="http://www.feedvalidator.org/check.cgi?url=http%3A//ericniebler.com/feed/"><img src="valid-rss-rogers.png" alt="[Valid RSS]" title="Validate my RSS feed" /></a>

If you would like to create a text link instead, here is the URL you can use:

http://www.feedvalidator.org/check.cgi?url=http%3A//ericniebler.com/feed/

Home · About · News · Docs · Terms