This is a valid RSS feed.
This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.
<url>http://www.thinkingparallel.com/favicon.ico</url>
^
line 38, column 0: (22 occurrences) [help]
<content:encoded><![CDATA[<p><img src='http://www.thinkingparallel ...
<p><object width="425" height="355"><param name="movie" value="http://www.yo ...
... umb enough to think “yet another oneâ€\x9d will fix our parallel programm ...
^
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
>
<channel>
<title>Thinking Parallel</title>
<atom:link href="http://www.thinkingparallel.com/feed/" rel="self" type="application/rss+xml" />
<link>https://www.thinkingparallel.com/</link>
<description>A Blog on Parallel Programming and Concurrency by Michael Suess</description>
<lastBuildDate>Wed, 28 Nov 2007 22:27:08 +0000</lastBuildDate>
<language>en-US</language>
<sy:updatePeriod>
hourly </sy:updatePeriod>
<sy:updateFrequency>
1 </sy:updateFrequency>
<generator>https://wordpress.org/?v=6.7</generator>
<image>
<link>https://www.thinkingparallel.com/</link>
<url>http://www.thinkingparallel.com/favicon.ico</url>
<title>Thinking Parallel</title>
</image>
<item>
<title>Two Job Openings At The Best Place To Work In Germany!</title>
<link>http://www.thinkingparallel.com/2007/11/29/two-job-openings-at-the-best-place-to-work-in-germany/</link>
<comments>http://www.thinkingparallel.com/2007/11/29/two-job-openings-at-the-best-place-to-work-in-germany/#comments</comments>
<dc:creator><![CDATA[Michael Suess]]></dc:creator>
<pubDate>Wed, 28 Nov 2007 22:27:08 +0000</pubDate>
<category><![CDATA[News]]></category>
<guid isPermaLink="false">http://www.thinkingparallel.com/2007/11/29/two-job-openings-at-the-best-place-to-work-in-germany/</guid>
<description><![CDATA[Now this is just typical. Last week I have finally gotten rid of my job board, because nobody was using it. Just a week later, my advisor asks me, where it went, because she would like to announce two job openings. Oh well. Murphy is everywhere. I won’t bring back the job board, because as […]]]></description>
<content:encoded><![CDATA[<p><img src='http://www.thinkingparallel.com/wp-content/uploads/2007/11/roof_grid.jpg' alt='The Grid' style="display: inline; float: left; margin-right: 10px; padding: 5px;"/>Now this is just typical. Last week I have finally gotten rid of my job board, because nobody was using it. Just a week later, my advisor asks me, where it went, because she would like to announce two job openings. Oh well. Murphy is everywhere. I won’t bring back the job board, because as we say in Germany <em>One swallow does not make a summer</em> (gotta love those word-for-word translations <img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f606.png" alt="😆" class="wp-smiley" style="height: 1em; max-height: 1em;" /> ), but I will not miss the chance to tell you about the openings.<span id="more-173"></span></p>
<p>The reason is simple: I believe those two openings really are among the best places to work in Germany. How do I know? Well, of course because I have worked in the research group for the last four years. My advisor, Prof. Claudia Leopold is incredibly helpful. The rooming situation at the workplace is great. Colleagues are a joy to work with. And the research area in question is interesting as well: everybody is talking about grid computing, but the field is more than big enough to leave a lasting impression in. I could go on for hours, but I will not bore you with the details. Instead, here is a link to the <a href="http://www.plm.eecs.uni-kassel.de/plm/fileadmin/pm/epo/uniKasselBatIIaProgGrid.pdf">official job offer</a>. It’s in German, but since the positions also include teaching responsibilities (another very rewarding area, if you ask me), you need to know German to qualify for it anyways. So if you are interested, don’t hesitate to apply, but better be quick, as the deadline is tight!</p>
]]></content:encoded>
<wfw:commentRss>http://www.thinkingparallel.com/2007/11/29/two-job-openings-at-the-best-place-to-work-in-germany/feed/</wfw:commentRss>
<slash:comments>4</slash:comments>
</item>
<item>
<title>Parallel Programming News for Week 47/2007</title>
<link>http://www.thinkingparallel.com/2007/11/20/parallel-programming-news-for-week-472007/</link>
<comments>http://www.thinkingparallel.com/2007/11/20/parallel-programming-news-for-week-472007/#comments</comments>
<dc:creator><![CDATA[Michael Suess]]></dc:creator>
<pubDate>Tue, 20 Nov 2007 19:55:14 +0000</pubDate>
<category><![CDATA[News]]></category>
<guid isPermaLink="false">http://www.thinkingparallel.com/2007/11/20/parallel-programming-news-for-week-472007/</guid>
<description><![CDATA[Contrary to what the update-frequency of this blog suggests, I am not dead. Not even near that, I am very healthy the only problem is that with my new job, new house in the works and family I am left with very little time for blogging. Life is about setting the right priorities and at […]]]></description>
<content:encoded><![CDATA[<p><img src='http://www.thinkingparallel.com/wp-content/uploads/2007/11/web_abstract_http.jpg' alt='Links' style="display: inline; float: left; margin-right: 10px; padding: 5px;" />Contrary to what the update-frequency of this blog suggests, I am not dead. Not even near that, I am very healthy the only problem is that with my new job, new house in the works and family I am left with very little time for blogging. Life is about setting the right priorities and at this point in time unfortunately this means I will not be able to keep up to my usual posting frequency of one post per week. Instead I am aiming for one post per month now, while leaving the door open to write more as soon as my time permits. There is always enough time for a short newsflash on interesting articles on the net, though:<span id="more-163"></span></p>
<ul>
<li>MSDN introduces a <a href="http://msdn.microsoft.com/msdnmag/issues/07/10/Futures/default.aspx?loc=en">Task Parallel Library</a> by Microsoft. It looks very similar to OpenMP, Intel’s TBB or even my own effort called AthenaMP (that I have been meaning to write about for quite some time now). What caught my eye was the very careful attention to exceptions, which is something that no other system of this scope has, as far as I know.</li>
<li>Anwar Ghuloum points out <a href="http://blogs.intel.com/research/2007/10/the_problem_with_gpgpu.html">some problems with GPGPU</a>. I generally agree with his points, especially the programming model is still far away from anything that I would like to program with.</li>
<li>Always helpful, Herb Sutter warns about <a href="http://ddj.com/architect/202802983">calling unknown code from inside a critical region</a>. I know I am preaching to the choir here, as most of you are probably aware of this advice. But especially the techniques at the end of this article on how to avoid doing just that are worth reviewing. </li>
<li>Joe Duffy has put down his <a href="http://www.bluebytesoftware.com/blog/PermaLink,guid,33792f7a-718b-456b-a6c3-6f9933edfd54.aspx">experiences with lock-freedom</a>. A very nice and balanced read, if you ask me. What has always worried me about lock-free algorithms is that most people are not even able or willing to understand the memory-model of lock-based parallel programming systems. Those are generally easier than the ones of lock-free ones, so should we really recommend those techniques to anybody except maybe expert library programmers?</li>
<li>And Joe has not stopped there but has also written a <a href="http://www.bluebytesoftware.com/blog/PermaLink,guid,bae6ac13-2a95-4887-9ee3-3e64867c5650.aspx">praise for an immutable world</a>. This is also one of the main advices in my favorite book on <a href='http://www.thinkingparallel.com/localizer/localize_amazon.php?asin=0321349601' rel='nofollow'>Java Concurrency</a>. Good stuff.</li>
</ul>
<p>I would like to close this post by sharing a short movie-clip with you. If you are as fascinated about speed as I am (and after all, this blog is also about speed, or why else are you going through all the trouble with parallel programming <img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f62f.png" alt="😯" class="wp-smiley" style="height: 1em; max-height: 1em;" /> ), I am sure you will enjoy it as well (via <a href="http://www.cyprich.com">cyprich.com</a>)! It shows a race between a Bugatti Veyron and a Eurofighter – but there is a twist to it…</p>
<p><object width="425" height="355"><param name="movie" value="http://www.youtube.com/v/cAhyQ9ubODM&rel=1&border=0"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/cAhyQ9ubODM&rel=1&border=0" type="application/x-shockwave-flash" wmode="transparent" width="425" height="355"></embed></object></p>
]]></content:encoded>
<wfw:commentRss>http://www.thinkingparallel.com/2007/11/20/parallel-programming-news-for-week-472007/feed/</wfw:commentRss>
<slash:comments>2</slash:comments>
</item>
<item>
<title>OpenMP 3.0 – Public Draft Available</title>
<link>http://www.thinkingparallel.com/2007/10/23/openmp-30-public-draft-available/</link>
<comments>http://www.thinkingparallel.com/2007/10/23/openmp-30-public-draft-available/#comments</comments>
<dc:creator><![CDATA[Michael Suess]]></dc:creator>
<pubDate>Tue, 23 Oct 2007 19:35:39 +0000</pubDate>
<category><![CDATA[OpenMP]]></category>
<category><![CDATA[specification]]></category>
<category><![CDATA[tasks]]></category>
<guid isPermaLink="false">http://www.thinkingparallel.com/2007/10/23/openmp-30-public-draft-available/</guid>
<description><![CDATA[I usually don’t post press releases. But this one is different, since my favorite parallel programming system has almost reached its third major release. That’s right, OpenMP 3.0 is right around the corner and this is your chance to get your early fix. The language committee has worked very hard to make it a true […]]]></description>
<content:encoded><![CDATA[<p><img src='http://www.thinkingparallel.com/wp-content/uploads/2007/10/phone_booths.jpg' alt='Phone Booth' style="display: inline; float: left; margin-right: 10px; padding: 5px;" />I usually don’t post press releases. But this one is different, since my favorite parallel programming system has almost reached its third major release. That’s right, OpenMP 3.0 is right around the corner and this is your chance to get your early fix. The language committee has worked very hard to make it a true revolution. I have told you about the major change before (<a href="http://www.thinkingparallel.com/2007/06/12/some-fresh-news-on-openmp/">tasking</a>) and I am absolutely sure this release will push OpenMP even further into the main stream! So without any further ado, here is the official announcement:<span id="more-169"></span></p>
<blockquote><p>
The OpenMP ARB is pleased to announce the release of a draft of Version 3.0 of the OpenMP specification for public comment. This is the first update to the OpenMP specification since 2005.</p>
<p>This release adds several new features to the OpenMP specification, including:</p>
<ul>
<li>Tasking: move beyond loops with generalized tasks and support complex and dynamic control flows.</li>
<li>Loop collapse: combine nested loops automatically to expose more concurrency.</li>
<li>Enhanced loop schedules: Support aggressive compiler optimizations of loop schedules and give programmers better runtime control over the kind of schedule used.</li>
<li>Nested parallelism support: better definition of and control over nested parallel regions, and new API routines to determine nesting structure.</li>
</ul>
<p>Larry Meadows, CEO of the OpenMP organization, states: “The creation of OpenMP 3.0 has taken very hard work by a number of people over more than two years. The introduction of a unified tasking model, allowing creation and execution of unstructured work, is a great step forward for OpenMP. It should allow the use of OpenMP on whole new classes of computing problems.”</p>
<p>The draft specification is available in PDF format from the Specifications section of the OpenMP ARB website: <a href="http://www.openmp.org">http://www.openmp.org</a></p>
<p>(Direct link: <a href="http://www.openmp.org/drupal/mp-documents/spec30_draft.pdf">http://www.openmp.org/drupal/mp-documents/spec30_draft.pdf</a>)</p>
<p>Mark Bull has led the effort to expand the applicability of OpenMP while improving it for its current uses as the Chair of the OpenMP Language Committee. He states: “The OpenMP language committee has done a fine job in producing this latest version of OpenMP. It has been difficult to resolve some tricky details and understand how tasks should propagate across the language. But I think we have come up with solid solutions, and the team should be proud of their accomplishment.”</p>
<p>The ARB warmly welcomes any comments, corrections and suggestions you have for Version 3.0. For Version 3.0, we are soliciting comments through an on-line forum, located at <a href="http://www.openmp.org/forum">http://www.openmp.org/forum</a>. The forum is entitled Draft 3.0 Public Comment. You can also send email to feedback@openmp.org if you would rather not use the forum. It is most helpful if you can refer to the page number and line number where appropriate.</p>
<p>The public comment period will close on 31 January 2008.
</p></blockquote>
<p>Lot’s of changes in there, don’t you think? And my favorite feature and one I have been involved with heavily isn’t even mentioned in the release notes: iterator loops are supported now, which is a very nice step for improving C++ support in the specification.</p>
<p>If you want to take part in shaping the future of OpenMP, be sure to let the language committee know what you think about this new version of the spec.</p>
]]></content:encoded>
<wfw:commentRss>http://www.thinkingparallel.com/2007/10/23/openmp-30-public-draft-available/feed/</wfw:commentRss>
<slash:comments>1</slash:comments>
</item>
<item>
<title>Choice Overload and Parallel Programming</title>
<link>http://www.thinkingparallel.com/2007/10/18/choice-overload-and-parallel-programming/</link>
<comments>http://www.thinkingparallel.com/2007/10/18/choice-overload-and-parallel-programming/#comments</comments>
<dc:creator><![CDATA[Michael Suess]]></dc:creator>
<pubDate>Thu, 18 Oct 2007 18:55:00 +0000</pubDate>
<category><![CDATA[Parallel Programming]]></category>
<category><![CDATA[choice]]></category>
<guid isPermaLink="false">http://www.thinkingparallel.com/2007/10/18/choice-overload-and-parallel-programming/</guid>
<description><![CDATA[I have been very quiet on this blog lately, mostly because my new job and the creation of our new house have kept me rather busy. Having to get up at 6 in the morning to be able to bring our son to kindergarden has not helped my productivity in the evenings either – and […]]]></description>
<content:encoded><![CDATA[<p><img src='http://www.thinkingparallel.com/wp-content/uploads/2007/10/choice.jpg' alt='Choices' style="display: inline; float: left; margin-right: 10px; padding: 5px;" />I have been very quiet on this blog lately, mostly because my new job and the creation of our new house have kept me rather busy. Having to get up at 6 in the morning to be able to bring our son to kindergarden has not helped my productivity in the evenings either – and of course that’s the time when I am usually writing for this blog. But anyways, I will try to be more productive in the future. What I would like to write about today are two blog posts from Tim Mattson, the first one called <a href="http://blogs.intel.com/research/2007/10/parallel_programming_environme.html">Parallel programming environments: less is more</a> and a follow up called <a href="http://blogs.intel.com/research/2007/10/is_anyone_dumb_enough_to_think.html">Is anyone dumb enough to think yet another parallel language will solve our problems? I MIGHT be!</a>. While I think Tim raises some very valid points, I still believe his conclusions are in need of some more discussion! <img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /> This post also includes a call to action at the bottom, so be sure to read it until the end!<span id="more-167"></span></p>
<p>But let’s start from the beginning. Tim cites a well-known study that shows, that too much choice will lead to customers buying less stuff. Jam in the example cited, but obviously the study can be applied to other things as well. Tim applies it to parallel programming systems and concludes:</p>
<blockquote><p>But donâ€<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />t waste my time with new languages. With hundreds of languages and APIâ€<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />s out there, is anyone really dumb enough to think “yet another oneâ€\x9d will fix our parallel programming problems?</p></blockquote>
<p>I do. Or at least I hope for another language that’s better than what we have now. Let me compare the situation today with the search-engine market before Google. There were more search-engines at that time than any single user could try out in a reasonable time frame. New ones were continuously created and funded. Yet, Larry Page and Sergey Brin did not worry about their competition back then, they went ahead and created the next big thing. And choice overload certainly did not stop their success story.</p>
<p>Another example: Linux and the amount of free software available for it. Nobody can try out all open-source programs out there for even a limited field like <em>multimedia</em>. Even the choices for a simple program to burn CDs with can be overwhelming. This used to be a real problem for distributions. Fortunately, the solution to choice overload there is easy as well, and distributions have been employing it from the very beginning: they are doing the selection for me. When I install a clean Kubuntu-system for example, it comes with K3B preinstalled, which does a very fine job of burning CDs for me. And if I don’t have very special needs, I will never worry about burning CDs again, because the preselected solution works fine for most needs. If I ever need a really special feature that K3B does not provide, I can go look for another piece of software that satisfies my needs and because I have a choice, I will most likely find one that fits. </p>
<p>Choice is good, experiments are good and we need all the experiments and new programming systems for parallelism we can get, because after all this is a hard problem. And even if some of those systems are reinventing the wheel, I could not care less because there are times when a reinvention will really kick off where the original version failed. There is no stronger force in the universe than an idea whose time has come. Or something like that. <img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> </p>
<p>To make your choice of a parallel programming system a little more difficult, I have decided to do a little experiment. I frequently get requests to review a new idea for a parallel programming system, or even the system itself. Unfortunately, I rarely have the time to do so at the moment. But since I know you (my dear readers) have a strong interest in new parallel programming systems (or else you would most likely not be reading this blog), let’s try this out: If you have invented an exiting new parallel programming system and would like the world to know about it (or at least the tiny part of it that reads my blog :smile:), I am inviting you to write a guest post about it here. If you are really exited about a new parallel programming language you have recently used, feel free to post about it here. A few rules:</p>
<ul>
<li>The post must be original. I will not permit press-releases or the like, because my readers are not here to read duplicate content.</li>
<li>The post must fit in here. Just read a couple of my posts in the past and you will see what I mean. No academic papers for example, there are other places for those. A little humor is appreciated and don’t take yourself to seriously :lol:.</li>
<li>Of course, spelling and grammer must also be correct. I don’t claim I get those right all the time, but we should at least try hard</li>
<li>And last but not least: I reserve the right to reject any post. That should be self-evident, but I thought I would mention it again.</li>
</ul>
<p>I am very curious how this turns out and looking forward to your contributions!</p>
]]></content:encoded>
<wfw:commentRss>http://www.thinkingparallel.com/2007/10/18/choice-overload-and-parallel-programming/feed/</wfw:commentRss>
<slash:comments>4</slash:comments>
</item>
<item>
<title>OpenMP Does Not Scale – Or Does It?</title>
<link>http://www.thinkingparallel.com/2007/09/25/openmp-does-not-scale-or-does-it/</link>
<comments>http://www.thinkingparallel.com/2007/09/25/openmp-does-not-scale-or-does-it/#comments</comments>
<dc:creator><![CDATA[Michael Suess]]></dc:creator>
<pubDate>Tue, 25 Sep 2007 18:59:01 +0000</pubDate>
<category><![CDATA[OpenMP]]></category>
<category><![CDATA[Optimization]]></category>
<category><![CDATA[books]]></category>
<category><![CDATA[humor]]></category>
<category><![CDATA[mistakes]]></category>
<guid isPermaLink="false">http://www.thinkingparallel.com/2007/09/25/openmp-does-not-scale-or-does-it/</guid>
<description><![CDATA[While at the Parco-Conference two weeks ago, I had the pleasure to meet Ruud van der Pas again. He is a Senior Staff Engineer at Sun Microsystems and gave a very enlightening talk called Getting OpenMP Up To Speed. What I would like to post about is not the talk itself (although it contains some […]]]></description>
<content:encoded><![CDATA[<p><img src='http://www.thinkingparallel.com/wp-content/uploads/2007/09/scale_ruler.jpg' alt='Scale' style="display: inline; float: left; margin-right: 10px; padding: 5px;" />While at the <a href="http://www.fz-juelich.de/conference/parco2007/">Parco-Conference</a> two weeks ago, I had the pleasure to meet <a href="http://blogs.sun.com/ruud/">Ruud van der Pas</a> again. He is a Senior Staff Engineer at Sun Microsystems and gave a very enlightening talk called <strong>Getting OpenMP Up To Speed</strong>. What I would like to post about is not the talk itself (although it contains some material that I wanted to write about here for a long time), but about the introduction he used to get our attention. He used an imaginary conversation, which I am reprinting here with his permission. Only one part of the conversation is shown, but it’s pretty easy to fill in the other one:<span id="more-164"></span></p>
<blockquote><p>Do you mean you wrote a parallel program, using OpenMP and it doesn’t perform?</p></blockquote>
<blockquote><p>I see. Did you make sure the program was fairly well optimized in sequential mode?</p></blockquote>
<blockquote><p>Oh. You didn’t. By the way, why do you expect the program to scale?</p></blockquote>
<blockquote><p>Oh. You just think it should and used all the cores. Have you estimated the speed up using Amdahl’s Law?</p></blockquote>
<blockquote><p>No, this law is not a new European Union environmental regulation. It is something else.</p></blockquote>
<blockquote><p>I understand. You can’t know everything. Have you at least used a tool to identify the most time consuming parts in your program?</p></blockquote>
<blockquote><p>Oh. You didn’t. You just parallelized all loops in the program. Did you try to avoid parallelizing innermost loops in a loop nest?</p></blockquote>
<blockquote><p>Oh. You didn’t. Did you minimize the number of parallel regions then?</p></blockquote>
<blockquote><p>Oh. You didn’t. It just worked fine the way it was. Did you at least use the nowait clause to minimize the use of barriers?</p></blockquote>
<blockquote><p>Oh. You’ve never heard of a barrier. Might be worth to read up on. Do all processors roughly perform the same amount of work?</p></blockquote>
<blockquote><p>You don’t know, but think it is okay. I hope you’re right. Did you make optimal use of private data, or did you share most of it?</p></blockquote>
<blockquote><p>Oh. You didn’t. Sharing is just easier. I see. You seem to be using a cc-NUMA system. Did you take that into account?</p></blockquote>
<blockquote><p>You’ve never heard of that. That is unfortunate. Could there perhaps be any false sharing affecting performance?</p></blockquote>
<blockquote><p>Oh. Never heard of that either. May come handy to learn a little more about both. So, what did you do next to address the performance ?</p></blockquote>
<blockquote><p>Switched to MPI. Does that perform better then?</p></blockquote>
<blockquote><p>Oh. You don’t know. You’re still debugging the code.</p></blockquote>
<p>What a great way to start a talk on performance issues with OpenMP, don’t you think? And he manages to pack some of the most important problems while optimizing not only OpenMP-programs, but all parallel programs into a tiny introduction. At the end of his talk, he continued the imaginary conversation as follows:</p>
<blockquote><p>While we’re still waiting for your MPI debug run to finish, I want to ask you whether you found my information useful.</p></blockquote>
<blockquote><p>Yes, it is overwhelming. I know.</p></blockquote>
<blockquote><p>And OpenMP is somewhat obscure in certain areas. I know that as well.</p></blockquote>
<blockquote><p>I understand. You’re not a Computer Scientist and just need to get your scientific research done.</p></blockquote>
<blockquote><p>I agree this is not a good situation, but it is all about Darwin, you know. I’m sorry, it is a tough world out there.</p></blockquote>
<blockquote><p>Oh, your MPI job just finished! Great.</p></blockquote>
<blockquote><p>Your program does not write a file called ‘core’ and it wasn’t there when you started the program?</p></blockquote>
<blockquote><p>You wonder where such a file comes from? Let’s get a big and strong coffee first.</p></blockquote>
<p>I am sure the MPI-crowd doesn’t really approve this ending, but I found the talk way more entertaining than the usual talks at conferences. Of course he is teasing, of course he is exaggerating, but that’s OK when you are a presenter and want to get your point across. Of course it also helps to put a smile on your face so your audience knows you are not a die-hard fanatic. <img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /> </p>
<p><a href='http://www.thinkingparallel.com/localizer/localize_amazon.php?asin=0262533022' rel='nofollow'><img src='http://images.amazon.com/images/P/0262533022.01.TZZZZZZZ.jpg' alt='' style='display: inline; float: left; margin-right: 10px; padding: 5px;'/></a>By the way, this was not really the end of Ruud’s talk, he went on just a tiny bit further to pitch a new book on OpenMP, for which he is an author. Called <a href='http://www.thinkingparallel.com/localizer/localize_amazon.php?asin=0262533022' rel='nofollow'>Using OpenMP</a>, this is a book I have been looking forward to for a while (and not just because the main author, Prof. Barbara Chapman is the second advisor for my thesis). Maybe I can finally add a real recommendation for a book on OpenMP to my list of <a href="http://www.thinkingparallel.com/recommended-books-on-parallel-programming/">recommended books on parallel programming</a>.</p>
]]></content:encoded>
<wfw:commentRss>http://www.thinkingparallel.com/2007/09/25/openmp-does-not-scale-or-does-it/feed/</wfw:commentRss>
<slash:comments>1</slash:comments>
</item>
<item>
<title>Parallel Programming News for Week 37/2007</title>
<link>http://www.thinkingparallel.com/2007/09/14/parallel-programming-news-for-week-372007/</link>
<comments>http://www.thinkingparallel.com/2007/09/14/parallel-programming-news-for-week-372007/#comments</comments>
<dc:creator><![CDATA[Michael Suess]]></dc:creator>
<pubDate>Fri, 14 Sep 2007 14:07:55 +0000</pubDate>
<category><![CDATA[News]]></category>
<category><![CDATA[intel]]></category>
<category><![CDATA[OpenMP]]></category>
<category><![CDATA[python]]></category>
<category><![CDATA[tasks]]></category>
<category><![CDATA[TBB]]></category>
<category><![CDATA[transactional-memory]]></category>
<guid isPermaLink="false">http://www.thinkingparallel.com/2007/09/14/parallel-programming-news-for-week-372007/</guid>
<description><![CDATA[It has been a while since I have done a news-roundup – therefore it is time for a new one. But before I start, let me pass on a few personal remarks about my present situation. I am in the process of finishing my PhD. right now and hope to submit it for review next […]]]></description>
<content:encoded><![CDATA[<p><img src='http://www.thinkingparallel.com/wp-content/uploads/2007/09/news_radio.jpg' alt='News on the Radio' style="display: inline; float: left; margin-right: 10px; padding: 5px;" />It has been a while since I have done a news-roundup – therefore it is time for a new one. But before I start, let me pass on a few personal remarks about my present situation. I am in the process of finishing my PhD. right now and hope to submit it for review next week. Of course, this also means that I am rather busy at the moment, therefore the comments to each article I present here are not as verbose as you may be used to. I have also moved back to <a href="http://maps.google.com/maps?f=q&hl=de&geocode=&q=leipzig&ie=UTF8&ll=52.187405,12.392578&spn=12.895816,29.882813&z=5&iwloc=addr&om=1">Leipzig</a>, which is the beautiful city where I was born and raised. Starting in October, I will be working at a company called <a href="http://www.tomtomwork.com">TomTom WORK</a>, which is a division of <a href="http://www.tomtom.com">TomTom</a>. You may or may not know that company from the label on your navigation system in your car. I will be doing software development using C++. As far as I know, my job has nothing to do with parallel programming, but since I still have my pet project in the works (more on it really soon now) and one article per week is easily sustainable without working in the field directly, I intend to just continue this blog as is.<span id="more-149"></span></p>
<p>That’s it from me, here are the links I found interesting during the last few weeks:</p>
<ul>
<li>In <a href="http://www.bitwiese.de/2007/08/hitting-memory-wall.html">Hitting The Memory Wall</a>, an interesting experience using a parallel sorting implementation is described. I do not have the time to dive into this and find out if the claims there are correct for the examples cited, but I know saturating the memory bus is easily done on todays architectures (especially the Intel ones that do not have a memory controller for each processor).</li>
<li>Dr. Dobb’s is having a whole issue dedicated to high performance computing with lots of interesting <a href="http://www.ddj.com/hpc-high-performance-computing/">parallel programming content</a>.
<li>Uncle Bob over at the Object Mentor blog does not like threadprivate storage, but would rather <a href="http://blog.objectmentor.com/articles/2007/09/04/thread-local-a-convenient-abomination">have taskprivate storage</a>. His arguments do make sense, especially as people start thinking more in terms of tasks when doing parallel programming with Intel’s TBB advocating them heavily and OpenMP 3.0 with task-support right around the corner.</li>
<li>There is <a href="http://patricklogan.blogspot.com/2007/09/race-is-on-or-is-that-off.html">a conversation</a> <a href="http://louderthanblank.blogspot.com/2007/09/transactional-silver-bullet.html">going on</a> about whether or not transactional memory is useful. My take: transactional memory is not a silver bullet. But every small thing helps and that’s why I am still looking forward to it.</li>
<li>If you are into complexity-theory and parallelism, you may have fun reading <a href="http://blogs.msdn.com/devdev/archive/2007/09/07/p-complete-and-the-limits-of-parallelization.aspx">this</a>.</li>
<li>Bob Warfield over at his blog really starts to get into concurrency. He has a call to <a href="http://smoothspan.wordpress.com/2007/09/08/a-multicore-language-timetable-waste-more-hardware/">waste more hardware</a> and he is actually serious with it. And although many of the HPC people may not like it: I think he has some very valid points.</li>
<li>A guy called Juergen asks Guido van Rossum (the benevolent dictator of Python) to please <a href="http://blog.snaplogic.org/?p=94">get rid of the GIL</a>. And <a href="http://www.artima.com/weblogs/viewpost.jsp?thread=214235">Guido responds</a>. Just in case you are wondering what the heck this infamous <strong>GIL</strong> is, it is the <strong>Global Interpreter Lock</strong> that prevents the python runtime from properly utilizing multiple cores. Guido’s main argument is that right now removing the GIL would slow down Python dramatically, but he is open for experiments, as long as somebody else does them. When I started my PhD., I actually looked into scripting languages and parallelism and the GIL was what finally convinced me to look for easy-to-use parallelism elsewhere. But I am also sure that the GIL will not be there forever, especially as scalability is getting more important than performance in our days.</li>
<li>And last but not least: <a href="http://www.insidehpc.com">insidehpc</a> is <a href="http://insidehpc.com/2007/09/03/free-to-good-home-1-web-site-slightly-used/">dead</a>. Or <a href="http://insidehpc.com/2007/09/07/insidehpccom-the-beat-goes-on/">maybe not</a>. It appears John has managed to raise enough support in the community to keep the site alive with an impressive list of <a href="http://www.insidehpc.com/about">new contributors</a>. The site has been alive and well during the last few days, let’s hope it stays that way once the initial excitement wears off. So if you want to help and make yourself a name in HPC, I am sure John appreciates any help and will happily add you to his list of contributors…</li>
</ul>
<p>This has taken longer than I wanted, but I hope you enjoyed it anyways!</p>
]]></content:encoded>
<wfw:commentRss>http://www.thinkingparallel.com/2007/09/14/parallel-programming-news-for-week-372007/feed/</wfw:commentRss>
<slash:comments>3</slash:comments>
</item>
<item>
<title>How-to Split a Problem into Tasks</title>
<link>http://www.thinkingparallel.com/2007/09/06/how-to-split-a-problem-into-tasks/</link>
<comments>http://www.thinkingparallel.com/2007/09/06/how-to-split-a-problem-into-tasks/#comments</comments>
<dc:creator><![CDATA[Michael Suess]]></dc:creator>
<pubDate>Thu, 06 Sep 2007 08:08:52 +0000</pubDate>
<category><![CDATA[Parallel Programming]]></category>
<guid isPermaLink="false">http://www.thinkingparallel.com/2007/09/06/how-to-split-a-problem-into-tasks/</guid>
<description><![CDATA[The very first step in every successful parallelization effort is always the same: you take a look at the problem that needs to be solved and start splitting it into tasks that can be computed in parallel. This sounds easy, but I can see from my students reactions that at least for some of them […]]]></description>
<content:encoded><![CDATA[<p><img src='http://www.thinkingparallel.com/wp-content/uploads/2007/08/axe.jpg' alt='Axe' style="display: inline; float: left; margin-right: 10px; padding: 5px;" />The very first step in every successful parallelization effort is always the same: you take a look at the problem that needs to be solved and start splitting it into tasks that can be computed in parallel. This sounds easy, but I can see from my students reactions that at least for some of them it is not. This article shows five ways to do just that. It is a short version for a blog article, the long version can be found in the book <a href='http://www.thinkingparallel.com/localizer/localize_amazon.php?asin=0201648652' rel='nofollow'>Introduction to Parallel Computing</a> by Grama and others.<span id="more-151"></span></p>
<p>Let’s start with a short paragraph on terminology: what I am describing here is also called <em>problem decomposition</em>. The goal here is to divide the problem into several smaller subproblems, called <em>tasks</em> that can be computed in parallel later on. The tasks can be of different size and must not necessarily be independent (although, of course, that would make your work easier later on <img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /> ).</p>
<p>If many tasks are created, we talk about <em>fine grained decomposition</em>. If few tasks are created it is called <em>coarse grained decomposition</em>. Which one is better heavily depends on the problem, as many tasks allow for more concurrency and better scalability, while fewer tasks usually need less communication / synchronization.</p>
<p>There are several ways to do problem decompositions, the most well-known probably being <em>recursive decomposition</em>, <em>data decomposition</em>, <em>functional decomposition</em>, <em>exploratory decomposition</em> and <em>speculative decomposition</em>. The next few paragraphs will shortly explain how they are carried out in practice.</p>
<h3>Recursive Decomposion</h3>
<p><em>Divide-and-Conquer</em> is a widely deployed strategy for writing sequential algorithms. In it, a problem is divided into subproblems, which are again divided into subproblems recursively until a trivial solution can be calculated. Afterwards, the results of the subproblems are merged together when needed. This strategy can be applied as is to achieve a recursive problem decomposition. As the smaller tasks are usually independent of one another, they can be calculated in parallel, often leading to well-scaling parallel algorithms. Parallel sorting algorithms often use recursive decompositions.</p>
<h3>Data Decomposition</h3>
<p>When data structures with large amounts of similar data need to be processed, <em>data decomposition</em> is usually a well-performing decomposition technique. The tasks in this strategy consist of groups of data. These can be either input data, output data or even intermediate data, decompositions to all varieties are possible and may be useful. All processors perform the same operations on these data, which are often independent from one another. This is my favorite decomposition technique, because it is usually easy to do, often has no dependencies in between tasks and scales really well. </p>
<h3>Functional Decomposition</h3>
<p>For <em>functional decomposition</em>, the functions to be performed on data are split into multiple tasks. These tasks can then be performed concurrently by different processes on different data. This often leads to so-called <em>Pipelines</em>. Although this decomposition technique is usually easy to do as well, it often does not scale too well, because there is only a limited amount of functions available in each program to split up.</p>
<h3>Exploratory Decomposition</h3>
<p><em>Exploratory decompostion</em> is a special case for algorithms, who search through a predefined space for solutions. In this case, the search space can often be partitioned into tasks, which can be processed concurrently. Exploratory decompostion is a special case decomposition that is not generally applicable, an example is breadth-first search in trees.</p>
<h3>Speculative Decomposition</h3>
<p>Another special purpose decomposition technique is called <em>speculative decomposition</em>. In the case when only one of several functions is carried out depending on a condition (think: a <em>switch</em>-statement), these functions are turned into tasks and carried out before the condition is even evaluated. As soon as the condition has been evaluated, only the results of one task are used, all others are thrown away. This decompostion technique is quite wasteful on resources and seldom used (personally, I have never used it).</p>
<p>The different decomposition methods described above can of course also be combined into <em>hybrid decompositions</em>. I think these are the most common techniques. Do you happen to know any other important ones? Then please don’t hesitate to share it in the comments!</p>
]]></content:encoded>
<wfw:commentRss>http://www.thinkingparallel.com/2007/09/06/how-to-split-a-problem-into-tasks/feed/</wfw:commentRss>
<slash:comments>11</slash:comments>
</item>
<item>
<title>Top 15 Mistakes in OpenMP</title>
<link>http://www.thinkingparallel.com/2007/08/31/top-15-mistakes-in-openmp/</link>
<comments>http://www.thinkingparallel.com/2007/08/31/top-15-mistakes-in-openmp/#comments</comments>
<dc:creator><![CDATA[Michael Suess]]></dc:creator>
<pubDate>Fri, 31 Aug 2007 08:16:38 +0000</pubDate>
<category><![CDATA[OpenMP]]></category>
<guid isPermaLink="false">http://www.thinkingparallel.com/2007/08/31/top-15-mistakes-in-openmp/</guid>
<description><![CDATA[It has been a while since I have done this little experiment, but I still find the results interesting. As some of you may know, I teach a class on parallel programming (this is an undergraduate class, by the way – may I have a million dollar in funding now as well, please? 😎 ). […]]]></description>
<content:encoded><![CDATA[<p><img src='http://www.thinkingparallel.com/wp-content/uploads/2007/08/bug_toy.jpg' alt='A Bug' style="display: inline; float: left; margin-right: 10px; padding: 5px;" />It has been a while since I have done this little experiment, but I still find the results interesting. As some of you may know, I teach a class on parallel programming (this is an undergraduate class, by the way – may I have a <a href="http://news.uns.purdue.edu/x/2007b/070807PaiComputer.html">million dollar in funding</a> now as well, please? <img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f60e.png" alt="😎" class="wp-smiley" style="height: 1em; max-height: 1em;" /> ). The first parallel programming system we teach to our students is OpenMP. There is no written test at the end of the class, but instead the students get to do assignments in teams of two people, which have to be defended before us. This is really educational for us (and I think for the students as well), because we get to see and find the mistakes our students make. I have done a little statistic on what mistakes are made by our students, and in this post you will find the results. Why am I posting a list of mistakes? Because I think learning from other peoples mistakes is almost as good as learning from my own, and usually saves quite a lot of time compared to the first option. <img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f600.png" alt="😀" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <span id="more-158"></span></p>
<p>I have chosen to divide the mistakes into two groups: correctness mistakes and performance mistakes. Correctness mistakes impact the correctness of the program (I can’t believe I am explaining this – I must be on a writing spree <img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f644.png" alt="🙄" class="wp-smiley" style="height: 1em; max-height: 1em;" /> ), leading to wrong results when made. Performance mistakes merely lead to slower programs. And now I have kept you waiting long enough, here is my list of the top mistakes to make when programming in OpenMP:</p>
<h3>Correctness Mistakes</h3>
<ol>
<li>Access to shared variables not protected</li>
<li>Use of locks without <em>flush</em> (as of OpenMP 2.5, this is no longer a mistake)</li>
<li>Read of shared variable without obeying the memory model</li>
<li>Forget to mark private variables as such</li>
<li>Use of <em>ordered</em> clause without <em>ordered</em> construct</li>
<li>Declare loop variable in <em>for</em>-construct as shared</li>
<li>Forget to put down <em>for</em> in <em>#pragma omp parallel for</em></li>
<li>Try to change the number of threads in a parallel region, after it has been started already</li>
<li><em>omp_unset_lock()</em> called from non-owner thread</li>
<li>Attempt to change loop variable while in <em>#pragma omp for</em></li>
</ol>
<h3>Performance Mistakes</h3>
<ol>
<li>Use of <em>critical</em> when <em>atomic</em> would be sufficient</li>
<li>Put too much work inside <em>critical</em> region</li>
<li>Use of orphaned construct outside parallel region</li>
<li>Use of unnecessary <em>flush</em></li>
<li>Use of unnecessary <em>critical</em></li>
</ol>
<h3>Disclaimer</h3>
<div class="emph"><strong>Warning:</strong> This is not a statistically sound survey, but merely an experiment I did out of curiosity! Of course the mistakes here are correlated to the way we taught the lecture, as well as to the assignments given, therefore please do not give these findings any statistical significance!</div>
<h3>Famous last words</h3>
<p>With this out of the way, let me tell you that I have written a paper about the experiment and some best practices to avoid them in the first place (which I may post here later). If you don’t understand some of the mistakes posted above, please look up the verbose explanations <a href="http://www.michaelsuess.net/publications/suess_leopold_common_mistakes_06.pdf">there</a>. There is blog by Yuan Lin dedicated to the sole purpose of showing common mistakes while programming in parallel – called <a href="http://blogs.sun.com/yuanlin/">Touch Dreams</a>. Unfortunately, it appears to be no longer maintained. What a pity, I think we could use more resources on mistakes and how to avoid them…</p>
]]></content:encoded>
<wfw:commentRss>http://www.thinkingparallel.com/2007/08/31/top-15-mistakes-in-openmp/feed/</wfw:commentRss>
<slash:comments>4</slash:comments>
</item>
<item>
<title>Is the Multi-Core Revolution a Hype?</title>
<link>http://www.thinkingparallel.com/2007/08/21/is-the-multi-core-revolution-a-hype/</link>
<comments>http://www.thinkingparallel.com/2007/08/21/is-the-multi-core-revolution-a-hype/#comments</comments>
<dc:creator><![CDATA[Michael Suess]]></dc:creator>
<pubDate>Tue, 21 Aug 2007 20:20:24 +0000</pubDate>
<category><![CDATA[Uncategorized]]></category>
<guid isPermaLink="false">http://www.thinkingparallel.com/2007/08/21/is-the-multi-core-revolution-a-hype/</guid>
<description><![CDATA[Mark Nelson does not believe in the hype about multi-cores. And he is right with several of his arguments. The world is not going to end if we cannot write our applications to allow for concurrency, that’s for sure. Since I am working on parallel machines all day, it is easy to become a little […]]]></description>
<content:encoded><![CDATA[<p><img src='http://www.thinkingparallel.com/wp-content/uploads/2007/08/crazy_hype.jpg' alt='Hype' style="display: inline; float: left; margin-right: 10px; padding: 5px;" />Mark Nelson does not believe in the <a href="http://marknelson.us/2007/07/30/multicore-panic/">hype about multi-cores</a>. And he is right with several of his arguments. The world is not going to end if we cannot write our applications to allow for concurrency, that’s for sure. Since I am working on parallel machines all day, it is easy to become a little disconnected with <em>the real world</em> and think everybody has gotten the message and welcomes our new parallel programming overlords. Some of Marks arguments are a little shaky, though, as I hope to show you in this article. Is Mark right? I suspect not, but only time will tell.<span id="more-160"></span></p>
<p>Let’s go through his arguments one by one (for this, it helps if you read the <a href="http://marknelson.us/2007/07/30/multicore-panic/">article</a> in full first, as my argument is harder to understand without the context).</p>
<blockquote><p>Linux, OS/X, and Windows have all had good support for Symmetrical Multiprocessing (SMP) for some time, and the new multicore chips are designed to work in this environment.</p></blockquote>
<p>I completely agree here, the problem is not on the operating system’s side of the equation.</p>
<blockquote><p>Just as an example, using the spiffy Sysinternals Process Explorer, I see that my Windows XP system has 48 processes with 446 threads. Windows O/S is happily farming those 446 threads out to both cores on my system as time becomes available. If I had four cores, we could still keep all of them busy. If I had eight cores, my threads would still be distributed among all of them.</p></blockquote>
<p>This argument I don’t understand. He claims that he has enough threads on his system to keep even a four core system busy. Yet, at the same time the CPU-monitor he depicts shows a CPU-Usage of merely 14.4% – which proves that those threads are not really doing anything useful most of the time. Most of them are sleeping and will therefore not be a burden on the CPU anyways. As I see it, Marks picture shows that he has by far not enough tasks to be done on his system to keep even his dual-core system from going into power-saving mode. It’s not how many threads there are, it’s how much they do that’s important.</p>
<blockquote><p>Modern languages like Java support threads and various concurrency issues right out of the box. C++ requires non-standard libraries, but all modern C++ environments worth their salt deal with multithreading in a fairly sane way.</p></blockquote>
<p>Right and wrong, if you ask me. The mainstream languages do have support for threads now. Whether or not that support is <em>sane</em> is another matter, altogether. <img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /> I know one thing from looking at my students and my own work: parallel programming today is not easy and it’s very easy to make mistakes. I welcome any effort to change this situation with new languages, tools, libraries or whatever magic available.</p>
<blockquote><p>The task doing heavy computation might be tying up one core, but the O/S can continue running UI and other tasks on other cores, and this really helps with overall responsiveness. At the same time, the computationally intensive thread is getting fewer context switches, and hopefully getting its job done faster.</p></blockquote>
<p>That’s true. Unfortunately, this does not scale, since we have already seen in the argument presented above, that all the other threads present can run happily on one core, without it even running hot. Nobody says that you need parallel programming when you have only two cores. But as soon as you have more, I believe you do.</p>
<blockquote><p>In this future view, by 2010 we should have the first eight-core systems. In 2014, weâ€<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />re up to 32 cores. By 2017, weâ€<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />ve reached an incredible 128 core CPU on a desktop machine.</p></blockquote>
<p>I can buy an eight-core system today, if I want to. Intel has a package consisting of two quad-core processors and a platform to run them on. I am sure as soon as AMD gets their act together with their quad-cores, they will follow. I am not so sure anymore when the first multi-cores where shipped, but this <a href="http://www.intel.com/pressroom/archive/releases/20050207corp.htm">press-release</a> suggests it was about two years ago in 2005. I can buy eight cores now from Intel. Or I can buy chips from Sun with eight cores supporting eight threads each. My reader David points me to an article describing a <a href="http://www.tgdaily.com/content/view/33451/135/">new chip with 64 cores</a>. Does this mean, that the cores today are going to double each year? When you follow this logic, we are at 64 cores in 2010. The truth is probably somewhere in the middle between Mark’s and my prediction, but I am fairly sure the multi-core revolution is coming upon us a lot faster than he is predicting…</p>
<blockquote><p>He also pointed out that even if we didnâ€<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />t have the ability to parallelize linear algorithms, it may well be that advanced compilers could do the job for us.</p></blockquote>
<p>Obviously, this has <a href="http://www.thinkingparallel.com/2007/08/14/an-interview-with-dr-jay-hoeflinger-about-automatic-parallelization/">not quite worked out</a> as good as expected.</p>
<blockquote><p>Maybe 15 or 20 years from now weâ€<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />ll be writing code in some new transaction based language that spreads a program effortlessly across hundreds of cores. Or, more likely, weâ€<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />ll still be writing code in C++, Java, and .Net, and weâ€<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />ll have clever tools that accomplish the same result.</p></blockquote>
<p>I sure hope he is right on this one. Or on a second thought, maybe I prefer a funky new language with concurrency support built in, instead of being stuck with C++ for twenty more years. <img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f61b.png" alt="😛" class="wp-smiley" style="height: 1em; max-height: 1em;" /> </p>
<p>You have heard my opinion, you have read Mark’s, what’s your’s? Is the Multi-Core Revolution a Hype? Looking forward to your comments!</p>
]]></content:encoded>
<wfw:commentRss>http://www.thinkingparallel.com/2007/08/21/is-the-multi-core-revolution-a-hype/feed/</wfw:commentRss>
<slash:comments>15</slash:comments>
</item>
<item>
<title>An Interview with Dr. Jay Hoeflinger about Automatic Parallelization</title>
<link>http://www.thinkingparallel.com/2007/08/14/an-interview-with-dr-jay-hoeflinger-about-automatic-parallelization/</link>
<comments>http://www.thinkingparallel.com/2007/08/14/an-interview-with-dr-jay-hoeflinger-about-automatic-parallelization/#comments</comments>
<dc:creator><![CDATA[Michael Suess]]></dc:creator>
<pubDate>Tue, 14 Aug 2007 04:00:50 +0000</pubDate>
<category><![CDATA[Interviews]]></category>
<guid isPermaLink="false">http://www.thinkingparallel.com/2007/08/14/an-interview-with-dr-jay-hoeflinger-about-automatic-parallelization/</guid>
<description><![CDATA[When I started my PhD.-thesis a couple of years ago, I took some time to look at auto-parallelizing compilers and research. After all, I wanted to work on making parallel programming easier, and the best way to do that would surely be to let compilers do all the work. Unfortunately, the field appeared to be […]]]></description>
<content:encoded><![CDATA[<p><img src='http://www.thinkingparallel.com/wp-content/uploads/2007/08/jay_hoeflinger.jpg' alt='Jay Hoeflinger' style="display: inline; float: left; margin-right: 10px; padding: 5px;" />When I started my PhD.-thesis a couple of years ago, I took some time to look at auto-parallelizing compilers and research. After all, I wanted to work on making parallel programming easier, and the best way to do that would surely be to let compilers do all the work. Unfortunately, the field appeared to be quite dead at that time. There has been a huge amount of research done in the eighties and nineties, yet it all appeared to have settled down. And the compilers I tried could not parallelize more than the simplest loops. I have always been asking myself why this was the case, and when I had the chance to talk to Dr. Jay Hoeflinger, he had some very interesting answers for me. He agreed to let me re-ask these questions in an email interview and this is the result. Thanks, Jay, for sharing your knowledge!<span id="more-154"></span></p>
<p><strong>Michael: </strong> First of all, please share a little bit about your biography and background on automatic parallelization with us!</p>
<div class="emph"><strong>Jay: </strong>I have a BS (1974), MS (1977) and PhD (1998) from the University of Illinois at Urbana-Champaign. I filled in the large gap between the MS and the PhD with various things. I did some scientific programming, some business programming, and then in 1985 I joined the <a href="http://www.csrd.uiuc.edu/">Center for Supercomputing Research and Development (CSRD)</a> at the University of Illinois. I worked on the Cedar Fortran parallelizing compiler, which was being built to drive the Cedar supercomputer being built at CSRD. We had high hopes that a parallelizing compiler would be the “silver bullet” that would allow an easy way to program parallel machines, which we viewed as being inevitable.</p>
<p>We made some big strides in understanding the challenges facing parallelizing compilers, but by the time CSRD lost its funding, we were far from solving the problems.</p>
<p>After CSRD, I stayed at the University of Illinois and worked on the Polaris parallelizing compiler. Many talented students contributed to Polaris, and it had good success over the years. I contributed to the compiler and was involved in many papers about the work. Finally I decided to work on a PhD degree, attacking the problem of helping the compiler represent the complex data access patterns of programs. I thought that the only way to really solve the automatic parallelization problem was to give the compiler a richer data structure for representing data accesses in a program.</p>
<p>I finally got the PhD degree in 1998, but it was obvious at that time that automatic parallelization by the compiler alone was not near a solution. A richer data structure for the compiler helped, but still the compiler had too little information about the program. The only hope was to do something at runtime. Of course, giving the compiler a richer way to represent complex data accesses forms a basis for that, and that work is finding some success, but a general solution of the problem remains out of reach.</p>
<p>In 1998 I went to work for the Center for Simulation of Advanced Rockets as a Senior Research Scientist. I studied various aspects of parallelizing the rocket simulation codes being developed at the Center.</p>
<p>In 2000, I joined Intel and have worked on the OpenMP runtime library and the Cluster OpenMP runtime library. While working for Intel I also have been involved in the OpenMP 2.0, 2.5 and 3.0 language committees. I’m a Senior Staff Software Engineer and the team lead of the <a href="http://www.intel.com/cd/software/products/asmo-na/eng/329023.htm">Cluster OpenMP</a> project.</div>
<p><strong>Michael: </strong>Wow, now I know why you are <em>the man</em> (TM) to talk to about automatic parallelization <img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /> . Let us start the actual interview with the current state of automatically parallelizing compilers: What are they able to do today?</p>
<div class="emph"><strong>Jay: </strong>Vectorization technology and simple parallelization has moved into mainstream commercial compilers. This is because the students that worked on parallelization in the 1980s and 90s now form the core of various commercial compiler teams. Many of the techniques used to attack parallelization have moved into commercial compilers and are supporting both parallel and serial optimization.</p>
<p>Today, parallelizing compilers can be fairly successful with Fortran77 programs. The language is simple, making it more possible to represent what the program is doing in the compiler’s internal data structures. Success with C has been more limited. In the 90s a lot of work was done on parallelizing Java programs, with some success. In the last 10 years work has been done on parallelizing domain-specific languages, like Matlab. </p>
<p>As people realized how hard the parallelization of programs at compile time was, they started turning to parallelizing at runtime. Many projects have studied this problem and there has been some success. Parallelization at runtime has the distinct advantage that all the information about variable values is available. Some of the complicated mechanisms that were developed to symbolically simulate execution in the compiler and propagate data and program attributes throughout the program aren’t needed at runtime.</p>
<p>Now, research compilers can work together with the runtime system to parallelize a code. For instance, if a compiler is trying to prove a certain relation in order to parallelize a given program loop, and is unable to prove it, the compiler could compile a runtime test into the program that can determine whether the relation is true. Parallelization decisions can then be based on the result of the test.</p></div>
<p><strong>Michael: </strong>Ok, now that we know what they can do let’s talk about the more important part: what are they not able do? And what are the reasons for that?</p>
<div class="emph"><strong>Jay: </strong>The key to parallelization is finding a way to divide up the work of the code such that a write to a memory location by one thread is always separated from another thread’s access to the memory location by a synchronization. So, uncertainty about which data is being accessed by which threads really hampers parallelization. </p>
<p>This uncertainty can result from data values not being available to the compiler, for instance, if they are read from an input file at runtime or simply calculated at runtime based on a complex formula. This means that compilers have difficulty with adaptive codes that adjust the way they access data, based on complicated functions or external information. Some very clever techniques have been devised to allow the compiler to parallelize even in these complex situations, but typically such techniques require long compile times.</p>
<p>In the commercial world, compile time is much more important than it is in the research world. So, complicated techniques that can parallelize loops in a research compiler by causing compilation to take a long time, may not be implemented in a commercial compiler – at least they would not be used by default. So, commercial compilers are usually limited to rather simple parallelization techniques.</p>
<p>Achieving the same success in C or C++ as we see in Fortran77 is much more difficult because of the presence of pointers. Pointers create a big problem for compilers because they obscure what data is actually being accessed, making the problem of understanding data access even harder than it would be otherwise.</p></div>
<p><strong>Michael: </strong>Is there hope? Compiler writers are some of the smartest people in computer science and parallel programming is a huge problem to attack, what kind of help can we hope for from our compilers in the near future?</p>
<div class="emph"><strong>Jay: </strong>I think there is some hope in limited domains. There is some hope for parallelizing domain-specific languages, for instance. Matlab would be an example of a language that is great for linear algebra, but not so much for other things. A Matlab compiler doesnâ€<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />t have to consider all the things that a C++ compiler has to consider.</p>
<p>The thing you have to understand is that compilers are enormous programs, typically written by a large group of experts. Even without automatic parallelization, the modern commercial compiler is extremely complex, and has a very large amount of source code. The software engineering challenges of design and testing for such a large program are already daunting. Adding the large amount of source code that would be required to do a really good job of parallelization would only exacerbate this problem.</p>
<p>So, I think it is unlikely that compilers in the near future will do a significantly better job of parallelization. This is especially true due to the rising popularity of explicitly parallel languages and the increasing willingness of universities to teach people about parallel algorithms and how to design their codes for parallelism from the start.</p>
<p>I believe that compiler vendors are more interested in increasing support for the explicitly parallel languages than they are in trying to do automatic parallelization.</p></div>
<p><strong>Michael: </strong>Could a change in languages help? You mentioned earlier that Fortran 77 code is easier to optimize / parallelize for compilers than e.g. C/C++ code. Could a switch to a more restricted / more regular or just different language (maybe even a new one) change the situation and allow our compilers to help us more?</p>
<div class="emph"><strong>Jay: </strong>There is a tension between the expressibility of a language and the ability of compilers to properly optimize it. A language with a high degree of expressibility may be easier for humans to use, yet at the same time it almost certainly is more complex and therefore more difficult for a compiler to understand, making it harder to parallelize.</p>
<p>A program written in a language with limited expressibility (e.g. Fortran77) is therefore easier to parallelize, yet harder for a person to write programs with. Defining a new language that’s sufficiently rich in expressibility, yet simple enough to be routinely successfully parallelized is a difficult challenge. It seems possible, but no good solution yet.</p>
<p>This is why OpenMP has become popular. OpenMP has the serial program embedded in it, but allows the programmer to tell the compiler where the parallel parts are. The programmer’s knowledge of the code is put to its best use. The programmer typically knows that conceptually, a given region of the program consists of independent computation, and can just say so in a very simple way to the compiler.</p></div>
<p><strong>Michael: </strong>Where do you see automatically parallelizing compilers in 5 years from now? Will some of the problems you have already sketched be solved by then? Any silver bullets on the horizon? Please take a look into your crystal ball <img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" /> !</p>
<div class="emph"><strong>Jay: </strong>I think there’s hope for domain-specific compilers as I explained above. There may be some hope for runtime parallelization using virtual machines or maybe just runtime tests compiled into the code.</p>
<p>But, for the most part, parallel code will still need to be designed and written by experts. Along these lines, I think parallel libraries will be much more prominent in 5 years. These libraries allow people to write serial programs that call the library routines to solve various problems in parallel, in which the library routines themselves have been optimized by experts.</p>
<p>So, the emphasis has to shift to people writing their own parallel code, not relying on compilers. Universities will need to do a better job of producing parallel experts.</p>
<p>For the most part, I think automatic parallelization will remain an unfulfilled dream. </p>
<p>So, unfortunately, I don’t see any silver bullets. Automatic parallelization remains an extremely interesting research problem, though. Maybe one day, when someone much smarter than I has a great idea …</p></div>
<p><strong>Michael: </strong>Thank you, Jay, that was very enlightening! And although the outlook at the end may sound pessimistic, I am looking forward to interviewing the smart guy you have mentioned. You never know – he may be right around the corner implementing his revolutionary idea in a compiler as we speak…</p>
]]></content:encoded>
<wfw:commentRss>http://www.thinkingparallel.com/2007/08/14/an-interview-with-dr-jay-hoeflinger-about-automatic-parallelization/feed/</wfw:commentRss>
<slash:comments>6</slash:comments>
</item>
</channel>
</rss>
If you would like to create a banner that links to this page (i.e. this validation result), do the following:
Download the "valid RSS" banner.
Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)
Add this HTML to your page (change the image src
attribute if necessary):
If you would like to create a text link instead, here is the URL you can use:
http://www.feedvalidator.org/check.cgi?url=http%3A//www.thinkingparallel.com/feed/