<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Developing for GPUs, Cell, and Multi-core CPUs Using a Unified Programming Model</title>
	<atom:link href="http://www.linux-mag.com/id/6374/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.linux-mag.com/id/6374/</link>
	<description>Open Source, Open Standards</description>
	<lastBuildDate>Sat, 05 Oct 2013 13:48:18 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
	<item>
		<title>By: csk317</title>
		<link>http://www.linux-mag.com/id/6374/#comment-5514</link>
		<dc:creator>csk317</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/6374/#comment-5514</guid>
		<description>The question I have is, are younger programmers using C++?</description>
		<content:encoded><![CDATA[<p>The question I have is, are younger programmers using C++?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: pbr</title>
		<link>http://www.linux-mag.com/id/6374/#comment-5515</link>
		<dc:creator>pbr</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/6374/#comment-5515</guid>
		<description>&lt;strong&gt;csk317&lt;/strong&gt; - 5-year-olds are building virtual instruments and doing parallel programming using NXT-G, a simplified version of Labview.  And, &lt;i&gt;yes&lt;/i&gt;, they&#039;ve no fear of C, C++, Java, Smalltalk (they get to play with that on their OLPC&#039;s) or numerous other &quot;power tools&quot;.&lt;br /&gt;
&lt;br /&gt;
&lt;strong&gt;Challenge, yes... severe, no.&lt;/strong&gt;&lt;br /&gt;
&lt;br /&gt;
While I liked this article, I don&#039;t agree with the byline - the challenge is there but it&#039;s not severe.  &lt;br /&gt;
&lt;br /&gt;
Sure, there is now a call for toolkits and libraries which blend inter-core communications solutions with &quot;unlooping&quot; and other means of deconstructing monolithic problem spaces - as the author says, &quot;expressing parallelism&quot;.  &lt;br /&gt;
&lt;br /&gt;
RapidMind seems to be doing a respectable job of that; I&#039;m sure there are other solutions as well.  It would be nice to see a table comparing/contrasting them; maybe LinuxWorld&#039;s up to the challenge?&lt;br /&gt;
&lt;br /&gt;
&lt;i&gt;Thanks, Doc McCool - keep up the good work!&lt;/i&gt;&lt;br /&gt;
&lt;strong&gt;-pbr&lt;/strong&gt;</description>
		<content:encoded><![CDATA[<p><strong>csk317</strong> &#8211; 5-year-olds are building virtual instruments and doing parallel programming using NXT-G, a simplified version of Labview.  And, <i>yes</i>, they&#8217;ve no fear of C, C++, Java, Smalltalk (they get to play with that on their OLPC&#8217;s) or numerous other &#8220;power tools&#8221;.</p>
<p><strong>Challenge, yes&#8230; severe, no.</strong></p>
<p>While I liked this article, I don&#8217;t agree with the byline &#8211; the challenge is there but it&#8217;s not severe.  </p>
<p>Sure, there is now a call for toolkits and libraries which blend inter-core communications solutions with &#8220;unlooping&#8221; and other means of deconstructing monolithic problem spaces &#8211; as the author says, &#8220;expressing parallelism&#8221;.  </p>
<p>RapidMind seems to be doing a respectable job of that; I&#8217;m sure there are other solutions as well.  It would be nice to see a table comparing/contrasting them; maybe LinuxWorld&#8217;s up to the challenge?</p>
<p><i>Thanks, Doc McCool &#8211; keep up the good work!</i><br />
<strong>-pbr</strong></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: hhemken</title>
		<link>http://www.linux-mag.com/id/6374/#comment-5516</link>
		<dc:creator>hhemken</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/6374/#comment-5516</guid>
		<description>csk317:&lt;br /&gt;
&lt;br /&gt;
There is currently a meme going around that C/C++ are { archaic &#124; obsolete &#124; superseded &#124; pointless to learn &#124; old fart languages &#124; etc } as compared to Java, Ruby, Python, C#, and the other currently popular programming languages. This goes hand in hand with the meme that says that personal computers are obsolete and everything will be done &quot;in the cloud,&quot; that soon people will not need desktops or perhaps even laptops, that everything will be done on handheld devices, and so on. In the extreme case, this is interpreted to mean that people should only learn web backend scripting languages that give fast results. Damn the torpedoes, full speed ahead!&lt;br /&gt;
&lt;br /&gt;
No doubt there are kernels of fact to some aspects of these memes, but considering them prophetic predictions of the future is pretty idiotic. Contemporary culture is made up of sound bites and fast tempos, great expectations and tight budgets, quick thinking and quick execution. C++ programming requires thoughtful engineering, and is usually applied to critical software infrastructure like operating systems, device drivers, sophisticated software packages, etc., on top of which reside things like web apps and huge Java constructions, among others.&lt;br /&gt;
&lt;br /&gt;
The patience needed for that sort of thing has always been in short supply, that&#039;s nothing new. Today&#039;s new twist in the C++ world is having to deal with ubiquitous multi-core and multi-cpu systems, and write code in such a way to exploit them. This, too, is not a new need. However, it is still not a part of general C++ culture, and not even a well-understood and well-provisioned aspect of programming in general.&lt;br /&gt;
&lt;br /&gt;
The question is, how will it play out?</description>
		<content:encoded><![CDATA[<p>csk317:</p>
<p>There is currently a meme going around that C/C++ are { archaic | obsolete | superseded | pointless to learn | old fart languages | etc } as compared to Java, Ruby, Python, C#, and the other currently popular programming languages. This goes hand in hand with the meme that says that personal computers are obsolete and everything will be done &#8220;in the cloud,&#8221; that soon people will not need desktops or perhaps even laptops, that everything will be done on handheld devices, and so on. In the extreme case, this is interpreted to mean that people should only learn web backend scripting languages that give fast results. Damn the torpedoes, full speed ahead!</p>
<p>No doubt there are kernels of fact to some aspects of these memes, but considering them prophetic predictions of the future is pretty idiotic. Contemporary culture is made up of sound bites and fast tempos, great expectations and tight budgets, quick thinking and quick execution. C++ programming requires thoughtful engineering, and is usually applied to critical software infrastructure like operating systems, device drivers, sophisticated software packages, etc., on top of which reside things like web apps and huge Java constructions, among others.</p>
<p>The patience needed for that sort of thing has always been in short supply, that&#8217;s nothing new. Today&#8217;s new twist in the C++ world is having to deal with ubiquitous multi-core and multi-cpu systems, and write code in such a way to exploit them. This, too, is not a new need. However, it is still not a part of general C++ culture, and not even a well-understood and well-provisioned aspect of programming in general.</p>
<p>The question is, how will it play out?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: chitown76</title>
		<link>http://www.linux-mag.com/id/6374/#comment-5517</link>
		<dc:creator>chitown76</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/6374/#comment-5517</guid>
		<description>Straight &#039;C&#039; and native assembly will always outperform C++ in terms of speed and efficiency.  I do like C++, but I don&#039;t use it for high performance critical code.  As somebody who has had a great deal of experience working with the Cell Broadband Engine, to get the best performance out of your application, there is no replacement for directly using the SPU intrinsics and/or writing assembly language.  Optimization of your data access patterns for optimal DMA transactions is equally if not more important to achieve the best performance and efficiency.  And of course, arranging your algorithm such that it can be made parallel with minimal synchronization/data dependencies across processors is important too.</description>
		<content:encoded><![CDATA[<p>Straight &#8216;C&#8217; and native assembly will always outperform C++ in terms of speed and efficiency.  I do like C++, but I don&#8217;t use it for high performance critical code.  As somebody who has had a great deal of experience working with the Cell Broadband Engine, to get the best performance out of your application, there is no replacement for directly using the SPU intrinsics and/or writing assembly language.  Optimization of your data access patterns for optimal DMA transactions is equally if not more important to achieve the best performance and efficiency.  And of course, arranging your algorithm such that it can be made parallel with minimal synchronization/data dependencies across processors is important too.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: pdhackett</title>
		<link>http://www.linux-mag.com/id/6374/#comment-5518</link>
		<dc:creator>pdhackett</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/6374/#comment-5518</guid>
		<description>So far, no one has commented on the fact that the article doesn&#039;t say *anything*&lt;br /&gt;
negative about &quot;RapidMind&quot;. My guess is that this:&lt;br /&gt;
&lt;br /&gt;
Program p = RM_BEGIN {&lt;br /&gt;
  In a, b;&lt;br /&gt;
  Out c;&lt;br /&gt;
  Value3f d = f * exp(a) * sin(b);&lt;br /&gt;
  c = d + a * 2.0f;&lt;br /&gt;
} RM_END;&lt;br /&gt;
&lt;br /&gt;
Is turned into &quot;stuff&quot; plus a string. If this is true, you&lt;br /&gt;
aren&#039;t going to get any warnings when you compile, even if you&lt;br /&gt;
type in complete garbage.&lt;br /&gt;
&lt;br /&gt;
Also, I surmise that &quot;RapidMind&quot; is a commercial product and&lt;br /&gt;
the history of commercial general purpose languages doesn&#039;t &lt;br /&gt;
bode well for it. It also seems a bit out of place&lt;br /&gt;
in the Linux world where there is generally an open source&lt;br /&gt;
aspect to &quot;stuff&quot;. (e.g. MySQL, QT)</description>
		<content:encoded><![CDATA[<p>So far, no one has commented on the fact that the article doesn&#8217;t say *anything*<br />
negative about &#8220;RapidMind&#8221;. My guess is that this:</p>
<p>Program p = RM_BEGIN {<br />
  In a, b;<br />
  Out c;<br />
  Value3f d = f * exp(a) * sin(b);<br />
  c = d + a * 2.0f;<br />
} RM_END;</p>
<p>Is turned into &#8220;stuff&#8221; plus a string. If this is true, you<br />
aren&#8217;t going to get any warnings when you compile, even if you<br />
type in complete garbage.</p>
<p>Also, I surmise that &#8220;RapidMind&#8221; is a commercial product and<br />
the history of commercial general purpose languages doesn&#8217;t <br />
bode well for it. It also seems a bit out of place<br />
in the Linux world where there is generally an open source<br />
aspect to &#8220;stuff&#8221;. (e.g. MySQL, QT)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: mmccool</title>
		<link>http://www.linux-mag.com/id/6374/#comment-5519</link>
		<dc:creator>mmccool</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/6374/#comment-5519</guid>
		<description>Hi, I&#039;d like to respond to a couple of the comments posted here.&lt;br /&gt;
&lt;br /&gt;
First, regarding the post by pdhackett: actually, the interface shown above is native C++ which is completely type-checked using normal mechanisms when you invoke your existing C++ compiler.  RapidMind just provides some types and macros in a header file, using normal C++ semantics.  There is no special preprocessor, as you were probably assuming, so what you see above is NOT ever turned into a string.  RapidMind code IS ordinary, portable, ISO-standard C++ code.  The C++ compiler *will* tell you if you have a malformed program.&lt;br /&gt;
&lt;br /&gt;
What actually happens is the RapidMind numerical types, like Value, are instrumented so the RapidMind platform (which is linked in like a library) can observe the sequence of operations that are applied to them.   The BEGIN (which is just a macro wrapping a function call into the API) starts a &quot;trace&quot; of these operations, and END (another function call) stops the trace.  In addition to this &quot;retained&quot; usage, Values also work in &quot;immediate&quot; mode as ordinary numerical types outside of BEGIN/END blocks.  Immediate mode is handy for modifying non-local variables, as described below.&lt;br /&gt;
&lt;br /&gt;
Once a trace has been captured, at runtime the platform uses a staged code generator (completely separate from the C++ code generator) to construct some optimized machine language that reimplements that trace so it can run in parallel. &lt;br /&gt;
&lt;br /&gt;
&quot;Program&quot; objects are basically brand-new functions, built at runtime, that you can use to (asynchronously, as it happens) kick off a parallel version of the sequence of operations captured in the trace.  In other words, the RapidMind platform interface described above adds the capability to C++ to dynamically construct parallelized functions in a safe way.&lt;br /&gt;
&lt;br /&gt;
What this means is:&lt;br /&gt;
&lt;br /&gt;
- C++ overhead is eliminated.  Operations that are not on RapidMind types just act as &quot;scaffolding&quot; that organizes the sequence of numerical operations on RapidMind types.  This scaffolding is completely ignored by the platform&#039;s code generator.   You can use all the C++ modularity constructs like classes, namespaces, virtual member functions etc. that you want freely, then &quot;compile them out&quot;.   &lt;br /&gt;
&lt;br /&gt;
- The modularity and scope constructs of C++ get automatically transformed into interprocessor communication patterns. Non-local variables declared outside of a BEGIN/END work for Programs as you would expect for functions.  This means that binding between code on the &quot;host&quot; and code on the &quot;co-processors&quot; follows the same scope rules as the rest of C++, although the implementation is significantly more involved internally: the co-processor code may be running on separate processors that may not even share the same memory space.  RapidMind hides this complexity completely. &lt;br /&gt;
&lt;br /&gt;
- The &quot;metaprogramming&quot; approach enables some interesting alternative programming models, which can significantly reduce the size of code, without reducing performance, and in some cases even enhancing it significantly.  For example, you can easily generate parameterized variants of functions programmatically, or variants that depend on data only known at runtime.  In particular, you can turn interpreters into compilers trivially, which is a rather extreme form of overhead elimination and run-time dependency.  It should be noted that the staged code generator, since it operates on small kernels, is very fast.&lt;br /&gt;
&lt;br /&gt;
Which addresses an earlier comment: &quot;C++ is not an HPC language&quot;.  Our approach lets you have your cake (modularity and abstraction) and eat it too (performance).   We&#039;ve seen practical examples of portable code that&#039;s 1/10 the size and far easier to understand, but nearly twice the performance as compared to a non-portable C implementation on the same hardware.   Abstraction helps with some of the code size reduction, but the scope rules and the embedded approach get rid of all the annoying glue code as well.  The &quot;kernel&quot; language IS the API.   What you see above is IT.&lt;br /&gt;
&lt;br /&gt;
As for chitown76&#039;s comments: we have a lot of experience with the Cell too, and actually, code written this way can significantly outperform code written at a low level on the Cell, and with much less effort, because &lt;br /&gt;
(a) you can try out more high-level optimizations faster &lt;br /&gt;
(b) you can programmatically generate code with RapidMind that would be insane to try to build by hand &lt;br /&gt;
(c) you can write parameterized code and then automatically search for the sweet spot for block sizes, loop unroll factors, etc. &lt;br /&gt;
(d) the platform automates the more common optimizations, like double-buffering and prefetch, so they&#039;re always there, but are invisible and portable.   &lt;br /&gt;
&lt;br /&gt;
I should emphasize that we are not limited by the semantics of C++ in our code generator: the RapidMind platform has its own, which is designed around use of vector operations.   But if you really, really, really want to, and you are willing to break portability, you CAN use explicit asyncronous DMA transfers and assembly intrinsics through the interface.  Most Cell SPU assembly instructions can be specified with a simple function call, for instance. &lt;br /&gt;
&lt;br /&gt;
Such drill-down is occasionally useful, but should only be done after profiling a more generic implementation, and should be hidden whenever possible inside a suitable abstraction (which our approach makes zero-cost).  Also, often higher-level transformations to the algorithm and the data layout have a bigger impact on performance, and should not be neglected.    &lt;br /&gt;
&lt;br /&gt;
Simple things should be easy.  Difficult things should be possible.&lt;br /&gt;
&lt;br /&gt;
Michael McCool</description>
		<content:encoded><![CDATA[<p>Hi, I&#8217;d like to respond to a couple of the comments posted here.</p>
<p>First, regarding the post by pdhackett: actually, the interface shown above is native C++ which is completely type-checked using normal mechanisms when you invoke your existing C++ compiler.  RapidMind just provides some types and macros in a header file, using normal C++ semantics.  There is no special preprocessor, as you were probably assuming, so what you see above is NOT ever turned into a string.  RapidMind code IS ordinary, portable, ISO-standard C++ code.  The C++ compiler *will* tell you if you have a malformed program.</p>
<p>What actually happens is the RapidMind numerical types, like Value, are instrumented so the RapidMind platform (which is linked in like a library) can observe the sequence of operations that are applied to them.   The BEGIN (which is just a macro wrapping a function call into the API) starts a &#8220;trace&#8221; of these operations, and END (another function call) stops the trace.  In addition to this &#8220;retained&#8221; usage, Values also work in &#8220;immediate&#8221; mode as ordinary numerical types outside of BEGIN/END blocks.  Immediate mode is handy for modifying non-local variables, as described below.</p>
<p>Once a trace has been captured, at runtime the platform uses a staged code generator (completely separate from the C++ code generator) to construct some optimized machine language that reimplements that trace so it can run in parallel. </p>
<p>&#8220;Program&#8221; objects are basically brand-new functions, built at runtime, that you can use to (asynchronously, as it happens) kick off a parallel version of the sequence of operations captured in the trace.  In other words, the RapidMind platform interface described above adds the capability to C++ to dynamically construct parallelized functions in a safe way.</p>
<p>What this means is:</p>
<p>- C++ overhead is eliminated.  Operations that are not on RapidMind types just act as &#8220;scaffolding&#8221; that organizes the sequence of numerical operations on RapidMind types.  This scaffolding is completely ignored by the platform&#8217;s code generator.   You can use all the C++ modularity constructs like classes, namespaces, virtual member functions etc. that you want freely, then &#8220;compile them out&#8221;.   </p>
<p>- The modularity and scope constructs of C++ get automatically transformed into interprocessor communication patterns. Non-local variables declared outside of a BEGIN/END work for Programs as you would expect for functions.  This means that binding between code on the &#8220;host&#8221; and code on the &#8220;co-processors&#8221; follows the same scope rules as the rest of C++, although the implementation is significantly more involved internally: the co-processor code may be running on separate processors that may not even share the same memory space.  RapidMind hides this complexity completely. </p>
<p>- The &#8220;metaprogramming&#8221; approach enables some interesting alternative programming models, which can significantly reduce the size of code, without reducing performance, and in some cases even enhancing it significantly.  For example, you can easily generate parameterized variants of functions programmatically, or variants that depend on data only known at runtime.  In particular, you can turn interpreters into compilers trivially, which is a rather extreme form of overhead elimination and run-time dependency.  It should be noted that the staged code generator, since it operates on small kernels, is very fast.</p>
<p>Which addresses an earlier comment: &#8220;C++ is not an HPC language&#8221;.  Our approach lets you have your cake (modularity and abstraction) and eat it too (performance).   We&#8217;ve seen practical examples of portable code that&#8217;s 1/10 the size and far easier to understand, but nearly twice the performance as compared to a non-portable C implementation on the same hardware.   Abstraction helps with some of the code size reduction, but the scope rules and the embedded approach get rid of all the annoying glue code as well.  The &#8220;kernel&#8221; language IS the API.   What you see above is IT.</p>
<p>As for chitown76&#8242;s comments: we have a lot of experience with the Cell too, and actually, code written this way can significantly outperform code written at a low level on the Cell, and with much less effort, because <br />
(a) you can try out more high-level optimizations faster <br />
(b) you can programmatically generate code with RapidMind that would be insane to try to build by hand <br />
(c) you can write parameterized code and then automatically search for the sweet spot for block sizes, loop unroll factors, etc. <br />
(d) the platform automates the more common optimizations, like double-buffering and prefetch, so they&#8217;re always there, but are invisible and portable.   </p>
<p>I should emphasize that we are not limited by the semantics of C++ in our code generator: the RapidMind platform has its own, which is designed around use of vector operations.   But if you really, really, really want to, and you are willing to break portability, you CAN use explicit asyncronous DMA transfers and assembly intrinsics through the interface.  Most Cell SPU assembly instructions can be specified with a simple function call, for instance. </p>
<p>Such drill-down is occasionally useful, but should only be done after profiling a more generic implementation, and should be hidden whenever possible inside a suitable abstraction (which our approach makes zero-cost).  Also, often higher-level transformations to the algorithm and the data layout have a bigger impact on performance, and should not be neglected.    </p>
<p>Simple things should be easy.  Difficult things should be possible.</p>
<p>Michael McCool</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: chitown76</title>
		<link>http://www.linux-mag.com/id/6374/#comment-5520</link>
		<dc:creator>chitown76</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/6374/#comment-5520</guid>
		<description>Thanks for the insightful information.  I do have a better perspective on your offering now.  Although, in response, I do have no doubt that the set of tools you mention reduces the amount of effort required to implement an application on a multi-core system such as the Cell, but I am not yet convinced that using these tools that you could significantly outperform (or even outperform) a hand-coded optimized implementation.  We can continue to discuss this point over and over, but until an example is openly and readily available then I am still in the camp that for the performance intensive parts of your application where every cycle matters, you&#039;re still not going to get any better than a hand-coded assembly optimized implementation.&lt;br /&gt;
Thanks again for your insightful article!  As a member of the multi-core development community, I look forward to hearing more from you in the near future.</description>
		<content:encoded><![CDATA[<p>Thanks for the insightful information.  I do have a better perspective on your offering now.  Although, in response, I do have no doubt that the set of tools you mention reduces the amount of effort required to implement an application on a multi-core system such as the Cell, but I am not yet convinced that using these tools that you could significantly outperform (or even outperform) a hand-coded optimized implementation.  We can continue to discuss this point over and over, but until an example is openly and readily available then I am still in the camp that for the performance intensive parts of your application where every cycle matters, you&#8217;re still not going to get any better than a hand-coded assembly optimized implementation.<br />
Thanks again for your insightful article!  As a member of the multi-core development community, I look forward to hearing more from you in the near future.</p>
]]></content:encoded>
	</item>
</channel>
</rss>