<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Small HPC</title>
	<atom:link href="http://www.linux-mag.com/id/7362/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.linux-mag.com/id/7362/</link>
	<description>Open Source, Open Standards</description>
	<lastBuildDate>Sat, 05 Oct 2013 13:48:18 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
	<item>
		<title>By: kalloyd</title>
		<link>http://www.linux-mag.com/id/7362/#comment-6544</link>
		<dc:creator>kalloyd</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7362/#comment-6544</guid>
		<description>Doug,&lt;br /&gt;
&lt;br /&gt;
This is a great start on what&#039;s going on in HPC.  But there is much more that multi-core CPUs.  Consider the MPI implications between clusters of multi-core CPUs, each core spawning MPP kernels on hundreds of GPU process cores.  Some of those simple global or shared memory sections now have multiple routing options - such as xGigE or InfiniBand.&lt;br /&gt;
&lt;br /&gt;
While I see a great paradigm shift in learning parallel programming, the compute fabrics that result from hybrid, heterogeneous compute clusters means that networked parallel/serial fabrics offer breathtaking power, but come with great complexity.&lt;br /&gt;
&lt;br /&gt;
This is playing field for serious HPC software developers today.&lt;br /&gt;
&lt;br /&gt;
Ken Lloyd&lt;br /&gt;
Director of Systems Science&lt;br /&gt;
Watt Systems Technologies Inc.</description>
		<content:encoded><![CDATA[<p>Doug,</p>
<p>This is a great start on what&#8217;s going on in HPC.  But there is much more that multi-core CPUs.  Consider the MPI implications between clusters of multi-core CPUs, each core spawning MPP kernels on hundreds of GPU process cores.  Some of those simple global or shared memory sections now have multiple routing options &#8211; such as xGigE or InfiniBand.</p>
<p>While I see a great paradigm shift in learning parallel programming, the compute fabrics that result from hybrid, heterogeneous compute clusters means that networked parallel/serial fabrics offer breathtaking power, but come with great complexity.</p>
<p>This is playing field for serious HPC software developers today.</p>
<p>Ken Lloyd<br />
Director of Systems Science<br />
Watt Systems Technologies Inc.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: endawobrien</title>
		<link>http://www.linux-mag.com/id/7362/#comment-6545</link>
		<dc:creator>endawobrien</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7362/#comment-6545</guid>
		<description>I expect that shared-memory, multi-threaded HPC will become quite democratic and ubiquitous, since multi-core will be everywhere and the programming tools for this (e.g., OpenMP) are relatively simple.  I think of this as &quot;supply-led&quot; HPC.&lt;br /&gt;
&lt;br /&gt;
However, the real heavyweight macho HPC world is more &quot;demand led&quot;, and will continue to exploit the power of distributed-memory systems, while still press-ganging multi-core processors, GPGPUs, FPGAs and whatever other accelerator or configuration will let them run ever bigger models, ever faster.&lt;br /&gt;
&lt;br /&gt;
Difficulty with programming may limit the adoption (and business success) of some &quot;accelerators&quot;, but there will probably always be a hard core (pardon the pun...) of deviant weirdos for whom performance is everything, and who are prepared to go to heroic lengths to exploit parallelism on even the most unwieldy hardware.&lt;br /&gt;
&lt;br /&gt;
Certainly, a &quot;holy grail&quot; for HPC would be a high-level programming language that could compile an executable to run &quot;transparently&quot; over distributed memory.  Some such grails, of intermediate holiness, already exist.  Otherwise, a tool to automatically decompose a serial application for MPI would be nice; something analogous to what OpenMP does for multi-threading.&lt;br /&gt;
&lt;br /&gt;
-Enda</description>
		<content:encoded><![CDATA[<p>I expect that shared-memory, multi-threaded HPC will become quite democratic and ubiquitous, since multi-core will be everywhere and the programming tools for this (e.g., OpenMP) are relatively simple.  I think of this as &#8220;supply-led&#8221; HPC.</p>
<p>However, the real heavyweight macho HPC world is more &#8220;demand led&#8221;, and will continue to exploit the power of distributed-memory systems, while still press-ganging multi-core processors, GPGPUs, FPGAs and whatever other accelerator or configuration will let them run ever bigger models, ever faster.</p>
<p>Difficulty with programming may limit the adoption (and business success) of some &#8220;accelerators&#8221;, but there will probably always be a hard core (pardon the pun&#8230;) of deviant weirdos for whom performance is everything, and who are prepared to go to heroic lengths to exploit parallelism on even the most unwieldy hardware.</p>
<p>Certainly, a &#8220;holy grail&#8221; for HPC would be a high-level programming language that could compile an executable to run &#8220;transparently&#8221; over distributed memory.  Some such grails, of intermediate holiness, already exist.  Otherwise, a tool to automatically decompose a serial application for MPI would be nice; something analogous to what OpenMP does for multi-threading.</p>
<p>-Enda</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: indivar</title>
		<link>http://www.linux-mag.com/id/7362/#comment-6546</link>
		<dc:creator>indivar</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7362/#comment-6546</guid>
		<description>Hi &lt;br /&gt;
&lt;br /&gt;
One of the biggest challenge for any Distributed Shared Memory (DSM) system is &#039;Thread Migration&#039;. While most DSM systems have managed to migrate complete processes from one node to another (and thereby spread the load), almost all of them are struggling with individual thread migration. Now since most of today&#039;s applications are multi-threaded as opposed to multi-process, the effectiveness of such a solution is limited. DSM developers are getting there, but it will take a while to perfect it.&lt;br /&gt;
&lt;br /&gt;
In any case, App Developers will have to now look at writing Hybrid parallel applications, i.e. applications that are multi-process as well as multi-threaded (e.g. Apache HTTP Server, worker module) to really exploit the power of today&#039;s systems, be it Hardware based SMP (mulit-core) or Software based SMP (DSM). &lt;br /&gt;
&lt;br /&gt;
In short, both DSM and MPI developers face the same set of challenges. &lt;br /&gt;
&lt;br /&gt;
In future, DSM will be used for day-to-day to applications while MPI, because of its sheer ability to scale massively will continue to be used for hardcore number crunching stuff. &lt;br /&gt;
&lt;br /&gt;
Note: &lt;b&gt;HPC&lt;/b&gt; is a relative term. What you call &#039;Small HPC&#039; or as Microsoft calls it - &#039;Personal SuperComputer&#039;, will simply remain to be called the humble &lt;b&gt;&#039;Personal Computer&#039;&lt;/b&gt; or &lt;b&gt;&#039;PC&#039;&lt;/b&gt; in future.&lt;br /&gt;
&lt;br /&gt;
- Indivar Nair</description>
		<content:encoded><![CDATA[<p>Hi </p>
<p>One of the biggest challenge for any Distributed Shared Memory (DSM) system is &#8216;Thread Migration&#8217;. While most DSM systems have managed to migrate complete processes from one node to another (and thereby spread the load), almost all of them are struggling with individual thread migration. Now since most of today&#8217;s applications are multi-threaded as opposed to multi-process, the effectiveness of such a solution is limited. DSM developers are getting there, but it will take a while to perfect it.</p>
<p>In any case, App Developers will have to now look at writing Hybrid parallel applications, i.e. applications that are multi-process as well as multi-threaded (e.g. Apache HTTP Server, worker module) to really exploit the power of today&#8217;s systems, be it Hardware based SMP (mulit-core) or Software based SMP (DSM). </p>
<p>In short, both DSM and MPI developers face the same set of challenges. </p>
<p>In future, DSM will be used for day-to-day to applications while MPI, because of its sheer ability to scale massively will continue to be used for hardcore number crunching stuff. </p>
<p>Note: <b>HPC</b> is a relative term. What you call &#8216;Small HPC&#8217; or as Microsoft calls it &#8211; &#8216;Personal SuperComputer&#8217;, will simply remain to be called the humble <b>&#8216;Personal Computer&#8217;</b> or <b>&#8216;PC&#8217;</b> in future.</p>
<p>- Indivar Nair</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: heymanj</title>
		<link>http://www.linux-mag.com/id/7362/#comment-6547</link>
		<dc:creator>heymanj</dc:creator>
		<pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.linux-mag.com/id/7362/#comment-6547</guid>
		<description>Memory on a per node basis will become the cost point.  With the 6core AMD processor supporting upwards of 256GB of memory, one has to think of memory cost when building a large multi-core system.&lt;br /&gt;
This fact will come to play with application development as another possible divergence comes into play - small number of multi-core processors with large local memory or large number of multi-core processors with significantly smaller amounts of local memory.&lt;br /&gt;
&lt;br /&gt;
OpenMP benefits from the large shared memory w/any number of multi-core processors.   MPI can do both but will definitely succeed in a large multi-core with smaller individual node memory footprint.&lt;br /&gt;
&lt;br /&gt;
There is a definite need for something to assist MPI programming along the lines of OpenMP pragmas or compiler smarts to hide all the details.  Otherwise in the case of 10Ks of MPI tasks (one task per core) you wind up with the old adage of &quot;not being able to see the forest because you&#039;re lost with the individual trees&quot;&lt;br /&gt;
&lt;br /&gt;
Jerry Heyman</description>
		<content:encoded><![CDATA[<p>Memory on a per node basis will become the cost point.  With the 6core AMD processor supporting upwards of 256GB of memory, one has to think of memory cost when building a large multi-core system.<br />
This fact will come to play with application development as another possible divergence comes into play &#8211; small number of multi-core processors with large local memory or large number of multi-core processors with significantly smaller amounts of local memory.</p>
<p>OpenMP benefits from the large shared memory w/any number of multi-core processors.   MPI can do both but will definitely succeed in a large multi-core with smaller individual node memory footprint.</p>
<p>There is a definite need for something to assist MPI programming along the lines of OpenMP pragmas or compiler smarts to hide all the details.  Otherwise in the case of 10Ks of MPI tasks (one task per core) you wind up with the old adage of &#8220;not being able to see the forest because you&#8217;re lost with the individual trees&#8221;</p>
<p>Jerry Heyman</p>
]]></content:encoded>
	</item>
</channel>
</rss>