<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet href="client.xsl" type="text/xsl"?>
<article article-type="other">
<front>
<journal-meta>
<journal-id>International Journal of Aerospace and Lightweight Structures</journal-id>
<publication_date>2012</publication_date>
<volume>2</volume>
<issue>1</issue>
<banner>
<href>banner.jpg</href>
<size width="100%"/>
</banner>
</journal-meta>
<article-meta>
<title-group>
<doi>10.3850/S2010428612000232</doi>
<article-title>High Performance Computation by Multi-Node GPU Cluster-Tsubame2.0 on the Air Flow in an Urban City Using Lattice Boltzmann Method</article-title>
</title-group>

<author>Xian Wang<sup>1</sup> and Takayuki Aoki<sup>2</sup></author>
<author-citation>Wang, Xian; Aoki, Takayuki</author-citation>

<aff><sup>1</sup>State Key Laboratory for Strength and Vibration of Mechanical Structures, School of Aerospace, Xi'an Jiaotong University, Xi'an, Shannxi, 710049, China. </aff>
<email><a href="mailto:wangxian@mail.xjtu.edu.cn">wangxian@mail.xjtu.edu.cn</a></email>
<aff><sup>2</sup>Global Scientific Informational and Computing Center, Tokyo Institute of Technology,
2-12-1, Meguro-ku, Tokyo, 152-8550, Japan.</aff>
<email><a href="mailto:taoki@gsic.titech.ac.jp">taoki@gsic.titech.ac.jp</a></email>

</article-meta></front>
<body>
<abstract>
<title>ABSTRACT</title>
<p>General Purpose Graphic Processing Unit (GPGPU) has drawn much attention on accelerating non-graphic applications. The simulation by D3Q19 model of Lattice Boltzmann
method was executed successfully on multi-node GPU cluster by using CUDA programming and MPI library. The numerical code ran on the multi-node GPU cluster TSUBAME2.0 of Tokyo Institute of technology, which includes 1408 computing nodes and all of them are equipped with three NVIDIA Tesla M2050 GPU accelerators, total
4224 GPUs are equipped. In the present work, a large-scaled computation on the air flow
in an urban city was carried on using 120 GPUs of TSUBAME2.0. The number of computational grids was 3072£ 2000 £ 256 and the computational domain was decomposed in a three dimensional way for parallel computation. The large eddy simulation (LES) was adopted for the turbulence. As a result, for the conditions of 1km x 1km area and
273 s flow time, the time for simulation is 585 seconds, in which the computational time
is 461 seconds and data communicational time is 124 seconds, respectively. The achieved
performance is about 6 TFLOPS.</p><p><italic>Keywords: GPGPU, Parallel computation, Lattice Boltzmann method.</italic></p>
</abstract>
<fpdf>
<href>pdflogo.jpg</href>
<hpdf>S2010428612000232</hpdf>
<nsref>../protected_docs/0201/S2010428612000232.pdf</nsref>
</fpdf>
</body>
</article>

