Home

Reference

Tutorials

Applications

 

2D Diffusion Application

The problem

We wish to solve the IBVP:

on the domain  using a FTCS discretization

The domain is discretized using

and we advance the initial conditions in time using the time step

The single process code

The first step in writing a parallel application is to have a working single processor version! Remember: never try to parallelize a code until the single processor version has been debugged and is working. The essential part for this application is

DO WHILE (t<tfinal)

DO i=1,nx-1

  DO j=1,ny-1

    qnew(i,j)=qold(i,j)+sigmax*(qold(i+1,j)-2*qold(i,j)+qold(i-1,j)) &

                        sigmay*(qold(i,j+1)-2*qold(i,j)+qold(i,j-1))   

  END DO

END DO

qold=qnew

END DO

The parallel code

 We split the domain into vertical strips. Each strip has a local x-index running from i=0 to i=ni+1 with i=0, i=ni+1 corresponding to regions outside of the strip. The values for these must be obtained from the adjoining strips or the boundary conditions.

We accomplish this by sending messages between the processes with the values in these ghost cells. The communication is structured in the following way:

·         All strips except the last send their i=ni gridline values. These correspond to the i=0 ghost cell values for the right-adjoining strip.

·         All strips except the first wait to receive their i=0 ghost cell values.

·         All strips except the first send their i=1 gridline values. These correspond to the i=ni+1 ghost cell values for the left-adjoining strip.

·         All strips except the last wait to receive their i=ni+1 ghost cell values

You have to be careful about how full the last grid strip is but other than that the parallel implementation is pretty straightforward.

CALL MPI_COMM_SIZE(MPI_COMM_WORLD,np,ierr)

CALL MPI_COMM_RANK(MPI_COMM_WORLD,idproc,ierr) 

DO WHILE (t<tfinal)  

DO j=1,ny

  DO i=1,MIN(ni,nx+1-idproc*ni)

    qnew(i,j)=qold(i,j)+sigmax*(qold(i+1,j)-2*qold(i,j)+qold(i-1,j)) &

                       +sigmay*(qold(i,j+1)-2*qold(i,j)+qold(i,j-1))       

  END DO

END DO

IF (idproc < np-1) THEN

  qright(:)=qnew(ni,:)

  CALL MPI_SEND(qright(0), ny+1, MPI_REAL, idproc+1, TAG_DATASYNCH, MPI_COMM_WORLD, ierr)

END IF   

IF (idproc > 0) THEN     

  CALL MPI_RECV(qleft(0), ny+1, MPI_REAL, idproc-1, TAG_DATASYNCH, MPI_COMM_WORLD, ierr)

  qnew(0,:)=qleft(:)

END IF   

IF (idproc > 0) THEN

  qleft(:)=qnew(1,:)

  CALL MPI_SEND(qleft(0), ny+1, MPI_REAL, idproc-1, TAG_DATASYNCH, MPI_COMM_WORLD, ierr)

END IF   

IF (idproc < np-1) THEN

  CALL MPI_RECV(qright(0), ny+1, MPI_REAL, idproc+1, TAG_DATASYNCH, MPI_COMM_WORLD, ierr)

  qnew(ni+1,:)=qright(:)

END IF    

qold=qnew

t=t+delt              

Results

 

The complete code may be found by following this link: mpidif2D.f90.

 

The speed-ups in going from a single processor to two processors as a function of problem size are

64x64

128x128

256x256

1.42

1.71

1.84

This particular algorithm is not a very good candidate for distributed parallel processing since there is relatively little computation between the communication of ghost cell values and better performance would probably be achieved by a vectorized implementation.

An animation of the results of the computation is available: mpidif2d.mpeg