|
|
2D Diffusion Application
The problem
We wish to solve the IBVP:
on the domain
The domain is discretized using
and we advance the initial conditions in time using the time step
The single process code
The first step in writing a parallel application is to have a working
single processor version! Remember: never try to parallelize a code until the
single processor version has been debugged and is working. The essential part
for this application is DO
WHILE (t<tfinal) DO
i=1,nx-1 DO j=1,ny-1
qnew(i,j)=qold(i,j)+sigmax*(qold(i+1,j)-2*qold(i,j)+qold(i-1,j)) &
sigmay*(qold(i,j+1)-2*qold(i,j)+qold(i,j-1)) END DO END
DO qold=qnew END
DO The parallel code
We split the domain into vertical strips. Each strip has a local
x-index running from i=0 to i=ni+1 with i=0, i=ni+1 corresponding to regions
outside of the strip. The values for these must be obtained from the
adjoining strips or the boundary conditions. We accomplish this by sending messages between the processes with the
values in these ghost cells. The communication is structured in the following
way: ·
All strips except the last send their i=ni gridline
values. These correspond to the i=0 ghost cell values for the right-adjoining
strip. ·
All strips except the first wait to receive their i=0
ghost cell values. ·
All strips except the first send their i=1 gridline
values. These correspond to the i=ni+1 ghost cell values for the
left-adjoining strip. ·
All strips except the last wait to receive their
i=ni+1 ghost cell values You have to be careful about how full the last grid strip is but other
than that the parallel implementation is pretty straightforward. CALL
MPI_COMM_SIZE(MPI_COMM_WORLD,np,ierr) CALL
MPI_COMM_RANK(MPI_COMM_WORLD,idproc,ierr) DO
WHILE (t<tfinal) DO
j=1,ny DO i=1,MIN(ni,nx+1-idproc*ni)
qnew(i,j)=qold(i,j)+sigmax*(qold(i+1,j)-2*qold(i,j)+qold(i-1,j)) & +sigmay*(qold(i,j+1)-2*qold(i,j)+qold(i,j-1)) END DO END
DO IF
(idproc < np-1) THEN qright(:)=qnew(ni,:) CALL MPI_SEND(qright(0), ny+1,
MPI_REAL, idproc+1, TAG_DATASYNCH, MPI_COMM_WORLD, ierr) END
IF IF
(idproc > 0) THEN CALL MPI_RECV(qleft(0), ny+1,
MPI_REAL, idproc-1, TAG_DATASYNCH, MPI_COMM_WORLD, ierr) qnew(0,:)=qleft(:) END
IF IF
(idproc > 0) THEN qleft(:)=qnew(1,:) CALL MPI_SEND(qleft(0), ny+1,
MPI_REAL, idproc-1, TAG_DATASYNCH, MPI_COMM_WORLD, ierr) END
IF IF
(idproc < np-1) THEN CALL MPI_RECV(qright(0), ny+1,
MPI_REAL, idproc+1, TAG_DATASYNCH, MPI_COMM_WORLD, ierr) qnew(ni+1,:)=qright(:) END
IF qold=qnew t=t+delt
Results
The complete code may be found by following this link: mpidif2D.f90. The speed-ups in going from a single processor to two processors as a
function of problem size are
This particular algorithm is not a very good candidate for distributed
parallel processing since there is relatively little computation between the
communication of ghost cell values and better performance would probably be
achieved by a vectorized implementation. An animation of the results of the computation is available: mpidif2d.mpeg |