Distributed Arrays and Coarrays in Fortran 2008

In modern computational systems, the demand for parallel processing has risen sharply, especially for high-performance computing (HPC) tasks such as simulations, scientific computations, and data-intensive calculations. One of the key advancements in parallel computing is the ability to perform distributed computations across multiple processors or nodes. Fortran 2008, the latest update to the Fortran programming language, introduced coarrays, a powerful tool for distributed memory parallelism that allows arrays to be shared and updated across different processors. This post explores the concept of coarrays in Fortran, how they work, and their use in creating scalable, distributed applications.

What are Coarrays?

Coarrays are a language feature introduced in Fortran 2008 to facilitate parallel programming on distributed-memory systems. Before coarrays, Fortran programmers used MPI (Message Passing Interface) or other parallel programming models to achieve distributed computing. Coarrays provide an alternative approach to distributed memory parallelism by allowing arrays to be implicitly shared between different images (processors or threads).

Coarrays are arrays that are declared in a special way, allowing them to be accessed by all participating processors in a parallel execution. Each processor can read and write to the coarray, which enables distributed computation. A coarray array, by definition, exists on all images, and operations on it are synchronized across all processors.

Key Features of Coarrays

Distributed Memory Model: Coarrays allow arrays to be shared across different processors, enabling distributed computing without explicitly managing memory across different nodes.
Simplicity: Unlike message-passing models like MPI, coarrays allow simpler syntax for parallel operations, which reduces the complexity of writing parallel code in Fortran.
Synchronization: Coarrays in Fortran come with synchronization features like the SYNC directive to ensure that updates to the coarray are propagated across all images at the correct time.
Scalability: Coarrays are designed to scale easily across many processors or nodes, making them suitable for large-scale parallel applications.
Implicit Sharing: Coarrays automatically provide the necessary mechanisms to allow processors to share data without requiring complex inter-process communication.

Syntax of Coarrays

In Fortran 2008, coarrays are declared using the following syntax:

real, dimension[:,:] :: a[*], b[*]

Here, a[*] and b[*] are coarrays. The [*] symbol indicates that the array a and b are distributed across all participating processors. This means that each processor has access to the entire array, and updates to the array will be visible across all processors.

Example Code: Coarrays in Action

Let’s take a closer look at an example that demonstrates the use of coarrays in Fortran 2008.

program coarray_example
  implicit none
  real, dimension[:,:] :: a[*], b[*]
  integer :: i, j

  ! Initialize coarrays on each image (processor)
  a(1,1) = 100.0
  b(1,1) = 200.0

  ! Synchronize all images
  sync all

  ! Print the values of the arrays
  print *, "On image ", this_image(), ":"
  print *, "a(1,1) = ", a(1,1)
  print *, "b(1,1) = ", b(1,1)

  ! Perform computations (e.g., update array values)
  a(1,1) = a(1,1) + 50.0
  b(1,1) = b(1,1) + 25.0

  ! Synchronize all images again after computation
  sync all

  ! Final output after synchronization
  print *, "After computation, on image ", this_image(), ":"
  print *, "a(1,1) = ", a(1,1)
  print *, "b(1,1) = ", b(1,1)

end program coarray_example

Breakdown of the Code:

Array Declaration: real, dimension[:,:] :: a[*], b[*] In this line, both a and b are declared as coarrays. The [*] indicates that these arrays will be distributed across all processors. This means that each image (processor) will have access to its corresponding portion of the arrays, allowing shared operations.
Initialization: a(1,1) = 100.0 b(1,1) = 200.0 In this part, we initialize the values of the coarrays a and b. These values are set on all images (processors).
Synchronization: sync all The sync all directive ensures that all images synchronize before proceeding further. This means that any changes made to the coarrays are propagated across all images before any further computation can proceed. Without synchronization, one processor may proceed with outdated data, leading to inconsistencies.
Computation: a(1,1) = a(1,1) + 50.0 b(1,1) = b(1,1) + 25.0 This part performs simple computations on the coarrays. The value of a(1,1) is increased by 50.0, and b(1,1) is increased by 25.0. After computation, the values are updated across all participating processors.
Final Output:
The final print statements display the updated values of the coarrays after synchronization.

Synchronization in Coarrays

Synchronization is a crucial aspect of working with coarrays. Without synchronization, different images may operate on outdated data, leading to incorrect results. Fortran provides the SYNC directive to synchronize coarrays across different processors.

Types of Synchronization:

SYNC ALL: Ensures that all images synchronize at the same point in the program, which is useful when you want to make sure that all processors reach the same state before proceeding.
SYNC IMAGE: Synchronizes a specific image with others, useful for situations where certain processors should synchronize while others continue their work.
SYNC MEMORY: Synchronizes only the memory of a specific processor, allowing a fine-grained level of synchronization when needed.

Example:

! Synchronize all images (processors)
sync all

This ensures that all the participating processors are synchronized and have consistent values before proceeding with the next operations.

Using Coarrays for Distributed Computation

One of the main advantages of coarrays is their ability to facilitate distributed computing without the complexity of managing explicit inter-process communication. This is particularly useful in high-performance computing tasks, where operations must be performed concurrently across a large number of processors or nodes.

Consider a scenario where a large array of data is being processed across multiple nodes. Each node can process a portion of the array, and then, through coarrays, share updates and results with other nodes. The synchronization features in Fortran ensure that these operations occur in a consistent manner.