OpenMP: Introduction to OpenMP (Part 9)

Loop Worksharing Constructs in OpenMP

Loop Worksharing Constructs in OpenMP

In parallel programming, loop worksharing constructs are used to distribute loop iterations among multiple threads, allowing them to execute the iterations in parallel. OpenMP provides worksharing constructs that enable developers to parallelize loops easily and efficiently. This blog post explores the sequential code, a basic parallel region, and introduces worksharing constructs using examples. Additionally, we will discuss the importance of the "schedule" clause in controlling loop iteration assignments.

Sequential Code:

    
for (i = 0; i < N; i++) {
  a[i] = a[i] + b[i];
}
    
  

The above code represents a simple loop that iterates over the elements of arrays a and b and performs an element-wise addition. This sequential code processes the loop iterations one by one, resulting in a sequential execution.

OpenMP Parallel Region with Manual Loop Partitioning:

    
#include 

int main() {
  int id, i, Nthrds, istart, iend;

  #pragma omp parallel
  {
    id = omp_get_thread_num();
    Nthrds = omp_get_num_threads();

    istart = id * N / Nthrds;
    iend = (id + 1) * N / Nthrds;

    if (id == Nthrds - 1)
      iend = N;

    for (i = istart; i < iend; i++) {
      a[i] = a[i] + b[i];
    }
  }

  return 0;
}
    
  

In the provided code, a parallel region is created using the #pragma omp parallel directive. Each thread retrieves its unique thread ID using omp_get_thread_num() and the total number of threads using omp_get_num_threads(). The iterations of the loop are manually partitioned among the threads, ensuring that each thread processes a distinct subset of the loop iterations. This manual partitioning enables parallel execution of the loop.

OpenMP Parallel Region with Worksharing for Construct:

    
#include 

int main() {
  #pragma omp parallel
  {
    #pragma omp for
    for (i = 0; i < N; i++) {
      a[i] = a[i] + b[i];
    }
  }

  return 0;
}
    
  

The above code utilizes the worksharing construct #pragma omp for to parallelize the loop. By using this construct, OpenMP automatically distributes the loop iterations among the threads. Each thread is assigned a distinct subset of the loop iterations to process, enabling efficient parallel execution of the loop.

Importance of Schedule:

The "schedule" clause in OpenMP allows you to control how loop iterations are assigned to threads. It determines the granularity and order in which iterations are executed. Two commonly used schedule options are "static" and "dynamic".

Static Schedule:

    
#pragma omp for schedule(static)
for (i = 0; i < N; i++) {
  a[i] = a[i] + b[i];
}
    
  

With the "schedule(static)" clause, the loop iterations are divided into equal-sized chunks, and each thread is assigned a chunk of iterations to execute. The chunks are assigned statically, meaning they are assigned before the parallel region begins and remain constant throughout the execution.

Dynamic Schedule:

    
#pragma omp for schedule(dynamic)
for (i = 0; i < N; i++) {
  a[i] = a[i] + b[i];
}
    
  

Using the "schedule(dynamic)" clause, the loop iterations are dynamically divided into smaller chunks, and each thread is assigned a chunk as it becomes available. This allows for load balancing and can be beneficial when the workload of each iteration varies.

Selecting the appropriate schedule option is crucial to achieve load balance and maximize performance. The choice depends on factors such as the workload of each iteration and the nature of data dependencies in the loop.

References:

  • "Using OpenMP: Portable Shared Memory Parallel Programming" by Barbara Chapman, Gabriele Jost, and Ruud van der Pas
  • OpenMP official website: https://www.openmp.org

Comments

Popular Posts