OpenMP: Introduction to OpenMP (Part 9)
Loop Worksharing Constructs in OpenMP
In parallel programming, loop worksharing constructs are used to distribute loop iterations among multiple threads, allowing them to execute the iterations in parallel. OpenMP provides worksharing constructs that enable developers to parallelize loops easily and efficiently. This blog post explores the sequential code, a basic parallel region, and introduces worksharing constructs using examples. Additionally, we will discuss the importance of the "schedule" clause in controlling loop iteration assignments.
Sequential Code:
for (i = 0; i < N; i++) {
a[i] = a[i] + b[i];
}
The above code represents a simple loop that iterates over the elements of arrays a
and b
and performs an element-wise addition. This sequential code processes the loop iterations one by one, resulting in a sequential execution.
OpenMP Parallel Region with Manual Loop Partitioning:
#include
int main() {
int id, i, Nthrds, istart, iend;
#pragma omp parallel
{
id = omp_get_thread_num();
Nthrds = omp_get_num_threads();
istart = id * N / Nthrds;
iend = (id + 1) * N / Nthrds;
if (id == Nthrds - 1)
iend = N;
for (i = istart; i < iend; i++) {
a[i] = a[i] + b[i];
}
}
return 0;
}
In the provided code, a parallel region is created using the #pragma omp parallel
directive. Each thread retrieves its unique thread ID using omp_get_thread_num()
and the total number of threads using omp_get_num_threads()
. The iterations of the loop are manually partitioned among the threads, ensuring that each thread processes a distinct subset of the loop iterations. This manual partitioning enables parallel execution of the loop.
OpenMP Parallel Region with Worksharing for Construct:
#include
int main() {
#pragma omp parallel
{
#pragma omp for
for (i = 0; i < N; i++) {
a[i] = a[i] + b[i];
}
}
return 0;
}
The above code utilizes the worksharing construct #pragma omp for
to parallelize the loop. By using this construct, OpenMP automatically distributes the loop iterations among the threads. Each thread is assigned a distinct subset of the loop iterations to process, enabling efficient parallel execution of the loop.
Importance of Schedule:
The "schedule" clause in OpenMP allows you to control how loop iterations are assigned to threads. It determines the granularity and order in which iterations are executed. Two commonly used schedule options are "static" and "dynamic".
Static Schedule:
#pragma omp for schedule(static)
for (i = 0; i < N; i++) {
a[i] = a[i] + b[i];
}
With the "schedule(static)" clause, the loop iterations are divided into equal-sized chunks, and each thread is assigned a chunk of iterations to execute. The chunks are assigned statically, meaning they are assigned before the parallel region begins and remain constant throughout the execution.
Dynamic Schedule:
#pragma omp for schedule(dynamic)
for (i = 0; i < N; i++) {
a[i] = a[i] + b[i];
}
Using the "schedule(dynamic)" clause, the loop iterations are dynamically divided into smaller chunks, and each thread is assigned a chunk as it becomes available. This allows for load balancing and can be beneficial when the workload of each iteration varies.
Selecting the appropriate schedule option is crucial to achieve load balance and maximize performance. The choice depends on factors such as the workload of each iteration and the nature of data dependencies in the loop.
Comments
Post a Comment