OpenMP: Introduction to OpenMP (Part 5)

Fork-Join Parallelism and Nested Threads

Fork-Join Parallelism and Nested Threads

In parallel programming, Fork-Join parallelism refers to the concept of dividing a program's execution into multiple threads that work concurrently and then join back together to continue the program's execution as a single unit. OpenMP, a popular parallel programming API, follows the Fork-Join model and enables developers to utilize parallelism effectively.

Example:

    
double A[1000];
omp_set_num_threads(4);
#pragma omp parallel
{
  int ID = omp_get_thread_num();
  pooh(ID, A);
}
    
  

Approximation of Data Allocation:

In the provided example, if the data array A is allocated outside the parallel region, it stays in the heap memory and is accessible to all threads. However, if the data is allocated inside the parallel region, it resides in each thread's individual stack memory and becomes private/local to each thread.

Master Thread and Thunk Thread:

In OpenMP, the master thread refers to the thread that encounters the parallel region. It performs the initial setup and executes the parallel region alongside other threads. The master thread also handles the tasks after the parallel region, once all the other threads have finished their execution.

The thunk thread, on the other hand, is an internal thread used by the OpenMP runtime to manage the parallel region's execution. It assists in workload distribution and synchronization among the participating threads.

Explanation of the Example:

    
static long num_steps = 100000;
double step;
int main() {
  int i;
  double x, pi, sum = 0.0;
  step = 1.0 / (double)num_steps;

  for (i = 0; i < num_steps; i++) {
    x = (i + 0.5) * step;
    sum = sum + 4.0 / (1.0 + x * x);
  }

  pi = step * sum;
}
    
  

The provided code calculates an approximation of the mathematical constant pi using a numerical integration technique called the "Monte Carlo method." It sequentially computes the sum of a series of terms and multiplies it by the step size to estimate pi. However, this code does not utilize parallelism.

Enhanced Example with Parallelism:

    
#include 
static long num_steps = 100000;
double step;
#define NUM_THREADS 2

int main() {
  int i, nthreads;
  double pi, sum[NUM_THREADS];

  step = 1.0 / (double)num_steps;
  omp_set_num_threads(NUM_THREADS);

  #pragma omp parallel
  {
    int i, id, nthrds;
    double x;

    id = omp_get_thread_num();
    nthrds = omp_get_num_threads();

    if (id == 0)
      nthreads = nthrds;

    for (i = id, sum[id] = 0.0; i < num_steps; i = i + nthrds) {
      x = (i + 0.5) * step;
      sum[id] += 4.0 / (1.0 + x * x);
    }
  }

  for (i = 0, pi = 0.0; i < nthreads; i++)
    pi += sum[i] * step;
    
  

The enhanced version of the code incorporates parallelism using OpenMP. By dividing the loop iterations among multiple threads, each thread computes a portion of the sum independently. At the end, the master thread combines the partial sums to obtain the final result of pi.

References:

  • "Using OpenMP: Portable Shared Memory Parallel Programming" by Barbara Chapman, Gabriele Jost, and Ruud van der Pas
  • OpenMP official website: https://www.openmp.org

Comments

Popular Posts