Open in Colab:

Parallel Processing

The parallelization approach adopted in CASA is the so-called embarrassingly parallelization. Embarassingly parallel workload or problem is one where little or no effort is needed to separate the problem into a number of parallel tasks. This is often the case where there is little or no dependency or need for communication between those parallel tasks, or for results between them.

In order to run one analysis on multiple processors, one can parallelize the work by dividing the data into several parts (“partitioning”) and then:

    1. run a CASA instance on each part, or

    1. have non-trivially parallelized algorithms, which make use of several processors within a single CASA instance. Non-trivial parallelization is presently only implemented in a few areas of the CASA codebase, and is based on OpenMP, which is a shared-memory parallel programming library. For example certain sections of the imaging code of CASA are parallelized using OpenMP.

All other parallelization is achieved by partitioning the MeasurementSet (MS) of interest using the task partition or at import time using importasdm. The resulting partitioned MS is called a “Multi-MS” or “MMS”. The parallel processing of a Multi-MS is possible using the Message Passing Interface (MPI). MPI is a standard which addresses primarily the message-passing parallel programming model in a practical, portable, efficient and flexible way.

WARNING: Parallel processing on multi-MSs in CASA is unverified - please use at own discretion.

Logically, an MMS has the same structure as an MS but internally it is a group of several MSs which are virtually concatenated. Virtual concatenation of several MSs or MMSs into an MMS can also be achieved via task virtualconcat.

Due to the virtual concatenation, the main table of an MMS appears like the union of the main tables of all the member MSs such that when the MMS is accessed like a normal MS, processing can proceed sequentially as usual. Each member MS or “Sub-MS” of an MMS, however, is at the same time a valid MS on its own and can be processed as such. This is what happens when the MMS is accessed by a parallelized task. The partitioning of the MMS is recognized and work is started in parallel on the separate Sub-MSs, provided that the user has started CASA with mpicasa.

The internal structure of an MMS can be inspected using task listpartition.

Configuration and Control

CASA can be run in parallel on a cluster of computer nodes or on a single multi-core computer. In the multi-node case, the following requirements are necessary for all nodes to be included in the cluster. Users with access to a cluster will not need to do these settings, but it is still useful to be aware of the configuration:

  • Password-less ssh access from the client (user) machine into all the nodes to be included in the cluster.

NOTE: This is not necessary when using only localhost, i.e. if the cluster is deployed only on the machine where CASA is running.
  • All the input files must be located in a shared file-system, accessible from all the nodes comprising the cluster, and mounted in the same path of the file-system.

  • Mirrored CASA installation with regards to the CASA installation in the client (user) machine, so that the following environmental variables are pointing to valid installations: PATH, LD_LIBRARY_PATH, IPYTHONDIR, CASAPATH, CASAARCH, PYTHONHOME, __CASAPY_PYTHONDIR, PGPLOT_DEV, PGPLOT_DIR, PGPLOT_FONT. This is usually achieved by having the CASA installation on a shared file-system.

Configuration and Start-Up

The main library used in CASA (4.4+) to achieve parallelization is the Message Passing Interface (MPI) and in particular the OpenMPI implementation. MPI is already included in the CASA distribution so that users do not need to install it. The CASA distribution comes with a wrapper of the MPI executor, which is called mpicasa. This wrapper does several settings behind the scenes in order to properly configure the environment to run CASA in parallel.

The collection of CASA processes which will run the jobs from parallelized tasks, is set up via mpicasa. The simplest example is to run CASA in parallel on the localhost using the available cores in the machine. A typical example would be to run CASA on a desktop with 16 cores such as the following example:

path_to_casa/mpicasa -n 16 path_to_casa/casa <casa_options>


  1. mpicasa: Wrapper around mpirun, which can be found in the casa installation directory. Example: /home/user/casa-release-4.5.0-el6/bin

  2. -n : MPI option to get the number of processes to run.

  3. 16: The number of cores to be used in the localhost machine.

  4. casa: Full path to the CASA executable, casa.

  5. casa_options: CASA options such as: -c, –nogui, –log2term, etc.

NOTE: MPI uses one process as the MPI Client, which is where the user will see messages printed in the terminal or in the logger. The other processes are used for the parallel work and are called MPI Servers. Because of this, usually we give number_of_processes + 1.

NOTE: when several versions of CASA are available in the PATH, there is the risk that the executable mpicasa and other executables used by CASA, such as casaplotms or asdm2MS, would be picked from one of those different versions instead of the “path_to_casa/casa” version that we want to run. This is typically the case in data reduction clusters where either the default environment or user setup scripts set the PATH to point to the latest release of CASA, for example. In such cases, it is safer to make sure in advance that the first version found in the PATH is the right one, with a command like this (bash), as explained in the CASA distribution README:

export PATH=path_to_casa/bin:$PATH

It is also possible to use other nodes, which can form a “cluster”. Following the requirements given above, replace the “-n” option of mpicasa with a “-hostfile host_file”, as shown below:

mpicasa -hostfile <host_file> path_to_casa/casa <casa_options>


  1. host_file: It is a text file containing the name of the nodes forming the cluster and the number of cores to use in each one of the nodes.


orion slots=5
antares slots=4
sirius slots=4

The above configuration file will set up a cluster comprised of three nodes (orion, antares and sirius), deploying the cores per node as follows: At host “orion” up to 5 cores will be deployed (including the MPI Client). If the processing requires more cores, it will take them from “antares” and once all the 4 engines in “antares” are used, it will use up to 4 cores in “sirius”.

To run CASA in interactive mode (without the “-c” option) the user needs to first login to the desired computer node with X11 forwarding. This is achieved with the command ssh -XY <node>, where <node> is the hostname of the computer where he/she wants to run CASA. * *

mpicasa -n <number_of_processes> path_to_casa/casa

This will open an xterm window for the interactive work. To get help do:

mpicasa --help

Parallel Imaging

The parallelization of imaging is achieved through task tclean. The parallelization itself is tied closely to the major-minor cycles of the imager and follows a different approach of that used by other tasks. The parallelization inside tclean does not need the MS to be partitionted into a Multi-MS. It will work in the same way if the input is an MS or MMS. But in order to run tclean in parallel it is necessary to launch CASA with mpicasa, in the same way as for other tasks. One extra step necessary to run tclean in parallel is to set the parameter parallel=True. Details of the parallelization are described in the section mentioned above, as well as in the synthesis-imaging chapter on “Imager Parallelization”.

Parallel imaging on an MS file (rather than MMS file) in tclean is an official mode of operations in the ALMA pipeline since Cycle-6, and officially endorsed by CASA as per CASA 5.4. We recommend users interested in parallel processing to use this mode of operation. For large data products, the imaging step often dominates the overall runtime, as well as the advantages that can be achieved with parallelization (see CASA Memo 5). Processing Multi-MS files, either for imaging or calibration, remains at the discretion of the user.

The Multi-MS

Parallel processing using Multi-MS (MMS) in CASA is unverified. Please use at own discretion.

Please consider parallel imaging using normal MS as alternative.

Multi-MS Structure

A Multi-MS (MMS) is structured to have a reference MS on the top directory and a sub-directory called SUBMSS, which contains each partitioned Sub-MS. A Multi-MS can be handled like a normal “monolithic” MS. It can be moved and renamed like any other directory. CASA tasks that are not MMS-aware can process it like a monolithic MS.

All sub-tables of Sub-MSs are identical, except for the SOURCE and HISTORY sub-tables. The reference MS contains links to the sub-tables of the first Sub-MS, which is identified by a “0000” index on its name. All subsequent Sub-MSs also contain links to the sub-tables of the first Sub-MS, except for the SOURCE and HISTORY sub-tables. The following is a typical view of the reference MS directory of a Multi-MS. Symbolic links have an **@** at the end.

> ls

The following is a view of the Sub-MSs directory. The Sub-MS names have the MMS name followed by a 4-digit index.

> ls

Next example shows the second Sub-MS, which has symbolic links to all sub-tables except the SOURCE and HISTORY tables. These two tables need write-access in several cases when running in parallel.

> ls --l
ANTENNA -> ../
FEED -> ../
FIELD -> ../
FLAG_CMD -> ../
STATE -> ../
SYSCAL -> ../
WEATHER -> ../

Multi-MS Creation


The partition task is the main task to create a “Multi-MS”. It takes an input MeasurementSet and creates an output “Multi-MS” based on the data selection parameters.

The inputs to partition are:

CASA <1>: inp partition
--------> inp(partition)
#partition :: Task to produce Multi-MSs using parallelism
vis                 =         ''        #Name of input MeasurementSet
outputvis           =         ''        #Name of output MeasurementSet
createmms           =       True        #Should this create a multi-MS output
     separationaxis =     'auto'        #Axis to do parallelization across(scan, spw, baseline, auto)
     numsubms       =     'auto'        #The number of SubMSs to create (auto or any number)
     flagbackup     =       True        #Create a backup of the FLAG column in the MMS.

datacolumn          =      'all'        #Which data column(s) to process.
field               =         ''        #Select field using ID(s) or name(s).
spw                 =         ''        #Select spectral window/channels.
scan                =         ''        #Select data by scan numbers.
antenna             =         ''        #Select data based on antenna/baseline.
correlation         =         ''        #Correlation: '' ==> all, correlation='XX,YY'.
timerange           =         ''        #Select data by time range.
intent              =         ''        #Select data by scan intent.
array               =         ''        #Select (sub)array(s) by array ID number.
uvrange             =         ''        #Select data by baseline length.
observation         =         ''        #Select by observation ID(s).
feed                =         ''        #Multi-feed numbers: Not yet implemented.

The keyword createmms is by default set to True to create an output MMS. It contains three sub-parameters, separationaxis, numsubms and flagbackup. Partition accepts four axes to do separation across: ‘auto’, ‘scan’ ‘spw’ or ‘baseline’. The default separationaxis=’auto’ will first separate the MS in spws, then in scans. It tries to balance the spw and scan content in each Sub-MS also taking into account the available fields.

The baseline axis is mostly useful for Single-Dish data. This axis will partition the MS based on the available baselines. If the user wants only auto-correlations, she/he should use the antenna selection syntax such as antenna=’*&&&’ together with the baseline separation axis. Note that if numsubms=’auto’, the task will try to create as many Sub-MSs as the number of available parallel cores used when starting CASA with mpicasa. If the user wants to have one Sub-MS for each baseline, he/she should set the numsubms parameter to a number higher than the number of baselines to achieve this.

The user may force the number of “Sub-MSs” in the output MMS by setting the sub-parameter numsubms. The default ‘auto’ is to create as many Sub-MSs as the number of engines used when starting CASA with mpicasa, in an optimized way.

The flagbackup sub-parameter will create a backup of the FLAG column and save it to the .flagversions file.


Task partition has been embedded in task importasdm so that at import time the user can already create a MMS. Set the parameter createmms to True and the output of importasdm will be a MMS created with default parameters. Sub-parameters separationaxis and numsubms are also available in importasdm. From this point on in the data reduction chain, tasks that have been parallelized will run automatically in parallel when they see an MMS and tasks that are not parallelized will work in the same way as they normally do on a MS.

Parallel Calibration

Parallel processing using Multi-MS (MMS) in CASA is unverified - please use at own discretion.

Please consider parallel imagingusing normal MS as alternative.

Some of the calibration tasks are internally parallelized and will run in parallel if the input MS is a Multi-MS. Other tasks are not and will work normally in the presence of an input MMS. A typical calibration cascade will work normally in parallel when it sees an input MMS. In order to do that, the first step is to set createmms=True inside importasdm to create a Multi-MS. Once that is done, the calibration steps will distribute the processing in parallel if CASA is started with mpicasa, or in serial otherwise.

Contrary to the MS, the calibration tables created by calibration tasks are not partitioned. For instance, when gaincal is run on a Multi-MS, it will create the same output gaincal table as if the input was a normal MS.

The following calibration tasks are internally parallelised and will work on each Sub-MS in parallel.

  • flagdata

  • setjy

  • applycal

  • hanningsmooth

  • cvel2

  • uvcontsub

  • mstransform

  • split

Special considerations when running some tasks in parallel


When the input is a Multi-MS and CASA is started in parallel using mpicasa, uvcontsub will try to process each Sub-MS in parallel. Depending on the parameters of uvcontsub and the separation axis of the partitioned Multi-MS, processing the input in parallel is not possible. This will happen for example when the input MMS is separated using the default axis ‘auto’. The ‘auto’ axis will partition the MMS by the scan and spw axes, in a way to balance the content on each Sub-MS.

If uvcontsub is called with combine=’spw’, the task will expect to find all selected spws in each Sub-MS, as each parallel engine will process a Sub-MS independently of the others. In such cases, task uvcontsub will issue some warnings that the process cannot be continued in parallel. The task will internally handle such cases and will continue to process the input in serial, as if the Multi-MS was a normal monolithic MS.

The following steps can be informed in order to find out what is the partition axis of the MMS and what is the content of each Sub-MS. First, use task listpartition to obtain information on the MMS.

CASA <2>: listpartition('combspw.mms')
INFO listpartition::::@almahpc05:MPIClient
INFO listpartition::::@almahpc05:MPIClient+ ##########################################
INFO listpartition::::@almahpc05:MPIClient+ ##### Begin Task: listpartition #####
INFO listpartition::::@almahpc05:MPIClient listpartition(vis="",createdict=False,listfile="")
INFO listpartition::::@almahpc05:MPIClient This is a Multi-MS with separation axis = scan,spw
INFO listpartition::::@almahpc05:MPIClient Sub-MS              Scan Spw                 Nchan                     Nrows Size
INFO  1    [ 1 5 6 9 12 16]    [128 128 128 128 128 128] 252   4.9M
INFO listpartition::::@almahpc05:MPIClient                     2    [ 0 3 13 17 18 21]  [128 128 128 128 128 128] 378
INFO listpartition::::@almahpc05:MPIClient  1    [ 0 4 8 13 17 21]   [128 128 128 128 128 128] 252   4.5M
INFO listpartition::::@almahpc05:MPIClient                     2    [ 2 6 7 10 14 22]   [128 128 128 128 128 128] 378
INFO listpartition::::@almahpc05:MPIClient  1    [ 3 7 10 14 20 22]  [128 128 128 128 128 128] 252   4.5M
INFO listpartition::::@almahpc05:MPIClient                     2    [ 5 11 12 15 19 23] [128 128 128 128 128 128] 378
INFO listpartition::::@almahpc05:MPIClient  1    [ 2 11 15 18 19 23] [128 128 128 128 128 128] 252   4.5M
INFO listpartition::::@almahpc05:MPIClient                     2    [ 1 4 8 9 16 20]    [128 128 128 128 128 128] 378
INFO listpartition::::@almahpc05:MPIClient ##### End Task: listpartition #####
INFO listpartition::::@almahpc05:MPIClient+ ##########################################

In the above example, the MMS was partitioned using the default axis ‘auto’ (scan,spw). One can see the Sub-MSs do not contain all spws, therefore depending on the selection used in the task, it will not be possible to proceed in parallel. See the following example for the warnings given by the task in this case.

CASA <8>: uvcontsub(vis="combspw.mms",fitspw="1~10:5~122,15~22:5~122",excludechans=False,combine="spw",fitorder=0,spw="6~14",want_cont=False)
2018-02-06 15:45:09 INFO uvcontsub::::@almahpc05:MPIClient
2018-02-06 15:45:09 INFO uvcontsub::::@almahpc05:MPIClient+ ##########################################
2018-02-06 15:45:09 INFO uvcontsub::::@almahpc05:MPIClient+ ##### Begin Task: uvcontsub #####
2018-02-06 15:45:09 INFO uvcontsub::::@almahpc05:MPIClient uvcontsub(vis="combspw.mms",field="",fitspw="1~10:5~122,15~22:5~122",excludechans=False,combine="spw",
2018-02-06 15:45:09 INFO uvcontsub::::@almahpc05:MPIClient+ solint="int",fitorder=0,spw="6~14",want_cont=False)
2018-02-06 15:45:11 WARN uvcontsub::::@almahpc05:MPIClient Cannot run with combine='spw' in parallel because the Sub-MSs do not contain all the selected spws
2018-02-06 15:45:11 WARN uvcontsub::::@almahpc05:MPIClient The Multi-MS will be processed in serial and will create an output MS
2018-02-06 15:45:11 INFO uvcontsub::::@almahpc05:MPIClient split is being run internally, and the selected spws
2018-02-06 15:45:11 INFO uvcontsub::::@almahpc05:MPIClient will be renumbered to start from 0 in the output!
2018-02-06 15:45:11 INFO uvcontsub::::@almahpc05:MPIClient Preparing to add scratch columns.
2018-02-06 15:45:11 INFO uvcontsub::::@almahpc05:MPIClient splitting to /data/users/scastro/work/CAS-10697/combspw.mms.contsubId4wzP with spw="1~5,6~14,15~22"
2018-02-06 15:45:11 INFO SubMS::parseColumnNames() Using DATA column.

A few options are possible at this stage. User can let the process continue in serial, which depending on the size of the MS, can take long, and at the end the continuum subtracted output will be a normal MS. Depending on what the user wants to do next, there is the possibility to recreate the MMS using task partition. If user only wants to run tclean and create an image, having either MS or MMS will work in the same way because tclean can run in parallel regardless whether the input is MS or MMS.

If the users opts to recreate the MMS before running uvcontsub, best recommend axis to do combine=’spw’ is per scan. Partition will have to be called in the following way:

partition(vis='', outputvis='', createmms=True separationaxis='scan')

flagdata (with mode=’rflag’)

The Rflag action=’calculate’ can be used to produce the frequency and time thresholds in a first pass which can then be applied in a second pass, using action=’apply’ once or several times. When this is done with the Multi-MS structure the thresholds calculated in the first pass might differ from the thresholds that would be calculated using a single MS structure. This is due to the fact that in the Multi-MS structure the data are partitioned into Sub-MSs. The default is to produce a balanced partition with respect to the SPWs and scans, with the aim to get content from all SPWs and scans into each of the Sub-MSs. For this reason, the statistics calculated by RFlag may differ across Sub-MSs, as they would differ for different data selections. At the moment this issue has not been assessed thoroughly for real-world datasets. A related question that is not understood in detail at the moment, and that can affect both serial and parallel runs of RFlag, is how much the thresholds can differ between the single pass and dual pass modes of RFlag.

Examples parallelization

Parallel processing using Multi-MS (MMS) in CASA is unverified - please use at own discretion.

Please consider parallel imagingusing normal MS as alternative.

Examples of running CASA in parallel

The following is a list of typical examples on how to run CASA in parallel. Once CASA is started with mpicasa and the “Multi-MS” is created, there is basically no difference between running CASA in serial and in parallel. You can find an example of a parallelized analysis in the script located in a sub-directory of your CASA distribution. For example, if CASA is untarred in /home/user/casa-release-5.0.0-el6, the alma-m100 script can be found in /home/user/casa-release-5.0.0-el6/lib/python2.7/regressions/

Example 1. Run the above regression script in parallel, using 8 cores in parallel and 1 core as the MPI Client.

mpicasa -n 9 <path_to_casa>/casa --nogui --log2term -c

Example 2. Start CASA as described before for an interactive session, using 5 cores on the local machine.

mpicasa -n 5 <path_to_casa>/casa <casa-options>

An xterm will be open showing in the tile bar rank0. Rank 0 is where the MPIClient runs. The other 4 cores have been opened and are idle waiting for any activity to be sent to them.

Run importasdm to create a “Multi-MS” and save the online flags to a file. The output will be automatically named, which is an MMS partitioned across spw and scan. The online flags are saved in the file uid__A002_X888a_cmd.txt.

CASA <2>: importasdm('uid__A002_X888a', createmms=True, savecmds=True)

List the contents of the MMS using listobs. In order to see how the MMS is partitioned, use listpartition.

CASA <3>: listobs('', listfile='uid__A002_X888a.listobs')
CASA <4>: listpartition('')

Apply the online flags produced by importasdm, using flagdata in list mode. flagdata is parallelized therefore each engine will work on a separated “Sub-MS” to apply the flags from the uid__A002_X888a_cmd.txt file. You will see messages in the terminal (also saved in the casa-###.log file), containing the strings MPIServer-1, MPIServer-2, etc., for all the cores that process in parallel.

CASA <5>: flagdata('', mode='list' inpfile='uid__A002_X888a_cmd.txt')

Flag auto-correlations and the high Tsys antenna also using list mode for optimization.

CASA <6>: flagdata('', mode='list',

Create all calibration tables in the same way as for a normal MS. Task gaincal is not parallelized, therefore it will work on the MMS as if it was a normal MS.

CASA <7>: gaincal('', caltable='cal-delay_uid__A002_X888a.K',
                  field='*Phase*',spw='1,3,5,7', solint='inf',combine='scan',
                  refant=therefant, gaintable='cal-antpos_uid__A002_X888a',

Apply all the calibrations to the MMS. applycal will work in parallel on each “Sub-MS” using the available cores.

CASA <8>: applycal(vis='', field='0', spw='9,11,13,15',
                   gainfield=['0', '', ''], interp='linear,linear',
                   spwmap=[tsysmap,[],[]], calwt=True, flagbackup=False)

Split out science spectral windows. Task split is also parallelized, therefore it will recognize that the input is an MMS and will process it in parallel, creating also an output MMS.

CASA <9>: split(vis='', outputvis='',
                 datacolumn='corrected', spw='9,11,13,15', keepflags=True)

Run tclean normally to create your images.

Advanced: Interface Framework

The mpi4casa parallelization framework and advanced CASA parallel processing

The CASA parallelization framework, mpi4casa was developed as a layer on top of MPI using a client-server model. The Client is the master process, driving user interaction, and dispatching user commands to the servers. Servers are all the other processes, running in the background, waiting for commands sent from the client side.

One use-case of mpi4casa is to run CASA in parallel on a Multi-MS, as explained in previous chapters. There are other ways to process the data in parallel using mpi4casa without the need to create a Multi-MS. For instance, advanced users can benefit from the mpi4casa implementation to run multiple task commands in different cores or nodes.


Start CASA in parallel as explained in previous chapters, using mpicasa.

Import MPICommandClient from mpi4casa module

from mpi4casa.MPICommandClient import MPICommandClient

Create an instance of MPICommandClient

client = MPICommandClient()

Set logging policy


Initialize command handling services


Syntax to send a command request

ret = client.push_command_request(command,block,target_server,parameters)

command: String containing the Python/CASA command to be executed. The command parameters can be included within the command in itself also as strings.

block: Boolean to control whether command request is executed in blocking mode (True) or in non-blocking mode (False). Default is False (non-blocking).

target_server: List of integers corresponding to the server IDs to handle the command

target_server=None: The command will be executed by the first available server

target_server=2: The command will be executed by the server n #2 as soon as it is available

target_server=[0,1]: The command will be executed by the servers n #2 and #3

parameters (Optional): Alternatively the command parameters can be specified in a separated dictionary using their native types instead of strings.

ret (Return Variable):

In non-blocking mode: It will not block and will return an Integer (command ID) to retrieve the command response at a later stage.

In blocking mode: It will block until the list of dictionaries, containing the command response is received.

Syntax to receive a command result

ret = client.get_command_response(command_request_id_list,block)
  • command_request_id_list: List of Ids (integers) corresponding to the commands whose result is to be retrieved.

  • block: Boolean to control whether command request is executed in blocking mode (True) or in non-blocking mode (False).

  • ret (Return Variable): List of dictionaries, containing the response parameters. The dictionary elements are as follows:

  • successful (Boolean): indicates whether command execution was successful or failed

  • traceback (String): In case of failure contains the traceback of the exception thrown

  • ret: Contains the result of the command in case of successful execution

Example 1:

Run wvrgcal in 2 different MeasurementSets (for instance each one corresponding to an Execution Block):

#Example of full command including parameters
cmd1 = "wvrgcal(vis='',caltable='cal-wvr_X54',spw=[1,3,5,7])"
cmdId1 = client.push_command_request(cmd1,block=False)

#Example of command with separated parameter list
cmd2 = "wvrgcal()"
cmdId2 = client.push_command_request(cmd2,block=False,parameters=params2)

#Retrieve results
resultList = client.get_command_response([cmdId1, cmdId2],block=True)

Note: target_server is not specified because these are monolithic state-less commands, therefore any server can process them.

Example 2:Use the CASA ms tool to get the data from 2 EBs and apply a custom median filter:

#Open MSs

#Apply median filter
client.push_command_request("from scipy import signal",target_server=[1,2])

#Put filter data back in the MSs

#Close MSs

NOTE: target_server is specified as each command depends on the state generated by previous ones; block will block only on the last commands as all the others will be executed using a FIFO queue, meaning the commands will be received in the same order they were sent.

Link to first version of the CASA framework development document

Advanced: Multiprocessing and Multithreading

This section explains technical details aimed at advanced users and system administrators who are interested in knowing more about different forms of parallelization in CASA, customizing processes and threads in CASA sessions, and/or optimizing performance. Most users would not normally need to be aware of or modify these settings.

The parallelization approach described in this chapter, embarrassingly parallel, is based on multiprocessing. Certain areas of CASA also use a different form of parallelization, multithreading (via OpenMP). By default, casampi disables multithreading. This is to prevent competition for CPU cores between OpenMP threads and MPI processes, given that OpenMP would spawn as many threads as cores are available. In other words, if CASA is started with casampi -n 8 ..., 8 processes are spawned but no additional threads are spawned.

The mechanism used to disable multithreading is the OMP_NUM_THREADS environment variable. When CASA is run under mpicasa, the mpicasa wrapper sets the environment variable OMP_NUM_THREADS to 1 before starting CASA, unless it has previously been set by the user in which case it is not modified. This effectively disables OpenMP multithreading unless the user has set explicitly a number of threads in OMP_NUM_THREADS.

To use hybrid parallel approaches, with X processes and Y threads per process, the user can set OMP_NUM_THREADS to Y and then start CASA using mpicasa -n X.... Hybrid setups are not validated and not particularly recommended. For more details on this and other related environment variables that control OpenMP multithreading please refer to the OpenMP documentation. Imager Parallelization explains how multiprocessing and multithreading are used in imaging.