mstransform – Split the MS, combine/separate/regrid spws and do channel and time averaging – manipulation task

Description

The task mstransform can do the same functionalities available in cvel, partition, hanningsmooth and split without the need to read and write the output to disk multiple times. The main features of this task are:

  • take an input MS or Multi-MS (MMS)

  • ability to create an output MS or MMS

  • spw combination and separation

  • channel averaging taking flags and weights into account

  • time averaging taking flags and weights into account

  • reference frame transformation

  • Hanning smoothing

All these transformations will be applied on the fly without any writing to disk to optimize I/O. The user can ask to create a Multi-MS in parallel using CASA’s cluster infrastructure using the parameter createmms. See MPIInterface for more information on the cluster infrastructure.

This task is implemented in a modular way to preserve the functionalities available in the replaced tasks. One can choose which functionality to apply or apply all of them by setting the corresponding parameters to True. Note that there is an order in which the transformations are applied to the data that makes logical sense on the point of view of the data analysis.

This task can create a multi-MS as the output. General selection parameters are included, and one or all of the various data columns (DATA, LAG_DATA and/or FLOAT_DATA, and possibly MODEL_DATA and/or CORRECTED_DATA) can be selected. It can also be used to create a normal MS, split-based on the given data selection parameters.

The mstransform task creates a Multi-MS in parallel, using the CASA MPI framework. The user should start CASA as follows in order to run it in parallel.

  1. Start CASA on a single node with 8 engines. The first engine will be used as the MPIClient, where the user will see the CASA prompt. All other engines will be used as MPIServers and will process the data in parallel.

    mpicasa -n 8 casa –nogui –log2term mstransform(…..)

  2. Running on a group of nodes in a cluster.

    mpicasa -hostfile user_hostfile casa …. mstransform(…..)

    where user_hostfile contains the names of the nodes and the number of engines to use in each one of them. Example:

    pc001234a, slots=5 pc001234b, slots=4

If CASA is started without mpicasa, it is still possible to create an MMS, but the processing will be done in sequential.

The resulting WEIGHT_SPECTRUM produced by mstransform is in the statistical sense correct for the simple cases of channel average and time average, but not for the general re-gridding case, in which the error propagation formulas applicable for WEIGHT_SPECTRUM are yet to be defined. Currently, as in cvel and in the imager, WEIGHT_SPECTRUM is transformed in the same way as the other data columns. Notice that this is not formally correct from the statistical point of view, but is a good approximation at this stage.

NOTE: the input/output in mstransform have a one-to-one relation.

input MS – output MS input MMS – output MMS

unless the user sets the parameter createmms to True to create the following:

input MS – output MMS

Parameters

Title

Parameter

Default

Description

vis

''

outputvis

''

createmms

False

separationaxis

'auto'

numsubms

'auto'

tileshape

numpy.array( [  ] )

field

''

spw

''

scan

''

antenna

''

correlation

''

timerange

''

intent

''

array

''

uvrange

''

observation

''

feed

''

datacolumn

'corrected'

realmodelcol

False

keepflags

True

usewtspectrum

False

combinespws

False

chanaverage

False

chanbin

int(1)

hanning

False

regridms

False

mode

'channel'

nchan

int(-1)

start

int(0)

width

int(1)

nspw

int(1)

interpolation

'linear'

phasecenter

''

restfreq

''

outframe

''

veltype

'radio'

preaverage

False

timeaverage

False

timebin

'0s'

timespan

''

maxuvwdistance

float(0.0)

docallib

False

callib

''

douvcontsub

False

fitspw

''

fitorder

int(0)

want_cont

False

denoising_lib

True

nthreads

int(1)

niter

int(1)

disableparallel

False

ddistart

int(-1)

taql

''

monolithic_processing

False

reindex

True

Parameter Explanations

vis

''

Name of input Measurement set or Multi-MS.

outputvis

''

Name of output Measurement Set or Multi-MS.

createmms

False

Create a multi-MS output from an input MS.

separationaxis

'auto'

Axis to do parallelization across(scan,spw,auto,baseline).

numsubms

'auto'

The number of Sub-MSs to create (auto or any number)

tileshape

numpy.array( [  ] )

List with 1 or 3 elements giving the tile shape of the disk data columns.

field

''

Select field using ID(s) or name(s).

spw

''

Select spectral window/channels.

scan

''

Select data by scan numbers.

antenna

''

Select data based on antenna/baseline.

correlation

''

Correlation: ‘’ ==> all, correlation=”XX,YY”.

timerange

''

Select data by time range.

intent

''

Select data by scan intent.

array

''

Select (sub)array(s) by array ID number.

uvrange

''

Select data by baseline length.

observation

''

Select by observation ID(s).

feed

''

Multi-feed numbers: Not yet implemented.

datacolumn

'corrected'

Which data column(s) to process.

realmodelcol

False

Make real a virtual MODEL column.

keepflags

True

Keep completely flagged rows or drop them from the output.

usewtspectrum

False

Create a WEIGHT_SPECTRUM column in the output MS.

combinespws

False

Combine the input spws into a new output spw. Only supported when the number of channels is the same for all the spws.

chanaverage

False

Average data in channels.

chanbin

int(1)

Width (bin) of input channels to average to form an output channel.

hanning

False

Hanning smooth data to remove Gibbs ringing.

regridms

False

Transform channel labels and visibilities to a different spectral reference frame. Notice that u,v,w data is not transformed.

mode

'channel'

Regridding mode (channel/velocity/frequency/channel_b).

nchan

int(-1)

Number of channels in the output spw (-1=all).

start

int(0)

First channel to use in the output spw (mode-dependant)

width

int(1)

Number of input channels that are used to create an output channel.

nspw

int(1)

Number of output spws to create in output MS.

interpolation

'linear'

Spectral interpolation method.

phasecenter

''

Phase center direction to be used for the spectral coordinate transformation: position or field index

restfreq

''

Rest frequency to use for output.

outframe

''

Output reference frame (‘’=keep input frame).

veltype

'radio'

Velocity definition.

preaverage

False

Pre-average channels before regridding when the ratio.

timeaverage

False

Average data in time.

timebin

'0s'

Bin width for time averaging.

timespan

''

Span the timebin across scan, state or both.

maxuvwdistance

float(0.0)

Maximum separation of start-to-end baselines that can be included in an average. (meters)

docallib

False

Enable on-the-fly (OTF) calibration as in task applycal

callib

''

Path to calibration library file

douvcontsub

False

Enable continuum subtraction as in task uvcontsub

fitspw

''

Spectral window:channel selection for fitting the continuum

fitorder

int(0)

Polynomial order for the fits

want_cont

False

Produce continuum estimate instead of continuum subtracted data

denoising_lib

True

Use new denoising library (based on GSL) instead of casacore fitting routines

nthreads

int(1)

Number of OMP threads to use (currently maximum limited by number of polarizations)

niter

int(1)

Number of iterations for re-weighted linear fit

disableparallel

False

Hidden parameter for internal use only. Do not change it!

ddistart

int(-1)

Hidden parameter for internal use only. Do not change it!

taql

''

Table query for nested selections

monolithic_processing

False

Hidden parameter for internal use only. Do not change it!

reindex

True

Hidden parameter for use in the pipeline context only