Common Astronomy Software Applications¶
CASA, the Common Astronomy Software Applications, is the primary data processing software for the Atacama Large Millimeter/submillimeter Array (ALMA) and Karl G. Jansky Very Large Array (VLA), and is often used also for other radio telescopes.
6.5.2 Release
CASA 6.5.2 can now be downloaded for general use. CASA 6.5.2 is available either as a downloadable tar-file, or through pip-wheel installation, which gives flexibility to integrate CASA into a customized Python environment.
Highlights:
deconvolve: new task for image-domain deconvolution.
uvcontsub: new implementation, old uvcontsub task deprecated.
fringefit: support added for ‘uvrange’ parameter.
tclean: new iteration control parameter ‘nmajor’.
sdimaging: new parameter ‘enablecache’ for improved performance.
mstransform: parameter ‘douvcontsub’ deprecated.
flagdata: mode=’shadow’ now uses the uvw values from the UVW column.
tclean/tsdimaging: improved runtime performance of ephemeris imaging.
simulator tool: new parameter ‘simint’ in sm.settrop() to control time granularity, down to 0.1s.
ImageAnalysis tool: new string ‘mbret’ parameter added to ‘image.restoringbeam()’.
casalog tool: new method ‘getOrigin()’ implemented to retrieve origin of messages.
For more details on these and other new features, see the CASA 6.5.2 Release Notes.
CASA is developed by an international consortium of scientists based at the National Radio Astronomical Observatory (NRAO), the European Southern Observatory (ESO), the National Astronomical Observatory of Japan (NAOJ), the Academia Sinica Institute of Astronomy and Astrophysics (ASIAA), CSIRO Astronomy and Space Science (CSIRO/CASS), and the Netherlands Institute for Radio Astronomy (ASTRON), under the guidance of NRAO.
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/introduction.ipynb
Release Information¶
These are the release notes for CASA 6.5. Changes compared to the CASA 6.4 release are listed below.
Highlights¶
deconvolve: a new task deconvolve was added for stand-alone access to the image-domain deconvolver algorithms in tclean. (CASA 6.5.2)
uvcontsub: a new task uvcontsub was implemented, while the old uvcontsub task moved to uvcontsub_old and was deprecated. Task uvcontsub3 was removed. (CASA 6.5.2)
fringefit: now supports the uvrange parameter. (CASA 6.5.2)
tclean: a new iteration control parameter ‘nmajor’ was added. (CASA 6.5.2)
sdimaging: a new parameter enablecache allows improved performance by caching spectra coordinates. (CASA 6.5.2)
mstransform: the douvcontsub parameter was deprecated. (CASA 6.5.2)
flagdata: where possible, mode=’shadow’ now uses the uvw values from the UVW column to calculate uv distances. (CASA 6.5.2)
CASA testing: Information on automated testing and performance testing of CASA was made available in CASA Docs. (CASA 6.5.2)
tclean/tsdimaging: Runtime performance of ephemeris imaging was improved. (CASA 6.5.2)
simulator tool: new parameter simint controls the time granularity, which has can now be as short as 0.1s in sm.settrop(). (CASA 6.5.2)
ImageAnalysis tool: a new string mbret parameter was added to image.restoringbeam(). (CASA 6.5.2)
casalog tool: a new method, getOrigin() has been implemented to retrieve the origin of messages. (CASA 6.5.2)
tclean: a bug was fixed that prevented an outer UV taper to work in combination with weighting=’natural’. (CASA 6.5.1)
table tool: a new row() method and tablerow class were added to facilitate reading and writing of table rows. (CASA 6.5.1)
quanta tool: a new parameter keepshape was added to qa.quantity and qa.unit to preserve the shape of N-dimensional arrays. (CASA 6.5.1)
casashell and setup.py: now print out additional information about exceptions during startup, to aid developers on local builds. (CASA 6.5.1)
sdbaseline: several improvements were implemented, including a bug fix of blmode=‘fit’ and more accurate output of information. (CASA 6.5.1)
Casacore submodule reference was updated to the latest version. (CASA 6.5.1)
imbaseline: a new task imbaseline was added for image-based baseline subtraction for single-dish data. (CASA 6.5.0)
corrbit(): a new method corrbit() was implemented in the msmetadata tool for returning the value in SPECTRAL_WINDOW:: SDM_CORR_BIT column (CASA 6.5.0)
setjy: the parameter modimage was removed and parameter model should be used instead. (CASA 6.5.0)
plotms: support was added for additional axes of calibration tables. (CASA 6.5.0)
tclean: the return dictionary now includes additional information about the minor cycles. (CASA 6.5.0)
Mac OS 12: a bug was fixed which prevented the Mac OS 11 / Python 3.6 package to open on Mac OS 12. (CASA 6.5.0)
Release Notes¶
Installation and operation
OS Support: CASA is expected to be compatible with a range of Operating Systems for RedHat, Ubuntu and Mac OS. For information, please see CASA’s compatibility matrix (CASA 6.5.0)
Casacore submodule reference was updated to the latest version (0d871e1fca1f96abd8bf1326fb45c253192d01c2). The update includes the change to prohibit implicit conversion from std::vector to casacore::Vector.
casashell now prints out a summary of any python exceptions that happen during startup (previously it would silently exit). Users of the monolithic CASA should not notice any changes. This change is primarily useful to developers building the components locally. (CASA 6.5.1)
setup.py build scripts now print out information about exceptions that happen during that process (previously they were largely silent). This change is only relevant to developers. (CASA 6.5.1)
Import/export and MS structure
msmetadata tool: a new method corrbit() was implemented to return the value in SPECTRAL_WINDOW:: SDM_CORR_BIT column for the specified spw(s), or a list representing these values for all spectral windows if spw < 0. (CASA 6.5.0)
table tool: a new row() method and tablerow class were added to facilitate reading and writing of table rows. (CASA 6.5.1)
table tool: Implemented more thorough input checking and better error handling in the getcellslice() method. (CASA 6.5.1)
quanta tool: a new parameter keepshape was added to qa.quantity and qa.unit to preserve the shape of N-dimensional arrays. These N-dimensional quantities are compatible with qa.convert but not with other quanta tool methods. (CASA 6.5.1)
Information
casalog tool: A new method, getOrigin() has been implemented to retrieve the origin of messages to be displayed. (CASA 6.5.2)
imhead: Updated imhead exception message in the case of mode=’get’ to alert the user that the expected values for hdkey in this case are often different from the keys in the dictionary returned by mode=’summary’. (CASA 6.5.2)
Flagging
flagdata: The mode=’shadow’ now uses the uvw values found in the UVW column of the MS to calculate the uv distances of baselines, for all baselines for which such values are available. For baselines not present in the MS, shadow flags are derived by calculating UVW values from antenna positions. (CASA 6.5.2)
Calibration
fringefit: now supports the uvrange parameter provided through the selectdata facility. The documentation for this parameter can be found in the Synthesis Calibration section of the manual. Users inexperienced with this parameter are warned that the uvrange selection is made before calibration; which data are calibrated and which are flagged may not match expectations unless the consequences of the selection are carefully thought through. (CASA 6.5.2)
setjy: the deprecated parameter ‘modimage’ has been removed from the setjy task; the parameter ‘model’ should be used instead. (CASA 6.5.0)
Manipulation
uvcontsub: A new task uvcontsub was added for continuum subtraction in the uv-domain. The old task uvcontsub has been deprecated and renamed as uvcontsub_old. Task uvcontsub3 has been removed. Future development of the new uvcontsub task includes combining spectral windows through combine=’spw’, which is not yet working as in the uvcontsub_old task. (CASA 6.5.2)
mstransform: the option douvcontsub of mstransform has been deprecated. (CASA 6.5.2)
Imaging
tclean: Added a new iteration control parameter ‘nmajor’ to directly limit the number of minor-major cycle sets. (CASA 6.5.2)
tclean/tsdimaging: Runtime performance of ephemeris imaging with tclean and tsdimaging was improved. (CASA 6.5.2)
deconvolve: A new task ‘deconvolve’ has been added to provide stand-alone access to the image-domain deconvolver algorithms available within tclean. Options supported in this initial version are ‘hogbom’, ‘clark’, ‘clarkstokes’, ‘multiscale’, with support for single-plane images and spectral cubes. The ‘mtmfs’ and ‘asp’ algorithms will be enabled in a later release. (CASA 6.5.2)
sdimaging: Performances of task sdimaging have been improved, by caching spectra coordinates computed while creating the normal image and re-using - instead of re-computing - them when creating the weight image. A new parameter enablecache has been added to the task, making it possible to turn this new feature on or off. The performances improvement is most noticeable for fast-scan datasets with a few channels. For typical fast-scan solar datasets, re-using cached spectra coordinates is roughly 20 times faster than re-computing them, resulting in a ~25-30% speed-up for the whole sdimaging task. (CASA 6.5.2)
tclean: the return dictionary now includes additional information about the minor cycles. (CASA 6.5.0)
Analysis
ImageAnalysis tool: (1) A new string mbret parameter was added to image.restoringbeam(). mbret=”list” is the default, and produces backward compatible behavior. mbret=”matrix” indicates that the return dictionary should be structured such that ‘bmaj’, ‘bmin’, and ‘bpa’ have numpy arrays as values, making it simple to utilize these values as numpy arrays rather than having to write python code to construct such structures. (2) Completed docs for image.beamarea() were also added. No interface changes. (CASA 6.5.2)
imbaseline: a new task imbaseline was added to do image-based baseline subtraction for single-dish data. Task imbaseline does, if necessary, smoothing of the spatial/spectral plane of an input image cube to improve S/N, then subtracts spectral baselines from it. (CASA 6.5.0)
Visualization
plotms:
Support was added for for additional calibration table axes (where applicable): azimuth, elevation, hourang, parang, u, v, w, uwave, vwave, wwave, uvwave, and uvdist. (CASA 6.5.0)
Tick labels will now switch to scientific notation when displaying small values near zero or very large values. (CASA 6.5.1)
The Help -> About window of the GUI now shows more complete version information. (CASA 6.5.1)
Single Dish
Performed refactoring of single dish Python code to be compliant with PEP-8 coding style. (CASA 6.5.0)
Simulations
simulator tool: The Simulator code underlying sm.settrop() was improved by enabling the tool to support visibility integration times as short as 0.1 s. The new parameter simint (seconds) controls the time granularity of the simulation. Default value for simint is -1 which uses a granularity of 10 s (the same as in previous CASA versions). (CASA 6.5.2)
Testing:
Automated testing: CASA Docs now includes an overview of the automated testing of the CASA code that is performed for various operating systems.
Performance testing: Performance benchmarks for CASA tasks have been made available to the community. Detailed information can be found on this new Performance page in CASA Docs.
Bug fixes
plotms: Fixed an int overflow bug that sometimes prevented interactive flagging/locating from working correctly. (CASA 6.5.2)
tclean: a data selection issue was fixed that could lead to NaNs in some image planes. (CASA 6.5.2)
calanalysis module: Warnings appeared in the pipeline when a baseline was flagged completely. This was switched from a WARNING in the logs to an INFO post. (CASA 6.5.2)
plotbandpass: Messages that earlier appeared only in the console, now appear in the logs of casa and the pipeline. (CASA 6.5.2)
plotbandpass: The getWeather function of plotbandpass now uses np.median when combining values across multiple weather stations to prevent potential issues with faulty values. (CASA 6.5.2)
plotms: A bug was fixed in plotms that caused a crash when combining antenna iteration, averaging and negation. (CASA 6.5.2)
CASA Docs: Fixed a bug in casadocs infrastructure, where for task and tool parameters with multiple possible types, only the first type was being shown in the parameters list. (CASA 6.5.2)
tclean: a bug was fixed that prevented an outer UV taper to work in combination with weighting=’natural’. (CASA 6.5.1)
tclean: the warning for non-zero edge pixels in the PB image will now only be shown for gridders ‘mosaic’ and ‘awproject’. (CASA 6.5.1)
sdatmcor: A bug that overrode OpenMP configuration was fixed. (CASA 6.5.1)
ms.getdata tool: a bug was fixed where averaging multiple columns could yield different results than averaging a single column. (CASA 6.5.1)
sdbaseline: blmode=‘fit’ will now properly account for mask information and calculate the weight of the baseline-subtracted spectral data. In addition, the description for the ‘maskmode’ parameter was updated. (CASA 6.5.1)
sdbaseline: fixed incorrect parameter names output in ascii-text format in case of per-spectrum baseline fitting. (CASA 6.5.1)
sdbaseline: fixed the output in ascii-text and csv formats in case of per-spectrum baseline fitting, so that unnecessary info (non-existent parameter values) is not printed. (CASA 6.5.1)
sdimaging: fixed a caching issue that could lead to slightly different images from the same data selection. (CASA 6.5.1)
immoments: a rare bug was fixed where writing history to the output files could cause a crash. (CASA 6.5.1)
plotms: fixed a crash when turning calibration on without giving a cal library string. (CASA 6.5.1)
sm.setnoise tool: correction to the inline docs of the sm.setnoise tool method was made. (CASA 6.5.1)
Mac OS 12: a bug was fixed which prevented the Mac OS 11 / Python 3.6 version of CASA 6.4.4 to open on Mac OS 12. (CASA 6.5.0)
Known Issues¶
Summary Most Important Issues
The Adaptive Scale Pixel (asp) deconvolution algorithm in tclean is experimental, and we welcome user feedback.
The task clean is no longer being actively maintained; instead, tclean is now the recommended task for imaging.
CASA 6 startup may fail on some Mac OS where users have set up a file system that is case-sensitive.
There are generic problems putting multiple MSs into tclean that have mismatches in their shape.
Wideband and widefield imaging in tclean are only partially validated - please use at own risk and read wideband and widefield documentation.
In tclean, uvtaper does not work with natural weighting. (fixed in CASA 6.5.1)
When imaging large mosaics with mosweight in tclean, an error “too many open files” may occur that may require to increase the limit for open files.
stawt may fail when the correlator integration time changes within an MS and statwt is run with timebin set to an integer value.
CASA is not using LD_LIBRARY_PATH anymore but CASALD_LIBRARY_PATH to avoid confusion.
cvel is calculating the velocity incorrectly for ephemeris objects. We recommend to use mstransform or its offspring cvel2, although the latter should be used with care as it is not fully commissioned yet.
fixvis uses the small angle approximation and may be incorrect for large phase shifts. Use the new task phaseshift instead, or use tclean for phase center shifts during imaging when applicable.
With parallel calibration on MMS files, fixvis does not write out the the new MMS specified in outputvis correctly, hence fixvis solutions are not applied when writing to a new MMS.
In fringefit, calibration tables created with CASA 5.5 and before cannot be used with CASA 5.6 and later.
In tclean, defining image cubes in optical velocity in some cases is known not to work.
In tclean, using the mosaic gridder with the default nchan=-1 is in some cases known to produce errors.
Ionospheric TEC corrections are currently validated in CASA only for VLA data.
ephemeris objects are not correctly supported by virtual model columns.
In tclean, the combination of specmode=’cube’ and gridder=”awproject” has not been commissioned for use and may result in errors.
sdimaging will crash or create incorrect images if there exist some spectra taken at a time t that fall outside all pointing intervals of a specific antenna.
General¶
Several issues have been encountered for CASA 6:
inp/go syntax does not reset hidden parameters to default between consecutive calls. The most notable issue occurs with the selectdata parameters; once set at selectdata=True, any associated subparameters that are changed from default (e.g., spw, or field) will remain even after re-setting to selectdata=False. Generally this should not matter as hidden parameters are not used. However, known exceptions exist in tclean, sdintimaging, apparentsens, setjy, listvis, sdcal, and sdfit where the value of hidden selection parameters does in fact matter. We are currently investigating the extent of this problem. As a workaround, users can call default to manually reset all their parameters.
inp/go does not work for the following tasks: msuvbin, browsetable, imview, msview, deconvolve, testconcat. Please invoke the arguments directly when running these tasks in CASA 6.
Installation¶
CASA 6 startup may fail on some Mac OS where users have set up a file system that is case-sensitive (as shown here). As a temporary work-around, please manually update the casapy script in /Applications/CASA.app/Contents/MacOS/casapy, by replacing the string “Macos” with “MacOS”, which occurs in lines 36 and 39 of the casaspy script.
For Mac OS, the default behavior when downloading multiple versions of CASA is to call it “CASA X.app” (i.e., including a space). However, CASA is unable to find the viewer when a space exists in the application name. The workaround is to rename the application excluding the space.
If you use a version of RHEL6 with a kernel version that is older than 2.6 you may encounter an error like:
E0324 17:24:18.576966686 27444 tcp_server_posix.cc:65] check for SO_REUSEPORT:
{"created":"@1585038258.576951288","description":"OS Error","errno":92,"file":
"src/core/lib/iomg/socket_utils_common_posix.cc","file_line":168,"os_error":"Protocol
not available","syscall":"setsockopt(SO_REUSEPORT)"}
NFS mounted disks
It is not recommended that you run CASA (e.g. have your data) on disks that are NFS mounted. It can be done, but in some cases the files will be NFS locked and this can crash CASA or its tasks. In this case, you have to restart CASA.
If you receive messages like xvfb timeout you may try to clean out your /tmp folder, then restart CASA.
Python:
Environment variables set for personal use may be incompatible with CASA 6, given that the CASA comes with a Python version that may be different from the one installed for regular use. It is still unclear which specific errors can occur, but one workaround solution for these types of errors is to unset PYTHONSTARTUP before starting casa. We are looking into possible solutions.
Files in the current directory with the same name as ipython files will cause errors like this error that occurs when a new.py file exists in the current directory: ```python
AttributeError Traceback (most recent call last)
/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in enable_matplotlib(self, gui)
2945 gui, backend = pt.find_gui_and_backend(self.pylab_gui_select)
2946
-> 2947 pt.activate_matplotlib(backend)
2948 pt.configure_inline_support(self, backend)
2949
/lib/python2.7/site-packages/IPython/core/pylabtools.pyc in activate_matplotlib(backend)
292 matplotlib.rcParams['backend'] = backend
293
--> 294 import matplotlib.pyplot
295 matplotlib.pyplot.switch_backend(backend)
296
/lib/python2.7/site-packages/matplotlib/pyplot.py in <module>()
21 from matplotlib.cbook import dedent, silent_list, is_string_like, is_numlike
22 from matplotlib import docstring
---> 23 from matplotlib.figure import Figure, figaspect
24 from matplotlib.backend_bases import FigureCanvasBase
25 from matplotlib.image import imread as _imread
/lib/python2.7/site-packages/matplotlib/figure.py in <module>()
16 import artist
17 from artist import Artist, allow_rasterization
---> 18 from axes import Axes, SubplotBase, subplot_class_factory
19 from cbook import flatten, allequal, Stack, iterable, is_string_like
20 import _image
/lib/python2.7/site-packages/matplotlib/axes.py in <module>()
8452
8453 #This is provided for backward compatibility
-> 8454 Subplot = subplot_class_factory()
8455
8456 docstring.interpd.update(Axes=martist.kwdoc(Axes))
Scripting¶
Starting CASA 6: For “execfile” calls within a script which itself is run via “execfile”, it is necessary to add globals() as the second argument to those “execfile” calls in order for the nested script to know about the global variables of the calling script. For example, within a script ‘mainscript.py’, calls to another script ‘myscript.py’ should be written as follows: execfile(‘myscript.py’, globals()).
There are cases where two scripts call on each other (i.e., where script “a” uses script “b” and vice versa). In casa 6, the only way to execute these scripts is by running the first script twice:
execfile("a.py",globals())
execfile("b.py",globals())
execfile("a.py",globals())
statwt¶
In some circumstances when an MS data selection is specified, chanbin is not equal to the default value of spw, and the WEIGHT_SPECTRUM or SIGMA_SPECTRUM columns don’t exist, the statwt task may need to be run twice in order to complete successfully due to a known issue with initializing the WEIGHT_SPECTRUM and/or SIGMA_SPECTRUM columns in the code. In these circumstances, an exception will be raised with instructions to restart the task. If you are using the tool method, first close the ms tool, then reopen it using the same data set, apply the same selection, and then run ms.statwt().
mstransform¶
SPW combination (combinespws=True) requires that all the SPWs selected have the same number of channels.
Some inconsistencies are present in CASA in the SIGMA/WEIGHT columns (and their _SPECTRUM variants) when splitting on datacolumn=’data’, such as:
For an MS with WEIGHT_SPECTRUM but no SIGMA_SPECTRUM (as obtained from initweights), SIGMA_SPECTRUM is created and initialized to SIGMA. While split/mstransform correctly initializes the output WEIGHT to 1/SIGMA^2, it does not initialize the output WEIGHT_SPECTRUM to 1/SIGMA_SPECTRUM^2 (instead it copies the original WEIGHT_SPECTRUM).
For an MS with both WEIGHT_SPECTRUM and SIGMA_SPECTRUM, the output WEIGHT_SPECTRUM is again a copy of the input instead of being initialized to 1/SIGMA_SPECTRUM^2.
Future work in CASA is planned to address such inconsistencies.
cvel¶
cvel is calculating the velocity incorrectly for ephemeris objects. We recommend to use mstransform or its offspring cvel2, although the latter should be used with care as it is not fully commissioned yet.
cvel fails on MMS files used for parallel processing. We recommend to use mstransform or its offspring cvel2, although the latter should be used with care as it is not fully commissioned yet.
bandpass¶
Currently, bandpass will not find good solutions if any correlation (including cross-correlation) in the data is completely flagged. As an interim solution one may split the unflagged data in a separate file and then perform bandpass
polcal¶
Polarization position angle calibration poltype=’X’ or ‘Xf’ will be sensitive to any unmodelled position shift in the specified calibrator linear polarization model relative to the centroid location of total intensity (typically the phase center). Excess phase due to the position shift will introduce a bias in the cross-hand phase calibration (which is the same as position angle calibration in the circular feed basis). For this reason, it is best to use truly point-like (in all polarizations) calibrators, if possible, or accurate resolved models.
setjy¶
Sometimes setjy does not properly overwrite a current model in the header of the ms (virtual scratch column). It is recommended to use delmod if a model exists and shall be overwritten.
virtual model columns in the MS do not correctly support ephemeris objects, although they will run without generating errors or warnings. If any of your calibrators exhibit significant celestial motion on the timescale of your observation (e.g. , any solar system object), you must set ‘usescratch=True’ in calls to setjy().
uvcontsub¶
fitorder should be kept low (<= 1) unless the line band is relatively narrow compared to the fit bands. If the image rms markedly rises in the middle line channels after uvcontsub, fitorder should probably be lowered.
fitorder > 0 does not work with solint > ‘int’
cal library¶
The CASA cal library (docallib=True in applycal, gaincal, bandpass, etc.) may exhibit problems when calibration is unavailable for a subset of MS spectral windows. Use of spwmap to (transparently, harmless) supply nominal calibration for these spectral windows may help avoid this problem. For antenna position corrections, try spwmap=[0] to relieve a variety of this problem.
VLA Switched Power¶
In CASA v4.2.2 and higher, the weight calibration for EVLA switched power/Tsys corrections is still being investigated. Visibility corrections are ok. Since switched power calibration is not used by the EVLA pipeline (except for requantizer gain corrections, for which this problem is irrelevant), and since calwt=F remains the general recommendation, users should rely on statwt to generate appropriate data weights.
fringefit¶
For task fringefit, calibration tables created with CASA 5.5 and before cannot be applied with CASA 5.6 and later. Attempting to do so will fail with an error about non-confirming array sizes.
fixvis¶
fixvis uses the small angle approximation and may be incorrect for large phase shifts. This may result in sources shifting position if large phase shifts are being applied (shifts up to a few beam sizes have been reported). Please use the new task phaseshift instead, or use tclean for phase center shifts during imaging when applicable.
With parallel calibration on multi-MS (MMS) files, fixvis does not write out the outputvis correctly, hence fixvis solutions are not applied when writing to a new MMS. The recommended work-around solution is to over-write the input MMS by leaving the outputvis parameter empty. This will change the input MMS, so if you are concerned about that, we recommend to make a copy before running fixvis in parallel mode. Writing output MS files in serial mode is not affected by this bug.
tec_map¶
Ionospheric TEC corrections in CASA are currently validated only for the VLA. TEC corrections for other observatories are experimental and should be done at your own discretion.
Do not use CASA 6.1.0 for tec_map corrections.
fixplanets¶
To supply JPL-Horizons ephemeris data, a query function, which construct an email query request, can be used but it is only available to CASA 5.8. Constructing a query request manually via email as described in the fixplanets task section should still work for CASA 6.3 as long as the data file is saved in MIME format. An alternative function to query for ephemeris data via JPL-Horizons web interface and CASA readable format directly is planned for the future releases.
tclean¶
For the experimental ‘Adaptive Scale Pixel (asp)’ deconvolution algorithm in tclean:
This algorithm implementation is experimental and has had limited testing. We are currently investigating some intermittent failures on some flavors of Mac OSX. We have follow-up development and test efforts, and encourage users to provide feedback on what they find.
3rd party code used for the core optimization (lbfgs) is known to produce different numerical results depending on the compiler/build options. We have tested this on all CASA’s supported platforms for the use cases included in our characterization demo and for these tests the results are scientifically equivalent.
Generic problems putting multiple MSs into tclean that have mismatches in their shape: Recently, generic problems have been found with putting multiple MSs into tclean when there are mismatches in shape across the data set. For example, certain data columns may cause a segment fault if they are present in only some of the input data sets. And for mosaics, please specify the phasecenter explicitly, otherwise tclean will select the first pointing from the first MS. Other mismatches in shape across multiple input MSs may cause similar problem in tclean. The CASA team is in the process of coherently addressing these issues for CASA 5.8/6.2. Please contact the Helpdesk if you experience related issues that you cannot otherwise solve.
The gridder=’awproject’ has not been fully commisisoned for use with specmode=’cube’ in tclean. The following message appears: The gridder=’awproject’ has not been fully tested for ‘cube’ imaging (parallel=True or False). Formal commissioning of this mode is expected in a subsequent release, where ‘awproject’ will be aligned with recent framework changes. Until then, please report errors/crashes if seen.
In tclean, if gridder=’awproject’ is run with psterm=True, the output Primary Beam currently still includes the Prolate Spheroidal function. In order to do a primary beam correction, a separate PB needs to be made with psterm=False. See the CASA pages on AWproject for more information.
For widefield imaging in tclean, the following features still need to implemented and commissioned (for usepointing=True, with full heterogenous pointing support):
gridder=’mosaic’ : Enable accurate pointing corrections for baselines with antennas pointing in different directions
In tclean, the gridders ‘mosaic’ and ‘awproject’ include aperture illumination functions in the gridding convolution functions used for the PSF generation. Although this strictly follows the math, it has several undesirable features especially in the situation where data are not uniform across all axes across which they are being combined (i.e. if the mosaic pattern is not relatively flat, if the center of the image has no mosaic pointing, if different pointings have drastically different uv-coverages or frequency coverages). All such variations cause the PSFs to be position-dependent and could relate to potential instabilities during deconvolution, either requiring many major cycles to converge or diverging. For spectral-cube imaging, the effects are lower because PSFs are normalized to peak 1 no matter what their raw peak values are. For multi-term imaging, the ratios between the spectral PSF central values matter and the effect/error is enhanved. When all these uv-coverage variations are avoided (in careful simulations), both algorithms perform as expected for joint wideband mosaics (both with conjbeams=True or False). For CASA 6.3, the guidelines are:
For the standard gridder, full-field, single pointing imaging (spectral cube as well as multi-term) will be accurate as long as the image phase center matches the PB pointing center.
For multi-term wideband joint mosaics, we recommend the use of gridder=’awproject’ with conjbeams=True as that is the only combination that has demonstrated accurate wideband pb-correction (at niter=0) especially in the presence of position-dependent uv-coverage. All other options will need monitoring and several major cycles to ensure convergence. The image should ideally be centered on a PB center.
In tclean, the mosweight parameter for multi-field imaging has a new default value of mosweight=True as of CASA 5.4. The new default setting of mosweight=True in tclean optimizes noise characteristics for Briggs/uniform weighting, but one should be aware that it has the following disadvantages:
it may potentially cause memory issues for large VLA mosaics
the major and minor axis of the synthesized beam may be up to ~10% larger than with mosweight=False
Please change to mosweight=False to get around these issues.
When imaging mosaics with a large number of fields and many MSs in tclean, an error can occur that specifies too many open files. This can happen for both manual and pipeline imaging when using the mosweight=True parameter. The reason is that in CASA 5.5, a trade-off was made to reduce memory demands in tclean when using mosweight, by placing the weights on disk using multiple files. Unfortunately, this memory fix may cause open file problems for data sets consisting of many MSs and fields. The problem has been characterized based on the number of MSs and fields: with respect to earlier CASA releases, the imager code now uses #MSs x #fields x 2 additional files. In CASA 6.2 and 6.3, the number of open files required for the default perchanweightdensity=True case has been dramatically reduced, though perchanweightdensity=False imaging runs are more limited. While the CASA team is working on a permanent solution for future CASA versions, the recommended work-around solution is to manually increase the limit for the number of open files, e.g.: ulimit -Sn 8000 or ulimit -Sn 16000. In some cases, increasing the hard-limit on number of open files may be necessary, which requires admin/root permissions. As a rule of thumb, each MS requires ~54 simultaneously open files. With a 16k limit and a maximum of 150 fields per MS, for a use case with perchanweightdensity=False, mosweight=True tclean will encounter too many open files with ~46 MSs or more, while with perchanweightdensity=True, mosweight=True tclean will encounter too many open files with ~300 MSs or more.
There are small, systematic offsets known to occur when using tclean. Our initial tests show that the offset in dec is of the order ~50 milli-arcsec, while the offset in RA is a function of declination, but also amounting to ~50mas. This issue is currectly being investigated.
Currently the parameter type of niter is defined as an integer, therefore the integer value larger than 2147483647 will not be set properly as it causes an overflow.
Using deconvolver=’mtmfs’, nterms=1 and specmode=cube does not yet work in parallel imaging mode. Use specmode=’mfs’ instead.
In tclean, defining image cubes in optical velocity in some cases is known not to work. This problem is under investigation.
The awproject gridder in tclean does not support the virtual model scheme.
Interactive tclean only works when a region or mask is selected in the CASA Viewer. There is a known bug that when a region is first selected, and then de-selected to produce an empty mask (filled with zeros), the CASA Viewer that runs interactive tclean will still allow you to proceed, and tclean will detect an empty mask and stop. Please always mark a region/mask to continue interactive tclean (if the entire image should be cleaned, draw a box around the entire image), and do not forget to double-click inside the green contours to select the region.
When using interactive tclean, hand-edited cyclethresholds do not change back to the auto-calculated values in the GUI until two major cycles later. However, the logger contains the most accurate information about what was used, and the expected behaviour (of hand-edited cyclethresholds applying to only the current minor cycles) is seen and has been tested. Therefore, iteration control and imaging will proceed as expected. This known issue affects CASA versions 5.6 and 5.7/6.1
In the makemask task, region files using the minus sign ( - ) to create cutouts are known not to work.
The combined imaging of different large MSs where differences in the topo-centric frequencies are large compared to the channel width can fail in tclean as a result of memory issues, in particular when imaging in parallel mode. A potential workaround solution is to use the task concat to first combined the MSs before running tclean, although further testing is needed to ensure that this solution is valid for all cases. This issue will be fixed in a next CASA release.
imregrid¶
Position-velocity (PV) images are not supported by imregrid, because their coordinate systems are nonstandard, lacking a direction coordinate and having a linear coordinate.
When converting from between coordinate system that require rotation (e.g., from celestial to galactic coordinates), CASA is known to introduce deviations in position from other software packages that can be several tenths of an arcsec. This could be because the rotation of the rectangular grid in a non-cartesian coordinate system is imperfect, possibly due to internal inconsistencies in the conversion matrices. The conversion between one frame and another in general becomes less accurate as distance from the output image’s reference pixel increases. The imregid task and Measures tool suffer from this Known Issue (see imregrid task page).
viewer¶
The CASA viewer does not yet support the entire region shapes and parameters at this stage.
For equatorial cubes, i.e. data cubes that include dec=0 (exact), the viewer only gives spectra for sources at dec > 0. No spectra are produced for any points with
dec <0
.Viewer may not properly open saved region files.
With the new region panel being used now, It may be advisable to rename the $HOME/.casa/viewer/rc file that stores previous configurations of the viewer.
Viewer - labels are not shown - this can be caused by a conflict between an installed version of PGPLOT and the version of PGPLOT that comes with the non-root version of CASA. If you do have PGPLOT installed in a standard location (e.g. /usr/lib), you may try moving it aside and see if it resolves the problems. If you do encounter this problem, please report it to the CASA team.
Some X11 settings can make the viewer unstable. We identified that the line Load “glx” in /etc/X11/xorg.conf is such a setting. If you don’t need this line for aother applications, it would be better to have it removed.
The viewer can only load MeasurementSets (MS) for which all spectral windows have the same channel width. If this is not the case, an ArrayColumn error will appear. To get around this, use SPLIT`` to place the spectral windows of interest in a separate MS, or try the table browser tool.
When exiting CASA after using the viewer, a message similar to the following may appear: proc vtool_1EziEss1P2tH0PxJbGHzzQ is being killed. This is a cosmetic issue and can be ignored.
For some OSs and window managers, parts of the display may be eclipsing interactive elements. We recommend to change the window manager styles for these cases.
When multiple animators are open, it can happen that it is not possible to make them active, when the ‘Images’ animator is inactive. Active the ‘Images’ animator first to enable the other animators.
MeasurementSet with sizes of tens of Gb may not visualize the full data set properly on all machines, which can give the appearance that part of the data is flagged.
The line tool in the MAC viewer plots unreadable hex numbers.
plotms¶
In plotms, PDF and PS exports have been reported not to work with some older sub-versions of RH7 (e.g., RH 7.6). While the problem has not been completely characterized, upgrading to a newer RH version (e.g., RH 7.9) has shown to solve this issue.
In RedHat 7 we found that in some circumstances the vertical tab of the viewer appears on the right hand side instead of the left hand side. This eclipses the scrollbar and makes it difficult to use. To fix, add the following to the top of ~/.config/Trolltech.conf
[[Qt]]{.error} style=GTK+
When plotting pointing axes in plotms on RHEL6, the tick-values of minutes and seconds on the axes are not multiples of 5
For concatenated data sets, plotms can create an output error if certain data columns were present in some of the concat input MSs, but missing in others (making concat inset zero values). A practical workaround is to either handle the MSs separately, or delete those columns using the tb.removecols tool (but in case of the latter one has to take care that the columns are not crucial).
uvmodelfit¶
When running uvmodelfit, the output componentlist does not contain the uncertainty in flux that the task calculates (and displays at the end of the fitting process).
imstat¶
The use of the “centerbox” parameter when specifying a region in imstat has a known issue that under very specific circumstances, less pixels are taken into account for the statistics than what is expected. This only occurs when all of the following are true: (1) values are specified in pixels; (2) the width of the box is an even number of pixels (e.g, 4pix, 16pix, or 100pix); and (3) the box is located away from the image center in Right Ascension (progressively more pixels are dropped when moving away from the image center, but only in RA). The issue is a combination of machine rounding errors (when the boundary of the centerbox is exactly at the center of a pixel), and the fact that centerbox has to converts pixel coordinates to sky coordinates to allow all possible combinations of regions. Note that the “box” parameter is not affected by this, because it can be more strict in only using flat pixel coordinates. As a simple work-around solution, we recommend to always give the width of the centerbox in “odd” number of pixels. Please note that because centerbox places the center of a box in the middle of a pixel and CASA only includes full pixels, the width of a centerbox always has an odd number of pixels anyway. For example,
centerbox=[[1000pix,4000pix],[4pix,4pix]]
for an 8000x8000 pixel image should give npts=25, but due to the above issue will result innpts <25
. Instead,centerbox=[[1000pix,4000pix],[5pix,5pix]]
will always give npts=25.
tb.statistics¶
The table tool’s statistics function tb.statistics currently ignores the useflags parameter, so statistics are calculated for all values in the specified column, and flagged values cannot be avoided.
simobserve / simanalyze¶
When using simobserve to simulate a spectral cube, using the inwidth parameter with units of velocity (e.g., kms/s) is known to produce wrong results. Use inwidth with frequency units instead (e.g., inwidth=‘1MHz’).
CASA simulations do not yet fully support all spectral types of components (i.e., ability to include spectral lines or spectral indices)
When cleaning with a simulated MS, it should be considered best practice to declare the phasecenter parameter using the ‘J2000 xx:xx:xx.xxx +xxx.xx.xx.xxx’ notation to account for possible rounding errors that can create an offset in the image.
corruption of simulated MS by an atmospheric phase screen is only available from the toolkit. simobserve and sm: Under some circumstances, running sm.setnoise and sm.corrupt, or simobserve with thermal noise, twice using the same project name, the noise can be applied a second time, doubling the noise level. Be sure to use different project names for creating different simulations with noise. See casaguides.nrao.edu for the latest simulation information
single dish¶
sdimaging will crash or create incorrect images if - among the spectra to be imaged - there exist some spectra taken at some time t with some antenna, where t falls outside all pointing intervals of that antenna.
Difficulty in allocating memory to import/processing of Band 9 (fast-mapped, double-circle) data. Use high-performance machines as workaround.
Please avoid using spectral window and channel selection by frequency range. It may occasionally fail. So far, this has only been reported on Mac OS but it may happen on Linux, too.
CASA 6.4.0: sdcal crashes if ‘en_US’ is not available in the OS locale. This is known to happen on Ubuntu 20.04 with the so-called “minimal” install option. The underline cause is that task sdcal expects the presence of ‘en_US’ in the OS locales, but the Ubuntu OS is only shipped with ‘en_US.utf8’ by default, not ‘en_US’. This issue was fixed in the CASA 6.4.3 release.
sdimaging¶
Frequencies and frequency increments in the weight image generated by sdimaging could be slightly different from those in the science image. The difference in frequency resolution (CDELT3 in .fits) could be 0.1% at maximum, which is negligible in most cases. If the difference matters, please use tsdimaging instead although it takes more time to process than sdimaging.
sdimaging task may fail when more than several MSes are chosen as inputs (infiles) to create single output image. It is because the file descriptor opened by the task exceeds the limit defined by OSes. You can relax the limit of the number of open file descriptors by the command, e.g., ulimit -n 4096 . Note the typical number of file descriptors opened by the task is 35/MS.
sdintimaging¶
For task sdintimaging, gridder=’awproject’ is not yet available. It will be enabled in a subsequent release.
plotprofilemap¶
The task intermittently seg faults on Mac OS.
Compatibility¶
The CASA software is available for mutiple LINUX and Mac computer operating systems, as well as multiple versions of Python. The compatibility matrix below shows the different Operating Systems and Python versions on which current and future CASA versions are expected to work, and for which the CASA team accepts bug reports.
However, note that CASA is only validated against the operational configuration of NRAO instruments (currently RHEL7/Python 3.6).
Further differences exist between the monolithic CASA distribution (which includes all packages and a relocatable copy of Python) and modular CASA (in which the user supplies their own Python environment).
Refer to the following matrix for current and future planned compatibility.
Full Monolithic Distribution
Python 2.7 |
Python 3.6 |
Python 3.7 |
Python 3.8 |
|
---|---|---|---|---|
RHEL 6 |
5.8 |
<=6.3 |
||
RHEL 7 |
5.8 |
>=6.1 |
>=6.4 |
|
RHEL 8 |
>=6.4 |
|||
Ubuntu 18.04 |
>=6.2 |
>=6.4 |
||
Ubuntu 20.04 |
>=6.2 |
>=6.4 |
||
Mac OS 10.14 |
5.8 |
>=6.1 |
<=6.3 |
|
Mac OS 10.15 |
5.8 |
>=6.1 |
>=6.3 |
|
Mac OS 11 x86 |
>=6.3 |
>=6.3 |
||
Mac OS 12 ARM* |
>=6.4 |
Note
For plotms to work on Mac OS 12, XQuartz needs to be installed.
Modular CASA
Python 2.7 |
Python 3.6 |
Python 3.7 |
Python 3.8 |
|
---|---|---|---|---|
RHEL 6 |
<=6.3 |
6.2 |
6.2 |
|
RHEL 7 |
>=6.0 |
>=6.2 |
>=6.2 |
|
RHEL 8 |
>=6.0 |
>=6.4 |
>=6.4 |
|
Ubuntu 18.04 |
>=6.0 |
>=6.2 |
>=6.2 |
|
Ubuntu 20.04 |
>=6.0 |
>=6.2 |
>=6.2 |
|
Mac OS 10.14 |
>=6.1 |
<=6.3 |
||
Mac OS 10.15 |
>=6.1 |
>=6.3 |
||
Mac OS 11 x86 |
>=6.3 |
>=6.3 |
||
Mac OS 12 ARM |
>=6.4 |
WARNING: The 6.2.1 module of casatools is not available for Python 3.7.
Notes
Older versions of CASA may not be compatible with the latest Operating Systems (for example, an appropriate usage mode on RedHat8 for CASA versions older than 6.3 has not yet been defined/tested). A listing of previous CASA versions and their supported OSs can be found on the CASA website.
The AppImage software is used to package the GUIs that are part of CASA. Users need to have the FUSE software installed, so these applications and be mounted and CASA can be run in its entirety.
casaplotms became available for all Linux/Python 3.x combinations beginning in 6.1
casaviewer became initially available in 6.2 and fully supported in 6.3
Automated testing¶
Automated tests, used for verification during the development cycle and for new release versions, are run on a number of platforms. These include automated tests for CASA tasks, tools, and auxillary repos (e.g., Viewer, PlotMS, almatasks), as well as stakeholder tests, regression tests, and mpi tests. Entries marked with a “T” in the following table indicate versions of these automated tests in CASA 6.5:
OS |
tasks |
tools |
aux repos |
stakeholders |
regressions |
mpi |
---|---|---|---|---|---|---|
RHEL 8 + Py 3.8 |
T |
T |
T |
T |
T |
T |
RHEL 7 + Py 3.8 |
T |
T |
T |
|||
RHEL 7 + Py 3.6 |
T |
T |
T |
T |
T |
T |
Mac OS 12 M1 + Py 3.8 |
T |
T |
N/A |
N/A |
||
Mac OS 11 + Py 3.8 |
T |
T |
N/A |
N/A |
||
Mac OS 10.15 + Py 3.8 |
T |
T |
N/A |
N/A |
*Note: mpicasa, which is required for both mpi and stakeholder tests, is not supported for macOS.
Installation¶
A full installation of CASA including custom python environment is available as a Linux (.tar) or Mac (.dmg) file from our Downloads page (http://casa.nrao.edu/casa_obtaining.shtml)
The CASA 6.x series is also available as modular packages, giving users the flexibility to build CASA tools and tasks in their own Python environment. This includes the casatools, casatasks, and casampi modules, allowing for core data processing capabilities in parallel.
Prerequisite OS Libraries¶
CASA requires certain libraries be installed in the users operating system. Some may already be present by default. In case they are not, the following list should be checked before using CASA or if errors are encountered at runtime. Commands and package names are for Red Hat Linux, but equivalents can be found for other Linux distributions.
A system administrator may be required to install OS libraries. For example, for RHEL8 the following OS libraries should be installed:
$: sudo yum install ImageMagick*
$: sudo yum install xorg-x11-server-Xvfb
$: sudo yum install compat-libgfortran-48
$: sudo yum install libnsl
$: sudo yum install libcanberra-gtk2
For modular CASA, one must supply their own Python environment. There are many, including ipython and Jupyter, here is a basic example:
$: sudo yum install python36-devel
orpython38-devel
When using the casampi package from modular CASA, additional MPI libraries are needed:
$: sudo yum install openmpi-devel
$: sudo yum install mpich-devel
User installs mpi4py with:
$: env MPICC=/usr/lib64/openmpi/bin/mpicc python -m pip install mpi4py --no-cache-dir
Ensure mpirun is found:
which mpirun
If not, try full path:
export PATH=/usr/lib64/openmpi/bin/mpirun\$PATH
Alternative method for NRAO systems only
contact the helpdesk to install casa-toolset-3 (which contains the previous libraries)
the run
export PATH=/opt/casa/03/bin:\$PATH
Note
Using the modular CASA Viewer with Mac OS requires a special setup step. Download and expand the pgplot installation files on your machine. Then set the following environment variables from a terminal:
export PGPLOT_DIR=<download location>/pgplot
export PGPLOT_FONT=<download location>/pgplot/grfont
Monolithic Distribution¶
On Linux:
Download the .tar file and place it in a work directory (e.g. ~/casa)
From a Linux terminal window:
$: tar -xvf casa-xyz.tar.xz $: ./casa-xyz/bin/casa
The one caveat is that CASA on Linux currently will not run if the Security-Enhanced Linux option of the linux operating system is set to enforcing. For the non-root install to work, SElinux must be set to disabled or permissive (in /etc/selinux/config) or you must run (as root): $: setsebool -P allow_execheap=1
. Otherwise, you will encounter errors like:
error while loading shared libraries: /opt/casa/casa-20.0.5653-001/lib/liblapack.so.3.1.1: cannot restore segment prot after reloc: Permission denied
WARNING: By default, python 3.6 (and earlier versions of python 3) include the current working directory in the python path at startup. Any script in that directory with the same name as a standard python module or a CASA module will be detected and used by python instead of the code that is delivered with CASA. Protections have been included for files called “new.py” and “pickle.py”, but other scripts may cause problems with the CASA startup. For example, do not include a file named runpy.py in the working directory.
On Macintosh:
Download the .dmg disk image file
Double click on the disk image file (if your browser does not automatically open it).
Drag the CASA application to the Applications folder of your hard disk.
Eject the CASA disk image.
Double click the CASA application to run it for the first time. If the OS does not allow you to install apps from non-Apple sources, please Change the settings in “System Preferences-> Security & Privacy -> General” and “Allow applications downloaded from: Mac App store and identified developers”.
Optional: Create symbolic links to the CASA version and its executables (Administrator privileges are required), which will allow you to run
casa
,casaviewer
,casaplotms
, etc. from any terminal command line. To do so, run!create-symlinks
Modular Packages¶
Pip wheels for casatools and casatasks are available as Python 3 modules. This allows simple installation and import into standard Python environments. The casatools wheel is necessarily a binary wheel so there may be some compatibility issues for some time as we work toward making wheels available for important Python configurations.
Make sure you have set up your machine with the necessary prerequisite libraries first. Then a la carte installation of desired modules (from a Linux terminal window) as follows:
$: python3 -m venv myvenv
$: source myvenv/bin/activate
(myvenv) $: pip install --upgrade pip wheel
Now pick whichever subset of the available CASA packages you are interested in. Package dependencies are handled automatically by pip, with the exception of casadata which must be explicitly installed and updated by the user (see External Data). The following packages are available:
(myvenv) $: pip install casatools==6.5.2.26
(myvenv) $: pip install casatasks==6.5.2.26
(myvenv) $: pip install casaplotms==1.8.7
(myvenv) $: pip install casaviewer==1.6.6
(myvenv) $: pip install casampi==0.5.01
(myvenv) $: pip install casashell==6.5.2.26
(myvenv) $: pip install casadata==2022.9.5
(myvenv) $: pip install casaplotserver==1.4.6
(myvenv) $: pip install almatasks==1.5.2
(myvenv) $: pip install casatestutils==6.5.2.26
Note for Mac M1 users: For macOS 12 on an ARM-based M1 chip, users will need to install the wheels of CASA version 11 for x86 architecture. For that, we recommend to use the following command to pip install the CASA wheels:
(myvenv) $: arch -x86_64 python3 -m pip install ...
Users are advised to use a Python virtual environment (venv) and specific module version numbers as shown above. Giving an invalid number (like 999) to the pip install command is an effective way to list all available version numbers.
List all available versions of a module (a hack):
(myvenv) $: pip install casatasks==999
ERROR: Could not find a version that satisfies the requirement casatasks==999 (from versions: 6.0.0.27, 6.2.0.124, 6.2.1.7, 6.3.0.48)
Start Python in your venv and sanity check:
(myvenv) $ python
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[[GCC 8.3.0] on linux]
Type "help", "copyright", "credits" or "license" for more information.
>>> import casatasks
>>> help(casatasks)
To exit the python venv, type deactivate
from the terminal. However, the rest of this documentation assumes the venv is active (to reactivate, type source myvenv/bin/activate
)
The use of python3 venv is a simple built-in method of containerizing the pip install such that multiple versions of CASA 6.x can be kept on a single machine in different environments. In addition, CASA is built and tested using standard (python 3.6) libraries which can be replicated with a fresh venv, keeping the libraries needed for CASA isolated from other libraries which may already be installed on your machine.
With the pip installation, CASA may be used in a standard Pythonic manner. Examples can be found in this Jupyter Notebook.
WARNING: The pip-wheel version is intended for manual data processing, and is not yet officially endorsed by ALMA or VLA. Currently, pipelines are included in -and tested only for- all-inclusive monolithic CASA distributions.
Parallel Processing Setup
The casampi package provides the task-level MPI parallelization infrastructure of CASA. The casatasks module detects when casampi is available and enables the parallel processing capabilities of CASA. Advanced users may also access the casampi package directly to build new or custom parallelization schemes.
Make sure you have installed the prerequisite OS libraries for parallel processing. To test for correct modular MPI installation, run the following commands (from Linux terminal):
(myvenv) $ echo "from casampi.MPIEnvironment import MPIEnvironment; print('working?', MPIEnvironment.is_mpi_enabled)" > test.py
(myvenv) $ mpirun -q -n 2 python test.py
observe two instances of “working? True”
Performance¶
CASA is now running performance benchmarks against a subset of the casatasks
API to track various runtime metrics over development history of the project, starting in CASA 6. See the Performance Benchmark webpages for an interactive view of the latest test results.
casabench¶
The tests are implemented using airspeed-velocity with configuration and results tracked by a separate repository. Automated deployment is coordinated with Bamboo and confined to a single computer node.
The dedicated testing machine has eight Intel(R) Xeon(R) CPUs (E5-2670 @ 2.60GHz), 264GB of DDR3 SDRAM (36KSF2G72PZ-1G6P1 @ 1600 MT/s), 500GB Crucial(R) internal SSD (MX500), and runs Red Hat Enterprise Linux Workstation release 7.9 against the Linux kernel version 3.10.0-1160.71.1.el7.x86_64.
Tests currently run for each CASA6 pre-release package once other verification steps are complete. New tests and test cases will be added as development continues in CASA.
Index¶
API¶
External Interface definition of CASA. This section is verified prior to each release
almatasks¶
ALMA specific routines with separate build dependencies.
Generate a gain table based on Water Vapour Radiometer data |
casadata¶
Routines for handling external data dependencies in CASA. These functions are needed to manipulate the runtime data necessary for proper CASA operation.
update-data¶
-
update-data
Command line application bundled in monolitic CASA. Takes no inputs. Updates default casadata package installation to latest. Callable from within the CASA shell via:
CASA <1>: !update-data
update-user-data¶
-
update-user-data
runtime argument passed to Python when calling casatools module directly. Updates the casadata contents from specified location to match current contents of casadata repository. If no location is specified, defaults to the location pointed to by the rundata variable in config.py.
bash$ python -m casatools --update-user-data=/tmp/mydata
bash$ python -m casatools --update-user-data
casalith¶
CASA monolithic environment bundling Python and library dependencies into a single download package.
tasks¶
A few remaining tasks are found only in the monolithic environment
Browse a table (MS, calibration table, image) |
|
grid the visibility data onto a defined uniform grid (in the form of an ms); multiple MS's can be done onto the same grid |
executables¶
The following executable applications are located in the <casa release>/bin directory of the expanded monolithic CASA tarball:
-
python3
(v3.6.7)¶
-
pip3
(v9.0.1)¶
-
2to3
¶
-
casa
¶
-
mpicasa
¶
-
casaviewer
¶
-
buildmytasks
¶
python libraries¶
The following third party libraries are included in the python distribution of monolithic casa and available as imports:
-
libraries
(attrs==19.3.0, backcall==0.1.0, certifi==2020.12.5, cycler==0.10.0, decorator==4.4.2, grpcio==1.29.0, importlib-metadata==1.6.0, ipython==7.15.0, ipython-genutils==0.2.0, jedi==0.17.0, kiwisolver==1.2.0, matplotlib==3.2.1, more-itertools==8.3.0, mpi4py==3.0.3, numpy==1.18.4, packaging==20.4, parso==0.7.0, pexpect==4.8.0, pickleshare==0.7.5, pluggy==0.13.1, prompt-toolkit==3.0.5, protobuf==3.12.2, ptyprocess==0.6.0, py==1.8.1, pyfits==3.5, Pygments==2.6.1, pyparsing==2.4.7, pytest==5.4.2, python-dateutil==2.8.1, pytz==2020.1, scipy==1.4.1, six==1.15.0, traitlets==4.3.3, wcwidth==0.2.2, zipp==3.1.0)¶
Note that each component in the modular CASA distribution uses a subset of these same dependencies.
The definition is provided here in pip compatible format such that one could save the preceding list to a list.txt file and recreate using:
pip install -r list.txt
casaplotms¶
Routines for accessing the PlotMS application through Python.
A plotter/interactive flagger for visibility data. |
casashell¶
CASA shell environment for interactive Python-based analysis using CASA tasks.
buildmytasks¶
-
buildmytasks
()¶ An executable application included within the casashell module for creating custom CASA tasks
- Description
How to create your own CASA tasks using the
buildmytasks
executable included alongside the casashell environmentWarning
This prescription for writing and incorporating tasks in CASA is for the power-user. This procedure may also change in future releases.
The Basics
It is possible to write your own task and have it appear in CASA. For example, if you want to create a task named yourtask, then must create two files, yourtask.xml and a task_yourtask.py. The .*xml file is use to describe the interface to the task and the task_yourtask.py does the actual work. The argument names must be the same in both the yourtask.xml and task_yourtask.py file. The yourtask.xml file is used to generate all the interface files so **yourtask* will appear in the CASA system. It is easiest to start from one of the existing tasks when constructing these. You would make the name of the function in the yourtask.py be yourtask in this example.
We have provided the buildmytasks command in order to assemble your Python and XML into a loadable Python file. The steps you need to execute (again for an example task named “yourtask”):
Create python code for task as task_yourtask.py
Create XML for task as yourtask.xml
Execute buildmytasks
Import your new task into CASA
We will work through these steps now with the assumption that you already have your XML file and task implementation. If you need to create these from scratch, the documentation below will provide help.
The first thing that you need to do is add the bin directory for CASA to your path:
#Setup your environment for Linux -bash$ cd casa-6.2.0-94/bin -bash$ PATH=`pwd`:$PATH
If your XML file is from CASA 5, then it needs to be updated for CASA 6:
#Upgrading the XML in <your-development-path> -bash$ cd <your-development-path> -bash$ buildmytasks --upgrade yourtask.xml upgrading yourtask.xml -bash$
This step only needs to be done once. The old version is stored in yourtask.xml.bak. The update is done with an XML processor which modifies the XML without changing the content. However, if you had large sections of comments you should copy these from yourtask.xml.bak back into the updated yourtask.xml since these comments are not retained in the conversion. A discussion of XML changes between CASA 5 and CASA 6 can be found in the casatools bitbucket page.
In CASA 6, buildmytasks generates tasks that are designed to be inside of a Python package. You should decide what you want your package to be called, create it, and copy your XML file into it:
#Create a package -bash$ mkdir -p yourpkg/private -bash$ cp yourtask.xml yourpkg -bash$ cp task_yourtask.py yourpkg/private -bash$ cd yourpkg
Now buildmytask can be used to create yourtask along with the code needed to support inp/go/etc:
#Generate task -bash$ buildmytasks --module yourpkg yourtask.xml generating task for yourtask.xml generating 'go task' for yourtask.xml -bash$
This adds yourtask to the yourpkg package, but you still have to export yourtask to allow it to be accessible by users:
#Export task -bash$ echo '__name__ = "yourpkg"' > __init__.py -bash$ echo '__all__ = [ "yourtask" ]' >> __init__.py -bash$ echo 'from .yourtask import yourtask' >> __init__.py
At this point, you should find a yourtask.py in the current directory and a gotasks subdirectory with the inp/go implementation inside it. The commands we just executed created a minimal initialization file for yourpkg, and we can now test our new task:
#Test new task -bash$ cd .. -bash$ ls -p yourpkg gotasks/ yourtask.py yourtask.xml __init__.py private/ -bash$ casa CASA <1>: sys.path.insert(0,'.') CASA <2>: from yourpkg.gotasks.yourtask import yourtask CASA <3>: inp(yourtask)
This should display the help and inputs for yourtask inside CASA. Now you can set the parameters with inp, reset the defaults with default, save and restore parameters with tput and tget and run the task with go. The location of yourpkg must be in your PYTHONPATH; the first CASA <1> command added the current directory to the path used for imports.
If you have other tasks, you can put their XML files in yourpkg. Generate the bindings with buildmytasks, and then edit __init__.py to export any that you want the user to have available.
The XML file
The key to getting your task into CASA is constructing a task interface description XML file.
Some XML basics, an xml element begins with <element> and ends with </element>. If an XML element contains no other XML element you may specify it via <element/>. An XML element may have zero or more attributes which are specified by attribute="attribute value". You must put the attribute value in quotes, i.e. <element myattribute="attribute value">.
All task xml files must start with this header information.
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" ?> <casaxml xmlns="http://casa.nrao.edu/schema/psetTypes.html" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://casa.nrao.edu/schema/casa.xsd file:///opt/casa/code/xmlcasa/xml/casa.xsd">
and the file must have the end tag
</casaxml>
Inside a <task> tags you will need to specify the following elements:
<task> Attributes type required, allowed value is "function" name required Subelements shortdescription required description required input optional output optional returns optional constraints optional <shortdescription> - required by <task>; A short one-line description describing your task Attributes None Subelements None <description> - required] by <task>, Also used by <param>; A longer description describing your task with multiple lines Attributes None Subelements None <input> - optional element used by <task>; An input block specifies which parameters are used for input Attributes None Subelements <param> - optional <output> - optional An output element that contains a list of parameters that are "returned" by the task. Attributes None Subelements <param> - optional <returns> - optional Value returned by the task Attributes type optional; as specified in <param> Subelements <description> - optional <constraints> - optional A constraints element that lets you constrain params based on the values of other params. Attributes None Subelements <when> - required. <param> - optional The input and output elements consist of param elements. Attributes type - required; allowed values are record, variant, string int, double, bool, intArray, doubleArray, boolArray, stringArray name - required; subparam - optional; allowed values True, False, Yes or No. kind - optional; mustexist - optional; allowed values True, False, Yes or No. All param elements require name and type attributes. Subelements <description> - required; <value> - optional; <allowed> - optional; <value> - optional Value returned by the task Attributes type - required; as specified in <param> attributes. Subelements <value> - optional <allowed> - optional; Block of allowed values Attributes enum - required; maybe enum or range. If specified as enum only specific values are allowed If specified as range then the value tags may have min and max attributes. Subelements <value> - optional <when> - optional When blocks allow value specific handling for parameters Attributes param - required; Specifies special handling for a <param> Subelements <equals> - optional <notequals> - optional <equals> - optional Reset parameters if equal to the specified value Attributes value - required; the value of the parameter Subelements <default> - required <notequals> - optional Reset specified parameters if not equal to the specified value Attributes value - required; The value of the parameter Subelements <default> - optional <default> - optional Resets default values for specified parameters Attributes param - required; Name of the <param> to be reset. Subelements <value> - required, the revised value of the <param>. <example> - optional An example block, typically in python Attributes lang optional; specifies the language of the example, defaults to python. Subelements None
The task yourtask.py file
You must write the python code that does the actual work. The
task_*.py
file function call sequence must be the same as specified in the XML file. We may relax the requirement that the function call sequence exactly match the sequence in the XML file in a future release.
default¶
-
default
(taskname)¶ Reset task parameter values to the task’s default parameter values. If given a taskname, sets taskname as the current active (default) task.
- Parameters
taskname (obj, string, or None) - task object or task name. None will use current active (default) task.
- Description
Each task has a special set of default values defined for its parameters. You can use the default() command to reset the parameters for a specified task (or the current active task) to their default.
The
default()
command resets the values of the task parameters to a set of “defaults” as specified in the task code. Some defaults are blank strings ‘’ or empty lists [], others are specific numerical values, strings, or lists. It is important to understand that just setting a string parameter to an empty string ‘’ is not setting it to its default! Some parameters do not have a blank as an allowed value. See the help for a particular task to find out its default. If ‘’ is the default or an allowed value, it will say so explicitly.
doc¶
-
doc
(topic, version)¶ Open a web browser pointing to the location of the given named task or tool API documentation
- Parameters
topic (string) - name of task or tool or “start” or “toc”
version (string) - a casadocs version string, defaults to the casa version being used
- Description
Each task has built in inline help that can be seen with the standard Python help command. However, given the complexity of many tasks, with lengthy descriptions, images, tables, and large numbers of parameters and subparameters (not a standard Python feature), the web-based casadocs API documentation provides more functionality.
The doc command will open the OS default browser and direct it to the casadocs page for the given task name.
When the topic is a tool with API documentation the doc command will direct the browser to the casadocs page for the given tool name. Not all CASA tools have API documentation.
Using “start” for the topic will direct the browser to the top of the CASA documentation.
An empty argument, “toc”, or an unrecognized topic directs the browser to the casatasks API page.
If the documentation for the current version can not be found, doc will try and use the CASA documentation for the “latest” version. If no documentation can be found doc will direct the browser to the CASA home page. Note: “latest” is the most recent version of the documentation and typically corresponds to the version under development (not yet released).
A warning message is printed when doc can not use the documentation for the current release.
The version argument can be used to direct doc to use something other than the documentation for the version of CASA being used. Version strings typically look like “v6.4.1” with the numbers corresponding to the first 3 elements of the ctsys.version() value. A value of “latest” is used to find the documentation of the most recent version of casadocs and “stable” is used to find the documentation of the most recent release.
If the documentation for a specific verison is requested and can not be found, the browser will be directed to the top of the CASA site.
A warning message is printed when doc can not use the documentation for the requested release.
execfile¶
-
execfile
(filename, globals=globals())¶ Execute file
- Parameters
filename (string) - name of file to execute
globals (dictionary) - the environment for evaluation
- Description
Python 3 removed the
execfile
builtin function. CASA provides a convenience function that attempts to reproduce the behavior of the Python 2.7 builtinexecfile
function.execfile
evaluates contents offilename
in the environment specified byglobals
.When
execfile
is used within thefilename
being evaluated, it is necessary to addglobals( )
as the second argument to those execfile calls in order for the secondary script to know about the global variables of the calling script. For example, within a script ‘mainscript.py’, calls to another script ‘myscript.py’ should be written asexecfile(‘myscript.py’, globals())
.
go¶
-
go
(taskname=None)¶ Execute given task using parameters from the workspace. If given a taskname, sets taskname as the current active (default) task.
If the task is successfully executed, then a
<taskname>.last
file is created in the working directory containing the parameter values- Parameters
taskname (obj, string, or None) - task object or task name. None will use current active (default) task.
- Description
You can execute a task using the
go()
command, either explicitlyCASA <44>: go listobs ---------> go(listobs) Executing: listobs() ...
or implicitly if the active (default) task has already been set (e.g. by previous use of
default()
orinp()
)CASA <45>: inp tclean CASA <46>: go() ---------> go() Executing: tclean() ...
You can also execute a task simply by typing the taskname.
CASA <46>: tclean ---------> tclean() Executing: tclean() ...
inp¶
-
inp
(taskname=None)¶ Inspect the parameter values of the given task. If given a taskname, sets taskname as the current active (default) task.
- Parameters
taskname (obj, string, or None) - task object or task name. None will use current active (default) task.
- Description
You can set the values for the parameters for tasks (but currently not for tools) by performing the assignment within the CASA shell and then inspecting them using the inp() command. This command can be invoked in any of three ways: via function call
inp('<taskname>')
orinp(<taskname>)
, without parenthesesinp '<taskname>'
orinp <taskname>
, or using the current active (default) task with inp(). For example,CASA <1>: inp('tclean') ... CASA <2>: inp 'tclean' ----------> inp('tclean') ... CASA <3>: inp(tclean) ... CASA <4>: inp tclean ----------> inp(tclean) ... CASA <5>: inp() ----------> inp()
all do the same thing (the final example shows the parameters for the current active task, which is tclean here since the previous line set the current active task to tclean).
When you invoke the task inputs via inp(), you see a list of the parameters, their current values, and a short description of what that parameters does. For example, starting from the default values,
CASA <18>: inp('tclean') vis = '' # Name of input visibility file(s) selectdata = True # Enable data selection parameters field = '' # field(s) to select spw = '' # spw(s)/channels to select timerange = '' # Range of time to select from data uvrange = '' # Select data within uvrange antenna = '' # Select data based on antenna/baseline scan = '' # Scan number range observation = '' # Observation ID range intent = '' # Scan Intent(s) datacolumn = 'corrected' # Data column to image(data,corrected) imagename = '' # Pre-name of output images imsize = [] # Number of pixels cell = [] # Cell size phasecenter = '' # Phase center of the image stokes = 'I' # Stokes Planes to make projection = 'SIN' # Coordinate projection startmodel = '' # Name of starting model image specmode = 'mfs' # Spectral definition mode (mfs,cube,cubedata, cubesource) reffreq = '' # Reference frequency gridder = 'standard' # Gridding options (standard, wproject, widefield, mosaic, awproject) vptable = '' # Name of Voltage Pattern table pblimit = 0.2 # PB gain level at which to cut off normalizations deconvolver = 'hogbom' # Minor cycle algorithm (hogbom,clark,multiscale,mtmfs,mem,clarkstokes) restoration = True # Do restoration steps (or not) restoringbeam = [] # Restoring beam shape to use. Default is the PSF main lobe pbcor = False # Apply PB correction on the output restored image outlierfile = '' # Name of outlier-field image definitions weighting = 'natural' # Weighting scheme (natural,uniform,briggs, briggsabs[experimental]) uvtaper = [] # uv-taper on outer baselines in uv-plane niter = 0 # Maximum number of iterations usemask = 'user' # Type of mask(s) for deconvolution: user, pb, or auto-multithresh mask = '' # Mask (a list of image name(s) or region file(s) or region string(s) ) pbmask = 0.0 # primary beam mask fastnoise = True # True: use the faster (old) noise calculation. False: use the new improved noise calculations restart = True # True : Re-use existing images. False : Increment imagename savemodel = 'none' # Options to save model visibilities (none, virtual, modelcolumn) calcres = True # Calculate initial residual image calcpsf = True # Calculate PSF parallel = False # Run major cycles in parallel
The Figure below shows how this will look to you on your terminal. Note that some parameters are in boldface with a gray background. This means that some values for this parameter will cause it to expand, revealing new sub-parameters to be set. Some default values cause the related sub-parameters to be revealed.
CASA uses color and font to indicate different properties of parameters and their values:
Text Font
Text Color
Highlight
Indentation
Meaning
Parameters:
plain black none none standard parameter bold black grey none expandable parameter plain green none yes sub-parameter Values:
plain black none none default value plain blue none none non-default value plain red none none invalid value The Figure below shows what happens when you set some of the tclean parameters to non-default values. Some have opened up sub-parameters, which can now be seen and set. Some have closed sub-parameters because that non-default value has no related sub-parameters. The Figure thereafter shows what happens when you set a parameter to an invalid value. Its value now appears in red. Reasons for invalidation include incorrect type, an invalid menu choice, or a filename that does not exist. For example, since vis expects a filename, it will be invalidated (red) if it is set to a non-string value, or a string that is not the name of a file that can be found. The deconvolver value is invalid because it’s not a supported choice (‘hogbom’, ‘clark’, ‘multiscale’, ‘mtmfs’, ‘mem’, ‘clarkstokes’).
The tclean inputs after setting values away from their defaults (blue text). Note that some of the boldface ones have opened up new dependent sub-parameters (indented and green).
The tclean inputs where one parameter has been set to an invalid value. This is drawn in red to draw attention to the problem. This hapless user probably confused the ‘hogbom’ clean algorithm with Harry Potter.
saveinputs¶
-
saveinputs
(taskname=None, outfile=None)¶ Save current task parameters to file. If given a taskname, sets taskname as the current active (default) task.
- Parameters
taskname (obj, string, or None) - task object or task name. None will use current active (default) task.
outfile (string or None) - output file name, None will use current active (default) taskname.
- Description
saveinputs
is a synonym fortput
. Seetput
for the full documentation.
tget¶
-
tget
(taskname=None, savefile='')¶ Recover saved values of the inputs to a task. If given a taskname, sets taskname as the current active (default) task.
- Parameters
taskname (obj , string, or None) - task object or task name. None will use current active (default) task.
savefile (str) - Input file for the task inputs. default: <taskname>.last then <taskname>.saved. example: savefile=’tclean.orion’
- Description
This is a convenient way to retrieve the paramaters used in a previous task invocation. Typing
tget
without a taskname will recover the saved parameter values for the task that is currently active (default). If a task (or task name) is provided for the taskname parameter, e.g.tget <task>
, that task will become the active task and the parameter values will be restored for it.The previous task parameter values are stored in files. By default, they are retrieved based upon the name of the task. This is done by searching for
a
<taskname>.last
filea
<taskname>.saved
file
and then executing the Python in these files. For example,
default('gaincal') #set current active task to gaincal and default tget #read saved inputs from gaincal.last (or gaincal.saved) inp() #see these inputs! tget bandpass #now get from bandpass.last (or bandpass.saved) inp() #task is now bandpass, with recovered inputs
The
savefile
parameter can be used to causetget
to retrieve parameter values from a file with a different name. Supplying both thetaskname
andsavefile
parameters makes the specified task the active task and loads the defaults saved in the specifiedsavefile
, for example,tget(gaincal,"ngc-calib.last")
If the
taskname
parameter is omitted, The active task is used. For exampledefault(tclean) tget(savefile='good-clean.last')
Here, the active task is set with
default(<task>)
before loading the parameter values withtget
.Note:
tget
does not check whether the parameters in a namedsavefile
came from thetaskname
or active task.
tput¶
-
tput
(taskname=None, outfile='')¶ Save the current parameter values of a task to a file. If given a taskname, sets taskname as the current active (default) task.
- Parameters
taskname (obj, string, or None) - task object or task name. None will use current active (default) task.
outfile (string) - output file name. default: <taskname>.last example: savefile=’tclean.orion’
Description
The
tput
command will save the current parameter values of a task to a Python (plain ascii) file. It can take up to two arguments, e.g.tput(taskname, outfile)
The first is the usual taskname parameter. The second is the name for the output Python file. If there is no second argument, for example,
tput('tclean')
a file with name <taskname>.last (in this case ‘tclean.last’ will be created or overwritten if extant. If invoked with no arguments, e.g.
tput
it will use the current active taskname (for example as set using
inp <taskname>
ordefault <taskname>
).saveinputs
is a synonym fortput
For example, starting from default values
CASA <1>: default('listobs') CASA <2>: tput CASA <3>: !more 'listobs.last' vis = '' selectdata = True spw = '' field = '' antenna = '' uvrange = '' timerange = '' correlation = '' scan = '' intent = '' feed = '' array = '' observation = '' verbose = True listfile = "" listunfl = False cachesize = 50.0 overwrite = False #listobs(vis='',selectdata=True,spw='',field='',antenna='',uvrange='', timerange='',correlation='',scan='',intent='',feed='',array='', observation='',verbose=True,listfile='',listunfl=False,cachesize=50.0, overwrite=False )
An example save to a custom named file:
tput('listobs','ngc5921_listobs.par')
This is a counterpart to
tget
. Typingtput
without a taskname will save the values of the inputs for the current active (default) task.Adding a task name, e.g.
tput <taskname>
will save the values for the specified task. For example,default('gaincal') #set current task to gaincal and default tget #read saved inputs from gaincal.last (or gaincal.saved) inp() #see these inputs! vis = 'new.ms' #change the vis parameter tput #save back to the gaincal.last file for later use
Running Tasks and Tools
Tools are functions linked to the Python interface which must be called by name with arguments. Tasks have higher-level capabilities than tools. Tasks require input parameters which maybe be specified when you call the task as a function, or be set as parameters in the interface. A task, like a tool, is a function under Python and may be written in Python, C, or C++ (the CASA toolkit is made up of C++ functions).
There are distinct ways to run tasks. You can either call the task as a function with one or more arguments specified, or set the global CASA parameters relevant to the task and tell
the task to go()
. These two invocation
methods differ in whether the global parameter values are used or not.
One may call tasks and tools by name with parameters set on the same line. Parameters may be set either as explicit
<parameter>=<value>
arguments, or as a series of comma delimited <value>s in the correct order for that task or tool.
Note that missing parameters will use the default values for that task. For example, the following are equivalent:
#Specify parameter names for each keyword input:
plotms(vis='ngc5921.ms',xaxis='channel',yaxis='amp',datacolumn='data')
#when specifying the parameter name, order doesn't matter, e.g.:
plotms(xaxis='channel',vis='ngc5921.ms',datacolumn='data',yaxis='amp')
#use parameter order for invoking tasks
plotms('ngc5921.ms',1,1,0,0,0,'channel','data','amp','data')
This non-use of globals when calling as a function is so that robust scripts can be written. One need only cut-and-paste the calls and need not worry about the state of the global variables or what has been run previously. It is also more like the standard behavior of function calls in Python and other languages.
One may also invoke the tasks as follows:
default('plotms')
vis='ngc5921.ms'
xaxis='channel'
yaxis='amp'
ydatacolumn='data'
plotms()
Similar to the input as a function, the above invokation also calls the task with no arguments, and will thus use the global parameter values.
Alternatively, one can use inp/go to manually execute a task using an interface format. For example,
inp('plotms')
vis='ngc5921.ms'
xaxis='channel'
yaxis='amp'
datacolumn='data'
go()
will execute plotms with the set values for the parameters, which will appear in the terminal when re-typing ‘inp’.
Aborting Tasks
If you are running CASA tasks, you can usually use CTRL-C to abort execution of the task. If this does not work, try CTRL-Z followed by a kill.
You may have to quit and restart CASA after an abort, as the internal state can get mixed up.
Getting Return Values
Some tasks and tools return a record (usually a Python dictionary) to the interface. For example, the imstat task returns a dictionary with the image statistics in it. To catch these return values into a Python variable, you MUST assign that variable to the task call, e.g.
xstat = imstat('ngc5921.clean.image')
or
default('imstat')
imagename = 'ngc5921.clean.image'
xstat = imstat()
You can print or use the return value in Python for controlling scripts. For example,
CASA <1>: xstat = imstat('ngc5921.clean.image')
CASA <2>: xstat
Out[2]:
{'blc': array([0, 0, 0, 0]),
'blcf': '15:24:08.404, +04.31.59.181, I, 1.41281e+09Hz',
'flux': array([ 4.15292207]),
'max': array([ 0.05240594]),
'maxpos': array([134, 134, 0, 38]),
'maxposf': '15:21:53.976, +05.05.29.998, I, 1.41374e+09Hz',
'mean': array([ 1.62978083e-05]),
'medabsdevmed': array([ 0.00127287]),
'median': array([ -1.10467618e-05]),
'min': array([-0.0105249]),
'minpos': array([160, 1, 0, 30]),
'minposf': '15:21:27.899, +04.32.14.923, I, 1.41354e+09Hz',
'npts': array([ 3014656.]),
'quartile': array([ 0.00254587]),
'rms': array([ 0.00201818]),
'sigma': array([ 0.00201811]),
'sum': array([ 49.1322855]),
'sumsq': array([ 12.27880404]),
'trc': array([255, 255, 0, 45]),
'trcf': '15:19:52.390, +05.35.44.246, I, 1.41391e+09Hz'}
CASA <3>: myrms = xstat['rms'][0]
CASA <4>: print 10.0*myrms
0.0201817648485
If you do not catch the return variable, it will be lost
imstat('ngc5921.clean.image')
or
default('imstat')
imagename = 'ngc5921.clean.image'
imstat()
and spewed to terminal. Note that go() will trap and lose the return value, e.g.
default('imstat')
imagename = 'ngc5921.clean.image'
go()
will not dump the return to the terminal either.
Setting Parameters and Invoking Tasks
One can set parameters for tasks (but not for tools) by performing the assignment within the CASA shell and then
inspecting them using the inp()
command:
CASA <30>: default(bandpass)
CASA <31>: vis = 'ngc5921.demo.ms'
CASA <32>: caltable = 'ngc5921.demo.bcal'
CASA <33>: field = '0'
CASA <34>: refant = '15'
CASA <35>: inp('bandpass')
#bandpass :: Calculates a bandpass calibration solution
vis = 'ngc5921.demo.ms' #Name of input visibility file
caltable = 'ngc5921.demo.bcal' #Name of output gain calibration table
field = '0' #Select field using field id(s) or field
#name(s)
spw = '' #Select spectral window/channels
intent = '' #Select observing intent
selectdata = True #Other data selection parameters
timerange = '' #Select data based on time range
uvrange = '' #Select data within uvrange (default units meters)
antenna = '' #Select data based on antenna/baseline
scan = '' #Scan number range
observation = '' #Select by observation ID(s)
msselect = '' #Optional complex data selection (ignore for now)
solint = 'inf' #Solution interval in time[,freq]
combine = 'scan' #Data axes which to combine for solve (obs, scan, spw, and/or field)
refant = '15' #Reference antenna name(s)
minblperant = 4 #Minimum baselines _per antenna_ required for solve
minsnr = 3.0 #Reject solutions below this SNR (only applies for bandtype = B)
solnorm = False #Normalize average solution amplitudes to 1.0
bandtype = 'B' #Type of bandpass solution (B or BPOLY)
fillgaps = 0 #Fill flagged solution channels by interpolation
smodel = [] #Point source Stokes parameters for source model.
append = False #Append solutions to the (existing) table
docallib = False #Use callib or traditional cal apply parameters
gaintable = [] #Gain calibration table(s) to apply on the fly
gainfield = [] #Select a subset of calibrators from gaintable(s)
interp = [] #Interpolation mode (in time) to use for each gaintable
spwmap = [] #Spectral windows combinations to form for gaintables(s)
parang = False #Apply parallactic angle correction
All task parameters have global scope within CASA: the parameter values are common
to all tasks and also at the CASA command line. This allows the convenience of not
changing parameters that are shared between tasks but does require care when
chaining together sequences of task invocations (to ensure proper values are
provided). Some task parameters accept a specific string from a collection of
possible strings. These parameters are generally case sensitive, but sometimes both
fully upper and fully lower case versions of the same string may be accepted (i.e.
datacolumn='data'
or 'DATA'
). When in doubt, use the case specified in the
default setting of that parameter.
If you want to reset the input keywords for a single task, use the default()
command. For example, to set the defaults for the bandpass task, type:
CASA <30>: default('bandpass')
To inspect a single parameter value just type it at the command line. Continuing the above example:
CASA <36>: combine
Out[14]: 'scan'
The scope of parameters in CASA
By default the scope of CASA parameters is global. However, if you call a task as a function with one
or more arguments specified, e.g. task(arg1=val1,...)
, then non-specified parameters will be
defaulted and no globals used. This makes scripting more robust. Tasks DO NOT change the value of globals.
All task parameters have global scope within CASA: the parameter values are common to all tasks and also
at the CASA command line. This allows the convenience of not changing parameters that are shared between
tasks but does require care when chaining together sequences of task invocations (to ensure proper values
are provided). Tasks DO NOT change the values of the global parameters, nor does the invocation of tasks
using the functional call with arguments change the globals.
This does mean that unless you do an explicit default of the task, previously set values may be unexpectedly used if you do not inspect the inp() carefully. For example, good practice is:
default('imhead')
imagename = 'ngc5921.demo.cleanimg.image'
mode = 'list'
imhead()
If you supply the task call with arguments, then these will be used for the values of those parameters. However, if some but not all arguments are supplied, then those parameters not given as arguments will default and NOT use the current global values. Thus,
imhead('ngc5921.demo.cleanimg.image',mode='list')
will reproduce the above.
For example, suppose we have been running CASA on a particular dataset, e.g.
CASA <26>: inp tclean
---------> inp(tclean)
---------> inp(tclean)
# tclean -- Radio Interferometric Image Reconstruction
vis = 'ngc5921.demo.src.split.ms.contsub' # Name of input visibility file(s)
selectdata = True # Enable data selection parameters
field = '0' # field(s) to select
spw = '' # spw(s)/channels to select
timerange = '' # Range of time to select from data
uvrange = '' # Select data within uvrange
antenna = '' # Select data based on antenna/baseline
scan = '' # Scan number range
observation = '' # Observation ID range
intent = '' # Scan Intent(s)
datacolumn = 'corrected' # Data column to image(data,corrected)
imagename = 'ngc5921.demo.cleaning' # Pre-name of output images
imsize = [] # Number of pixels
cell = [15.0, 15.0] # Cell size
...
and now we wish to switch to a different one. We can reset the parameter values using
default()
:
CASA <27>: default()
CASA <28>: inp()
# tclean -- Radio Interferometric Image Reconstruction
vis = '' # Name of input visibility file(s)
selectdata = True # Enable data selection parameters
field = '' # field(s) to select
spw = '' # spw(s)/channels to select
timerange = '' # Range of time to select from data
uvrange = '' # Select data within uvrange
antenna = '' # Select data based on antenna/baseline
scan = '' # Scan number range
observation = '' # Observation ID range
intent = '' # Scan Intent(s)
datacolumn = 'corrected' # Data column to image(data,corrected)
imagename = '' # Pre-name of output images
imsize = [] # Number of pixels
cell = [] # Cell size
...
It is good practice to use default()
before running a task if you are unsure what state the CASA global
variables are in.
Warning
You can only reset ALL of the parameters for a given task to their defaults.
CASA 6 internally promotes integers to doubles, and for tasks CASA 6 ensures that the parameter values are converted to the internally acceptable type.
The .last file
Whenever you successfully execute a CASA task, a Python script file called <taskname>.last
will be written
(or over-written) into the current working directory. For example, if you ran the listobs
task as detailed
above, then
CASA <14>: vis = 'ngc5921.ms'
CASA <15>: verbose = True
CASA <16>: listobs()
CASA <17>: !more 'listobs.last'
IPython system call: more listobs.last
taskname = "listobs"
vis = "ngc5921.ms"
verbose = True
listfile = ""
#listobs(vis="ngc5921.ms",verbose=False,listfile="")
You can restore the parameter values from the save file using
CASA <18>: tget("listobs")
or (since the current active task is listobs
)
CASA <19>: tget
or
CASA <20>: run listobs.last
Note that the .last file is generally not created until the task actually finished (successfully), so it is often best
to manually create a save file beforehand using the saveinputs
or tput
command if you are running a critical task that you
strongly desire to have the inputs saved for.
casatasks¶
Tasks in CASA are python interfaces to the more basic toolkit. Tasks are executed to perform a single job, such as loading, plotting, flagging, calibrating, and imaging the data.
The parameters used and their defaults can be obtained by typing help(<taskname>)
at the Python prompt,
where <taskname>
is the name of a given task. This command lists all parameters, a brief
description of the parameter, the parameter default, and any options if there are limited allowed values
for the parameter.
Experimental tasks and algorithms
Some tasks and algorithms in CASA are labelled as Experimental or Unverified. These tasks have not been fully commissioned and/or verified. Such tasks are provided to enhance user capabilities, or because they are required for specific pipeline use.
The label Experimental or Unverified means that the task/algorithm falls under the following disclaimers:
Only a subset of modes have been incorporated into CASA unit/regression tests. These are documented in CASA Docs. Other options/modes may be run, and might work just fine, but they are not part of what has been tested carefully.
Some parameters have been tested for specific use cases (as part of the algorithm development, publication, and CASA test programs), but we have not yet established best practices for all different situations. This information will build over time and will be incorporated into our documentation as appropriate.
Experimental tasks and algorithms may have Known Issues, representing CASA's current understanding of the state of the code. These Known Issues are clearly defined as part of CASA Docs.
Parameter names and task structure can change, based on feedback and improved understanding of usability.
It is expected that ALMA and VLA pipelines will begin using experimental tasks only after they have stabilized for stand-alone use.
The complete listing of tasks available in CASA is as follows:
Input / Output¶
Convert a CASA visibility file (MS) into an ALMA or EVLA Science Data Model |
|
Convert a CASA image to a FITS file |
|
Convert a CASA visibility data set to a UVFITS file: |
|
Convert an ALMA Science Data Model observation into a CASA visibility file (MS) |
|
Import ATCA RPFITS file(s) to a measurement set |
|
Convert an image FITS file into a CASA image |
|
Convert a FITS-IDI file to a CASA visibility data set |
|
Convert a UVFITS file to a CASA visibility data set |
|
Convert a Miriad visibility file into a CASA MeasurementSet |
|
Convert a UVFITS file to a CASA visibility data set |
|
Import VLA archive file(s) to a measurement set |
|
Convert a downloaded Splatalogue spectral line list to a casa table. |
Information¶
Summarized description of an ASDM dataset. |
|
Displays statistical information on a calibration table |
|
List, get and put image header parameters |
|
Retrieve and modify image history |
|
Displays statistical information from an image or image region |
|
List antenna gain solutions |
|
List the HDU and typical data rows of a fits file: |
|
List the processing history of a dataset: |
|
List the summary of a data set in the logger or in a file |
|
List the summary of a multi-MS data set in the logger or in a file |
|
Lists observation information present in an SDM directory. |
|
List measurement set visibilities. |
|
Search a spectral line table. |
|
List, summary, get, and put metadata in a measurement set |
|
Displays statistical information from a MeasurementSet, or from a Multi-MS |
Flagging¶
Flagging task based on batches of flag-commands |
|
All-purpose flagging task based on data-selections and flagging modes/algorithms. |
|
Enable list, save, restore, delete and rename flag version files. |
Calibration¶
Normalize visibilities based on auto-correlations |
|
Apply calibrations solutions(s) to data |
|
Calculates a bandpass calibration solution |
|
Calculate a baseline-based calibration solution (gain or bandpass) |
|
Re-initializes the calibration for a visibility data set |
|
Bootstrap the flux density scale from standard calibrators |
|
Fringe fit delay and rates |
|
Determine temporal gains from calibrator observations |
|
Specify Calibration Values of Various Types |
|
Initializes weight information in the MS |
|
Determine instrumental polarization calibrations |
|
Derive linear polarization from gain ratio |
|
Re-apply refant to a caltable |
|
Smooth calibration solution(s) derived from one or more sources: |
Imaging¶
Imaging sensitivity estimataion |
|
Image-domain deconvolution |
|
Deletes model representations in the MS |
|
Combine two images using their Fourier transforms |
|
Insert a source model as a visibility set |
|
Construct a primary beam corrected image from an image and a primary beam pattern. |
|
Makes and manipulates image masks |
|
Make a component list for a known calibrator |
|
Form images from interferometric visibilities and single dish image to reconstruct a sky model by joint deconvolution. |
|
Fills the model column with the visibilities of a calibrator |
|
Radio Interferometric Image Reconstruction |
|
Wideband PB-correction on the output of the MS-MFS algorithm |
Single Dish¶
Convert ASAP Scantable data into a CASA visibility file (MS) |
|
Convert NOSTAR data into a CASA visibility file (MS) |
|
Average SD data over beams and do time averaging |
|
Offline correction of residual atmospheric features |
|
Fit/subtract a spectral baseline |
|
MS SD calibration task |
|
Fit a spectral line |
|
Task for single-dish image processing |
|
MS SD gain calibration task |
|
SD task: imaging for total power and spectral data |
|
Average SD spectra over polarisation |
|
[EXPERIMENTAL] invoke sideband separation using FFT |
|
Smooth spectral data |
|
Average SD data, perform time averaging |
|
SD task: imaging for total power and spectral data |
Manipulation¶
Clear all autolock locks |
|
Concatenate several visibility data sets. |
|
Change the sign of the phases in all visibility columns. |
|
regrid an MS to a new spectral window / channel structure or frame |
|
Regrid an MS or MMS to a new spectral window, channel structure or frame |
|
Changes FIELD and SOURCE table entries based on user-provided direction or POINTING table, optionally fixes the UVW coordinates |
|
Recalculates (u, v, w) and/or changes Phase Center |
|
Hanning smooth frequency channel data to remove Gibbs ringing |
|
Split the MS, combine/separate/regrid spws and do channel and time averaging |
|
Task to produce Multi-MSs using parallelism |
|
Rotate a Measurement Set to a new phase-center |
|
Remove tables cleanly, use this instead of rm -rf |
|
Create a visibility subset from an existing visibility set |
|
Compute and set weights based on variance of data. |
|
continuum subtraction in the uv domain |
|
Continuum fitting and subtraction in the uv plane |
|
Fit a single component source model to the uv data |
|
Subtract/add model from/to the corrected visibility data. |
|
Concatenate several visibility data sets into a multi-MS |
Analysis¶
Image-based baseline subtraction for single-dish data |
|
Collapse image along one axis, aggregating pixel values along that axis. |
|
Estimates and subtracts continuum emission from an image cube |
|
Create an image that can represent the statistical deviations of the input image. |
|
Fit one or more elliptical Gaussian components on an image region(s) |
|
Perform math operations on images |
|
Compute moments from an image |
|
Construct a position-velocity image by choosing two points in the direction plane. |
|
Rebin an image by the specified integer factors |
|
Change the frame in which the image reports its spectral values |
|
regrid an image onto a template image |
|
Smooth an image or portion of an image |
|
Create a (sub)image from a region of the image |
|
Reorder image axes |
|
Get the data value(s) and/or mask value in an image. |
|
Calculate rotation measure. |
|
Fit 1-dimensional gaussians and/or polynomial models to an image or image region |
|
Report spectral profile and calculate spectral flux over a user specified region |
|
Smooth an image region in one dimension |
|
Fit a 1-dimensional model(s) to an image(s) or region for determination of spectral index. |
Visualization¶
Plot the antenna distribution in the local reference frame: |
|
Makes detailed plots of Tsys and bandpass solutions. |
|
Makes profile map. |
|
Plot elements of the weather table; estimate opacity. |
Simulation¶
Simulation task for ALMA |
|
image and analyze measurement sets created with simobserve |
|
visibility simulation task |
casatools¶
The CASA toolkit is the foundation of the functionality in the package, and consists of a suite of C++ classes that wrapped and imported in Python. The tools are typically used inside casatasks, but they can also be used directly by advanced users to perform operations that are not available through the tasks. Tools are typically instantiated as stateful objects in Python.
Tool Listing
Tool for manual and automated flagging |
|
Filler for ATNF/ATCA RPFITS data |
|
Atmosphere model |
|
Get and fit data from a calibration table (CASA 3.4 and later). |
|
Synthesis calibration (self- and cross-) |
|
A tool for the manipulation of groups of components |
|
Operations on CoordinateSystems |
|
Functionals handling |
|
Operations on images |
|
tool for synthesis imaging |
|
combining images in a weighted fashion |
|
tool for logsink |
|
measures tool |
|
Operations on measurement sets |
|
Operations to retrieve metadata from a measurment set |
|
quanta tool handles units and quantities |
|
Create and manipulate regions of interest |
|
New single dish tool interface using sakura |
|
Manipulate or examine SDM datasets |
|
Tool for simulation |
|
New single dish tool interface to process an MS |
|
tool for mask handling in sysnthesis imaging |
|
tool for synthesis imaging |
|
Access tables from casapy |
|
Utility component, verified, xml translator |
Special Cases
In some cases, the state within a tool must be maintained singularly for an entire CASA session. In these cases, a singleton object is instantiated and provided directly to the user.
-
ctsys
¶ Singleton object from utils tool. See utils tool documentation for methods
- Examples
ctsys is already instantiated and provides access to the methods of the utils tool class. For example:
>>> from casatools import ctsys # modular casa only, already imported in monolithic >>> ctsys.hostinfo()
-
casalog
¶ Singleton object from logsink tool. See logsink tool documentation for methods
- Examples
casalog is already instantiated and provides access to the methods of the logsink tool class. For example:
>>> from casatools import casalog # modular casa only, already imported in monolithic >>> casalog.post('my example log message', 'INFO')
casaviewer¶
Routines for accessing the Viewer application through Python.
View an image |
|
View a visibility data set |
configuration¶
CASA accepts a variety of options through two mechanisms: configuration files and command line arguments. Configuration files are typically stored in a ~/.casa folder while command line options (only applicable to the full installation) are specified after the casa command at startup.
config.py¶
-
config.
py
(datapath, rundata, logfile, nologfile, log2term, nologger, nogui, colors, agg, pipeline, iplog, telemetry_enabled, crashreporter_enabled)¶
Each modular CASA 6 package as well as the full installation reads a single config.py configuration file. This file should be placed in the user root .casa folder (~/.casa) prior to starting the casa installation or importing the packages in to a standard python environment for the first time.
The following parameters can be set in the configuration file. Finer control over telemetry can also be set in the configuration file, as described here.
datapath : list of paths where CASA should search for data subdirectories
rundata : location of required runtime measures data, takes precedence over location(s) in datapath list
logfile : log file path/name
nologfile : do not create a log file when True, default False. If nologfile is true, then any logfile value is ignored and there is no log file.
log2term : print log output to terminal when True (in addition to any logfile and CASA logger), default False
nologger : do not start the CASA logger when True, default False
nogui : avoid starting GUI tools when True, default False. If nogui is True then the CASA logger is not started even if nologger is False.
colors : the IPython prompt color scheme. Must be one of “Neutral”, “NoColor”, “Linux” or “LightBG”, default “Neutral”. If an invalid color is given a warning message is printed and logged but CASA continues using the default color.
agg : startup without a graphical backend if True, default False
pipeline : attempt to load the pipeline modules and set other options appropriate for pipeline use if True, default False. When pipeline is True then agg will be assumed to be true even if agg is set to False here or on the command line.
iplog : create and use an IPython log in the current directory if True, default False.
telemetry_enabled : allow anonymous usage reporting, default True
crashreporter_enabled : allow anonymous crash reporting, default True
user_site : include the user’s local site-packages in the python path if True. Normally these are excluded to avoid any conflicts with CASA modules (when False, the default).
The configuration file is a standard python script, so any valid python syntax and libraries can be used. A typical config.py file might look something like this:
datapath=["/home/casa/data/casa-data", "~/.casa/my_additional_data"]
rundata="/home/casa/data/rsync"
log2term=True
nologger=True
An example config.py file showing all recognized configurable parameters is shown here, this also illustrates that config.py can contain other python commands. This shows setting logfile using the time module. Note that some of the parameters shown here are set to the their default values.
import time
datapath=["/home/casa/data/casa-data", "~/.casa/mydata"]
rundata="/home/casa/data/rsync"
logfile='casalog-%s.log' % time.strftime("%Y%m%d-%H",time.localtime())
telemetry_enabled = True
crashreporter_enabled = True
nologfile = False
log2term = True
nologger = True
nogui = False
colors = "LightBG"
agg = False
pipeline = False
iplog = True
user_site = False
telemetry_log_directory = /tmp
telemetry_log_limit = 20000
telemetry_log_size_interval = 60
telemetry_submit_interval = 604800
At runtime the datapath(s) are expanded through a resolve(...) function to find the needed data tables. For example
>>> casatools.ctsys.resolve('geodetic/IERSpredict')
'/home/casa/data/rsync/geodetic/IERSpredict'
The command line arguments take precendence over the equivalent config.py value.
The variables rundata and datapath are related but different. rundata is a single path, does not change after CASA has started, and it is meant to point to essential data that is required for CASA to run, such as the casacore Measures data (see External Data). In contrast, datapath is a list of paths and can be changed at runtime to include multiple data locations. The function resolve will search for files and directories through the datapath in list order. The idea is to allow users to add directories that contain the data they want to use during their session. After adding directories where they want to load data from, fully qualified paths are no longer needed for example for imaging tasks. Since there is no longer a single data path, users can add shared image directories, etc.
When using a monolithic/tar-file CASA distribution, if rundata is left as default, it points to the data included in the distribution. rundata can be used to set up and update custom data locations, see Updating a custom location. The value of rundata in a CASA session can be checked via the function rundata:
>>> casatools.ctsys.rundata()
'/home/casa/data/rsync'
Note
rcdir is used to change the location of the root .casa folder to something other than ~/.casa. In addition to the startup files (config.py and startup.py) the root .casa folder contains working files and directories used by CASA components (e.g. ipython, telemetry). It is expected to be writable by the user for use by those components.
startup.py¶
-
startup.
py
¶
This section only applies to the monolithic/tar-file CASA distribution, and it only applies to CASA 6.
For CASA 5, please see an earlier version of CASA Docs.
The 'startup.py' file found in $HOME/.casa (i.e. ~/.casa/startup.py) is evaluated by the CASA shell just before the CASA prompt is presented to the user. This allows users to customize their CASA shell environment beyond the standard settings in 'config.py', by importing packages, setting variables or modifying the python system path.
One case where this is useful is for configuring CASA for ALMA data reduction. A package called 'analysisUtils' is often used as part of ALMA analysis. It is typically imported and instantiated in startup.py:
$ cat ~/.casa/startup.py
import sys, os
sys.path.append("/home/casa/contrib/AIV/science/analysis_scripts/")
import analysisUtils as aUes = aU.stuffForScienceDataReduction()
In this example, the standard python modules os and sys are made available in the CASA shell. The path where the analysisUtils module can be found is added to the Python system path, and finally the package is imported and an object is created. These modules and objects will then be available for the user within the CASA shell environment.
terminal¶
-
terminal
(-h, --help, --logfile, --log2term, --nologger, --nologfile, --nogui, --rcdir, --norc, --colors, --pipeline, --agg, --iplog, --notelemetry, --nocrashreport, --datapath, --user-site, -v, --version, -c)¶
With the full installation of CASA from a tar file, the python environment itself is included and started through ./bin/casa. This ./bin/casa executable can be provided the following options to change configuration values at run time:
-h, --help show this help message and exit
--logfile LOGFILE path to log file
--log2term direct output to terminal
--nologger do not start CASA logger
--nologfile do not create a log file
--nogui avoid starting GUI tools
--rcdir RCDIR location for startup files, internal working files
--norc do not load user config.py (startup.py is unaffected)
--colors {Neutral,NoColor,Linux,LightBG} prompt color
--pipeline load CASA pipeline modules on startup
--agg startup without graphical backend
--iplog create ipython log
--notelemetry disable telemetry collection
--nocrashreport do not submit an online report when CASA crashes
--datapath DATAPATH data path(s) [colon separated]
--user-site include user's local site-packages lib in path
(toggling this option turns it on; use startup.py to append to the path)
-v, --version show CASA version
-c ... python eval string or python script to execute
These options take precedence over the configuration files. See the discussion of equivalent config.py parameters for more details on these command line options.
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/usingcasa.ipynb
Using CASA¶
See the CASA API for information on configuration options prior to startup.
Starting CASA¶
CASA packages installed through pip may be imported in to the standard Python environment on the host machine. For example:
(casa6) $ python
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[[GCC 8.3.0] on linux]
Type "help", "copyright", "credits" or "license" for more information.
>>> import casatasks
>>> help(casatasks)
The ~/.casa/config.py file will be read and processed when the casatasks package is imported.
The full installation of CASA includes an IPython environment which is executed like an application. Any desired command line arguments may be included. For example:
$ ./casa6/bin/casa --logfile MyTestRun.txt --nogui
The ~/.casa/config.py file will be read and processed as the casa application executes, with the supplied command line arguments (logfile and nogui) added on top.
This environment is based upon IPython. CASA uses IPython because it provides nice command completion, comandline editing and function invocation without parenthesis. The fact that the CASA application environment is IPython based means that users can also use the IPython magic commands like %run
. Users are encouraged to explore the options available with IPython, but this is outside the scope of this document. CASA only supports configuration using config.py. Some of these configuration
variable in config.py are used to configure IPython at startup time. CASA configures IPython to supply parentheses when they are omitted. For example, when sin 3
is executed IPython supplies the missing parens and invokes sin(3)
. CASA also turns off any output that the paren expansion would normally generate.
Users may wish to set shortcuts, links, aliases or add bin/casa to their envrionment PATH. See the documentation for your operating system.
Running User Scripts¶
CASA 6: modular version
The modular version of CASA behaves like a standard Python package and user scripts should include the relevant modules as they would any other python module (i.e. numpy). Executing external user scripts with modular CASA is just like any other python application. Note we recommend running in a Python venv, see the installation instructions for more information.
$ (casa6) python myscript.py param1 param2
CASA 6: all-inclusive version
Since the full CASA installation from a tar file includes its own python environment that is (typically) not called directly, alternative methods of feeding in user scripts are necessary. There are three main standard Python ways of executing external user scripts in the full installation of CASA:
-c startup parameter (see configuration instructions)
exec(open(“./filename”).read()) within the CASA Python environment
add your script to startup.py in the ~/.casa directory
In addition, an “execfile” python shortcut has been added to the full installation of CASA 6 for backwards compatibility with ALMA scriptForPI.py restore scripts. This allows running scripts with the following command:
execfile ‘filename.py’ within the CASA Python environment
The execfile command in CASA 6 has been tested and found to work in the same way as in (Python 2 based) CASA 5 with the exception that (1) the treatment of global variables has changed in Python 3, (2) only direct arguments should be used (i.e., listobs(vis=’name.ms’)), and (3) the command default(‘taskname’) no longer works with execfile (but note that direct arguments always invoke the defaults anyway). While casashell tasks (tasks run in the CASA shell environment) could be treated as scriptable in the Python 2.7/CASA 5 version of execfile, in Python 3/CASA 6 this behavior is not supported. Python replacements for execfile exist [i.e., “%run -i” or “exec(open(read…”] which provide some scripting capabilities for casashell commands, though these Python alternatives are not tested in any internal CASA regression tests.
Regarding the treatment of global variables: for execfile calls within a script which itself is run via execfile, it is necessary to add globals() as the second argument to those execfile calls in order for the nested script to know about the global variables of the calling script. For example, within a script ‘mainscript.py’, calls to another script ‘myscript.py’ should be written as follows: execfile(‘myscript.py’, globals()) .
Logging¶
Detailed description of the CASA logger
Logging your session¶
The output from CASA commands is sent to the file casa-YYYYMMDD-HHMMSS.log in your local directory, where YYYYMMDD-HHMMSS are the UT date and time when CASA was started up. New starts of CASA create new log files.
The CASA Logger GUI window under Linux. Note that under MacOSX a stripped down logger will instead appear as a Console.
The output contained in casa-YYYYMMDD-HHMMSS.log is also displayed in a separate window using the casalogger. Generally, the logger window will be brought up when CASA is started. If you do not want the logger GUI to appear, then start casa using the –nologger option,
casa --nologger
which will run CASA in the terminal window. See Starting CASA for more startup options.
ALERT: Due to problems with Qt , the GUI qtcasalogger is a different version on MacOSX and uses the Mac Console. This still has the important capabilities such as showing the messages and cut/paste. The following description is for the Linux version and thus should mostly be disregarded on OSX. On the Mac, you treat this as just another console window and use the usual mouse and hot-key actions to do what is needed.
The CASA logger window for Linux is shown in the figure above. The main feature is the display area for the log text, which is divided into columns. The columns are:
Time — the time that the message was generated. Note that this will be in local computer time (usually UT) for casa generated messages, and may be different for user generated messages;
Priority — the Priority Level (see below) of the message;
Origin — where within CASA the message came from. This is in the format Task::Tool::Method (one or more of the fields may be missing depending upon the message);
Message — the actual text.
The CASA Logger GUI window under Linux. Note that under MacOSX a stripped down logger will instead appear as a Console.
Using the casalogger Filter facility. The log output can be sorted by Priority, Time, Origin, and Message. In this example we are filtering by Origin using ‘clean’, and it now shows all the log output from the clean task.
The casalogger GUI has a range of features, which include:
Search — search messages by entering text in the Search window and clicking the search icon. The search currently just matches the exact text you type anywhere in the message.
Filter — a filter to sort by message priority, time, task/tool of origin, and message contents. Enter text in the Filter window and click the filter icon to the right of the window. Use the pull-down at the left of the Filter window to choose what to filter. The matching is for the exact text currently (no regular expressions).
View — show and hide columns (Time, Priority, Origin, Message) by checking boxes under the View menu pull-down. You can also change the font here.
Insert Message — insert additional comments as “notes” in the log. Enter the text into the “I*nsert Message*” box at the bottom of the logger, and click on the Add (+) button, or choose to enter a longer message. The entered message will appear with a priority of “NOTE” with the Origin as your username.
Copy — left-click on a row, or click-drag a range of rows, or click at the start and shift click at the end to select. Use the Copy button or Edit menu Copy to put the selected rows into the clipboard. You can then (usually) paste this where you wish.
Open — There is an Open function in the File menu, and an Open button, that will allow you to load old casalogger files.
Alert: Messages added through Insert Message will currently not be inserted into the correct (or user controllable) order into the log. Copy does not work routinely in the current version. It is recommended to open the casa-YYYYMMDD-HHMMSS.log file in a text editor, to grab text.
CASA Logger - Insert facility: The log output can be augmented by adding notes or comments during the reduction. The file should then be saved to disk to retain these changes.
Other operations are also possible from the menu or buttons. Mouse “flyover” displays a tooltip describing the operation of buttons.
It is possible to change the name of the logging file. By default it is ‘casa-YYYYMMDD-HHMMSS.log’. But starting CASA with the option –logfile will redirect the output of the logger to the file ‘otherfile.log’ (see also Page on “Starting CASA”).
casa --logfile otherfile.log
The log file can also be changed during a CASA session. Typing:
casalog.setlogfile('otherfile.log')
will redirect the output to the ‘otherfile.log’ file. However, the logger GUI will still be monitoring the previous ‘casa-YYYYMMDD-HHMMSS.log’ file. To change it to the new file, go on File - Open and select the new log file, in our case ‘otherfile.log’.
Startup options for the logger¶
One can specify logger options at the startup of CASA on the command line:
casa <logger options>
The options are described in “Starting CASA”. For example, to inhibit the a GUI and send the logging messages to your terminal, do
casa --nologger --log2term
while
casa --logfile mynewlogfile.log
will start CASA with logger messages going to the file mynewlogfile.log. For no log file at all, use:
casa --nologfile
Setting priority levels in the logger¶
Logger messages are assigned a Priority Level when generated within CASA. The current levels of Priority are:
SEVERE — errors;
WARN — warnings;
INFO — basic information every user should be aware of or has requested;
INFO1 — information possibly helpful to the user;
INFO2 — details for advanced users;
INFO3 — continued details;
INFO4 — lowest level of non-debugging information;
DEBUGGING — most “important” debugging messages;
DEBUG1 — more details;
DEBUG2 — lowest level of debugging messages.
The “debugging” levels are intended for the developers use.
Inside the Toolkit:
The casalog tool can be used to control the logging. In particular, the casalog.filter method sets the priority threshold. This tool can also be used to change the output log file, and to post messages into the logger.
There is a threshold for which these messages are written to the casa-YYYYMMDD-HHMMSS.log file and are thus visible in the logger. By default, only messages at level INFO and above are logged. The user can change the threshold using the casalog.filter method. This takes a single string argument of the level for the threshold. The level sets the lowest priority that will be generated, and all messages of this level or higher will go into the casa-YYYYMMDD-HHMMSS.log file.
Some examples:
casalog.filter('INFO') #the default
casalog.filter('INFO2') #should satisfy even advanced users
casalog.filter('INFO4') #all INFOx messages
casalog.filter('DEBUG2') #all messages including debuggingcasalog.
WARNING: Setting the threshold to DEBUG2 will put lots of messages in the log!
Error Handling with CASA tasks¶
Irrecoverable errors in tasks produce exceptions in all CASA tasks. Different standard Python types of exceptions are thrown depending on the type of error, including RuntimeError, OSError, ValueError, or AssertionError (in particular when there is an error validating the input parameters). This behavior applies to all CASA tasks and has been made consistent across all tasks beginning with CASA 6.2/5.8. For a list of CASA tasks see the API section.
When using CASA tasks in their modular version (from the casatasks
module), the exceptions are thrown as in normal Python functions, and can be used to handle errors in the data reduction workflow in user scripts, pipelines, etc. In earlier versions of CASA this was not consistent; see the changes section below for more details on earlier versions of CASA.
Let us see an example script that in CASA 6 produces an exception in the task split
:
from casatasks import split
try:
split(vis='uid___A002_X30a93d_X43e_small.ms', outputvis='foo.ms', datacolumn='corrected', spw='0')
except RuntimeError as exc:
print(' * Got exception: {}'.format(exc))
The task fails and produces an exception because the requested data column is not present in the MeasurementSet. The print statement (shown here as a trivial way of handling the error, just for illustration) will print the following message:
* Got exception: Desired column (CORRECTED_DATA) not found in the input MS (/casadata/uid___A002_X30a93d_X43e_small.ms).
The following messages are also produced in the CASA log:
INFO split::::casa ##########################################
INFO split::::casa ##### Begin Task: split #####
INFO split::::casa split( vis='uid___A002_X30a93d_X43e_small.ms', outputvis='foo.ms', keepmms=True, field='', spw='0', scan='', antenna='', correlation='', timerange='', intent='', array='', uvrange='', observation='', feed='', datacolumn='corrected', keepflags=True, width=1, timebin='0s', combine='' )
INFO MSTransformManager::parseMsSpecParams Input file name is uid___A002_X30a93d_X43e_small.ms
INFO MSTransformManager::parseMsSpecParams Data column is CORRECTED
INFO MSTransformManager::parseMsSpecParams Output file name is foo.ms
INFO MSTransformManager::parseDataSelParams spw selection is 0
WARN MSTransformManager::checkDataColumnsToFill CORRECTED_DATA column requested but not available in input MS
INFO MSTransformManager::initDataSelectionParams Selected SPWs Ids are Axis Lengths: [1, 4] (NB: Matrix in Row/Column order)
INFO MSTransformManager::initDataSelectionParams+ [0, 0, 3, 1]
INFO MSTransformManager::open Select data
INFO MSTransformManager::createOutputMSStructure Create output MS structure
SEVERE split::::casa Task split raised an exception of class RuntimeError with the following message: Desired column (CORRECTED_DATA) not found in the input MS (/casadata/uid___A002_X30a93d_X43e_small.ms).
INFO split::::casa Task split complete. Start time: 2020-11-02 14:33:58.083124 End time: 2020-11-02 14:33:58.262353
INFO split::::casa ##### End Task: split #####
INFO split::::casa ##########################################
CASA 6 - casashell (monolithic) version of tasks¶
When starting CASA in its full installation variant, using the bin/casa
command, a version of the CASA tasks meant for interactive use in the CASA IPython prompt is automatically imported. These interactive or casashell versions of the CASA tasks (see casashell) support, for example, the inp/go commands. The behavior of these casashell tasks in terms of error handling is intentionally different from the behavior of the modular version of the tasks, as explained
below. Note that this doesn’t imply that the modular tasks behavior cannot be obtained when using the full installation. The modular version of a task can be imported by simply importing the task explicitly from casatasks
.
The casashell versions of the tasks do not throw exceptions. Instead, they return False
if an exception occurs within the task. The exception traceback is printed to the CASA log, but the execution of the task finishes without throwing the exception. The casashell infrastructure captures the exceptions that occur inside a task and turn any exception into a return value False
. For example, if we use the same split command from the previous section in the CASA prompt, which fails because a
non-existent data column is requested, the split
call will return False and the log will show the following messages:
CASA <1>: split(vis='uid___A002_X30a93d_X43e_small.ms', outputvis='foo.ms', datacolumn='corrected', spw='0')
INFO split::::casa ##########################################
INFO split::::casa ##### Begin Task: split #####
INFO split::::casa split( vis='uid___A002_X30a93d_X43e_small.ms', outputvis='foo.ms', keepmms=True, field='', spw='0', scan='', antenna='', correlation='', timerange='', intent='', array='', uvrange='', observation='', feed='', datacolumn='corrected', keepflags=True, width=1, timebin='0s', combine='' )
INFO MSTransformManager::parseMsSpecParams Input file name is uid___A002_X30a93d_X43e_small.ms
INFO MSTransformManager::parseMsSpecParams Data column is CORRECTED
INFO MSTransformManager::parseMsSpecParams Output file name is foo.ms
INFO MSTransformManager::parseDataSelParams spw selection is 0
WARN MSTransformManager::checkDataColumnsToFill CORRECTED_DATA column requested but not available in input MS
INFO MSTransformManager::initDataSelectionParams Selected SPWs Ids are Axis Lengths: [1, 4] (NB: Matrix in Row/Column order)
INFO MSTransformManager::initDataSelectionParams+ [0, 0, 3, 1]
INFO MSTransformManager::open Select data
INFO MSTransformManager::createOutputMSStructure Create output MS structure
SEVERE split::::casa Task split raised an exception of class RuntimeError with the following message: Desired column (CORRECTED_DATA) not found in the input MS (/casadata/uid___A002_X30a93d_X43e_small.ms).
INFO split::::casa Task split complete. Start time: 2020-11-02 18:27:59.105823 End time: 2020-11-02 18:27:59.316335
INFO split::::casa ##### End Task: split #####
INFO split::::casa ##########################################
SEVERE split::::casa Exception Reported: Error in split: Desired column (CORRECTED_DATA) not found in the input MS (/casadata/uid___A002_X30a93d_X43e_small.ms).
INFO split::::casa Traceback (most recent call last):
INFO split::::casa+ File "/scratch/casa-6.2.0-36/lib/py/lib/python3.6/site-packages/casashell/private/split.py", line 723, in __call__
INFO split::::casa+ _return_result_ = _split_t( _invocation_parameters['vis'],_invocation_parameters['outputvis'],_invocation_parameters['keepmms'],_invocation_parameters['field'],_invocation_parameters['spw'],_invocation_parameters['scan'],_invocation_parameters['antenna'],_invocation_parameters['correlation'],_invocation_parameters['timerange'],_invocation_parameters['intent'],_invocation_parameters['array'],_invocation_parameters['uvrange'],_invocation_parameters['observation'],_invocation_parameters['feed'],_invocation_parameters['datacolumn'],_invocation_parameters['keepflags'],_invocation_parameters['width'],_invocation_parameters['timebin'],_invocation_parameters['combine'] )
INFO split::::casa+ File "/scratch/casa-6.2.0-36/lib/py/lib/python3.6/site-packages/casatasks/split.py", line 258, in __call__
INFO split::::casa+ task_result = _split_t( _pc.document['vis'], _pc.document['outputvis'], _pc.document['keepmms'], _pc.document['field'], _pc.document['spw'], _pc.document['scan'], _pc.document['antenna'], _pc.document['correlation'], _pc.document['timerange'], _pc.document['intent'], _pc.document['array'], _pc.document['uvrange'], _pc.document['observation'], _pc.document['feed'], _pc.document['datacolumn'], _pc.document['keepflags'], _pc.document['width'], _pc.document['timebin'], _pc.document['combine'] )
INFO split::::casa+ File "/scratch/casa-6.2.0-36/lib/py/lib/python3.6/site-packages/casatasks/private/task_split.py", line 163, in split
INFO split::::casa+ mtlocal.open()
INFO split::::casa+ File "/scratch/casa-6.2.0-36/lib/py/lib/python3.6/site-packages/casatools/mstransformer.py", line 43, in open
INFO split::::casa+ return self._swigobj.open()
INFO split::::casa+ File "/scratch/casa-6.2.0-36/lib/py/lib/python3.6/site-packages/casatools/__casac__/mstransformer.py", line 176, in open
INFO split::::casa+ return _mstransformer.mstransformer_open(self)
INFO split::::casa+ RuntimeError: Desired column (CORRECTED_DATA) not found in the input MS (/casadata/uid___A002_X30a93d_X43e_small.ms).
Out[1]: False
(in this example listing we show what would be printed to the terminal when the CASA log messages are printed to the terminal, using the --log2term
command line option).
In CASA 5, by default tasks never threw exceptions. See an earlier version of CASA Docs for details.
Standardized behavior since CASA 6.2¶
The behavior of tasks with respect to errors, exceptions, and return values in case of error has been standardized since CASA 6.2. In CASA 6.x, this standardization applies only to CASA tasks when imported from casatasks
. The casashell versions that come with casashell (CASA IPython prompt) keep the same behavior: they never throw exceptions and instead return False in case of a serious error (exception in the task). This is ensured by the casashell infrastructure (casashell task wrappers)
that will never let an exception out.
The changes introduced are:
All tasks throw exceptions as normal Python functions (the one or more top level try/except blocks that would trap all exceptions in 1/3+ of tasks have been removed or replaced in favor of finally blocks that clean up tools, etc. resources, see documentation on Python clean-up actions and the finally clause.
Exception types are more specific, using different Python built-in exceptions and avoiding the overly generic ‘Exception’ type as much as possible.
The message when there is an exception in a task is the same for all tasks, is printed to the CASA log and reads, for example, like this: “Task <mstransform> raised an exception of class <RuntimeError> with the following message: Desired column (CORRECTED_DATA) not found in the input MS (/casadata/uid___X02_X3d737_X1_01_small.ms)”
The same behavior when there are errors in HISTORY updates is ensured across all CASA tasks: produce a warning message but don’t change the task return/exceptions.
The situation before/after these changes can be summarized as follows:
In CASA releases earlier than 6.2/5.8, some tasks would raise exceptions, but other tasks would never raise exceptions. Instead, they would trap any exceptions raised in the task and return different values representing unsuccessful execution, such as False, None, or {}. To know whether exceptions were raised or something else returned (and what particular something else was returned), one would need to look into the task code, as this was undocumented behavior and there was no clear pattern to explain the different behavior of different tasks.
Beginning in CASA 6.2, all CASA tasks raise exceptions in the event of an unrecoverable error, as they would be raised by normal Python functions. Client code using CASA tasks can assume all tasks raise exceptions in case of error and devise a consistent error handling approach when using CASA tasks.
The following list summarizes the tasks which will raise exceptions after CASA 6.2/5.8 but did not raise exceptions in earlier versions of CASA:
from casatasks: clearcal, clearstat, cvel2, delmod, fixplanets, fixvis, flagdata, hanningsmooth, imhead, importasdm, importatca, importgmrt, importmiriad, importuvfits, imregrid, imsmooth, initweights, listcal, listfits, listhistory, listobs, listpartition, listvis, mstransform, nrobeamaverage, partition, plotants, sdpolaverage, sdtimeaverage, setjy, specfit, split, spxfit, uvsub, vishead
from casaplotms: plotms
The impact on scripts that use these tasks is that errors will now be more visible, with an exception raised rather than a probably silent and ambiguous False
or None
or {}
return. Note that scripts that do not import tasks explicitly from casatasks
are implicitly using the casashell version of the tasks. These scripts would not see any behavior changes, as in their case the casashell tasks never throw exceptions. In scripts that use tasks from casatasks
and were not checking
the return value of tasks (for example for a False
or {}
value), potential failures in tasks would remain potentially hidden in the code flow, before CASA 6.2/5.8 (and maybe unnoticed). From CASA 6.2/5.8 onward, a failure in a task will raise an exception. Scripts can implement error handling to react to exceptions from tasks by using the try
clause of Python.
Information Collection¶
To better understand real-world usage patterns, quality and reliability, CASA collects runtime telemetry and crash reports from users and sends periodic reports back to NRAO. This information is anonymous with no personal identifiable information (PII) or science data included.
Telemetry¶
Telemetry records task usage activity (task name, start time, end time) during CASA runs. Periodically, these reports will be batched together and sent to NRAO.
You can disable telemetry by adding the following line in ~/.casa/config.py
telemetry_enabled = False
Telemetry adds log files in the “rcdir” (e.g., ~/.casa) directory and submits the data at CASA startup after a predefined interval. This can be configured in the ~/.casa/config.py file by setting telemetry_submit_interval to a desired value in seconds. The default value is 1 week.
The log file cache directory can be changed by setting “telemetry_log_directory” in ~/.casa/config.py. “telemetry_log_directory” must be an absolute path.
Maximum telemetry log usage can be set with “telemetry_log_limit” (in kilobytes). CASA will check for the logfile size periodically and disable Telemetry when the limit is reached. The check interval can be set with “telemetry_log_size_interval” (seconds).
Summary of all available options in .casa/config.py:
telemetry_log_directory: /tmp
telemetry_log_limit: 20000
telemetry_log_size_interval: 60
telemetry_submit_interval: 604800
Crash Reporter¶
Crash reports are triggered whenever a CASA task terminates abnormally (e.g., unhandled C++ exception, segfault, etc.). The crash reports include:
program call stack
filesystem mount information
CASA log
memory information
operating system version
CPU information
You can disable crash reports by adding the following line in ~/.casa/config.py
crashreporter_enabled = False
Hardware Requirements¶
Recommended CASA computing environments
The recommended Hardware requirements are provided here as part of the CASA webpages.
Amazon Web Services¶
Overview of how to use CASA on Amazon Web Services
An introduction to Amazon Web Services
In this chapter you will learn how to create an account within AWS, select appropriate resources for a problem, launch those resources, perform any desired processing, return or store resulting products, and finally to release the reserved resources back to Amazon.
Amazon Web Services Introduction¶
Amazon Web Services (AWS) is a collection of physical assets and software tools for using ad hoc computing resources (aka Cloud Computing) within Amazon. The combination of a wide range of processing hardware, network speeds, storage media and tools allows users to create virtual computing platforms tailored to specific problem sizes with discrete durations.
In simplest terms, AWS allows users to create workstations or medium sized clusters of computers (ranging from 10’s to a few 1000 nodes) that are essentially identical to the kind of physical workstation or small cluster they might have at their home institution without the overhead of upfront capital expense, space, power or cooling. The full range of offerings from Amazon goes well beyond that simple conceptual model but many, if not most, are not directly applicable to radio astronomy data processing.
The target audience for this document is the astronomer who wishes to run their computations more quickly, would like to know if AWS can help accomplish that goal, and what possibilities and limitations AWS brings.
Applicability to NRAO Data Processing
NRAO data products, particularly those from the Atacama Large Millimeter Array (ALMA) and the Jansky Very Large Array (JVLA), are of sufficient volume (100s to 1000s of GBytes) and compute complexity to require processing capabilities ranging from high end workstations to small clusters of servers. Additionally advanced imaging algorithms typically benefit from more system memory than is likely available on standard desktops.
AWS can facilitate the transfer of data among researchers through high speed networks or shared storage. Large scale projects which can be decomposed along some axis (e.g. by observation, field, frequency, etc) can be processed concurrently across 10s, 100s or even 1000s of compute instances.
Document Outline
This document set attempts to walk users through the necessary steps to create an account within AWS, select appropriate resources for their problem, launch those resources, perform any desired processing, return or store resulting products to the user, and finally to release the reserved resources back to Amazon. The last step is a critical aspect to the financial viability of computing within AWS. Later sections will cover the potential financial benefit and possible pitfalls of utilizing AWS resources.
Requesting Assistance
Given the unique nature of AWS resources, please direct any questions or comments to nrao-aws@nrao.edu rather than to CASA or Helpdesk personnel.
User Account Setup¶
Creating and setting up a user account on AWS
Overview
To facilitate fine grain control of AWS resources, Amazon supplies two distinct account types. A Root user account (Account Root User) and Identity and Access Management Users (IAM Users). Learn more.
These accounts are distinct from regular Linux accounts that may exist on a compute instance.
Account Root User
Click here and follow the steps to set up an Amazon Web Services account.
Signing up for an AWS account automatically creates a Root user account with total control over the account. The credit card used during sign-up will be billed for all usage by the Account Root User and the Account’s IAM Users.
In general, the Root user account should only be used for changing account wide settings, e.g. creating or removing IAM users, changing AWS support plan or closing the account. An IAM User account should be used when requesting resources. Following this model allows for finer grain control over the type and scale of resources a specific user can request, and can limit the risk from unexpected expenses or accidental global changes to the account.
IAM Users
Getting Started with IAM Users
View Amazon’s AWS Documentation
The Root Account User can create IAM Users. These IAM Users may have more limited permissions or they may have Administrators permissions. It is recommended the Root Account User first create an IAM User in the Administrators group. That IAM User can then login and perform essentially all administrative operations, including adding more IAM Users.
IAM users can be given different levels of permissions for requesting AWS resources. Perimissions can be mapped to a User via membership in an IAM group or by having an IAM Role mapped to the User. More information on creating and utilizing IAM groups and IAM Roles can be found here
Other Resources - How to Sign into AWS as an IAM User : View Amazon’s AWS Documentation - Best practices for using IAM Users : View Amazon’s AWS Documentation
Linux Users
View Amazon’s AWS Documentation
IAM Users typically have the ability to start Instances, a virtual machine running Linux on AWS hardware. While starting the instance, an ssh key is specified. That key can be used to ssh into the running instance.
Adding Additional Linux Users : View Amazon’s AWS Documentation
Amazon Machine Images¶
An Amazon Machine Image (AMI) provides the information required to launch an instance, which is a virtual server in the cloud
Overview
An AMI is an object that encapsulates a base OS image (e.g., Red Hat Enterprise Linux, CentOS, Debian), any 3rd party software packages (e.g., CASA, SciPy) and any OS level run time modifications (e.g., accounts, data, working directories). Since an AMI is a discrete object, it can be mapped onto differing hardware instances to provide a consistent work environment independent of the instance’s number of processors, available memory, or total storage space.
The NRAO provides a set of pre-made images based on the standard Amazon image, which include specific release versions of CASA and Python and AWS’s command line interface (CLI) and application programming interface (API). The appropriate AMI can be used to start an image with operating system and software ready to run CASA.
Finding an NRAO AMI
You can search for NRAO AMIs using the console.
From the navigation bar in the upper right, change your region to be “US West (Oregon)”
Open the AWS Console and click on “EC2 Dashboard”.
Click on “AMIs” to bring up the interface for selecting AMIs.
The AMI selection interface by default displays only AMIs “Owned by me”. Change it to “Public Images”.
Narrow this long list to only NRAO images.
Click in the box to the right of the magnifying glass.
A menu listing “Resource Attributes” will pop up below the box you clicked on. Ignore it and type “NRAO” in the box and press the Enter key.
The list of AMIs has been narrowed to NRAO AMIs only.
New NRAO AMIs will be released periodically.
Using an AMI
Click the box next to the AMI you want. Click Launch. The Instances section of this document covers starting instances.
Geographic Locales
AWS has the concept of Regions and Zones in which Instances, EBS volumes, S3 buckets, and other resources are run. So an S3 bucket may be in region us-west-2 and not directly accessible if another IAM User is currently using us-east-1. However, a user may select the region they run in. And users may also duplicate some resouces across regions if access latency is the concern. To find the latency from your computer to all the AWS regions, try the cloudping tool: http://www.cloudping.info.
AMIs are Region-specific
An AWS AMI User can only use AMIs stored in its region. However, copying an AMI to another region is straightforward. The copy gets a new AMI-ID, so it effectively becomes a new AMI; the original AMI-ID is appended to the new AMI’s “Description”. The new AMI always starts out private and may need to be made public. (It takes about 15 minutes after making an AMI public for it to show up in a search.) In every other way, it is a duplicate of the original.
To make the image public, select the AMI and from the “Actions” menu and pick “Modify Image Permissions”. As in this image select “Public” and click Save.
Storage¶
AWS provides four basic forms of storage that vary by speed, proximity to their associated instance (which impacts latency/performance), and price.
The sections below describe storage in roughly the order of proximity to the instance. To first order each, subsequent type decreases in both performance and cost with Glacier being the slowest and cheapest. EBS may be the most commonly used.
Instance Store
Instance stores are solid state drives physically attached to the hardware the instance is running on that are available only on certain instance types. (Use of instance stores is beyond the scope of this document, although more information is available at this AWS page.) It is indirectly the most expensive form of storage since instance types with instance store capacity also include extra processor cores and memory. Cost effectiveness is a function of whether the extra cores and memory are utilized. Instance stores do not use redundant hardware. Instance stores cannot preserve their data after their instance stops.
Elastic Block Storage (EBS)
EBS is connected to instances via high speed networks and is stored on redundant hardware. EBS persists independently after a compute instance terminates (although it may be set to terminate instead). EBS storage can be allocated during the creation of an instance, or it may be created separately and attached to an existing instance. It may be detached from one instance and re-attached to another, which is potentially useful where the processing requirements for one stage of processing (e.g., calibration and flagging) are substantially different from a later stage (e.g., imaging).
Simple Storage Service (S3)
S3 storage is an object level store (rather than a block level store) designed for medium to long-term storage. Most software applications like CASA do not interact directly with S3 storage. Instead, one of the AWS Interfaces to S3 is used. Typically, S3 is used to temporarily store data before moving it to an EBS or Instance store volume for processing. Or it is used as long term storage for final products. As of this writing, S3 storage costs range from $150 - $360 TByte/year, depending on whether data is flagged as infrequent access. Longer term storage utilizes Glacier storage.
Glacier
Glacier is the lowest cost AWS storage. Data within S3 can be flagged for migration to glacier where it is copied to tape. As of this writing, Glacier storage costs roughly $86 TByte/year. Retrival from Glacier takes ~4 hours.
Instances¶
An instance is effectively a single computer composed of an OS, processors, memory, base storage and optional additional data storage
Instance Types
Amazon has predefined over 40 instance types. These fall into classes defined roughly by processing power, which are further subdivided by total memory and base storage.
Click here to see a list of all Linux instance types with their number of virtual CPUs (vCPU), total memory in GBytes of RAM, and the type of storage utilized by the instance type.
Note the “vCPU” is actually a hyperthread, so 2 “vCPU” equal one core, e.g. m4.xlarge has 2 cores. (These prices are for on-demand instances.)
Starting Instances
CASA requires >=4GB/core. Storage can be EBS. Because AWS has hyperthreading turned on, each “vCPU” is one hyperthread and therefore 2 vCPUs essentially equal 1 physical core. Some experimentation may be required to optimize the instance type if many instances are to be used. This can have a very significant impact on total run time and cost. The 4GB per physical core rule should be sufficient to get started.
Choosing a Payment Option: On-demand vs. Spot
Choosing an on-demand instance (the default) guarantees you the use of that instance (barring hardware failure).
There is also a Spot Price market; click here to read about it. The price of an instance fluctuates over time as a function of demand. AWS fills spot requests starting at the highest bid and working down to the lowest, the bid price paid by all spot users is the bid price reached when all resources were exhausted. When you request a spot instance you submit a bid for those resources. If that bid exceeds the current spot price, then the spot instance will launch and continue to run as long as the spot price remains below the bid. Typically, the spot price is much less than the on-demand price, so bidding the on-demand price typically permits an instance to run its job to completion. During the time it runs, it is billed only at the running spot price, not the bid, so the savings can be considerable. If the spot price rises above your bid your instance will be terminated. Be warned, if demand is excessive for that particular instance type and you bid 2x or more of the on-demand price you run the risk of the spot price rising to that level. Over bidding the demand price is most useful for very long running jobs where it’s considered acceptable to pay more for brief periods while minimizing the risk that the instance will be terminated due to a low bid.
For example, the on-demand price for a m4.xlarge instance is $0.239 per hour. For the past 3 months, the mean spot price has been $0.0341 (maximum $0.0515 per hour). A 10-hour run of 100 instances would have cost $239 for on-demand and $34 for spot instances. That assumes adding 100 instances to the spot market will not affect the spot price much, which is a reasonable assumption. However, adding 500 instances will certainly raise the spot price.
It’s possible to bid up to 10 times the on-demand price.
There are other ways to purchase AWS instances, but only on-demand and spot instances appear of interest to running CASA. See purchasing options.
Monitoring¶
A critical aspect to the financial viability of computing within AWS
Instance Monitoring
After a job finishes on an instance, that instance and storage are still running and generating charges to the AWS Account Root User. Instances and storage known to not be needed can of course be shut down. To check whether an instance is in use or not, a quick check can be made using the console: Click Instances, check the box next to your instance, and select the “Monitoring” tab. CPU Utilization is shown. But to be really sure, login to the instance and check if your job is finished. If so, you can transfer data off the root volume as needed and terminate the instance.
For more information see: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/US_SingleMetricPerInstance.html#d0e6752
Storage Monitoring (EBS)
After the instance terminates, the instance ID and data volume names will continue to show up for several minutes. This can be used to find EBS volumes that were attached to the instance. If the user does not want their data to remain on EBS, i.e., to transfer their EBS volume to another instance or save it for later, then terminating the volume at that time makes sense. Although, the user might want to preserve the data by copying it to S3 or downloading it to a local storage device and then terminate the EBS volume.
Storage Monitoring (S3)
If you have data in S3 you may wish to leave it there. If you wish to move it to your local storage device, click the cube in the upper left of the console, then choose S3. Unfortunately the console is clumsy for transferring data. Reading through the Interfaces section of this chapter (and the relevant links), specifically on the use of the AWS CLI, is therefore recommended. Once the user is done with the previously allocated AWS resources, the user can release the reserved resources back to Amazon.
Interfaces¶
Amazon has multiple interface methods for interacting with resources.
The AWS console is a web interface used to control AWS resources. For users launching isolated instances and querying status, this is likely the simplest and most commonly used interface.
The CLI is the command line interface to AWS. This software, installed on your local computer, can be used to launch and control AWS resources. This interface is most useful for repeated tasks and simple automation
The python SDK interface is beyond the scope of this document, but more information can be found here. The python interface is most useful for complex automation frameworks and includes a REST-ful interface to all AWS resources.
Example Using CASA¶
Tutorial of CASA on AWS
Overview of Using CASA on AWS
Amazon Web Services (AWS) allows researchers to use instances for NRAO data processing. This section presents readying an Instance to process a data set with CASA. The CASA tutorial will be used as a demonstration of running CASA on AWS.
Choose an Instance
The tutorial does not require an instance with a lot of CPU and RAM. Per the hardware requirements page section on memory: 500MB per core is adequate. An m4.xlarge is more than adequat and costs $0.24/hour. It has 2 cores and 16 GB RAM. A smaller instance such as m4.large would probably work as well, except it has 1 core and could not run things in parallel.
Get Ready to Start an Instance
Follow the directions on the AMI page to locate an NRAO AMI. Select the AMI, and from the Actions menu, choose Launch. Select the m4.xlarge image type, which, as mentioned above, should be adequate for the tutorial at 2 cores and 8 GB/core. After that, we often press the “Review and Launch” button to skip to the end of the process. However, we need to add some storage first.
Start an Instance with Some Extra Storage Space
Select “4. Add Storage” at the top. You can see the root volume is by default 8 GB. For the tutorial, we might get by with 8 GB, but enlarging the root volume will remove any doubt. Change 8 to 1024 (the upper limit for the root volume). You might notice a checkbox called “Delete on Termination”. For root volumes, this is the default. Unchecking it causes the root volume to persist after shutdown of the instance. The charges for storing a terminated instance are minimal compared to the charge for a running one. The user (or a coworker with adequate privileges) can mount the EC2 volme on another instance. After making this selection, click Review and Launch. Then click “Launch”. You are asked for the ssh key pair that will allow ssh access to the instance you are about to start. If possible, use an existing key pair. Click Launch to start the instance.
Logging into Your Instance
Once the instance has had a couple of minutes to start, you can see in the instances screen the running (and recently terminated) instances. Your instance is identifiable by the Key Name and the fact that under Status Checks it says “Initializing.” Copy the external IP address and login to the instance: ssh -i ~/.ssh/mykeyname.pem centos@my-IP-address. If your login is not immediate, try again in a minute or so.
Using an NRAO AMI to start an instance brings up an instance with CASA already installed. Everything you need to run CASA should be there except for the data, which will be downloaded directly to the instance in the next step.
Downloading the Data
In this example we’ll be using the VLA high frequency Spectral line tutorial.
You can bring up that page in a browser on your host computer, there’s no need to launch a browser on the AWS instance.
Section 2 “Obtaining the Data” of the tutorial lists the URL where the data can be found and is repeated below. Once you’ve logged into your instance you can retreive and unpack the data with the commands below. If you’ve attached a seperate storage device to the instance you should cd to where it was mounted to ensure the data is written to that device. See the storage section for more details.
wget http://casa.nrao.edu/Data/EVLA/IRC10216/day2_TDEM0003_10s_norx.tar.gz
tar xf day2_TDEM0003_10s_norx.tar.gz
Launching and Running CASA
Typing ‘casa’ in the terminal will start the pre-installed version of casa. The first time it is run, it will take a few minutes to initialize. An Ipython interpreter for CASA will eventually open, ready for commands to CASA. (The CASA log window should display as well.)
Display the antenna map
#In CASA
plotants(vis='day2_TDEM0003_10s_norx',figfile='ant_locations.png')
Plot the MeasurementSet, amplitude vs. uv-distance
plotms(vis='day2_TDEM0003_10s_norx',field='3', xaxis='uvdist',yaxis='amp'
correlation='RR,LL', avgchannel='64',spw='0~1:4~60', coloraxis='spw')
Flag data
flagdata(vis='day2_TDEM0003_10s_norx', mode='list', inpfile=["field='2,3'
antenna='ea12' timerange='03:41:00~04:10:00'", "field='2,3'
antenna='ea07,ea08' timerange='03:21:40~04:10:00' spw='1'"])
Transfer data
When you are done with your instance and want to move the data on its root volume to your local storage, you can use scp -r or rsync -a .
Costs¶
Overview of Costs Associated with AWS
Amazon Web Services (AWS) allows researchers to use AWS resources for NRAO data processing. This section presents costs associated with using these resources. Resource types include: Instances, EBS Volumes, EBS Snapshots, S3, and Glacier.
The primary resource utilized is Instances.
Other resources are methods of storing input and output data: EBS, EFS, Snapshots, S3, and Glacier.
The way to contain costs is to first determine the needs of the code that is to be run. Then AWS resources can be matched to those needs as efficiently as possible given the circumstances.
CASA Hardware Requirements
Running CASA
Selecting a suitable instance type and size requires some knowledge about the CASA tasks to be run. The Hardware Requirements page includes sizing guidelines for typical computing hardware that can be translated to AWS instance types.
Choosing an Instance
An instance is the best place to start (allocating resources). A list of on-demand instance costs and capabilities are listed here: https://aws.amazon.com/ec2/pricing/, though be aware it takes a minute to load and nothing will display under “on-demand” while the data loads. Note that spot instances can be utilized to run jobs at a much reduced cost; this is covered in the Instances section of this document. The goal is to select an Instance type that is large enough to do the job, but leaves as few resouces idle as possible.
File I/O
Default EBS is generally used. However, options exist to specify different types of EBS, e.g., storage with higher iops, etc., that cost more. EBS storage pricing details can be found here: https://aws.amazon.com/ebs/pricing/. For reference, there is a detailed discussion of EBS volume types here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html.
Instance Store
Some instance types are pre-configured with attached SSD storage called “instance store”. If you start such an instance, part of its intrinsic cost is this storage. More details about instance store are here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/add-instance-store-volumes.html#adding-instance-storage-instance.
Selecting an Instance Type for CASA
What to try First
There are over 40 instance types with varying amounts of RAM and CPUs. There are, when you look closely, many storage options, but a few recurrent themes emerge. The simple storage system (S3) is primarily for storing large volumes of data from a few hours to a few months. S3 can be used to stage input and output data. You can share these data with other users. EBS storage is the most often used “attached” storage with performance comparable to a very good hard drive attached to your desktop system. However, it’s bandwidth goes up with the core count, contrary to ordinary storage. The number of cores and GBs of RAM depend entirely on instance type. If you were to want 4 GB RAM per core (as recommended in CASA Hardware Requirements) and 10 cores, you can find the closest instance on the list of instance types, https://aws.amazon.com/ec2/instance-types/. m4.4xlarge is close with 16 “vCPU” and 64 GB. Despite apearances, it does not have enough cores. AWS lists “vCPUs” (a.k.a. hyperthread) instead of cores. A vCPU is an internal CPU scheduling trick that is useful for certain classes of programs, but is nearly always detrimental in scientific computing, e.g., CASA. To summarize, 2 Amazon vCPUs equal 1 core. From here on cores are used, where 1 core = 2 vCPUs.
Continuing with the example of looking for an instance that meets the contraints of 10 cores and 4 GB RAM per core, it makes sense to look at core count first. The listing of instances with their core count, memory, and cost are here: https://aws.amazon.com/ec2/pricing/. There are no instances with 10 cores. The closest have 8 and 16 cores. So we’ll look at >=16 core instances. Also, we’ll throw out any instances that do not have: RAM >= (#cores * 4 GB RAM). What is left (without looking at immensely large and therefore expensive instances) are these:
m4.10xlarge with 20 cores and 160 GB of RAM. Cost: $2.394/hour
x1.32xlarge with 64 cores and 1962 GB of RAM. Cost: $13.338/hour
r3.8xlarge with 16 cores and 244 GB of RAM. Cost: $2.66/hour
d2.8xlarge with 18 cores and 244 GB of RAM. Cost: $5.52/hour
Selecting 10 cores produced a results list that contains the most expensive instances. If it’s feasible to use a number of cores that is an exponent of 2, a more efficient arrangement may result. Looking at what instances with 2^3 = 8 cores also meet the criterion of 4GB RAM per core, for example:
m4.4xlarge 8 cores, 64 GB RAM. Cost: $0.958/hour
c3.8xlarge 8 cores, 60 GB RAM. Cost: $1.68/hour (instance store)
The c3.8xlarge, although very similar to the m4.4xlarge, costs 75% more per hour. That’s because c3.8xlarge comes pre-configured with local (instance store) storage. This is charged even when it is not used. It is something to watch out for. Instance store can be useful, but it is tricky to make use of. The use of instance store is outside the scope of this document. When considering 8 core instances, m4.4xlarge appears to be the most attractive option in this case.
m4.4xlarge 8 cores 4 GB/core $0.958/hour
r3.8xlarge 16 cores ~15 GB/core $2.66/hour
r3.4xlarge 8 cores ~7.6 GB/core $1.33/hour
r3.4xlarge is not far behind in price. And it has more RAM as well as 320 GB of instance store storage. So zeroing in on the best instance takes some time. However, it is not time well spent to find the most efficient instance until many instances are to be run or an instance is run for a long period of time.
What Instance(s) to Consider for Big or Long Jobs
So, to begin, it is probably best to choose EC2 as your primary storage, S3 for cold storage, and an instance with >=4GB RAM per core. A more detailed discussion of these (and other) hardware considerations is outlined in the Hardware Requirements page. What is covered here is what is sufficient to get started. Keep in mind that, since AWS has hyperthreading turned on, their “2 cores” means “1 physical core” (2 hyperthreads). For example, an AWS “8 core” instance type is actually only 4 physical cores. CASA does not make good use of virtual cores so if you want a system with 4 actual cores, select an AWS “8 core” system with >= 16 GB of RAM. That should be sufficent to get started. As you use AWS more, you’ll want to invest more time in optimizing the instance type based on the details of your processing case. If you are running only a few instances, such optimizations are not worth much effort, but if you plan to run hundreds of jobs, this can have a very significant impact on total run time and cost. The 4GB per physical core rule should be sufficient to get started, but more demanding imaging tasks will likely require 8GB or 16Gbyte per core.
AWS Storage for CASA
Root Volume
Starting an Instance with an NRAO AMI and accepting the storage defaults creates a suitable root volume for CASA. If desired, exhaustive detail on root volumes is availabe at the AWS website: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/RootDeviceStorage.html.
Additional EBS Volumes
Additional EBS volumes can be added to an instance at any time during it’s life cycle. See the following link for more information: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-creating-volume.html.
CASA Citation¶
Citation for use in publications
Please cite the following reference when using CASA for publications:
McMullin, J. P., Waters, B., Schiebel, D., Young, W., & Golap, K. 2007, Astronomical Data Analysis Software and Systems XVI (ASP Conf. Ser. 376), ed. R. A. Shaw, F. Hill, & D. J. Bell (San Francisco, CA: ASP), 127 (ADS link)
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/casa-fundamentals.ipynb
CASA Fundamentals¶
Fundamentals of CASA: Measurement Equation, Science Data Model, and MeasurementSet
Measurement Equation¶
The visibilities measured by an interferometer must be calibrated before formation of an image. This is because the wavefronts received and processed by the observational hardware have been corrupted by a variety of effects. These include (but are not exclusive to): the effects of transmission through the atmosphere, the imperfect details amplified electronic (digital) signal and transmission through the signal processing system, and the effects of formation of the cross-power spectra by a correlator. Calibration is the process of reversing these effects to arrive at corrected visibilities which resemble as closely as possible the visibilities that would have been measured in vacuum by a perfect system. The subject of this chapter is the determination of these effects by using the visibility data itself.
The HBS Measurement Equation
The relationship between the observed and ideal (desired) visibilities on the baseline between antennas i and j may be expressed by the Hamaker-Bregman-Sault Measurement Equation Hamaker, Bregman, & Sault (1996) [1] and Sault, Hamaker, Bregman (1996) [2] .
where \(\vec{V}_{ij}\) represents the observed visibility, a complex number representing the amplitude and phase of the correlated data from a pair of antennas in each sample time, per spectral channel. \(\vec{V}_{ij}^{\mathrm{~IDEAL}}\) represents the corresponding ideal visibilities, and \(J_{ij}\) represents the accumulation of all corruptions affecting baseline \(ij\). The visibilities are indicated as vectors spanning the four correlation combinations which can be formed from dual-polarization signals. These four correlations are related directly to the Stokes parameters which fully describe the radiation. The \(J_{ij}\) term is therefore a \(4\times4\) matrix. Most of the effects contained in \(J_{ij}\) (indeed, the most important of them) are antenna-based, i.e., they arise from measurable physical properties of (or above) individual antenna elements in a synthesis array. Thus, adequate calibration of an array of \(N_{ant}\) antennas forming \(N_{ant} (N_{ant}-1)/2\) baseline visibilities is usually achieved through the determination of only \(N_{ant}\) factors, such that \(J_{ij} = J_i \otimes J_j^{*}\). For the rest of this chapter, we will usually assume that \(J_{ij}\) is factorable in this way, unless otherwise noted.
As implied above, \(J_{ij}\) may also be factored into the sequence of specific corrupting effects, each having their own particular (relative) importance and physical origin, which determines their unique algebra. Including the most commonly considered effects, the Measurement Equation can be written:
where:
\(T_{ij}~=~\) Polarization-independent multiplicative effects introduced by the troposphere, such as opacity and path-length variation.
\(P_{ij}~=~\) Parallactic angle, which describes the orientation of the polarization coordinates on the plane of the sky. This term varies according to the type of the antenna mount.
\(E_{ij}~=~\) Effects introduced by properties of the optical components of the telescopes, such as the collecting area’s dependence on elevation.
\(D_{ij}~=~\) Instrumental polarization response. “D-terms” describe the polarization leakage between feeds (e.g. how much the R-polarized feed picked up L-polarized emission, and vice versa).
\(G_{ij}~=~\) Electronic gain response due to components in the signal path between the feed and the correlator. This complex gain term \(G_{ij}\) includes the scale factor for absolute flux density calibration, and may include phase and amplitude corrections due to changes in the atmosphere (in lieu of \(T_{ij}\)). These gains are polarization-dependent.
\(B_{ij}~=~\) Bandpass (frequency-dependent) response, such as that introduced by spectral filters in the electronic transmission system
\(M_{ij}~=~\) Baseline-based correlator (non-closing) errors. By definition, these are not factorable into antenna-based parts. Note that the terms are listed in the order in which they affect the incoming wavefront (\(G\) and \(B\) represent an arbitrary sequence of such terms depending upon the details of the particular electronic system). Note that \(M\) differs from all of the rest in that it is not antenna-based, and thus not factorable into terms for each antenna.As written above, the measurement equation is very general; not all observations will require treatment of all effects, depending upon the desired dynamic range. E.g., instrumental polarization calibration can usually be omitted when observing (only) total intensity using circular feeds. Ultimately, however, each of these effects occurs at some level, and a complete treatment will yield the most accurate calibration. Modern high-sensitivity instruments such as ALMA and JVLA will likely require a more general calibration treatment for similar observations with older arrays in order to reach the advertised dynamic ranges on strong sources.In practice, it is usually far too difficult to adequately measure most calibration effects absolutely (as if in the laboratory) for use in calibration. The effects are usually far too changeable. Instead, the calibration is achieved by making observations of calibrator sources on the appropriate timescales for the relevant effects, and solving the measurement equation for them using the fact that we have \(N_{ant}(N_{ant}-1)/2\) measurements and only \(N_{ant}\) factors to determine (except for \(M\) which is only sparingly used). Note: By partitioning the calibration factors into a series of consecutive effects, it might appear that the number of free parameters is some multiple of \(N_{ant}\), but the relative algebra and timescales of the different effects, as well as the multiplicity of observed polarizations and channels compensate, and it can be shown that the problem remains well-determined until, perhaps, the effects are direction-dependent within the field of view. Limited solvers for such effects are under study; the calibrater tool currently only handles effects which may be assumed constant within the field of view. Corrections for the primary beam are handled in the imager tool. Once determined, these terms are used to correct the visibilities measured for the scientific target. This procedure is known as cross-calibration (when only phase is considered, it is called phase-referencing).
The best calibrators are point sources at the phase center (constant visibility amplitude, zero phase), with sufficient flux density to determine the calibration factors with adequate SNR on the relevant timescale. The primary gain calibrator must be sufficiently close to the target on the sky so that its observations sample the same atmospheric effects. A bandpass calibrator usually must be sufficiently strong (or observed with sufficient duration) to provide adequate per-channel sensitivity for a useful calibration. In practice, several calibrators are usually observed, each with properties suitable for one or more of the required calibrations.Synthesis calibration is inherently a bootstrapping process. First, the dominant calibration term is determined, and then, using this result, more subtle effects are solved for, until the full set of required calibration terms is available for application to the target field. The solutions for each successive term are relative to the previous terms. Occasionally, when the several calibration terms are not sufficiently orthogonal, it is useful to re-solve for earlier types using the results for later types, in effect, reducing the effect of the later terms on the solution for earlier ones, and thus better isolating them. This idea is a generalization of the traditional concept of self-calibration, where initial imaging of the target source supplies the visibility model for a re-solve of the gain calibration (\(G\) or \(T\)). Iteration tends toward convergence to a statistically optimal image. In general, the quality of each calibration and of the source model are mutually dependent. In principle, as long as the solution for any calibration component (or the source model itself) is likely to improve substantially through the use of new information (provided by other improved solutions), it is worthwhile to continue this process.In practice, these concepts motivate certain patterns of calibration for different types of observation, and the calibrater tool in CASA is designed to accommodate these patterns in a general and flexible manner. For a spectral line total intensity observation, the pattern is usually:
Solve for \(G\) on the bandpass calibrator
Solve for \(B\) on the bandpass calibrator, using \(G\)
Solve for \(G\) on the primary gain (near-target) and flux density calibrators, using \(B\) solutions just obtained
Scale \(G\) solutions for the primary gain calibrator according to the flux density calibrator solutions
Apply \(G\) and \(B\) solutions to the target data
Image the calibrated target data
If opacity and gain curve information are relevant and available, these types are incorporated in each of the steps (in future, an actual solve for opacity from appropriate data may be folded into this process):
Solve for \(G\) on the bandpass calibrator, using \(T\) (opacity) and \(E\) (gain curve) solutions already derived.
Solve for \(B\) on the bandpass calibrator, using \(G\), \(T\) (opacity), and \(E\) (gain curve) solutions.
Solve for \(G\) on primary gain (near-target) and flux density calibrators, using \(B\), \(T\) (opacity), and \(E\) (gain curve) solutions.
Scale \(G\) solutions for the primary gain calibrator according to the flux density calibrator solutions
Apply \(T\) (opacity), \(E\) (gain curve), \(G\), and \(B\) solutions to the target data
Image the calibrated target data
For continuum polarimetry, the typical pattern is:
Solve for \(G\) on the polarization calibrator, using (analytical) \(P\) solutions.
Solve for \(D\) on the polarization calibrator, using \(P\) and \(G\) solutions.
Solve for \(G\) on primary gain and flux density calibrators, using \(P\) and \(D\) solutions.
Scale \(G\) solutions for the primary gain calibrator according to the flux density calibrator solutions.
Apply \(P\), \(D\), and \(G\) solutions to target data.
Image the calibrated target data.
For a spectro-polarimetry observation, these two examples would be folded together.In all cases the calibrator model must be adequate at each solve step. At high dynamic range and/or high resolution, many calibrators which are nominally assumed to be point sources become slightly resolved. If this has biased the calibration solutions, the offending calibrator may be imaged at any point in the process and the resulting model used to improve the calibration. Finally, if sufficiently strong, the target may be self-calibrated as well.
General Calibrater Mechanics
The calibrater tasks/tool are designed to solve and apply solutions for all of the solution types listed above (and more are in the works). This leads to a single basic sequence of execution for all solves, regardless of type:
Set the calibrator model visibilities
Select the visibility data which will be used to solve for a calibration type
Arrange to apply any already-known calibration types (the first time through, none may yet be available)
Arrange to solve for a specific calibration type, including specification of the solution timescale and other specifics
Execute the solve process
Repeat 1-4 for all required types, using each result, as it becomes available, in step 3, and perhaps repeating for some types to improve the solutions
By itself, this sequence doesn’t guarantee success; the data provided for the solve must have sufficient SNR on the appropriate timescale, and must provide sufficient leverage for the solution (e.g., D solutions require data taken over a sufficient range of parallactic angle in order to separate the source polarization contribution from the instrumental polarization).
Science Data Model¶
It was decided realtively early in the preparatory phase of ALMA and EVLA that the two projects would:
use the same data analysis software (CASA) and
use essentially the same archive data format, the Astronomy Science Data Model (ASDM), also referred to as the ALMA Science Data Model for ALMA data or Science Data Model (SDM) for VLA data.
The ASDM was developed to a first prototype by Francois Viallefond (Observatoire de Paris) as an extension and full generalisation of the MeasurementSet. The ASDM is superior to the MS w.r.t. the storage of observatory raw data in that it is capable of capturing the metadata of an interferometic or total-power dataset completely without any compromise including all data relevant for calibration and observatory administration.
Just like for the MS, one can think of the ASDM as a relational database. And both databases have in principle a very similar layout. However, while the MS has only 12 required Subtables, the ASDM uses typically 40 Subtables, and there are more optional ones.
ASDMs, however, are for data storage and data reduction should be done on the MeasurementSet (although when importing data through importasdm with option lazy=True the ASDM is restructured to resemble an MS).
For the implementation of the ASDM, (then) novel source code generation techniques were applied which permitted simultaneous implementation in Java and C++. As the actual representation of the data on disk, a hybrid format was chosen: all low-volume metadata is stored as XML files (one per table) while the bulk data is stored in a binary format (MIME) in so-called Binary Large Objects (BLOBs). In particular the Main table is stored as a series of BLOBs of a few GB each with lossless compression. This makes the ASDM more efficient as a bulk data format than the MS which stores the the DATA column of the Main table as one single monolithic file.
An up to date description of the tables of the ASDM is given in this pdf.
This tar file contains the XML Schema Definition (xds) files for all of the tables described in the associated ASDM Short Table Description. Use “tar xvfz 0asdmSchematas_v8Dec2020.tgz” to extract its contents.
This tar file contains the XML Schema Definition (xds) files for all of the enumerations used by the ASDM tables. Use “tar xvfz 0enumerationsSchematas.tgz” to extract its contents.
The binary data format is given in this pdf.
MeasurementSet Basics¶
Data is handled in CASA via the table system. In particular, visibility data are stored in a CASA table known as a MeasurementSet (MS). Details of the physical and logical MS structure are given below, but for our purposes here an MS is just a construct that contains the data. An MS can also store single dish data (as an auto-correlation-only data set), see “Single-dish data calibration and reduction”.
A full description of the MeasurementSet can be found here, and a description of the MS model column can be found in the Synthesis Calibration section.
Inside the Toolkit: MeasurementSets are handled in the ms tool. Import and export methods include ms.fromfits and ms.tofits.
NOTE: Images are handled through special image tables, although standard FITS I/O is also supported. Images and image data are described in “Dealing with Images”.
The headers of any FITS files can be displayed in the logger with the listfits task:
#listfits :: List the HDU and typical data rows of a fits file:
fitsfile = '' # Name of input fits file
More Information on how to access Visibility Data is provided in the “Data Examination and Editing” chapter.
Unless your data was previously processed by CASA, you will need to import it into CASA as an MS. Supported formats include some “standard” flavors of UVFITS, the VLA “Export” archive format, and most recently, the Astronomy Science Data Model (ASDM) format. These are described in “UV Data Import”.
Once in MeasurementSet form, your data can be accessed through various tools and tasks with a common interface. The most important of these is the data selection interface, which allows you to specify the subset of the data on which the tasks and tools will operate.
Under the Hood: Structure of the MeasurementSet
Inside the Toolkit: Generic CASA tables are handled in the tb tool. You have direct access to keywords, rows and columns of the tables with the methods of this tool.
It is not necessary that a casual CASA user know the specific details on how the data in the MS is stored and the contents of all the sub-tables. However, CASA docs occasionally refers to specific “columns” of the MS when describing the actions of various tasks, and thus we provide the following synopsis to familiarize the user with the necessary nomenclature.
All CASA data files, including MeasurementSets, are written into the current working directory by default, with each CASA table represented as a separate sub-directory. MS names therefore need only comply with UNIX file or directory naming conventions, and can be referred to from within CASA directly, or via full path names.
An MS consists of a MAIN table containing the visibility data and associated sub-tables containing auxiliary or secondary information. The tables are logical constructs, with contents located in the physical table.* files on disk. The MAIN table consists of the table.* files in the main directory of the MS-file itself, and the other tables are in the respective subdirectories. The various MS tables and sub-tables can be seen by listing the contents of the MS directory itself (e.g. using Unix ls), or via the browsetable task.
See figure 1 for an example of the contents of a MS directory. Or, from the casa prompt,
CASA <1>: ls ngc5921.ms #IPython system call: ls -F ngc5921.ms
ANTENNA POLARIZATION table.f1 table.f3_TSM1 table.f8
DATA_DESCRIPTION PROCESSOR table.f10 table.f4 table.f8_TSM1
FEED SORTED_TABLE table.f10_TSM1 table.f5 table.f9
FIELD SOURCE table.f11 table.f5_TSM1 table.f9_TSM1
FLAG_CMD SPECTRAL_WINDOW table.f11_TSM1 table.f6 table.info
HISTORY STATE table.f2 table.f6_TSM0 table.lock
OBSERVATION table.dat table.f2_TSM1 table.f7
POINTING table.f0 table.f3 table.f7_TSM1
NOTE: The MAIN table information is contained in the table.* files in this directory.
Each of the sub-table sub-directories contain their own table.dat and other files, e.g.
CASA <2>: ls ngc5921.ms/SOURCE #IPython system call: ls -F ngc5921.ms/SOURCE
table.dat table.f0 table.f0i table.info table.lock
Figure 1: The contents of a MeasurementSet. These tables compose a MeasurementSet named ngc5921.demo.ms on disk. This display is obtained by using the File\:Open menu in browsetable and left double-clicking on the ngc5921.demo.ms directory.
Each “row” in a table contains entries for a number of specified “columns”. For example, in the MAIN table of the MS, the original visibility data is contained in the DATA column — each “cell” contains a matrix of observed complex visibilities for that row at a single time stamp, for a single baseline in a single spectral window. The shape of the data matrix is given by the number of channels and the number of correlations (voltage-products) formed by the correlator for an array.
Table 1 lists the non-data columns of the MAIN table that are most important during a typical data reduction session. Table 2 at the bottom lists the key data columns of the MAIN table of an interferometer MS. The MS produced by fillers for specific instruments may insert special columns, such as ALMA_PHASE_CORR, ALMA_NO_PHAS_CORR and ALMA_PHAS_CORR_FLAG_ROW for ALMA data filled using the importasdm filler. These columns are visible in browsetable and are accessible from the toolkit in the ms tool (e.g. the ms.getdata method) and from the tb “table” tool (e.g. using tb.getcol).
NOTE: When you examine table entries for IDs such as FIELD_ID or DATA_DESC_ID, you will see 0-based numbers.
Parameter |
Contents |
---|---|
ANTENNA1 |
First antenna in baseline |
ANTENNA2 |
Second antenna in baseline |
FIELD_ID |
Field (source no.) identification |
DATA_DESC_ID |
Spectral window number, polarization identifier pair (IF no.) |
ARRAY_ID |
Subarray number |
OBSERVATION_ID |
Observation identification |
POLARIZATION_ID |
Polarization identification |
SCAN_NUMBER |
Scan number |
TIME |
Integration midpoint time |
UVW |
UVW coordinates |
Table 1: Common columns in the MAIN table of the MS.
The MS can contain a number of “scratch” columns, which are used to hold useful versions of other columns such as the data or weights for further processing. The most common scratch columns are:
CORRECTED_DATA — used to hold calibrated data for imaging or display;
MODEL_DATA — holds the Fourier inversion of a particular model image for calibration or imaging. This column is optional.
The creation and use of the scratch columns is generally done behind the scenes, but you should be aware that they are there (and when they are used).
Column |
Format |
Contents |
---|---|---|
DATA |
Complex(Nc, Nf) |
complex visibility data matrix (= ALMA_PHASE_CORR by default) |
FLAG |
Bool(Nc, Nf) |
cumulative data flags |
WEIGHT |
Float(Nc) |
weight for a row |
SIGMA |
Float(Nc) |
sigma for a row |
WEIGHT_SPECTRUM |
Float(Nc, Nf) |
individual weights for a data matrix |
SIGMA_SPECTRUM |
Float(Nc, Nf) |
individual sigmas for a data matrix |
ALMA_PHASE_CORR |
Complex(Nc, Nf) |
on-line phase corrected data (Not in VLA data) |
ALMA_NO_PHAS_CORR |
Bool(Nc, Nf) |
data that has not been phase corrected (Not in VLA data) |
ALMA_PHAS_CORR_FLAG_RO W |
Bool(Nc, Nf) |
flag to use phase-corrected data or not (not in VLA data) |
MODEL_DATA |
Complex(Nc, Nf) |
Scratch: created by calibrater or imager tools |
CORRECTED_DATA |
Complex(Nc, Nf) |
Scratch: created by calibrater or imager tools |
Table 2: Commonly accessed MAIN Table data-related columns. NOTE: The columns ALMA_PHASE_CORR, ALMA_NO_PHAS_CORR and ALMA_PHAS_CORR_FLAG_ROW are specific to ALMA data filled using the importasdm filler.
Data flags can be set in the MS, too. Whenever a flag is set, the data will be ignored in all processing steps but not physically deleted from the MS. The flags are channel-based and stored in the MS FLAG subtable. Backups can be stored in the MS.flagversions file that can be accessed via the flagmanager.
The most recent specification for the MS is MeasurementSet definition version 2.0.
MeasurementSet v2¶
The MeasurementSet version 2 [3], is a database designed to hold radioastronomical data to be calibrated following the MeasurementEquation approach by Hamaker, Bregman, and Sault (1996).
Since its publication, the MeasurementSet (MS) design has been implemented by several software development groups, among them the CASA team and, e.g., the European VLBI Network team. CASA has also adopted the MeasurementEquation as its fundamental calibration scheme and has thus embraced the MS as its native way to store radio observations. With CASA becoming the designated analysis package for ALMA and the VLA, this means that the MS is now the default way of storing ALMA and VLA data during the actual analysis.
The ALMA and VLA raw data format, however, is not the MS but the so-called Astronomy Science Data Model (ASDM), also referred to as the ALMA Science Data Model for ALMA, and the Science Data Model (SDM) for the VLA. The ALMA and VLA archives hence do not store data in MS format but in ASDM format, and when a CASA user starts to work with this data, the first step has to be the import of the ASDM into the CASA MS format.
The MS is effectively a relational database which on the one hand tries to permit the storage of all imaginable radio (interferometric, single-dish) data with corresponding metadata, and on the other hand ventures to be storage-space and data-maintenance efficient by avoiding data redundancy.
The universality is achieved by offering many optional parts in the format which cover most imaginable use cases in radio astronomy. So a simple, few-antenna interferometer observing a simple object with time-independent position at just a single frequency can store its data using a small sub-set of the format while a large interferometer with antennas on time-dependent locations, observing many objects in rapid succession with time-dependent source positions using a complex, time-dependent spectral setup etc., can equally use the MS to store its data albeit using a larger subset of the possibilities of the MS.
The non-redundance of the format is achieved by simply following the standard approach of relational databases which is to put repeating pieces of information into separate database tables, the Subtables, and replacing them in the main body of the data base, the Main table, by references to the Subtables. In the case of the MS this happens in two layers of Subtables with the first layer being referenced by the Main table and the second layer being referenced by the first layer. I.e., there are some Subtables which reference other Subtables.
The Subtable referencing mechanism is defined in the original design. It works either via the line numbers of the individual Subtable ,this implies that the reference is a zero-based integer and that the removal of a line in such a Subtable requires reindexing in the referencing table(s), or via explicit references to an index column in the Subtable ,the latter is much less common.
These design principles lead to a format which puts the bulk of the data ,the interferometric visibilities and/or the single-dish total-power measurements with their timestamps, into a Main Table , and most of the metadata in the two layers of Subtables.
In the CASA MS implementation, the individual Tables are all stored in the CASA Table format, i.e. they are actually not single files on disk but directories containing several files, essentially one for each column of the table. So the entire MS is also not a single file (like, e.g., in the FITS IDI format) but a whole directory tree. For transport, the MS typically has to be turned into a single file by using the command “tar”.
The Main Table contains the radio data initially in a column called DATA (interferometric data) or FLOAT_DATA (pure single-dish data). One of these two columns always has to be present.
When a calibration is applied to the DATA column, a CORRECTED_DATA column is created to contain the calibrated data leaving the original data untouched. Furthermore, a MODEL_DATA column can be required to store expectation values for the emission of calibration sources.
For large datasets these bulk data columns can require large amounts of disk space and access to them may be slow. To mitigate these problems, the CASA team is working on making the columns “virtual” as much as possible, i.e. replacing the CORRECTED_DATA and MODEL_DATA columns by parameterised versions calculated on-the-fly.
In the case of the virtual MODEL_DATA column, this is essentially a model image which is stored with the MS and converted on-the-fly to visibilities.
In the case of the virtual CORRECTED_DATA column, this is a so-called “Cal Library” which permits to calibrate the data in the DATA column on-the-fly and make the results available as if they were stored in a standard table column.
Finally, a major case of data redundance for ALMA and VLA data is of course the fact that the raw data arrive at the user in ASDM format but then have to be translated into MS format which creates a completely redundant copy of all raw data without any gain for the user. This problem was addressed by introducing the so-called “lazy” import of ASDM data. The development is not yet completely finished but is already available for ALMA interferometric data. The idea here is to also make the DATA column virtual and perform the translation from the ASDM format on-the-fly. This typically shrinks the MS by a factor 30 in data volume. Of course the ASDM raw data has to be kept on disk for access. Access speeds to a virtual DATA column are essentially the same as to a non-virtual one. They may even be a little faster since the ASDM data is better compressed.
MS v2.0 Layout¶
CASA uses the MeasurementSet Version 2 (A.J. Kemball and M.H. Wieringa, eds., 2000) as the internal working data format. The MeasurementSet set was orignially defined in AIPS++ Note 191 (Wieringa and Cornwell 1996). Reproduced below is the table structrue for the MeasurementSet as used by CASA.
There is a MAIN table containing a number of data columns and keys into various subtables. There is at most one of each subtable. The subtables are stored as keywords of the MS, and all defined sub-tables are tabulated below. Optional sub-tables are shown in italics and in parentheses.
Subtables
Table |
Contents |
Keys |
---|---|---|
ANTENNA |
Antenna characteristics |
ANTENNA_ID |
DATA_DESCRIPTION |
Data description |
DATA_DESC_ID |
(DOPPLER) |
Doppler tracking |
DOPPLER_ID, SOURCE_ID |
FEED |
Feed characteristics |
FEED_ID, ANTENNA_ID, TIME, SPECTRAL_WINDOW_ID |
FIELD |
Field position |
FIELD_ID |
FLAG_CMD |
Flag commands |
TIME |
(FREQ_OFFSET) |
Frequency offset information |
FEED_ID, ANTENNAn, FEED_ID, TIME, SPECTRAL_WINDOW_ID |
HISTORY |
History information |
OBSERVATION_ID, TIME |
OBSERVATION |
Observer, Schedule, etc |
OBSERVATION_ID |
POINTING |
Pointing information |
ANTENNA_ID, TIME |
POLARIZATION |
Polarization setup |
POLARIZATION_ID |
PROCESSOR |
Processor information |
PROCESSOR_ID |
(SOURCE) |
Source information |
SOURCE_ID, SPECTRAL_WINDOW_ID, TIME |
SPECTRAL_WINDOW |
Spectral window setups |
SPECTRAL_WINDOW_ID |
STATE |
State information |
STATE_ID |
(SYSCAL) |
System calibration characteristics |
FEED_ID, ANTENNA_ID, TIME, SPECTRAL_WINDOW_ID |
(WEATHER) |
Weather info for each antenna |
ANTENNA_ID, TIME |
Note that there are two types of subtables. For the first, simpler type, the key (ID) is the row number in the subtable. Examples are FIELD, SPECTRAL_WINDOW, OBSERVATION and PROCESSOR. For the second, the key is a collection of parameters, usually including TIME. Examples are FEED, (SOURCE), (SYSCAL), and (WEATHER).
Note that all optional columns are indicated in italics and in parentheses.
MAIN table: Data, Coordinates and Flags¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Keywords |
||||
MS_VERSION |
Float |
MS format version |
||
(SORT_COLUMNS) |
String |
Sort columns |
||
(SORT_ORDER) |
String |
Sort order |
||
Key |
||||
TIME |
Double |
s |
EPOCH |
Integration midpoint |
(TIME_EXTRA_PREC) |
Double |
s |
extra TIME precision |
|
ANTENNA1 |
Int |
First antenna |
||
ANTENNA2 |
Int |
Second antenna |
||
(ANTENNA3) |
Int |
Third antenna |
||
FEED1 |
Int |
Feed on ANTENNA1 |
||
FEED2 |
Int |
Feed on ANTENNA2 |
||
(FEED3) |
Int |
Feed on ANTENNA3 |
||
DATA_DESC_ID |
Int |
Data desc. id. |
||
PROCESSOR_ID |
Int |
Processor id. |
||
(PHASE_ID) |
Int |
Phase id. |
||
FIELD_ID |
Int |
Field id. |
||
Non-key attributes |
||||
INTERVAL |
Double |
s |
Sampling interval |
|
EXPOSURE |
Double |
s |
The effective integration time |
|
TIME_CENTROID |
Double |
s |
EPOCH |
Time centroid |
(PULSAR_BIN) |
Int |
Pulsar bin number |
||
(PULSAR_GATE_ID) |
Int |
Pulsar gate id. |
||
SCAN_NUMBER |
Int |
Scan number |
||
ARRAY_ID |
Int |
Subarray number |
||
OBSERVATION_ID |
Int |
Observation id. |
||
STATE_ID |
Int |
State id. |
||
(BASELINE_REF) |
Bool |
Reference antenna |
||
UVW |
Double(3) |
m |
UVW |
UVW coordinates |
(UVW2) |
Double(3) |
m |
UVW |
UVW (baseline 2) |
Data |
||||
(DATA) |
Complex(Nc, Nf) |
Complex visibility matrix (synthesis arrays) |
||
(FLOAT_DATA) |
Float(Nc, Nf) |
Float data matrix (single dish) |
||
(VIDEO_POINT) |
Complex(Nc) |
Video point |
||
(LAG_DATA) |
Complex(Nc, Nl) |
Correlation function |
||
SIGMA |
Float(Nc) |
Estimated rms noise for single channel |
||
(SIGMA_SPECTRUM) |
Float(Nc, Nf*) |
Estimated rms noise |
||
WEIGHT |
Float(Nc) |
Weight for whole data matrix |
||
(WEIGHT_SPECTRUM) |
Float(Nc, Nf*) |
Weight for each channel |
||
Flag information |
||||
FLAG |
Bool(Nc, Nf*) |
Cumulative data flags |
||
FLAG_CATEGORY |
Bool(Nc, Nf*, Ncat) |
Flag categories |
||
FLAG_ROW |
Bool |
The row flag |
MS_VERSION - The MeasurementSet format revision number, expressed as \({major}_{revision}\) \({minor}_{revision}\). This version is 2.0.
SORT_COLUMNS - Sort indices, in the form \({index}_1\) \({index}_2\) \(\cdots\), for the underlying MS. A string containing “NONE” reflects no sort order. An example might be SORT_COLUMNS=”TIME ANTENNA1 ANTENNA2”, to indicate sorting in in time-baseline order.
SORT_ORDER - Sort order as either “ASCENDING” or “DESCENDING”.
TIME - Mid-point (not centroid) of data interval. Time is provided in Modified Julian Date. The CASA/casacore reference epoch (0 time) for timestamps in MeasurementSets is the MJD epoch: 1858/11/17.
TIME_EXTRA_PREC - Extra time precision.
ANTENNA n* - Antenna number (≥ 0), and a direct index into the ANTENNA sub-table rownr. For n > 2, triple-product data are implied.
FEED n* - Feed number ≥0). For n> 2, triple-product data are implied.
DATA_DESC_ID - Data description identifier (≥0), and a direct index into the DATA_DESCRIPTION sub-table rownr.
PROCESSOR_ID - Processor indentifier (≥0), and a direct index into the PROCESSOR sub-table rownr.
PHASE_ID - Switching phase identifier (≥0)
FIELD_ID - Field identifier (≥0).
INTERVAL - Data sampling interval. This is the nominal data interval and does not include the effects of bad data or partial integration.
EXPOSURE - Effective data interval, including bad data and partial averaging.
PULSAR_BIN - Pulsar bin number for the data record. Pulsar data may be measured for a limited number of pulse phase bins. The pulse phase bins are described in the PULSAR sub-table and indexed by this bin number.
PULSAR_GATE_ID - Pulsar gate identifier (≥0), and a direct index into the PULSAR_GATE sub-table rownr.
SCAN_NUMBER - Arbitrary scan number to identify data taken in the same logical scan. Not required to be unique.
ARRAY_ID - Subarray identifier (≥0), which identifies data in separate subarrays.
OBSERVATION_ID - Observation identifier (≥0), which identifies data from separate observations.
STATE_ID - State identifier (≥0), which identifies information relating to active reference signals or loads. BASELINE_REF - Flag to indicate the original correlator reference antenna for baseline-based correlators (True for ANTENNA1; False for ANTENNA2).
UVW - uvw coordinates for the baseline from ANTENNE2 to ANTENNA1, i.e. the baseline is equal to the difference POSITION2 - POSITION1. The UVW given are for the TIME_CENTROID, and correspond in general to the reference type for the PHASE_DIR of the relevant field. I.e. J2000 if the phase reference direction is given in J2000 coordinates. However, any known reference is valid. Note that the choice of baseline direction and UVW definition (W towards source direction; V in plane through source and system’s pole; U in direction of increasing longitude coordinate) also determines the sign of the phase of the recorded data.
UVW2 - uvw coordinates for the baseline from ANTENNE3 to ANTENNA1 (triple-product data only), i.e. the baseline is equal to the difference POSITION3 - POSITION1. The UVW given are for the TIME_CENTROID, and correspond in general to the reference type for the PHASE_DIR of the relevant field. I.e. J2000 if the phase reference direction is given in J2000 coordinates. However, any known reference is valid. Note that the choice of baseline direction and UVW definition (W towards source direction; V in plane through source and system’s pole; U in direction of increasing longitude coordinate) also determines the sign of the phase of the recorded data.
DATA, FLOAT_DATA, LAG_DATA - At least one of these columns should be present in a given MeasurementSet. In special cases one or more could be present (e.g., single dish data used in synthesis imaging or a mix of auto and crosscorrelations on a multi-feed single dish). If only correlation functions are stored in the MS, then Nf* is the maximum number of lags (Nl) specified in the LAG table for this LAG_ID. If both correlation functions and frequency spectra are stored in the same MS, then Nf* is the number of frequency channels, and the weight information refers to the frequency spectra only. The units for these columns (eg. ‘Jy’) specify whether the data are in flux density units or correlation coefficients.
VIDEO_POINT - The video point for the spectrum, to allow the full reverse transform.
SIGMA - The estimated rms noise for a single channel, for each correlator.
SIGMA_SPECTRUM - The estimated rms noise for each channel.
WEIGHT - The weight for the whole data matrix for each correlator, as assigned by the correlator or processor.
WEIGHT_SPECTRUM - The weight for each channel in the data matrix, as assigned by the correlator or processor. The weight spectrum should be used in preference to the WEIGHT, when available.
FLAG - An array of Boolean values with the same shape as DATA (see the DATA item above) representing the cumulative flags applying to this data matrix, as specified in FLAG_CATEGORY. Data are flagged bad if the FLAG array element is True.
FLAG_CATEGORY - An array of flag matrices with the same shape as DATA, but indexed by category. The category identifiers are specified by a keyword CATEGORY, containing an array of string identifiers, attached to the FLAG_CATEGORY column and thus shared by all rows in the MeasurementSet. The cumulative effect of these flags is reflected in column FLAG. Data are flagged bad if the FLAG array element is True. See Section 3.1.8 for further details.
FLAG_ROW - True if the entire row is flagged.
ANTENNA: Antenna Characteristics¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Data |
||||
NAME |
String |
Antenna name |
||
STATION |
String |
Station name |
||
TYPE |
String |
Antenna type |
||
MOUNT |
String |
Mount type:alt-az, equatorial, X-Y, orbiting, bizarre |
||
POSITION |
Double(3) |
m |
POSITION |
Antenna X,Y,Z phase reference positions |
OFFSET |
Double(3) |
m |
POSITION |
Axes offset of mount to FEED REFERENCE point |
DISH_DIAMETER |
Double |
m |
Diameter of dish |
|
(ORBIT_ID) |
Int |
Orbit id. |
||
(MEAN_ORBIT) |
Double(6) |
Mean Keplerian elements |
||
(PHASED_ARRAY_ID) |
Int |
Phased array id. |
||
Flag information |
||||
FLAG_ROW |
Bool |
Row flag |
NAME - Antenna name (e.g. “NRAO_140”)
STATION - Station name (e.g. “GREENBANK”)
TYPE - Antenna type. Reserved keywords include: (“GROUND-BASED” - conventional antennas; “SPACE-BASED” - orbiting antennas; “TRACKING-STN” - tracking stations).
MOUNT - Mount type of the antenna. Reserved keywords include: (“EQUATORIAL” - equatorial mount; “ALT-AZ” - azimuth-elevation mount; “X-Y” - x-y mount; “SPACE-HALCA” - specific orientation model.)
POSITION - In a right-handed frame, X towards the intersection of the equator and the Greenwich meridian, Z towards the pole. The exact frame should be specified in the MEASURE_REFERENCE keyword (ITRF or WGS84). The reference point is the point on the az or ha axis closest to the el or dec axis.
OFFSET - Axes offset of mount to feed reference point.
DISH_DIAMETER - Nominal diameter of dish, as opposed to the effective diameter.
ORBIT_ID - Orbit identifier. Index used in ORBIT sub-table if ANTENNA_TYPE is “SPACE_BASED”.
MEAN_ORBIT - Mean Keplerian orbital elements, using the standard convention (Flatters 1998):
0: Semi-major axis of orbit (a) in m.
1: Ellipticity of orbit (e).
2: Inclination of orbit to the celestial equator (i) in deg.
3: Right ascension of the ascending node (Ω) in deg.
4: Argument of perigee (ω ) in deg.
5: Mean anomaly (M) in deg.
PHASED_ARRAY_ID - Phased array identifier. Points to a PHASED_ARRAY sub-table which points back to multiple entries in the ANTENNA sub-table and contains information on how they are combined.
FLAG_ROW - Boolean flag to indicate the validity of this entry. Set to True for an invalid row. This does not imply any flagging of the data in MAIN, but is necessary as the ANTENNA index in MAIN points directly into the ANTENNA sub-table. Thus FLAG_ROW can be used to delete an antenna entry without re-ordering the ANTENNA indices throughout the MS.
DATA_DESCRIPTION: Data Description Table¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Data |
||||
SPECTRAL_WINDOW_ID |
Int |
Spectral window id. |
||
POLARIZATION_ID |
Int |
Polarization id. |
||
(LAG_ID) |
Int |
Lag fn. id. |
||
Flags |
||||
FLAG_ROW |
Bool |
Row flag. |
SPECTRAL_WINDOW_ID - Spectral window identifier.
POLARIZATION_ID - Polarization identifier (≥0); direct index into the POLARIZATION sub-table.
LAG_ID - Lag function identifier (≥0), and a direct index into the LAG sub-table rownr.
FLAG_ROW - True if the row does not contain valid data; does not imply flagging in MAIN.
DOPPLER: Doppler Tracking Information¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Key |
||||
DOPPLER_ID |
Int |
Doppler tracking id. |
||
SOURCE_ID |
Int |
Source id. |
||
Data |
||||
TRANSITION_ID |
Int |
Transition id. |
||
VELDEF |
Double |
m/s |
Doppler |
Velocity definition of Doppler shift. |
DOPPLER_ID - Doppler identifier, as used in the SPECTRAL_WINDOW sub-table.
SOURCE_ID - Source identifier (as used in the SOURCE sub-table).
TRANSITION_ID - This index selects the appropriate line from the list of transitions stored for each SOURCE_ID in the SOURCE table.
VELDEF - Velocity definition of the Doppler shift, e.g., RADIO or OPTICAL velocity in m/s.
FEED: Feed Characteristics¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Key |
||||
ANTENNA_ID |
Int |
Antenna id |
||
FEED_ID |
Int |
Feed id |
||
SPECTRAL_WINDOW_ID |
Int |
Spectral window id. |
||
TIME |
Double |
s |
EPOCH |
Interval midpoint |
INTERVAL |
Double |
s |
Time interval |
|
Data description |
||||
NUM_RECEPTORS |
Int |
receptors on this feed |
||
Data |
||||
BEAM_ID |
Int |
Beam model |
||
BEAM_OFFSET |
Double(2, NUM_RECEPTORS) |
rad |
DIRECTION |
Beam position offset (on sky but in antenna reference frame). |
(FOCUS_LENGTH) |
Double |
m |
Focus length |
|
(PHASED_FEED_ID) |
Int |
Phased feed |
||
POLARIZATION_TYPE |
String (NUM_RECEPTORS) |
Type of polarization to which a given RECEPTOR responds. |
||
POL_RESPONSE |
Complex (NUM_RECEPTORS, NUM_RECEPTORS) |
Feed polzn. response |
||
POSITION |
Double(3) |
m |
POSITION |
Position of feed relative to feed reference position for this antenna |
RECEPTOR_ANGLE |
Double (NUM_RECEPTORS) |
rad |
The reference angle for polarization. |
Notes: A feed is a collecting element on an antenna, such as a single horn, that shares joint physical properties and makes sense to calibrate as a single entity. It is an abstraction of a generic antenna feed and is considered to have one or more RECEPTORs that respond to different polarization states. A FEED may have a time-variable beam and polarization response. Feeds are numbered from 0 on each separate antenna for each SPECTRAL_WINDOW_ID. Consequently, FEED_ID should be non-zero only in the case of feed arrays, i.e. multiple, simultaneous beams on the sky at the same frequency and polarization.
ANTENNA_ID - Antenna number, as indexed from ANTENNAn in MAIN.
FEED_ID - Feed identifier, as indexed from FEEDn in MAIN.
SPECTRAL_WINDOW_ID - Spectral window identifier. A value of -1 indicates the row is valid for all spectral windows.
TIME - Mid-point of time interval for which the feed parameters in this row are valid. The same Measure reference used for the TIME column in MAIN must be used.
INTERVAL - Time interval.
NUM_RECEPTORS - Number of receptors on this feed. See POLARIZATION_TYPE for further information.
BEAM_ID - Beam identifier. Points to an optional BEAM sub-table defining the primary beam and polarization response for this FEED. A value of -1 indicates that no associated beam response is defined.
BEAM_OFFSET - Beam position offset, as defined on the sky but in the antenna reference frame.
FOCUS_LENGTH - Focus length. As defined along the optical axis of the antenna.
PHASED_FEED_ID - Phased feed identifier. Points to a PHASED_FEED sub-table which in turn points back to multiple entries in the FEED table, and specifies the manner in which they are combined.
POLARIZATION_TYPE - Polarization type to which each receptor responds (e.g. “R”,”L”,”X” or “Y”). This is the receptor polarization type as recorded in the final correlated data (e.g. “RR”); i.e. as measured after all polarization combiners.
POL_RESPONSE - Polarization response at the center of the beam for this feed. Expressed in a linearly polarized basis ($ \textbf{\vec e_x}`$, $ :nbsphinx-math:textbf{vec e_y}`$) using the IEEE convention.
POSITION - Offset of feed relative to the feed reference position for this antenna (see ANTENNA sub-table).
RECEPTOR_ANGLE - Polarization reference angle. Converts into parallactic angle in the sky domain.
FIELD: Field Positions for Each Source¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Key |
||||
Data |
||||
NAME |
String |
Name of field |
||
CODE |
String |
Special characteristics of field |
||
TIME |
Double |
s |
EPOCH |
Time origin for the directions and rates |
NUM_POLY |
Int |
Series order |
||
DELAY_DIR |
Double(2, NUM_POLY+1) |
rad |
DIRECTION |
Direction of delay center. |
PHASE_DIR |
Double(2, NUM_POLY+1) |
rad |
DIRECTION |
Phase center. |
REFERENCE_DIR |
Double(2, NUM_POLY+1) |
rad |
DIRECTION |
Reference center |
SOURCE_ID |
Int |
Index in Source table |
||
(EPHEMERIS_ID) |
Int |
Ephemeris id. |
||
Flags |
||||
FLAG_ROW |
Bool |
Row flag |
NAME - Field name; user specified.
CODE - Field code indicating special characteristics of the field; user specified.
TIME - Time reference for the directions and rates. Required to use the same TIME Measure reference as in MAIN.
NUM_POLY - Series order for the *_DIR columns.
DELAY_DIR - Direction of delay center; can be expressed as a polynomial in time. Final result converted to the defined Direction Measure type.
PHASE_DIR - Direction of phase center; can be expressed as a polynomial in time. Final result converted to the defined Direction Measure type.
REFERENCE_DIR - Reference center; can be expressed as a polynomial in time. Final result converted to the defined Direction Measure type. Used in single-dish to record the associated reference direction if position-switching has already been applied. For interferometric data, this is the original correlated field center, and may equal DELAY_DIR or PHASE_DIR.
SOURCE_ID - Points to an entry in the optional SOURCE subtable, a value of -1 indicates there is no corresponding source defined.
EPHEMERIS_ID - Points to an entry in the EPHEMERIS sub-table, which defines the ephemeris used to compute the field position. Useful for moving, near-field objects, where the ephemeris may be revised over time.
FLAG_ROW - True if data in this row are invalid, else False. Does not imply flagging in MAIN.
FLAG_CMD: Flag Commands¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Key |
||||
TIME |
Double |
s |
EPOCH |
Mid-point of interval |
INTERVAL |
Double |
s |
Time interval |
|
Data |
||||
TYPE |
String |
FLAG or UNFLAG |
||
REASON |
String |
Flag reason |
||
LEVEL |
Int |
Flag level |
||
SEVERITY |
Int |
Severity code |
||
APPLIED |
Bool |
True if applied in MAIN |
||
COMMAND |
String |
Flag command |
TIME - Mid-point of the time interval to which this flagging command applies. Required to use the same TIME Measure reference as used in MAIN.
INTERVAL - Time interval.
TYPE - Type of flag command, representing either a flagging (“FLAG”) or un-flagging (“UNFLAG”) operation.
REASON - Flag reason; user specified.
LEVEL - Flag level (≥0); reflects different revisions of flags which have the same REASON.
SEVERITY - Severity code for the flag, on a scale of 0-10 in order of increasing severity; user specified.
APPLIED - True if this flag has been applied to MAIN, and update in FLAG_CATEGORY and FLAG. False if this flag has not been applied to MAIN.
COMMAND - Global flag command, expressed in the standard syntax for data selection, as adopted within the project as a whole.
FREQ_OFFSET: Frequency Offset Information¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Key |
||||
ANTENNA1 |
Int |
Antenna 1. |
||
ANTENNA2 |
Int |
Antenna 2. |
||
FEED_ID |
Int |
Feed id. |
||
SPECTRAL_WINDOW_ID |
Int |
Spectral window id. |
||
TIME |
Double |
s |
EPOCH |
Interval midpoint |
INTERVAL |
Double |
s |
Time interval |
|
Data |
||||
OFFSET |
Double |
Hz |
Frequency offset |
ANTENNA n* - Antenna identifier, as indexed from ANTENNAn in MAIN.
FEED_ID - Antenna identifier, as indexed from FEEDn in MAIN.
SPECTRAL_WINDOW_ID - Spectral window identifier.
TIME - Mid-point of the time interval for which this offset is valid. Required to use the same TIME Measure reference as used in MAIN.
INTERVAL - Time interval.
OFFSET - Frequency offset to be added to the frequency axis for this spectral window, as defined in the SPECTRAL_WINDOW sub-table. Required to have the same Frequency Measure reference as CHAN_FREQ in that table.
HISTORY: History Information¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Key |
||||
TIME |
Double |
s |
EPOCH |
Time-stamp for message |
OBSERVATION_ID |
Int |
Points to OBSERVATION table |
||
Data |
||||
MESSAGE |
String |
Log message |
||
PRIORITY |
String |
Message priority |
||
ORIGIN |
String |
Code origin |
||
OBJECT_ID |
String |
Originating ObjectID |
||
APPLICATION |
String |
Application name |
||
CLI_COMMAND |
String(*) |
CLI command sequence |
||
APP_PARAMS |
String(*) |
Application paramters |
TIME - Time-stamp for the history record. Required to have the same TIME Measure reference as used in MAIN.
OBSERVATION_ID - Observation identifier (see the OBSERVATION table)
MESSAGE - Log message.
PRIORITY - Message priority, with allowed types: (“DEBUGGING”, “WARN”, “NORMAL”, or “SEVERE”).
ORIGIN - Source code origin from which message originated.
OBJECT_ID - Originating ObjectID, if available, else blank.
APPLICATION - Application name.
CLI_COMMAND - CLI command sequence invoking the application.
APP_PARAMS - Application parameter values, in the adopted project-wide format.
OBSERVATION: Observation Information¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Data |
||||
TELESCOPE_NAME |
String |
Telescope name |
||
TIME_RANGE |
Double(2) |
s |
EPOCH |
Start, end times |
OBSERVER |
String |
Name of observer(s) |
||
LOG |
String(*) |
Observing log |
||
SCHEDULE_TYPE |
String |
Schedule type |
||
SCHEDULE |
String(*) |
Project schedule |
||
PROJECT |
String |
Project identification string. |
||
RELEASE_DATE |
Double |
s |
EPOCH |
Target release date |
Flags |
||||
FLAG_ROW |
Bool |
Row flag. |
TELESCOPE_NAME - Telescope name (e.g. “WSRT” or “VLBA”).
TIME_RANGE - The start and end times of the overall observing period spanned by the actual recorded data in MAIN. Required to use the same TIME Measure reference as in MAIN. Time is provided in Modified Julian Date. The CASA/casacore reference epoch (0 time) for timestamps in MeasurementSets is the MJD epoch: 1858/11/17.
OBSERVER - The name(s) of the observer(s).
LOG - The observing log, as supplied by the telescope or instrument.
SCHEDULE_TYPE - The schedule type, with current reserved types (“VLBA-CRD”, “VEX”, “WSRT”, “ATNF”).
SCHEDULE - Unmodified schedule file, of the type specified, and as used by the instrument.
PROJECT - Project code (e.g. “BD46”)
RELEASE_DATE - Project release date. This is the date on which the data may become public.
FLAG_ROW - Row flag. True if data in this row is invalid, but does not imply any flagging in MAIN.
POINTING: Antenna Pointing Information¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Key |
||||
ANTENNA_ID |
Int |
Antenna id. |
||
TIME |
Double |
s |
EPOCH |
Interval midpoint |
INTERVAL |
Double |
s |
Time interval |
|
Data |
||||
NAME |
String |
Pointing position desc. |
||
NUM_POLY |
Int |
Series order |
||
TIME_ORIGIN |
Double |
s |
EPOCH |
Origin for the polynomial |
DIRECTION |
Double(2, NUM_POLY+1) |
rad |
DIRECTION |
Antenna pointing direction |
TARGET |
Double(2, NUM_POLY+1) |
rad |
DIRECTION |
Target direction |
(POINTING_OFFSET) |
Double(2, NUM_POLY+1) |
rad |
DIRECTION |
A priori pointing correction |
(SOURCE_OFFSET) |
Double(2, NUM_POLY+1) |
rad |
DIRECTION |
Offset from source |
(ENCODER) |
Double(2) |
rad |
DIRECTION |
Encoder values |
(POINTING_MODEL_ID) |
Int |
Pointing model id. |
||
TRACKING |
Bool |
True if on-position |
||
(ON_SOURCE) |
Bool |
True if on-source |
||
(OVER_THE_TOP) |
Bool |
True if over the top |
ANTENNA_ID - Antenna identifier, as specified by ANTENNAn in MAIN.
TIME - Mid-point of the time interval for which the information in this row is valid. Required to use the same TIME Measure reference as in MAIN.
INTERVAL - Time interval.
NAME - Pointing direction name; user specified.
NUM_POLY - Series order for the polynomial expressions in DIRECTION and POINTING_OFFSET.
TIME_ORIGIN - Time origin for the polynomial expansions.
DIRECTION - Antenna pointing direction, optionally expressed as polynomial coefficients. The final result is interpreted as a Direction Measure using the specified Measure reference.
TARGET - Target pointing direction, optionally expressed as polynomial coefficients. The final result is interpreted as a Direction Measure using the specified Measure reference. This is the true expected position of the source, including all coordinate corrections such as precession, nutation etc.
POINTING_OFFSET - The a priori pointing corrections applied by the telescope in pointing to the DIRECTION position, optionally expressed as polynomial coefficients. The final result is interpreted as a Direction Measure using the specified Measure reference.
SOURCE_OFFSET - The commanded offset from the source position, if offset pointing is being used.
ENCODER - The current encoder values on the primary axes of the mount type for the antenna, expressed as a Direction Measure.
TRACKING - True if tracking the nominal pointing position.
ON-SOURCE - True if the nominal pointing direction coincides with the source, i.e. offset-pointing is not being used.
OVER-THE-TOP - True if the antenna was driven to this position “over the top” (az-el mount).
POLARIZATION: Polarization Setup Information¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Data description columns |
||||
NUM_CORR |
Int |
correlations |
||
Data |
||||
CORR_TYPE |
Int(NUM_CORR) |
Polarization of correlation |
||
CORR_PRODUCT |
Int(2, NUM_CORR) |
Receptor cross-products |
||
Flags |
||||
FLAG_ROW |
Bool |
Row flag |
NUM_CORR- The number of correlation polarization products. For example, for (RR) this value would be 1, for (RR, LL) it would be 2, and for (XX,YY,XY,YX) it would be 4, etc.
CORR_TYPE - An integer for each correlation product indicating the Stokes type as defined in the Stokes class enumeration.
CORR_PRODUCT - Pair of integers for each correlation product, specifying the receptors from which the signal originated. The receptor polarization is defined in the POLARIZATION_TYPE column in the FEED table. An example would be (0,0), (0,1), (1,0), (1,1) to specify all correlations between two receptors.
FLAG_ROW - Row flag. True is the data in this row are not valid, but does not imply the flagging of any DATA in MAIN.
PROCESSOR: Processor Information¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Data |
||||
TYPE |
String |
Processor type |
||
SUB_TYPE |
String |
Processor sub-type |
||
TYPE_ID |
Int |
Processor type id. |
||
MODE_ID |
Int |
Processor mode id. |
||
(PASS_ID) |
Int |
Processor pass number |
||
Flags |
||||
FLAG_ROW |
Bool |
Row flag |
TYPE - Processor type; reserved keywords include (“CORRELATOR” - interferometric correlator; “SPECTROMETER” - single-dish correlator; “RADIOMETER” - generic detector/integrator; “PULSAR-TIMER” - pulsar timing device).
SUB_TYPE - Processor sub-type, e.g. “GBT” or “JIVE”.
TYPE_ID - Index used in a specialized sub-table named as subtype_type, which contains time-independent processor information applicable to the current data record (e.g. a JIVE_CORRELATOR sub-table). Time-dependent information for each device family is contained in other tables, dependent on the device type.
MODE_ID - Index used in a specialized sub-table named as subtype_type_mode, containing information on the processor mode applicable to the current data record. (e.g. a GBT_SPECTROMETER_MODE sub-table).
PASS_ID - Pass identifier; this is used to distinguish data records produced by multiple passes through the same device, where this is possible (e.g. VLBI correlators). Used as an index into the associated table containing pass information.
FLAG_ROW - Row flag. True if data in the row is not valid, but does not imply flagging in MAIN.
SOURCE: Source Information¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Key |
||||
SOURCE_ID |
Int |
Source id |
||
TIME |
Double |
s |
EPOCH |
Midpoint of time for which this set of parameters is accurate |
INTERVAL |
Double |
s |
Interval |
|
SPECTRAL_WINDOW_ID |
Int |
Spectral Window id |
||
Data description |
||||
NUM_LINES |
Int |
Number of spectral lines |
||
Data |
||||
NAME |
String |
Name of source as given during observations |
||
CALIBRATION_GROUP |
Int |
grouping for calibration purpose |
||
CODE |
String |
Special characteristics of source, e.g. Bandpass calibrator |
||
DIRECTION |
Double(2) |
rad |
DIRECTION |
Direction (e.g. RA, DEC) |
(POSITION) |
Double(3) |
m |
POSITION |
Position (e.g. for solar system objects) |
PROPER_MOTION |
Double(2) |
rad/s |
Proper motion |
|
(TRANSITION) |
String(NUM_LINES) |
Transition name |
||
(REST_FREQUENCY) |
Double(NUM_LINES) |
Hz |
FREQUENCY |
Line rest frequency |
(SYSVEL) |
Double(NUM_LINES) |
m/s |
RADIAL VELOCITY |
Systemic velocity at reference |
(SOURCE_MODEL) |
TableRecord |
Default csm |
||
(PULSAR_ID) |
Int |
Pulsar id. |
SOURCE_ID - Source identifier (≥ 0), as specified in the FIELD sub-table.
TIME - Mid-point of the time interval for which the data in this row is valid. Required to use the same TIME Measure reference as in MAIN.
INTERVAL - Time interval.
SPECTRAL_WINDOW_ID - Spectral window identifier. A -1 indicates that the row is valid for all spectral windows.
NUM_LINES - Number of spectral line transitions associated with this source and spectral window id. combination.
NAME - Source name; user specified.
CALIBRATION_GROUP - Calibration group number to which this source belongs; user specified.
CODE - Source code, used to describe any special characteristics f the source, such as the nature of a calibrator. Reserved keyword, including (“BANDPASS CAL”).
DIRECTION - Source direction at this TIME.
POSITION - Source position (x, y, z) at this TIME (for near-field objects).
PROPER_MOTION - Source proper motion at this TIME.
TRANSITION - Transition names applicable for this spectral window (e.g. “v=1, J=1-0, SiO”).
REST_FREQUENCY - Rest frequencies for the transitions.
SYSVEL - Systemic velocity for each transition.
SOURCE_MODEL - Reference to an assigned component source model table.
PULSAR_ID - An index used in the PULSAR sub-table to define further pulsar-specific properties if the source is a pulsar.
SPECTRAL_WINDOW: Spectral Window Description¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Data description columns |
||||
NUM_CHAN |
Int |
spectral channels |
||
Data |
||||
NAME |
String |
Spectral window name |
||
REF_FREQUENCY |
Double |
Hz |
FREQUENCY |
The reference frequency. |
CHAN_FREQ |
Double(NUM_CHAN) |
Hz |
FREQUENCY |
Center frequencies for each channel in the data matrix. |
CHAN_WIDTH |
Double(NUM_CHAN) |
Hz |
Channel width for each channel in the data matrix. |
|
MEAS_FREQ_REF |
Int |
FREQUENCY Measure ref. |
||
EFFECTIVE_BW |
Double(NUM_CHAN) |
Hz |
The effective noise bandwidth of each spectral channel |
|
RESOLUTION |
Double(NUM_CHAN) |
Hz |
The effective spectral resolution of each channel |
|
TOTAL_BANDWIDTH |
Double |
Hz |
total bandwidth for this window |
|
NET_SIDEBAND |
Int |
Net sideband |
||
(BBC_NO) |
Int |
Baseband converter no. |
||
(BBC_SIDEBAND) |
Int |
BBC sideband |
||
IF_CONV_CHAIN |
Int |
The IF conversion chain |
||
(RECEIVER_ID) |
Int |
Receiver id. |
||
FREQ_GROUP |
Int |
Frequency group |
||
FREQ_GROUP_NAME |
String |
Freq. group name |
||
(DOPPLER_ID) |
Int |
Doppler id. |
||
(ASSOC_SPW_ID) |
Int(*) |
Associated spw_id. |
||
(ASSOC_NATURE) |
String(*) |
Nature of association |
||
Flags |
||||
FLAG_ROW |
Bool |
NUM_CHAN - Number of spectral channels.
NAME - Spectral window name; user specified.
REF_FREQUENCY - The reference frequency. A frequency representative of this spectral window, usually the sky frequency corresponding to the DC edge of the baseband. Used by the calibration system if a fixed scaling frequency is required or in algorithms to identify the observing band.
CHAN_FREQ - Center frequencies for each channel in the data matrix. These can be frequency-dependent, to accommodate instruments such as acousto-optical spectrometers. Note that the channel frequencies may be in ascending or descending frequency order.
CHAN_WIDTH - Nomical channel width of each spectral channel. Although these can be derived from CHAN_FREQ by differencing, it is more efficient to keep a separate reference to this information.
MEAS_FREQ_REF - Frequency Measure reference for CHAN_FREQ. This allows a row-based reference for this column in order to optimize the choice of Measure reference when Doppler tracking is used. Modified only by the MS access code.
EFFECTIVE_BW - The effective noise bandwidth of each spectral channel.
RESOLUTION - The effective spectral resolution of each channel.
TOTAL_BANDWIDTH - The total bandwidth for this spectral window.
NET_SIDEBAND - The net sideband for this spectral window.
BBC_NO - The baseband converter number, if applicable.
BBC_SIDEBAND - The baseband converter sideband, is applicable.
IF_CONV_CHAIN - Identification of the electronic signal path for the case of multiple (simultaneous) IFs. (e.g. VLA: AC=0, BD=1, ATCA: Freq1=0, Freq2=1)
RECEIVER_ID - Index used to identify the receiver associated with the spectral window. Further state information is planned to be stored in a RECEIVER sub-table.
FREQ_GROUP - The frequency group to which the spectral window belongs. This is used to associate spectral windows for joint calibration purposes.
FREQ_GROUP_NAME - The frequency group name; user specified.
DOPPLER_ID - The Doppler identifier defining frame information for this spectral window.
ASSOC_SPW_ID - Associated spectral windows, which are related in some fashion (e.g. “channel-zero”).
ASSOC_NATURE - Nature of the association for ASSOC_SPW_ID; reserved keywords are (“CHANNEL-ZERO” - channel zero; “EQUAL-FREQUENCY” - same frequency labels; “SUBSET” - narrow-band subset).
FLAG_ROW - True if the row does not contain valid data.
STATE: State Information¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Data |
||||
SIG |
Bool |
Signal |
||
REF |
Bool |
Reference |
||
CAL |
Double |
K |
Noise calibration |
|
LOAD |
Double |
K |
Load temperature |
|
SUB_SCAN |
Int |
Sub-scan number |
||
OBS_MODE |
String |
Observing mode |
||
Flags |
||||
FLAG_ROW |
Bool |
Row flag |
SIG - True if the source signal is being observed.
REF - True for a reference phase.
CAL - Noise calibration temperature (zero if not added).
LOAD - Load temperature (zero if no load).
SUB_SCAN - Sub-scan number (≥ 0), relative to the SCAN_NUMBER in MAIN. Used to identify observing sequences.
OBS_MODE - Observing mode; defined by a set of reserved keywords characterizing the current observing mode (e.g. “OFF-SPECTRUM”). Used to define the schedule strategy.
FLAG_ROW - True if the row does not contain valid data. Does not imply flagging in MAIN.
SYSCAL: System Calibration¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Key |
||||
ANTENNA_ID |
Int |
Antenna id |
||
FEED_ID |
Int |
Feed id |
||
SPECTRAL_WINDOW_ID |
Int |
Spectral window id |
||
TIME |
Double |
s |
EPOCH |
Midpoint of time for which this set of parameters is accurate |
INTERVAL |
Double |
s |
Interval |
|
Data |
||||
(PHASE_DIFF) |
Float |
rad |
Phase difference between receptor 0 and receptor 1 |
|
(TCAL) |
Float (Nr) |
K |
Calibration temp |
|
(TRX) |
Float (Nr) |
K |
Receiver temperature |
|
(TSKY) |
Float (Nr) |
K |
Sky temperature |
|
(TSYS) |
Float (Nr) |
K |
System temp |
|
(TANT) |
Float (Nr) |
K |
Antenna temperature |
|
(TANT_TSYS) |
Float(Nr) |
$ {{T_{ant}}:nbsphinx-math:over{T_{sys}}}$ |
||
(TCAL_SPECTRUM) |
Float (Nr, Nf) |
K |
Calibration temp |
|
(TRX_SPECTRUM) |
Float (Nr, Nf) |
K |
Receiver temperature |
|
(TSKY_SPECTRUM) |
Float (Nr, Nf) |
K |
Sky temperature spectrum |
|
(TSYS_SPECTRUM) |
Float (Nr, Nf) |
K |
System temp |
|
(TANT_SPECTRUM) |
Float (Nr, Nf) |
K |
Antenna temperature spectrum |
|
(TANT_TSYS_SPECTRUM) |
Float (Nr,Nf) |
$ {{T_{ant}}:nbsphinx-math:over{T_{sys}}}$ spectrum |
||
Flags |
||||
(PHASE_DIFF_FLAG) |
Bool |
Flag for PHASE_DIFF |
||
(TCAL_FLAG) |
Bool |
Flag for TCAL |
||
(TRX_FLAG) |
Bool |
Flag for TRX |
||
(TSKY_FLAG) |
Bool |
Flag for TSKY |
||
(TSYS_FLAG) |
Bool |
Flag for TSYS |
||
(TANT_FLAG) |
Bool |
Flag for TANT |
||
(TANT_TSYS_FLAG) |
Bool |
Flag for \({{T_{ant}}\over{T_{sys}}}\) |
ANTENNA_ID - Antenna identifier, as indexed by ANTENNAn in MAIN.
FEED_ID - Feed identifier, as indexed by FEEDn in MAIN.
SPECTRAL_WINDOW_ID - Spectral window identifier.
TIME - Mid-point of the time interval for which the data in this row are valid. Required to use the same TIME Measure reference as that in MAIN.
INTERVAL - Time interval.
PHASE_DIFF - Phase difference between receptor 0 and receptor 1.
TCAL - Calibration temperature.
TRX - Receiver temperature.
TSKY - Sky temperature.
TSYS - System temperature.
TANT - Antenna temperature.
TANT_TSYS - Antenna temperature over system temperature.
TCAL_SPECTRUM - Calibration temperature spectrum.
TRX_SPECTRUM - Receiver temperature spectrum.
TSKY_SPECTRUM - Sky temperature spectrum.
TSYS_SPECTRUM - System temperature spectrum.
TANT_SPECTRUM - Antenna temperature spectrum.
TANT_TSYS_SPECTRUM - Antenna temperature over system temperature spectrum.
PHASE_DIFF_FLAG - True if PHASE_DIFF flagged.
TCAL_FLAG - True if TCAL flagged.
TRX_FLAG - True if TRX flagged.
TSKY_FLAG - True if TSKY flagged.
TSYS_FLAG - True if TSYS flagged.
TANT_FLAG - True if TANT flagged.
TANT_TSYS_FLAG - True if TANT_TSYS flagged.
WEATHER: Weather Station Information¶
Name |
Format |
Units |
Measure |
Comments |
---|---|---|---|---|
Columns |
||||
Key |
||||
ANTENNA_ID |
Int |
Antenna number |
||
TIME |
Double |
s |
EPOCH |
Mid-point of interval |
INTERVAL |
Double |
s |
Interval over which data is relevant |
|
Data |
||||
(H2O) |
Float |
m-2 |
Average column density of water |
|
(IONOS_ELECTRON) |
Float |
m-2 |
Average column density of electrons |
|
(PRESSURE) |
Float |
hPa |
Ambient atmospheric pressure |
|
(REL_HUMIDITY) |
Float |
Ambient relative humidity |
||
(TEMPERATURE) |
Float |
K |
Ambient air temperature for an antenna |
|
(DEW_POINT) |
Float |
K |
Dew point |
|
(WIND_DIRECTION) |
Float |
rad |
Average wind direction |
|
(WIND_SPEED) |
Float |
m/s |
Average wind speed |
|
Flags |
||||
(H2O_FLAG) |
Bool |
Flag for H2O |
||
(IONOS_ELECTRON_FLAG) |
Bool |
Flag for IONOS_ELECTRON |
||
(PRESSURE_FLAG) |
Bool |
Flag for PRESSURE |
||
(REL_HUMIDITY_FLAG) |
Bool |
Flag for REL_HUMIDITY |
||
(TEMPERATURE_FLAG) |
Bool |
Flag for TEMPERATURE |
||
(DEW_POINT_FLAG) |
Bool |
Flag for DEW_POINT |
||
(WIND_DIRECTION_FLAG) |
Bool |
Flag for WIND_DIRECTION |
||
(WIND_SPEED_FLAG) |
Bool |
Flag for WIND_SPEED |
ANTENNA_ID - Antenna identifier, as indexed by ANTENNAn from MAIN.
TIME - Mid-point of the time interval over which the data in the row are valid. Required to use the same TIME Measure reference as in MAIN.
INTERVAL - Time interval.
H2O - Average column density of water.
IONOS_ELECTRON - Average column density of electrons.
PRESSURE - Ambient atmospheric pressure.
REL_HUMIDITY - Ambient relative humidity.
TEMPERATURE - Ambient air temperature.
DEW_POINT - Dew point temperature.
WIND_DIRECTION - Average wind direction.
WIND_SPEED - Average wind speed.
H2O_FLAG - Flag for H2O.
IONOS_ELECTRON_FLAG - Flag for IONOS_ELECTRON.
PRESSURE_FLAG - Flag for PRESSURE.
REL_HUMIDITY_FLAG - Flag for REL_HUMIDITY.
TEMPERATURE_FLAG - Flag for TEMPERATURE.
DEW_POINT_FLAG - Flag for DEW_POINT.
WIND_DIRECTION_FLAG - Flag for DEW_POINT.
WIND_SPEED_FLAG - Flag for DEW_POINT.
Definition Synthesized Beam¶
CASA uses the following zero-centered two dimensional elliptical Gaussian function or Gaussian beam:
where A is the amplitude (usually set to unity) and \(\theta\) is the anti-clockwise angle from the x axis to the line that lies along the greatest width of \(f(x,y)\) (the line and the x axis must be coplanar). The factors \(d_1\) and \(d_2\) are respectively the semi-major and semi-minor axis of the ellipse, which is formed by the cross-section that lies parallel to the \(x, y\) plane, at a height so that \(d_1\) is equal to the FWHM (full width at half maximum) distance of the one dimensional Gaussian which lies on the plane formed by the \(z\) axis and \(d_1\). Note that \(d_1 \geqslant d_2 > 0\), since \(d_1\) is the semi-major axis.
Figure 1: Surface plot of a two dimensional elliptical Gaussian with \(A = 1\), \(d_1 = 3\), \(d_2=1\) and \(\theta = 30^\circ\).
Figure 2: Cross-section parallel to the \(x, y\) plane of the two dimensional elliptical Gaussian from Fig. 1, where the resulting ellipse has a semi-major and semi-minor axis equal to \(d_1\) and \(d_2\), respectively.
Figure 3: One dimensional Gaussian plot for \(A = 1\), \(y = 0\), \(\theta = 0\) and \(d_1 = 1 = FWHM\).
For calculating the Fourier transform of the two dimensional elliptical Gaussian, the above Equation can be re-written by grouping the \(x\) and \(y\) terms:
where
Converting from \(\alpha\), \(\beta\), \(\gamma\) to \(d_1\), \(d_2\), \(\theta\) can be done using the following set of equations:
Working with MS Data¶
The ALMA and VLA raw data are stored in their respective archives in the Astronomy Science Data Model (ASDM) format. The definition of the format can be found here.
To bring them into CASA, the ASDMs are filled into a so-called MeasurementSet (or MS) (format description can be found here). In its logical structure, the MS looks like a generalized description of data from any interferometric or single dish telescope. Physically, the MS consists of several tables in a directory on disk, in XML format.
Tables in CASA are actually directories containing files that are the sub-tables. For example, when you create a MS called AM675.ms, then the name of the directory where all the tables are stored will be called AM675.ms/. See chapter “Visibility Data Import Export” for more information on MeasurementSet and Data Handling in CASA.
The data that you originally get from a telescope can be put in any directory that is convenient to you. Once you “fill” the data into a MeasurementSet that can be accessed by CASA, it is generally best to keep that MS in the same directory where you started CASA so you can get access to it easily (rather than constantly having to specify a full path name).
When you generate calibration solutions or images (again these are in table format), these will also be written to disk. It is a good idea to keep them in the directory in which you started CASA, too.
How do I get rid of my data in CASA?
Note that when you delete a MeasurementSet, calibration table, or image, which are in fact directories, you must delete this and all underlying directories and files. If you are not running CASA, this is most simply done by using the file delete method of the operating system from which you started CASA. For example, when running CASA on a Linux system, in order to delete the MeasurementSet named AM675.ms type
CASA <5>: !rm -r AM675.ms
from within CASA. The ! tells CASA that a system command follows, and the -r makes sure that all subdirectories are deleted recursively.
It is convenient to prefix all MS, calibration tables, and output files produced in a run with a common string. For example, one might prefix all files from VLA project AM675 with AM675, e.g. AM675.ms, AM675.cal, AM675.clean. Then,
CASA <6>: !rm -r AM675*
will clean up all of these.
In scripts, the ! escape to the OS will not work. Instead, use the os.system() function to do the same thing:
os.system('rm -r AM675*')
If you are within CASA, then the CASA system is keeping a cache of tables that you have been using and using the OS to delete them will confuse things. For example, running a script that contains rm commands multiple times will often not run or crash the second time as the cache gets confused. The clean way of removing CASA tables (MS, caltables, images) inside CASA is to use the rmtables task:
rmtables('AM675.ms')
and this can also be wildcarded (though you may get warnings if it tries to delete files or directories that fit the name wildcard that are not CASA tables).
ALERT: rmtables is the preferred way to remove data. clean is a good example where frequently data are left in the cache after deleting the output files via “!rm -r”. Restarting clean then sometimes claims that the files still exist, even though they are not present on disk anymore. rmtables will completely remove the files on disks and all cached versions and restarting clean will work as intended.
ALERT: Some CASA processes lock the file and forget to give it up when they are done. You will get WARNING messages from rmtables and your script will probably crash second time around as the file isn’t removed. The safest thing is still to exit CASA and start a new session for multiple runs.
What’s in my data?
The actual data is in a large MAIN table that is organized in such a way that you can access different parts of the data easily. This table contains a number of “rows”, which are effectively a single timestamp for a single spectral window (like an IF from the VLA) and a single baseline (for an interferometer).
There are a number of “columns” in the MS, the most important of which for our purposes is the DATA column — this contains the original visibility data from when the MS was created or filled. There are other helpful “scratch” columns which hold useful versions of the data or weights for further processing: the CORRECTED_DATA column, which is used to hold calibrated data and an optional MODEL_DATA column, which may hold the Fourier inversion of a particular model image. The creation and use of the scratch columns is generally done behind the scenes, but you should be aware that they are there (and when they are used). We will occasionally refer to the rows and columns in the MS.
Loading Data to Images¶
The subsections below provide a brief overview of the steps you will need to load data into CASA and obtain a final, calibrated image. Each subject is covered in more detail in other chapters.
An end-to-end workflow diagram for CASA data reduction for interferometry data is shown in the Figure below. This might help you chart your course through the package. In the following sub-sections, we will chart a rough course through this process, with the later chapters filling in the individual boxes.
Flow chart of the data processing operations that a general user will carry out in an end-to-end CASA reduction session.
Note that single-dish data reduction (for example with the ALMA single-dish system) follows a similar course. This is detailed in the corresponding chapters.
Loading Data into CASA
The key data and image import tasks are (see “Visibility Data Import Export”):
importuvfits — import visibility data in UVFITS format
importvla — import data from VLA that is in export format
importasdm — import data in ASDM format
importfits — import a FITS image into a CASA image format table
These are used to bring in your interferometer data, to be stored as a CASA MeasurementSet (MS), and any previously made images or models (to be stored as CASA image tables).
The data import tasks will create a MS with a path and name specified by the vis parameter. The MeasurementSet is the internal data format used by CASA, and conversion from any other native format is necessary for most of the data reduction tasks.
Once data is imported, there are other operations you can use to manipulate the datasets:
concat — concatenate multiple MSs into a given or a new MS
Data Examination, Editing, and Flagging¶
The main data examination and flagging tasks are:
listobs — summarize the contents of a MS
flagmanager — save and manage versions of the flagging entries in the MeasurementSet
plotms — interactive X-Y plotting and flagging of visibility data
flagdata — flagging (and unflagging) of specified data
msview — the CASA msview task can display (as a raster image) MS data, with some editing capabilities
These tasks allow you to list, plot, and/or flag data in a CASA MS.
Interactive flagging (i.e., “see it – flag it”) is possible on the plotms X-Y displays of the data. Since flags are inserted into the MeasurementSet, it is useful to backup (or make a copy) of the current flags before further flagging is done, using flagmanager. Copies of the flag table can also be restored to the MS in this way.
plotms can also be invoked without starting CASA. Launch it from the terminal with:
$ casaplotms &
Calibration¶
The major calibration tasks are:
setjy — Computes the model visibilities for a specified source given a flux density or model image, knows about standard calibrator sources
initweights — if necessary, supports (re-)initialization of the data weights, including an option for enabling spectral weight accounting
gencal — Creates a calibration table for known delay and antenna position offsets, opacities, and requantizer gains
bandpass — Solves for frequency-dependent (bandpass) complex gains
gaincal — Solves for time-dependent (frequency-independent) complex gains
fluxscale — Bootstraps the flux density scale from standard calibrators
polcal — polarization calibration
applycal — Applies calculated calibration solutions
clearcal — Re-initializes calibrated visibility data in a given MeasurementSet
listcal — Lists calibration solutions
plotms — Plots (and optionally flags) calibration solutions
uvcontsub — carry out uv-plane continuum subtraction for spectral-line data
split — write out a new (calibrated) MS for specified sources
cvel — Regrid a spectral MS onto a new frequency channel system
During the course of calibration, the user will specify a set of calibrations to pre-apply before solving for a particular type of effect, for example gain or bandpass or polarization. The solutions are stored in a calibration table (subdirectory) which is specified by the user, not by the task: care must be taken in naming the table for future use. The user then has the option, as the calibration process proceeds, to accumulate the current state of calibration in a new cumulative table. Finally, the calibration can be applied to the dataset.
See “Synthesis Calibration” for more information.
Prior Calibration
The setjy task calculates absolute fluxes for MeasurementSet base on known calibrator sources. This can then be used in later calibration tasks. Currently, setjy knows the flux density as a function of frequency for several standard VLA flux calibrators and solar system objects, and the value of the flux density can be manually inserted for any other source. If the source is not well-modeled as a point source, then a model image of that source structure can be used (with the total flux density scaled by the values given or calculated above for the flux density). Models are provided for the standard VLA calibrators and calculated for solar system objects.
Antenna gain-elevation curves (e.g. for the VLA antennas), gain curves, requantizer gains, and atmospheric optical depth corrections (applied as an elevation-dependent function) may be pre-applied before solving for the bandpass and gains. The task gencal will generate those to be applied for further calibration.
See “Synthesis Calibration” for more information.
Delay Calibration
A delay for each antenna can be calculated using gaincal with option “K”. The delay calibration will remove delay errors that cause systematic slopes in the phases as a function opf time. In particular phase wraps will be removed.
Bandpass Calibration
The bandpass task calculates a bandpass calibration solution: that is, it solves for gain variations in frequency as well as in time. Since the bandpass (relative gain as a function of frequency) generally varies much more slowly than the changes in overall (mean) gain solved for by gaincal, one generally uses a long time scale when solving for the bandpass. The default ‘B’ solution mode solves for the gains in frequency slots consisting of channels or averages of channels.
A polynomial fit for the solution (solution type ‘BPOLY’) may be carried out instead of the default frequency-slot based ‘B’ solutions. This single solution will span (combine) multiple spectral windows.
Bandpass calibration is discussed in detail in “Synthesis Calibration”.
If the gains of the system are changing over the time that the bandpass calibrator is observed, then you may need to do an initial gain calibration (see next step).
Gain Calibration
The gaincal task determines solutions for the time-based complex antenna gains, for each spectral window, from the specified calibration sources. A solution interval may be specified. The default ‘G’ solution mode solves for antenna-based gains in each polarization in specified time solution intervals. The ‘T’ solution mode is the same as ‘G’ except that it solves for a single solution shared by both polarizations.
A spline fit for the solution (solution type ‘GSPLINE’) may be carried out instead of the default time-slot based ‘G’ solutions.
Gain calibration is discussed in detail in “Synthesis Calibration”.
Polarization Calibration
The polcal task will solve for any unknown polarization leakage and cross-hand phase terms (‘D’ and ‘X’ solutions). The ‘D’ leakage solutions will work on sources with no polarization and sources with known (and supplied, e.g., using smodel) polarization. For sources with unknown polarization tracked through a range in parallactic angle on the sky, using poltype ‘D+QU’, which will first estimate the calibrator polarization for you.
The solution for the unknown cross-hand polarization phase difference ‘X’ term requires a polarized source with known linear polarization (Q,U).
Frequency-dependent (i.e., per channel) versions of all of these modes are also supported (poltypes ‘Df’, ‘Df+QU’, and ‘Xf’.
Examining Calibration Solutions
The plotms task can plot the solutions in a calibration table. The xaxis choices include time (for gaincal solutions) and channel (e.g. for bandpass calibration).
The listcal task will print out the calibration solutions in a specified table.
Bootstrapping Flux Calibration
The fluxscale task bootstraps the flux density scale from “primary” standard calibrators to the “secondary” calibration sources. Note that the flux density scale must have been previously established on the “primary” calibrator(s) using setjy, and of course a calibration table containing valid solutions for all calibrators must be available.
Correcting the Data
The final step in the calibration process, applycal may be used to apply several calibration tables (e.g., from gaincal or bandpass, along with prior calibration tables). The corrections are applied to the DATA column of the visibility, writing the CORRECTED_DATA column which can then be plotted in plotms, split out as the DATA column of a new MS, or imaged (e.g. using clean). Any existing corrected data are overwritten.
Splitting the Data
After a suitable calibration is achieved, it may be desirable to create one or more new MeasurementSets containing the data for selected sources. This can be done using the split task (see “UV Manipulation”).
Further imaging and calibration (e.g. self-calibration) can be carried out on these split MeasurementSets.
UV Continuum subtraction
For spectral line data, continuum subtraction can be performed in the image domain (imcontsub) or in the uv domain. For the latter, uvcontsub subtracts polynomial of desired order from each baseline, defined by line-free channels.
Transforming the Data to a new frame
If you want to transform your dataset to a different frequency and velocity frame than the one it was observed in, then you can use the cvel task (“UV Manipulation”). Alternatively, you can do the regridding during the imaging process in clean without running cvel before.
Synthesis Imaging¶
The key synthesis imaging tasks are:
tclean - Calculates a deconvolved image based on the visibility data, using one of several clean algorithms
feather - Combines a single dish and synthesis image in the Fourier plane
Most of these tasks are used to take calibrated interferometer data, with the possible addition of a single-dish image, and reconstruct a model image of the sky.
See Chapter “Synthesis Imaging” and “Image Combination” for more information.
Cleaning a single-field image or a mosaic
The CLEAN algorithm is the most popular and widely-studied method for reconstructing a model image based on interferometer data. It iteratively removes at each step a fraction of the flux in the brightest pixel in a defined region of the current “dirty” image, and places this in the model image. The clean task implements the CLEAN algorithm for single-field data. The user can choose from a number of options for the particular flavor of CLEAN to use.
Often, the first step in imaging is to make a simple gridded Fourier inversion of the calibrated data to make a “dirty” image. This can then be examined to look for the presence of noticeable emission above the noise, and to assess the quality of the calibration by searching for artifacts in the image. This is done using clean with niter=0.
The clean task can jointly deconvolve mosaics as well as single fields, and also has options to do wide-field and wide-band multi-frequency synthesis imaging.
See “Synthesis Imaging” for an in-depth discussion of the clean task.
Feathering in a Single-Dish image
If you have a single-dish image of the large-scale emission in the field, this can be “feathered” in to the image obtained from the interferometer data. This is carried out using the feather task as the weighted sum in the uv-plane of the gridded transforms of these two images. While not as accurate as a true joint reconstruction of an image from the synthesis and single-dish data together, it is sufficient for most purposes. A graphical version of feather is provided by casafeather.
See “Image Combination” for an in-depth discussion of the feather task.
Self Calibration¶
Once a calibrated dataset is obtained, and a first deconvolved model image is computed, a “self-calibration” loop can be performed. Effectively, the model (not restored) image is passed back to another calibration process (on the target data). This refines the calibration of the target source, which up to this point has had (usually) only external calibration applied. This process follows the regular calibration procedure outlined above.
Any number of self-calibration loops can be performed. As long as the images are improving, it is usually prudent to continue the self-calibration iterations.
This process is described in “Synthesis Calibration”.
Data and Image Analysis¶
The key data and image analysis tasks are:
imhead — summarize and manipulate the “header” information in a CASA image
imcontsub — perform continuum subtraction on a spectral-line image cube
immath — perform mathematical operations on or between images
immoments — compute the moments of an image cube
imstat — calculate statistics on an image or part of an image
imval — extract values of one or more pixels, as a spectrum for cubes, from an image
imfit — simple 2D Gaussian fitting of single components to a region of an image
imregrid — regrid an image onto the coordinate system of another image
imview — there are useful region statistics and image cube plotting capabilities in imview.
What’s in an image?
The imhead task will print out a summary of image “header” keywords and values. This task can also be used to retrieve and change the header values.
See “Image Analysis” for more.
Image statistics
The imstat task will print image statistics. There are options to restrict this to a box region, and to specified channels and Stokes of the cube. This task will return the statistics in a Python dictionary return variable.
Image values
The imval task will return values from an image. There are options to restrict this to a box region, and to return specified channels and Stokes of the cube as a spectrum. This task will return these values in a Python dictionary return variable which can then be operated on in the CASA environment.
Moments of an image cube
The immoments task will compute a “moments” image of an input image cube. A number of options are available, from the traditional true moments (zero, first, second) and variations thereof, to other images such as median, minimum, or maximum along the moment axis.
Image math
The immath task will allow you to form a new image by mathematical combinations of other images (or parts of images). This is a powerful task to use.
Regridding an Image
It is occasionally necessary to regrid an image onto a new coordinate system. The imregrid task can be used to regrid an input image onto the coordinate system of an existing template image, creating a new output image.
Displaying Images¶
To display an image use the task imview. The imview task will display images in raster, contour, or vector form. Blinking and movies are available for spectral-line image cubes. To start imview, type:
imview
within CASA.
Executing the imview task will bring up two windows: a imview screen showing the data or image, and a file catalog list. Click on an image or MS from the file catalog list, choose the proper display, and the image should pop up on the screen. Clicking on the wrench tool (second from left on upper left) will obtain the data display options. Most functions are self-documenting.
See “Image / Cube Visualization” for more details.
Getting data and images out of CASA¶
The key data and image export tasks are:
exportuvfits — export a CASA MS in UVFITS format
exportfits — export a CASA image table as FITS
These tasks can be used to export a CASA MS or image to UVFITS or FITS respectively. See the individual sections referred to above for more on each.
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/external-data.ipynb
External Data¶
Each CASA distribution comes with a minimal repository of binary data that is required for CASA to function properly. This is contained in the casadata repository and bundled in to a casadata package for modular CASA. The repository includes Measures Tables that deal with the Earth Orientation Parameters (EOPs), reference frames, as well as ephemeris data. In particular the EOPs include predictions for the near future which drift until they are well determined.
Casacore Measures¶
The casacore Measures tables are needed to perform accurate conversions of reference frames. Casacore infrastructure includes classes to handle physical quantities with a reference frame, so-called Measures. Each type of Measure has its own distinct class in casacore which is derived from the Measure base class. One of the main functionalilties provided by casacore w.r.t. Measures, is the conversion of Measures from one reference frame to another using the MeasConvert classes.
Many of the spectral, spatial, and time reference frames are time-dependent and require the knowledge of the outcome of ongoing monitoring measurements of properties of the Earth and astronomical objects by certain service observatories. This data is tabulated in a number of tables (Measures Tables) which are stored in the casadata repository in the subdirectory geodetic
. A snapshot of this repository is included in each tarball distribution of CASA and in the casadata module for CASA6+.
Measures tables are updated daily based on the refinement of the geodetic information from the relevant services like the International Earth Rotation and Reference Systems Service (IERS). Strictly speaking, the Measures tables are part of the casacore infrastructure which is developed by NRAO, ESO, NAOJ, CSIRO, and ASTRON. In order to keep the repository consistent between the partners, the Measures tables are initially created at a single institution (ASTRON) and then copied into the NRAO
casadata repository from which all CASA users can retrieve them. As of March 2020, the update of the NRAO CASA copy of the Measures tables in geodetic
and the planetary ephemerides in directory ephemerides
takes place every day between 18 h UTC and 19 h UTC via two redundant servers at ESO (Garching).
The following list describes the individual Tables in subdirectory geodetic
:
IERSeop2000: The IERS EOP2000C04_05 Earth Orientation Parameters using the precession/nutation model “IAU2000” (files eopc04_IAU2000.xx)
IERSeop97: The IERS EOPC04_05 Earth Orientation Parameters using the precession/nutation model “IAU 1980” (files eopc04.xx)
IERSpredict: IERS Earth Orientation Data predicted from NEOS (from file ftp://ftp.iers.org/products/eop/rapid/daily/finals.daily)
IGRF: International Geomagnetic Reference Field Schmidt semi-normalised spherical harmonic coefficients. (Note that this still uses IGRF12. An update to IGRF13 is underway.)
IMF (not a Measures Table proper, access not integreated in Measures framework): Historical interplanetary magnetic field data until MJD 52618 (December 2002).
KpApF107 (not a Measures Table proper, access not integreated in Measures framework): Historical geomagnetic and solar activity indices until MJD 54921 (April 2009)
Observatories: Table of the official locations of radio observatories. Maintained by the CASA consortium.
SCHED_locations (not a Measures Table proper, access not integreated in Measures framework): VLBI station locations
TAI_UTC: TAI_UTC difference (i.e. leap second information) obtained from USNO
Measures Tables in the directory ephemerides
:
DE200: Historical JPL Planetary ephemeris DE200 used for Astronomical Almanach from 1984 to 2003 (from ftp://ssd.jpl.nasa.gov/pub/eph/planets/ascii/de200)
DE405: JPL Planetary ephemeris DE405; includes nutations and librations; referred to the ICRS (from ftp://ssd.jpl.nasa.gov/pub/eph/planets/ascii/de405)
Ephemeris Data¶
The ephemeris tables hold a selection of the solar system objects from JPL-Horizons database. The data tables are generated from the JPL Horizons system’s on-line solar system data and ephemeris computation service (https://ssd.jpl.nasa.gov/?horizons ). These are primarily used to determine flux models for the solar system objects used in the setjy task. These tables are stored as CASA tables in the casadata repository under
ephemerides/JPL-Horizons
. The current ephemeris tables cover ephemerides until December 31, 2030 for those objects officially supported in setjy.
Available objects, which include major planets, satellites, and asteroids, are: Mercury, Venus, Mars, Jupiter, Saturn, Uranus, Neptune, Pluto, Io, Europa, Ganymede, Callisto, Titan, Ceres, Vesta, Pallas, Juno, Lutetia, Sun and Moon (the objects in bold are those supported in ‘Butler-JPL-Horizons 2012’ standard in setjy.).
The format of the table name of these tables is objectname_startMJD_endMJD_J2000.tab These tables required by setjy task are included in the data directory in the CASA distribution. The available tables can be listed by the following commands:
#In CASA6
CASA <1>: import glob
CASA <2>: jpldatapath=os.getenv('CASAPATH').split(' ')[0]+'/data/ephemerides/JPL-Horizons/*J2000.tab'
CASA <3>: glob.glob(jpldatapath)
The following data are retrieved from the JPL-Horizons system (the nubmer in the parentheses indicates the column number listed in the JPL-Horizons system). One should refer https://ssd.jpl.nasa.gov/?horizons_doc for the detailed descreption of each of these quantities.
Quantities |
column no. |
Unit/format |
Descrition |
column label |
---|---|---|---|---|
Date |
n.a. |
YYYY-MM-DD |
HH:MM |
Date__(UT)__HR:MN |
Astrometric RA & DEC |
1 |
degrees |
Astrometric RA and Dec with respect to the observer’s location (GEOCETRIC) |
R.A._(ICRF)_DEC |
Observer sub-long& sub-lat |
14 |
degrees |
Apparent planetodetic (“geodetic”) longitude and latitude of the center of the target seen by the OBSERVER at print-time |
ob-lon, ob-lat |
Solar sub-long & sub-lat |
15 |
degrees |
Apparent planetodetic (“geodetic”) longitude and latitude of the Sun seen by the OBSERVER at print-time |
Sl-lon, Sl-lat |
North Pole Pos. ang. & dist. |
17 |
degrees and arcseconds |
Target’s North Pole position angle and angular distance from the “sub-observer” point |
NP.ang, NP.ds |
Helio range & range rate |
19 |
AU, km/s |
Heliocentric range (r) and range-rate (rdot) |
r, rdot |
Observer range & range rate |
20 |
AU, km/s |
Range (delta) and range-rate (deldot) of the target center with respect to the observer |
delta, dedot |
S-T-O angle |
24 |
degrees |
Sun-Target-Observer angle |
S-T-O |
The script request.py
(located in casatasks.private for CASA6) can be used to retrieve the ephemeris data from the JPL-Horizons system via e-mail (See also Manipulate Ephemeris Objects page). Further, the saved email text file is converted to a CASA table format using JPLephem_reader2.
#In CASA6
CASA <5>: from casatasks.private import JPLephem_reader2 as jplreader
CASA <6>: outdict = jplreader.readJPLephem('titan-jpl-horizons-ephem.eml')
opened the file=titan-jpl-horizons-ephem.eml
CASA <7>: jplreader.ephem_dict_to_table(outdict,'Titan_test_ephem.tab')
Got nrows = 3653 from MJD
The converted table contains following columns.
Column name |
unit/format |
description |
---|---|---|
MJD |
day |
modified Julian date |
RA |
degree |
atrometric right acension in ICRF/J2000 frame |
DEC |
degree |
astrometric declination in ICRF/J2000 frame |
Rho |
AU |
Geocentric distance |
RadVal |
AU/d |
Geocentric distance rate |
NP_ang |
degree |
North pole position angle |
NP_dist |
degree |
North pole angular distance |
DiskLong |
degree |
Sub-observer longitude |
DiskLat |
degree |
Sub-observer latitude |
Sl_lon |
degree |
Sub-Solar longitude |
Sl_lat |
degree |
Sub-Solar latitude |
r |
AU |
heliocentric distance |
rdot |
km/s |
heliocentric distance rate |
phang |
degree |
phase angle |
Array Configuration¶
Array configuration files for various telescopes are distributed with each CASA release. These configuration files can be used to define the telescope for simulator tools and tasks. Currently, configuration files for the following telescopes are available in CASA:
ALMA / 12m Array
ALMA / 7m ACA
VLA
VLBA
Next-Generation VLA (reference design)
ATCA
MeerKat
PdBI (pre-NOEMA)
WSRT
SMA
Carma
The full list of antenna configurations can be found in the CASA Guides on Simulations.
One can also locate the directory with the configurations in the CASA distribution and then list the configuration files, using the following commands in CASA:
CASA <1>: print os.getenv('CASAPATH').split(' ')[0] + '/data/alma/simmos/'
/home/casa/packages/RHEL7/release/casa-release-5.4.0-68/data/alma/simmos/
CASA <2>: ls /home/casa/packages/RHEL6/release/casa-release-5.4.0-68/data/alma/simmos/
If a configuration file is not distributed with CASA but retrieved elsewhere, then the configuration file can be called by explicitly writing the full path to the location of the configuration file in the antennalist paramter of the simulator tasks.
Locating the Data Tables¶
The casadata package with all necessary runtime data tables is included in each tarball distribution of monolithic CASA. For modular CASA6, the casadata package is pip installed as one of the modules and located alongside other python packages in the python environment. Therefore the default location of the casadata tables depends on the type of CASA installation as follows:
Modular CASA 6 : located inside the venv created during installation
venv/lib/python3.6/site-packages/casadata/__data__
Monolithic CASA 6 : located inside the tar file bundle on Linux RedHat or CASA.app on Mac OSX
casa-6.x.y-z/data (Linux)
or
CASA.app/Contents/data (Mac)
A user-specified location for the casadata tables may be set in config.py using the --datapath
option. See the CASA API for more information.
Note that for when building from source code, the repository snapshot is taken from .casa/config.py
using datapath=[‘/some-directory/’].
Updating the Data Tables¶
Some data tables (such as Measures) are regularly updated from the originating sources in the casadata repository at NRAO. Other data updated less frequently is also stored in the repository, such as beam models, antenna and Jy/K correction tables, and the antenna configuration files for the CASA simulator. Many tasks such as tclean, simobserve, and calibration tasks, need an up-to-date data repository to work properly. Because of this, the casadata package within the CASA distribution must be updated occasionally as well.
For observatory use, the update period should not be longer than weekly in order to have the EOPs up-to-date for upcoming observations. The shortest reasonable update interval is daily and the recommended method is via rsync. For offline data analysis use, the update period should not be longer than monthly. Weekly update is recommended. Typically, the administrator of a CASA installation sets up a cron job to perform the update automatically.
Legacy installations processing old data do not have to be updated because the relevant contents of the Measures Tables is not changing anymore for the more distant past.
Note that currently the casadata package is only updated weekly. Users needing daily updates should use the custom location, repository copy, or rsync methods.
When using a shared site-wide installation of CASA, a system administrator may be needed
Updating the default location¶
The default casadata location can be updated to the latest version of the data tables using the following methods. An additional command to view the current version of the data tables is included for reference. This method is limited to weekly updates.
Modular CASA 6 : from the terminal with active venv
(venv) bash$ pip install --upgrade --extra-index-url https://go.nrao.edu/pypi casadata
(venv) bash$ pip list | grep casadata
Monolithic CASA 6 : from within a running CASA shell
CASA <1>: !update-data
CASA <2>: import importlib_metadata
CASA <3>: importlib_metadata.version('casadata')
or from the terminal:
bash$ cd <release top directory>
bash$ bin/pip3 install --upgrade --extra-index-url https://go.nrao.edu/pypi casadata
bash$ bin/pip3 list | grep casadata (for Linux)
or
bash$ Contents/MacOS/pip3 install --upgrade --extra-index-url https://go.nrao.edu/pypi casadata
bash$ Contents/MacOS/pip3 list | grep casadata (for Mac)
where <release top directory>
is the top directory of the tar file (i.e., casa-6.x.y-z for Linux RedHat and CASA.app for Mac OSX).
Warning
Temporary scratch space is used to hold the intermediate download artifacts. If you have limited storage or a disk quota, you must ensure at least 2GB of free space. Otherwise you must specify a TMP path to another location by executing the following:
bash$ mkdir /bigdisk/tmp
bash$ export TMPDIR=/bigdisk/tmp
Updating a custom location¶
If users cannot update casadata within a shared site-wide installation of CASA, or wish to have different versions of casadata installed concurrently, a custom location(s) outside of the CASA distribution can be specified and updated.
Custom locations are specified in the config.py
file in the users home ~/.casa directory using the rundata option. See the CASA API for more information. Alternatively, a specific location can be supplied to the update command.
Modular CASA 6 : from the terminal with active venv
(venv) bash$ python -m casatools --update-user-data
(venv) bash$ python -m casatools --update-user-data=/tmp/mydata
Monolithic CASA 6 : from the terminal
bash$ cd <release top directory>
bash$ bin/python3 -m casatools --update-user-data (Linux)
or
bash$ Contents/MacOS/python3 -m casatools --update-user-data (Mac)
bash$ bin/python3 -m casatools --update-user-data=/tmp/mydata (Linux)
or
bash$ Contents/MacOS/python3 -m casatools --update-user-data=/tmp/mydata (Mac)
This method pulls directly from the underlying casadata repository, not the package, and as such provides daily updates.
Updating via repository copy¶
The standard method of updating the default location of the casadata package is currently limited to weekly changes. The underlying casadata repository is updated daily, but the overhead of creating and publishing packages built from the repository currently limits the cadence to weekly.
Updating to a custom location triggers a direct copy from the casadata repository, and thus allows for more frequent daily updates. In situations where the default location needs daily updates, the custom location method may be used with a specified destination of the default location.
Note that if this is done the version number of the casadata package will no longer be useful for understanding which state of the runtime data is being used
Modular CASA 6 : from the terminal with active venv
(venv) bash$ cd <venv top directory>
(venv) bash$ python -m casatools --update-user-data=`find lib -name __data__`
Monolithic CASA 6 : from the terminal
bash$ cd <release top directory>
bash$ bin/python3 -m casatools --update-user-data=`find lib -name __data__` (Linux)
or
bash$ Contents/MacOS/python3 -m casatools --update-user-data='find lib -name __data__' (Mac)
Updating via rsync¶
The previous methods are intended to update the minimum set of runtime data necessary for CASA execution. The runtime data is a subset of a larger casadata repository. Users wishing to access the entire repository (or who prefer this method to the previous) may use rsync to pull update casadata from an NRAO rsync server.
This method also allows for daily updates.
Note that if this is done the version number of the casadata package will no longer be useful for understanding which state of the runtime data is being used
Modular and Monolithic CASA : from the terminal
rsync -avz rsync://casa-rsync.nrao.edu/casa-data <location of casadata tables>
This data repository contains runtime data, reference data and test data. However, the CASA project is currently (Fall 2020) working to better organize and partition this data so these instructions will probably stop working in the future.
Calibration & Visibilities¶
This section provides an overview of how to handle, interpret, and calibrate visibility data held in the MeasurementSet structure.
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/visibilities_import_export.ipynb
Visibilities Import Export¶
To use CASA to process your data, you first will need to get it into a form that is understood by the package. These are “MeasurementSets” for synthesis and single dish data, which is the purpose of this chapter. Importing images, or “image tables” as understood by CASA, is explained here.
There are a number of tasks used to fill telescope-specific data and to import/export standard formats. These are:
importasdm — import of ALMA data in ASDM format
importuvfits — import visibility data in UVFITS format
importfitsidi — import visibility data in the FITS-IDI format
importvla — import data from VLA that is in export format
importmiriad — import data from MIRIAD visibilities
importatca — import ATCA data that is in the RPFITS (archive) format
importgmrt — import GMRT data
importasap — convert ASAP (ATNF Spectral Analysis Package) into a CASA visibility data format
importnro — import NRO 45m data
exportasdm — convert a CASA MS into an ASDM
exportuvfits — export a CASA MS in UVFITS format
UV Data Import¶
Converting Telescope UV Data to a MeasurementSet
There are a number of tasks available to bring data in various forms into CASA as a MeasurementSet:
ALMA and VLA Science Data Model format via importasdm and importevla
historic VLA Archive format data via importvla
ATCA Data via importatca
MIRIAD Data from various telescopes via importmiriad
GMRT Data via importgmrt
UVFITS format can be imported into and exported from CASA (importuvfits, importfitsidi, and exportuvfits)
ALMA and VLA Filling of Science Data Model (ASDM) data
The ALMA and JVLA projects have agreed upon a common archival science data model (ASDM; sometimes also called SDM) format, and have jointly developed the software to fill this data into CASA. In the ASDM format, the bulk of the data is contained in large binary data format (BDF) tables, with the meta-data and ancillary information in XML tables. This is structured as a directory, like the MS, and was designed to be similar to the MS to facilitate conversion.
The content of an ASDM can be listed with the task asdmsummary:
#asdmsummary :: Summarized description of an ASDM dataset.
asdm = '' #Name of input ASDM directory
with an output that contains the list and positions of the antennas, followed by the metadata of each scan like observation time, source name, frequency and polarization setup:
Input ASDM dataset : TDEM0008.sb3373760.eb3580330.55661.22790537037
========================================================================================
ASDM dataset :TDEM0008.sb3373760.eb3580330.55661.22790537037
========================================================================================
Exec Block : ExecBlock_0
Telescope : JVLA
Configuration name : B
Observer name : Dr. Juergen Ott
The exec block started on 2011-04-10T05:28:13.200000000 and ended on 2011-04-10T10:27:12.300000256
27 antennas have been used in this exec block.
Id Name Make Station Diameter X Y Z
Antenna_0 ea01 UNDEFINED W36 25 -1606841.96 -5042279.689 3551913.017
Antenna_1 ea02 UNDEFINED E20 25 -1599340.8 -5043150.965 3554065.219
Antenna_2 ea03 UNDEFINED E36 25 -1596127.728 -5045193.751 3552652.421
Antenna_3 ea04 UNDEFINED W28 25 -1604865.649 -5042190.04 3552962.365
Antenna_4 ea05 UNDEFINED W08 25 -1601614.091 -5042001.653 3554652.509
Antenna_5 ea06 UNDEFINED N24 25 -1600930.06 -5040316.397 3557330.397
Antenna_6 ea07 UNDEFINED E32 25 -1597053.116 -5044604.687 3553058.987
Antenna_7 ea08 UNDEFINED N28 25 -1600863.684 -5039885.318 3557965.319
Antenna_8 ea09 UNDEFINED E24 25 -1598663.09 -5043581.392 3553767.029
Antenna_9 ea10 UNDEFINED N32 25 -1600781.039 -5039347.456 3558761.542
Antenna_10 ea11 UNDEFINED E04 25 -1601068.79 -5042051.91 3554824.835
Antenna_11 ea12 UNDEFINED E08 25 -1600801.926 -5042219.366 3554706.448
Antenna_12 ea14 UNDEFINED W12 25 -1602044.903 -5042025.824 3554427.832
Antenna_13 ea15 UNDEFINED W24 25 -1604008.742 -5042135.828 3553403.707
Antenna_14 ea16 UNDEFINED N12 25 -1601110.052 -5041488.079 3555597.439
Antenna_15 ea17 UNDEFINED W32 25 -1605808.656 -5042230.082 3552459.202
Antenna_16 ea18 UNDEFINED N16 25 -1601061.961 -5041175.88 3556058.022
Antenna_17 ea19 UNDEFINED W04 25 -1601315.893 -5041985.32 3554808.305
Antenna_18 ea20 UNDEFINED N36 25 -1600690.606 -5038758.734 3559632.061
Antenna_19 ea21 UNDEFINED E12 25 -1600416.51 -5042462.45 3554536.041
Antenna_20 ea22 UNDEFINED N04 25 -1601173.979 -5041902.658 3554987.518
Antenna_21 ea23 UNDEFINED E16 25 -1599926.104 -5042772.967 3554319.789
Antenna_22 ea24 UNDEFINED W16 25 -1602592.854 -5042054.997 3554140.7
Antenna_23 ea25 UNDEFINED N20 25 -1601004.709 -5040802.809 3556610.133
Antenna_24 ea26 UNDEFINED W20 25 -1603249.685 -5042091.404 3553797.803
Antenna_25 ea27 UNDEFINED E28 25 -1597899.903 -5044068.676 3553432.445
Antenna_26 ea28 UNDEFINED N08 25 -1601147.94 -5041733.837 3555235.956
Number of scans in this exec Block : 234
scan #1 from 2011-04-10T05:28:13.200000000 to 2011-04-10T05:33:35.500000256
Intents : OBSERVE_TARGET
Sources : 1331+305=3C286
Subscan #1 from 2011-04-10T05:28:13.200000000 to 2011-04-10T05:33:35.500000256
Intent : UNSPECIFIED
Number of integrations : 322
Binary data in uid:///evla/bdf/1302413292901
Number of integrations : 322
Time sampling : INTEGRATION
Correlation Mode : CROSS_AND_AUTO
Spectral resolution type : FULL_RESOLUTION
Atmospheric phase correction : AP_UNCORRECTED
SpectralWindow_0 : numChan = 256, frame = TOPO,
firstChan = 8484000000, chandWidth = 125000 x Polarization_0 : corr = RR,LL
scan #2 from 2011-04-10T05:33:35.500000256 to 2011-04-10T05:35:35.200000000
Intents : OBSERVE_TARGET
Sources : 1331+305=3C286
Subscan #1 from 2011-04-10T05:33:35.500000256 to 2011-04-10T05:35:35.200000000
Intent : UNSPECIFIED
Number of integrations : 119
Binary data in uid:///evla/bdf/1302413293280
Number of integrations : 119
Time sampling : INTEGRATION
Correlation Mode : CROSS_AND_AUTO
Spectral resolution type : FULL_RESOLUTION
Atmospheric phase correction : AP_UNCORRECTED
SpectralWindow_0 : numChan = 256, frame = TOPO,
firstChan = 8484000000, chandWidth = 125000 x Polarization_0 : corr = RR,LL
scan #3 from 2011-04-10T05:35:35.200000000 to 2011-04-10T05:36:34.999999488
Intents : OBSERVE_TARGET
Sources : 1331+305=3C286
Subscan #1 from 2011-04-10T05:35:35.200000000 to 2011-04-10T05:36:34.999999488
...
The importasdm task will fill SDM format data from ALMA and the JVLA into a CASA visibility data set (MS). All SDM formats from version 2 to the current version can be filled by importasdm.
The default inputs of importasdm are:
# importasdm -- Convert an ALMA Science Data Model observation into a
# CASA visibility file (MS)
asdm = '' # Name of input asdm directory (on
# disk)
vis = '' # Root name of the MS to be created.
# Note the .ms is NOT added
createmms = False # Create a multi-MS output
corr_mode = 'all' # specifies the correlation mode to be
# considered on input. A quoted string
# containing a sequence of ao, co,
# ac,or all separated by whitespaces
# is expected
srt = 'all' # specifies the spectral resolution
# type to be considered on input. A
# quoted string containing a sequence
# of fr, ca, bw, or all separated by
# whitespaces is expected
time_sampling = 'all' # specifies the time sampling
# (INTEGRATION and/or SUBINTEGRATION)
# to be considered on input. A quoted
# string containing a sequence of i,
# si, or all separated by whitespaces
# is expected
ocorr_mode = 'ca' # output data for correlation mode
# AUTO_ONLY (ao) or CROSS_ONLY (co) or
# CROSS_AND_AUTO (ca)
compression = False # Flag for turning on data compression
lazy = False # Make the MS DATA column read the ASDM
# Binary data directly (faster import,
# smaller MS)
asis = '' # Creates verbatim copies of the
# ASDMtables in the ouput MeasurementSet.
# Value given must be a string
# of table names separated by spaces;
# A * wildcard is allowed.
wvr_corrected_data = 'no' # Specifies which values are considerd
# in the SDM binary data to fill the
# DATA column in the MAIN table of the
# MS; yes for corrected, no for
# uncorrected, both for corrected and
# uncorrected (resulting in two MSs)
scans = '' # Processes only the specified scans.
# A scan specification consists of an
# exec block index followed by the :
# character, followed by a comma
# separated list of scan indexes or
# scan index ranges.
# (e.g. 0:1;1:2~6,8;2:,3:24~30),
ignore_time = False # All the rows of the tables Feed,
# History, Pointing, Source, SysCal,
# CalDevice, SysPower, and Weather are
# processed independently of the time
# range of the selected exec
# block / scan.
process_syspower = True # Process the SysPower table?
process_caldevice = True # Process the CalDevice table?
process_pointing = True # Process the Pointing table?
process_flags = True # Create online flags in the FLAG_CMD
# sub-table?
tbuff = 0.0 # Time padding buffer (seconds)
applyflags = False # Apply the flags to the MS.
savecmds = False # Save flag commands to an ASCII file
outfile = '' # Name of ASCII file to save flag
# commands
flagbackup = True # Back up flag column before applying
# flags.
verbose = False # Output lots of information while the
# filler is working
overwrite = False # Over write an existing MS(s)
bdfflags = False # Set the MS FLAG column according to
# the ASDM _binary_ flags
with_pointing_correction = False # Add (ASDM::Pointing::encoder -
# ASDM::Pointing::pointingDirection)
# to the value to be written in
# MS::Pointing::direction
convert_ephem2geo = True # if True, convert any attached
# ephemerides to the GEO reference
# frame (time-spacing not changed)
polyephem_tabtimestep = 0.0 # Timestep (days) for the tabulation
# of polynomial ephemerides.
# A value <= 0 disables tabulation.
If scans is set, then importasdm processes only the scans specified in the option’s value. This value is a semicolon separated list of scan specifications. A scan specification consists of an exec block index followed by the character ‘:’ followed by a comma separated list of scan indexes or scan index ranges. A scan index is relative to the exec block it belongs to. Scan indexes are 1-based while exec blocks are 0-based. The expressions
"0:1"
"2:2~6"
"0:1,1:2~6,8;2:,3:24~30"
"1,2"
"3:"
are all valid values for the selection. The “3:” selector will be interpreted as ‘all the scans of the exec block 3’. A scan index or a scan index range not preceded by an exec block index will be interpreted as ‘all the scans with such indexes in all the exec blocks’. By default all the scans are considered.
When process_flags=True the task will create online flags based on the Flag.xml, Antenna.xml and SpectralWindow.xml files and copy them to the FLAG_CMD sub-table of the MS. The flags will NOT be applied unless the parameter applyflags is set to True. Optionally, the flags can also be saved to an external ASCII file if savecmds is set to True. The flags can later be applied to the MS using task flagdata in list mode.
When bdfflags=True the task will apply online flags contained in the ASDM BDF data by calling the executable bdflags2MS which the user can also do from the OS prompt. This is recommended for ALMA data.
The option createmms prepares the output file for parallel processing and creates a multi-MS.
Specifics on importing Janksy VLA data with importasdm
As of CASA 5.4, the task importevla is no longer available to import JVLA data, but a lot of its functionality is replaced by importasdm. However, several additional steps are required to duplicate the behaviour of importevla when using importasdm, involving a difference in default parameters and the fact that some of the on-the-go flagging cannot be performed by importasdm.
To mimic the behaviour of importevla, change the following parameters in importasdm from their default settings:
ocorr_mode = ‘co’ to import cross-correlations only (discarding auto-correlations)* *
with_pointing_correction = True to add pointing corrections* *
process_flags = True (default) to read in the online flags, then applyflags = True to apply the online flags and/or savecmd = True to save flag commands to an ascii table.
For ephemeris objects: convert_ephem2geo = False
While online flags can thus be created by leaving the parameter process_flags = True by default, additional flagging steps need to be performed after importasdm to flag zero values and shadowing of antennas:
Shadow flags: use task flagdata, with mode = ‘shadow’ (and optionally reason = ‘shadow’). The parameters tolerance and addantenna can be specified in flagdata in the same way they were used in importevla. * *
Zero clipping flags: use task flagdata, with *mode = ‘clip’,* correlation = ‘ABS_ALL’, and clipzeros = True (and optionally reason = ‘clip’). Note that the non-default case in importevla where flagpol = False can be replicated by setting correlation=”ABS_RR, ABS_LL”.
Like importasdm, the task flagdata can also save the flagging commands to an ascii table by setting savepars = True. To NOT apply the flags (applyflags=False in importevla) add action=’calculate’ to flagdata. You may also chose to add a reason using the cmdreason argument, e.g. cmdreason=”CLIP_ZERO_ALL”.
WARNING: The task flagdata can only write out the flag commands for that invocation of flagdata. The default overwrite=True must be used to overwrite an existing file. In order to save the commands from all 3 possible flagging steps (importasdm, zero, and shadow) each step must be saved to a separate file, which must then be concatenated into a single file to be used to flag the data.
Import of ASDM data with option lazy=True
With lazy=False, importasdm will fill the visibilities into a newly created DATA column of the MS converting them from their binary format in the ASDM to the CASA Table format.
If, however, lazy is set to True, the task will create the DATA column with an ALMA data-specific storage manager, the asdmstman, which enables CASA to directly read the binary data from the ASDM with on-the-fly conversion. No redundant copy of the raw data is created.
This procedure has the advantage that it saves more than 60% disk space and at least in some cases makes the access to the DATA column ≥ 10% faster because the data I/O volume is decreased. For the same reason, it also accelerates the import itself by ca. a factor 2. The acceleration is particularly large in the applycal task and here particularly on standard SATA disks.
E.g., if your ASDM has a size of 36 GB, the import with default parameters will turn this into an MS of 73 GB size (total disk space consumption = 36 GB + 73 GB = 109 GB). With lazy=True, the imported MS has a size of only 2 GB (total disk space consumption = 36 GB + 2 GB = 38 GB). I.e. your total disk space savings are ca. 65%. Even when you compare to the case where you delete the ASDM after normal import, the solution with lazy import and keeping the ASDM will save you ca. 48% disk space (in the example above 38 GB compared to 73 GB).
The only caveats are the following:
You must not delete your ASDM. You can, however, move it but you have to update the reference stored in the MS. Symbolic links will work. See below on how to use the tool method ms.asdmref() to manipulate the ASDM reference.
The lazily imported DATA column is read-only. But in any normal data reduction, the DATA column (as opposed to CORRECTED_DATA) is treated as read-only anyway.
The lazily imported MS is numerically identical with the traditionally imported MS and so are all results derived from the MSs.
An important additional tool to manipulate lazily imported MSs is the new method ms.asdmref() in the ms tool. If the MS is imported from an ASDM with option lazy=True, the DATA column of the MS is virtual and directly reads the visibilities from the ASDM. A reference to the original ASDM is stored with the MS. If the ASDM needs to be moved to a different path, the reference to it in the MS needs to be updated. This can be achieved with ms.asdmref().
The method takes one argument: abspath. When called with abspath equal to an empty string (default), the method just reports the currently set ASDM path or an empty string if the ASDM path was not set, i.e. the MS was not lazily imported.
If you want to move the referenced ASDM to a different path, you can set the new absolute path by providing it as the value of abspath to the method.
ms.open('uid___A12345_X678_X910.ms',False)
ms.asdmref('/home/alma/myanalysis/uid___A12345_X678_X910')
ms.close()
will set the new location of the referenced ASDM to /home/alma/myanalysis/uid___A12345_X678_X910. Contrary to what one would expect from the parameter name, you can also provide a relative path as abspath. This path will be interpreted as relative to the location of the MS.
Info: the lazily imported MS itself can be moved without any restrictions independently from the referenced ASDM as long as the path to the ASDM remains accessible, even across file systems.
VLA: Filling data from archive format (importvla)
VLA data in archive format (i.e., as downloaded from the historic VLA data archive) are read into CASA from disk using the importvla task. The inputs are:
#importvla :: import VLA archive file(s) to a MeasurementSet:
archivefiles = '' #Name of input VLA archive file(s)
vis = '' #Name of output visibility file
bandname = '' #VLA frequency band name:''=>obtain all bands in archive files
frequencytol = 150000.0 #Frequency shift to define a unique spectral window (Hz)
project = '' #Project name: '' => all projects in file
starttime = '' #start time to search for data
stoptime = '' #end time to search for data
applytsys = True #apply nominal sensitivity scaling to data & weights
autocorr = False #import autocorrelations to ms, if set to True
antnamescheme = 'new' #'old' or 'new'; 'VA04' or '4' for ant 4
keepblanks = False #Fill scans with empty source names (e.g. tipping scans)?
evlabands = False #Use updated eVLA frequencies and bandwidths
The main parameters are archivefiles to specify the input VLA Archive format file names, and vis to specify the output MS name.
Info: The scaling of VLA data both before and after the June 2007 Modcomp-turnoff is fully supported, based on the value of applytsys.
Note that archivefiles takes a string or list of strings, as there are often multiple files for a project in the archive.
For example:
archivefiles = ['AP314_A950519.xp1','AP314_A950519.xp2']
vis = 'NGC7538.ms'
The importvla task allows selection on the frequency band. Suppose that you have 1.3 cm line observations in K-band and you have copied the archive data files AP314_A95019.xp* to your working directory and started casa. Then,
default('importvla')
archivefiles = ['AP314_A950519.xp1','AP314_A950519.xp2','AP314_A950519.xp3']
vis = 'ngc7538.ms'
bandname = 'K'
frequencytol = 10e6
importvla()
If the data is located in a different directory on disk, then use the full path name to specify each archive file, e.g.:
archivefiles=['/home/rohir2/jmcmulli/ALMATST1/Data/N7538/AP314_A950519.xp1',
'/home/rohir2/jmcmulli/ALMATST1/Data/N7538/AP314_A950519.xp2',
'/home/rohir2/jmcmulli/ALMATST1/Data/N7538/AP314_A950519.xp3']
Info: importvla will import the on-line flags (from the VLA system) along with the data. Shadowed antennas will also be flagged. The flags will be put in the MAIN table and thus available to subsequent tasks and tools. If you wish to revert to unflagged data, use flagmanager to save the flags (if you wish), and then use flagdata with mode=’manual’ and unflag=True to toggle off the flags.
The other parameters are:
Parameter applytsys
The applytsys parameter controls whether the nominal sensitivity scaling (based on the measured TSYS, with the weights scaled accordingly using the integration time) is applied to the visibility amplitudes or not. If True, then it will be scaled so as to be the same as AIPS FILLM (i.e. approximately in deciJanskys). Note that post-Modcomp data is in raw correlation coefficient and will be scaled using the TSYS values, while Modcomp-era data had this applied online. In all cases importvla will do the correct thing to data and weights based on an internal flag in the VLA Archive file, either scaling it or unscaling based on your choice for applytsys.
If applytsys=True and you see strange behavior in data amplitudes, it may be due to erroneous TSYS values from the online system. You might want to then fill with applytsys=False and look at the correlation coefficients to see if the behavior is as expected.
Parameter bandname
The bandname indicates the VLA Frequency band(s) to load, using the traditional bandname codes. These are:
‘4’ = 48-96 MHz
‘P’ = 298-345 MHz
‘L’ = 1.15-1.75 GHz
‘C’ = 4.2-5.1 GHz
‘X’ = 6.8-9.6 GHz
‘U’ = 13.5-16.3 GHz
‘K’ = 20.8-25.8 GHz
‘Q’ = 38-51 GHz
‘’ = all bands (default)
Note that as the transition from the VLA to JVLA progressed, the actual frequency ranges covered by the bands expanded, and additional bands were added (namely ‘S’ from 2-4 GHz and ‘A’ from 26.4-40 GHz).
Parameter frequencytol
The frequencytol parameter specifies the frequency separation tolerated when assigning data to spectral windows. The default is frequencytol=150000 (Hz). For Doppler tracked data, where the sky frequency changes with time, a frequencytol < 10000 Hz may produce too many unnecessary spectral windows.
Parameter project
You can specify a specific project name to import from archive files. The default ‘’ will import data from all projects in file(s) archivefiles.
For example for VLA Project AL519:
project = 'AL519' #this will work project = 'al519' #this will also work
while project=’AL0519’ will NOT work (even though that is what queries to the VLA Archive will print it as - sorry!).
Parameters starttime and stoptime
You can specify start and stop times for the data, e.g.:
starttime = '1970/1/31/00:00:00' stoptime = '2199/1/31/23:59:59'
Note that the blank defaults will load all data fitting other criteria.
Parameter autocorr
Note that autocorrelations are filled into the data set if autocorr=True. Generally for the VLA, autocorrelation data is not useful, and furthermore the imaging routine will try to image the autocorrelation data (it assumes it is single dish data) which will swamp any real signal. Thus, if you do fill the autocorrelations, you will have to flag them before imaging.
Parameter antnamescheme
The antnamescheme parameter controls whether importvla will try to use a naming scheme where JVLA antennas are prefixed with EA (e.g. ‘EA16’) and old VLA antennas have names prefixed with VA (e.g. ‘VA11’). Our method to detect whether an antenna is JVLA is not yet perfected, and thus unless you require this feature, simply use antnamescheme=’old’.
Parameter evlabands
The evlabands=True option is provided to allow users to access JVLA frequencies outside the standard VLA tunings (e.g. the extended C-band above 6 GHz).
ALERT: use of this option for standard VLA data will cause unexpected associations, such as X-band data below 8 GHz being extracted to C-band (as the JVLA C-band is 4–8 GHz). Use with care.
Import ATCA and CARMA data
There are several ways to import data from ATCA and CARMA into CASA. The data from these arrays has historically been processed in MIRIAD. For simple cases (single source and frequency) exporting from MIRIAD to UVFITS format and importing using importuvfits often works ok, although some fixes to the resulting MeasurementSet may be needed.
The importmiriad task reads MIRIAD visibility data and can handle multiple frequencies and sources in the input. Since it does not apply any calibration, make sure to apply it beforehand in MIRIAD.
The importatca task reads the ATCA archive format (RPFITS) directly, avoiding the need to go through MIRIAD to load the data. It can handle ATCA data from both the old and new (CABB) correlator.
Import MIRIAD visibilities (importmiriad)
The task importmiriad allows one to import visibilities in the MIRIAD data format to be converted to a MS. The task has mainly been tested on data from the ATCA and CARMA telescopes and the inputs are:
#importmiriad :: Convert a Miriad visibility file into a CASA MeasurementSet
mirfile = '' #Name of input Miriad visibility file
vis = '' #Name of output MeasurementSet
tsys = False #Use the Tsys to set the visibility weights
spw = 'all' #Select spectral windows
vel = '' #Select velocity reference (TOPO,LSRK,LSRD)
linecal = False #(CARMA) Apply line calibration
wide = 'all' #(CARMA) Select wide window averages
debug = 0 #Display increasingly verbose debug messages
The mirfile parameter specifies a single MIRIAD visibility file which should have any calibration done in MIRIAD already applied to it.
Set the tsys parameter to True to change the visibility weights from the MIRIAD default (usually the integration time) to the inverse of the noise variance using the recorded system temperature.
The spw parameter can be used to select all or some of the simultaneous spectral windows from the input file. Use the default of ‘all’ for all the data or use e.g., spw=’0,2’ to select the first and third window.
The vel parameter can be used to set the output velocity frame reference. For ATCA this defaults to ‘TOPO’ and for CARMA it defaults to ‘LSRK’. Only change this if your data comes out with the incorrect velocity.
The linecal parameter is only useful for CARMA data and can apply the line calibration if it is stored with the MIRIAD data.
The wide parameter is only useful for CARMA data and can select which of the wide-band channels should be loaded.
Import ATCA RPFITS data (importatca)
The data from the ATCA is available from the archive in files in the RPFITS format. These files can be imported into CASA with the importatca task.
#importatca :: Import ATCA RPFITS file(s) to a MeasurementSet
files =['*.C1234'] #Name of input ATCA RPFits file(s)
vis = 'c1234.ms' #Name of output visibility file
#(MeasurementSet)
options = '' #Processing options: birdie, reweight,
#noxycorr, fastmosaic, hires, noac
#(comma separated list)
spw = [-1] #Specify the spectral windows to use,
#default=all
nscans = [0, 0] #Number of scans to skip followed by
#number of scans to read
lowfreq = '0.1GHz' #Lowest reference frequency to select
highfreq = '999GHz' #Highest reference frequency to select
fields = [''] #List of field names to select
edge = 8 #Percentage of edge channels to flag.
#For combined zooms, this specifies
#the percentage for a single zoom
#window
The files parameter can take a string or a list of strings as input and also allows the use of wildcards as shown in the example above.
For older ATCA continuum data (before the CABB correlator, April 2009) use options=’birdie,reweight’ to suppress internally generated RFI.
The options parameter:
birdie - (pre-CABB data only) Discard edge channels and channels affected by internal RFI.
reweight - (pre-CABB data only) Suppress ringing of RFI spikes by reweighting of the lag spectrum
noxycorr - do not apply the xy phase correction as derived from the switched noise calibration, by default this is applied during loading of the data.
fastmosaic - use this option if you are loading mosaic data with many pointings and only one or two integrations per pointing. This option changes the tiling of the data to avoid excessive I/O.
hires - use this option if you have data in time binning mode (as used for pulsars) but you want to make it look like data with very short integration time (no bins).
noac - discard the auto-correlation data
The spw parameter takes a list of integers and can be used to select one or more of the simultaneous frequencies. With CABB there can be up to 34 spectra. The order of the frequency bands in the RPFITS file is: the two continuum bands (0 and 1), followed by the zoom bands for the first frequency and then the zoom bands for the second frequency. Note that this spw parameter does not take a string with wildcards. Use spw=-1 to get all the data.
The nscans parameter can be used to select part of a file, e.g., to retrieve a few test scans for a quick look.
The lowfreq and highfreq parameters select data based on the reference frequency.
The fields parameter selects data based on the field/source name.
The edge parameter specifies how many edge channels to discard as a percentage of the number of channels in each band. E.g., the default value of 8 will discard 8 channels from the top and bottom of a 2048 channel spectrum.
UVFITS Import
The UVFITS format is not exactly a standard, but is a popular archive and transport format nonetheless. CASA supports UVFITS files written by the AIPS FITTP task, and others.
UVFITS is supported for both import and export.
Import using importuvfits
To import UVFITS format data into CASA, use the importuvfits task:
#In CASA: inp(importuvfits)
fitsfile = '' #Name of input UVFITS file
vis = '' #Name of output visibility file (MS)
antnamescheme = 'old' #For VLA only; 'new' or 'old'; 'VA04' or '04' for VLA ant 4
This is straightforward, since all it does is read in a UVFITS file and convert it as best it can into a MS.
For example:
importuvfits(fitsfile='NGC5921.fits',vis='ngc5921.ms')
Here is a hint for handling CARMA data loaded into CASA using importuvfits:
tb.open("c0104I/ANTENNA",nomodify=False)
namelist=tb.getcol("NAME").tolist()
for i in range(len(namelist)):
name = 'CA'+namelist[i]
print ' Changing '+namelist[i]+' to '+name
namelist[i]=name
tb.putcol("NAME",namelist)
tb.close()
Import using importfitsidi
Some uvfits data is written in the FITS-IDI standard. Those files can be imported into CASA with the importfitsidi task:
#importfitsidi :: Convert a FITS-IDI file to a CASA visibility data set
fitsidifile = [''] #Name(s) of input FITS-IDI file(s)
vis = '' #Name of output visibility file (MS)
constobsid = False #If True, give constant obs ID==0 to
#the data from all input fitsidi
#files (False = separate obs id for
#each file)
scanreindexgap_s = 0.0 #min time gap (seconds) between
#integrations to start a new scan
The constobs parameter can be used to give all visibilities the same observation id of 0. scanreindexgap_s controls the gap that defines different scans.
Example:
importfitsidi(fitsidifile='NGC1300.fits',vis='NGC1300.ms')
MeasurementSet Export¶
Convert a MeasurementSet to UVFITS
Export using exportuvfits
The exportuvfits task will take a MS and write it out in UVFITS format. The defaults are:
#exportuvfits :: Convert a CASA visibility data set to a UVFITS file:
vis = '' #Name of input visibility file
fitsfile = '' #Name of output UV FITS file
datacolumn = 'corrected' #Visibility file data column
field = '' #Select field using field id(s) or field name(s)
spw = '' #Select spectral window/channels
antenna = '' #Select data based on antenna/baseline
timerange = '' #Select data based on time range
avgchan = 1 #Channel averaging width (value > 1 indicates averaging)
writesyscal = False #Write GC and TY tables, (Not yet available)
multisource = True #Write in multi-source format
combinespw = True #Export the spectral windows as IFs
padwithflags = True #Fill in missing data with flags to fit IFs
writestation = True #Write station name instead of antenna name
overwrite = False #Overwrite output file if it exists?
For example:
exportuvfits(vis='ngc5921.split.ms',
fitsfile='NGC5921.split.fits',
multisource=False)
The MS selection parameters field, spw, antenna, and timerange follow the standard selection syntax.
The datacolumn parameter chooses which data-containing column of the MS is to be written out to the UV FITS file. Choices are: ‘data’, ‘corrected’, and ‘model’.
There are a number of special parameters that control what is written out. These are mostly here for compatibility with AIPS.
The writesyscal parameter toggles whether GC and TY extension tables are written. These are important for VLBA data, and for JVLA data.
ALERT: The writesyscal option is not yet available.
The multisource parameter determines whether the UV FITS file is a multi-source file or a single-source file, if you have a single-source MS or choose only a single source. Note: the difference between a single-source and multi-source UVFITS file here is whether it has a source (SU) table and the source ID in the random parameters. Some programs (e.g. difmap) only accept single-source files. If you select more than one source in fields, then the multisource parameter will be overridden to be True regardless.
The combinespw parameter allows, if some conditions are met, exporting of all spectral windows (SpW) as a set of “IF”s in a single “FREQID” setup instead of giving each SpW its own FREQID in the FITS file. In this context an IF (Intermediate Frequency) is a specialization of an SpW, where each IF in a UV FITS file must have the same number of channels and polarizations, each channel must have the same width, and each IF must be present (even if flagged) throughout the entire observation. If these conditions are not met the data must be exported using multiple FREQIDs, the UV FITS equivalent of a general SpW. This matters since many (sub)programs will work with multiple IFs, but not multiple FREQIDs. For example, a UV FITS file with multiple FREQIDs can be read by AIPS, but you may find that you have to separate the FREQIDs with SPLIT before you can do very much with them. Therefore combinespw should be True if possible. Typically MSes where each band was observed simultaneously can be exported with combinespw=True. MSes where the tuning changed with time, e.g. 10 minutes at 4.8 GHz followed by 15 minutes at 8.4 GHz, should be exported to multiple UV FITS files using spw to select one tuning (set of simultaneous SpWs) per file.
The writestation parameter toggles the writing of the station name instead of antenna name.
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/visibility_data_selection.ipynb
Visibility Data Selection¶
Once in MS form, subsets of the data can be operated on using the tasks and tools. In CASA, there are three common data selection parameters used in the various tasks: field, spw, and selectdata. In addition, the selectdata parameter, if set to True, will open up a number of other sub-parameters for selection. The selection operation is unified across all the tasks. The available selectdata parameters may not be the same in all tasks. But if present, the same parameters mean the same thing and behave in the same manner when used in any task.
For example:
field = '' #field names or index of calibrators ''==>all
spw = '' #spectral window:channels: ''==>all
selectdata = False #Other data selection parameters
versus
field = '' #field names or index of calibrators ''==>all
spw = '' #spectral window:channels: ''==>all
selectdata = True #Other data selection parameters
timerange = '' #time range: ''==>all
uvrange = '' #uv range''=all
antenna = '' #antenna/baselines: ''==>all
scan = '' #scan numbers
msselect = '' #Optional data selection (Specialized. but see help)
The following are the general syntax rules and descriptions of the individual selection parameters of particular interest for the tasks.
The full documentation of the MeasurementSet data selection syntax can be found on the CASAcore documentation page, document 263 “Measurement Selection Syntax” see also CASA Memo 3. Links to relevant subsections are provided in the subsections below.
General selection syntax¶
Most of the selections are effected through the use of selection strings. This sub-section describes the general rules used in constructing and parsing these strings. Note that some selections are done through the use of numbers or lists. There are also parameter-specific rules that are described under each parameter.
All lists of basic selection specification-units are comma separated lists and can be of any length. White-spaces before and after the commas (e.g. ‘3C286, 3C48, 3C84’) are ignored, while white-space within sub-strings is treated as part of the sub-string (e.g. ‘3C286, VIRGO A, 3C84’). In some cases, spaces need to be quoted, e.g. “‘spw 1’” (note the double quote around the single quotes).
All integers can be of any length (in terms of characters) composed of the characters 0–9. Floating point numbers can be in the standard format (DIGIT.DIGIT, DIGIT., or .DIGIT) or in the mantissa-exponent format (e.g. 1.4e9). Places where only integers make sense (e.g. IDs), if a floating point number is given, only the integer part is used (it is truncated). CASA 6 internally promotes integers to doubles, and for tasks CASA 6 ensures that the parameter values are converted to the internally acceptable type.
Range of numbers (integers or real numbers) can be given in the format ‘N0~N1’. For integer ranges, it is expanded into a list of integers starting from N0 (inclusive) to N1 (inclusive). For real numbers, it is used to select all values present for the appropriate parameter in the MeasurementSet between N0 and N1 (including the boundaries). Note that the '~'
(tilde) character is used rather than the more obvious ‘-‘ in order to accommodate hyphens in strings and minus signs in
numbers.
Wherever appropriate, units can be specified. The units are used to convert the values given to the units used in the MeasurementSet. For ranges, the unit is specified only once (at the end) and applies to both the range boundaries.
String Matching¶
String matching can be done in three ways. Any component of a comma separated list that cannot be parsed as a number, a number range, or a physical quantity is treated as a regular expression or a literal string. If the string does not contain the characters ‘*’, ‘{‘, ‘}’ or ‘?’, it is treated as a literal string and used for exact matching. If any of the above mentioned characters are part of the string, they are used as a regular expression. As a result, for most cases, the user does not need to supply any special delimiters for literal strings and/or regular expressions. For example:
field = '3' #match field ID 3 and not select field named "3C286".
field = '3*' #used as a pattern and matched against field names. If
#names like "3C84", "3C286", "3020+2207" are found,
#all will match. Field ID 3 will not be selected
#(unless of course one of the above mentioned field
#names also correspond to field ID 3!).
field = '30*' #will match only with "3020+2207" in above set.
However if it is required that the string be matched exclusively as a regular expression, it can be supplied within a pair of ‘/’ as delimiters (e.g. ‘/.+BAND.+/’). A string enclosed within double quotes (‘”’) is used exclusively for pattern matching (patterns are a simplified form of regular expressions - used in most UNIX commands for string matching). Patterns are internally converted to equivalent regular expressions before matching. See the Unix command “info regex”, or visit http://www.regular-expressions.info, for details of regular expressions and patterns.
Strings can include any character except the following:
',' ';' '"' '/' NEWLINE
(since these are part of the selection syntax). Strings that do not contain any of the characters used to construct regular expressions or patterns are used for exact matches. Although it is highly discouraged to have name in the MS containing the above mentioned reserved characters, if one does choose to include the reserved characters as parts of names etc., those names can only be matched against quoted strings (since regular expression and patterns are a super-set of literal strings – i.e., a literal string is also a valid regular expression).
This leaves ‘”’, ‘*’, ‘{‘, ‘}’ or ‘?’ as the list of printable character that cannot be part of a name (i.e., a name containing this character can never be matched in a MSSelection expression). These will be treated as pattern-matching even inside double double quotes (‘” “’). There is currently no escape mechanism (e.g. via a backslash).
Some examples of strings, regular expressions, and patterns:
The string ‘LBAND’ will be used as a literal string for exact match. It will match only the exact string LBAND.
The wildcarded string ‘*BAND*’ will be used as a string pattern for matching. This will match any string which has the sub-string BAND in it.
The string ‘”*BAND*”’ will also be used as a string pattern, matching any string which has the sub-string BAND in it.
The string ‘/.+BAND.+/’ will be used as a regular expression. This will also match any string which as the sub-string BAND in it. (the .+ regex operator has the same meaning as the * wildcard operator of patterns).
The field Parameter¶
The field parameter is a string that specifies which field names or ids will be processed in the task or tool. The field selection expression consists of comma separated list of field specifications inside the string.
Field specifications can be literal field names, regular expressions or patterns (see above). Those fields for which the entry in the NAME column of the FIELD MS sub-table match the literal field name/regular expression/pattern are selected. If a field name/regular expression/pattern fails to match any field name, the given name/regular expression/pattern are matched against the field code. If still no field is selected, an exception is thrown.
Field specifications can also be given by their integer IDs. IDs can be a single or a range of IDs. Field ID selection can also be done as a boolean expression. For a field specification of the form ‘>ID’, all field IDs greater than ID are selected. Similarly for ‘<ID’ all field IDs less than the ID are selected.
For example, if the MS has the following observations:
MS summary:
==========
FIELDID SPWID NChan Pol NRows Source Name
---------------------------------------------------------------
0 0 127 RR 10260 0530+135
1 0 127 RR 779139 05582+16320
2 0 127 RR 296190 05309+13319
3 0 127 RR 58266 0319+415
4 0 127 RR 32994 1331+305
5 1 1 RR,RL,LL,RR 23166 KTIP
one might select
field = '0~2,KTIP' #FIELDID 0,1,2 and field name KTIP
field = '0530+135' #field 0530+135
field = '05*' #fields 0530+135,05582+16320,05309+13319
Field Selection - Expression for selection data along frequency axis (CASAcore docs)
Note : If field names are strings containing only integers, it is advised that double quotes be placed around each field name. For example, field=’ “2” ‘ will select the field whose name is “2” (if it exists), whereas field=’2’ will select the field whose numeric id is 2 (if it exists). This is especially important when field names are entirely numeric and have leading zeros (for example : field=‘“00234”’). A list of multiple fields specified as field=’ 23, “00234” ‘ will select two fields. One will be a field with name “00234”. The other will be a field with id 23 (if it exists) or a field whose name is “23” (if it exists), checked in this order of precedence. “
The spw Parameter¶
The spw parameter is a string that indicates the specific spectral windows and the channels within them to be used in subsequent processing. Spectral window selection (‘SPWSEL’) can be given as a spectral window integer ID, a list of integer IDs, a spectral window name specified as a literal string (for exact match) or a regular expression or pattern.
The specification can be via frequency ranges or by indexes. A range of frequencies are used to select all spectral windows which contain channels within the given range. Frequencies can be specified with an optional unit — the default unit being Hz. Other common choices for radio and mm/sub-mm data are *kHz, MHz**,* and GHz. You will get the entire spectral windows, not just the channels in the specified range. You will need to do channel selection (see below) to do that.
The spw can also be selected via comparison for integer IDs. For example, ‘>ID’ will select all spectral windows with ID greater than the specified value, while ‘<ID’ will select those with ID lesser than the specified value.
Spectral window selection using strings follows the standard rules:
spw = '1' #SPWID 1
spw = '1,3,5' #SPWID 1,3,5
spw = '0~3' #SPWID 0,1,2,3
spw = '0~3,5' #SPWID 0,1,2,3 and 5
spw = '<3,5' #SPWID 0,1,2,3 and 5
spw = '*' #All spectral windows
spw = '1412~1415MHz' #Spectral windows containing 1412-1415MHz.
In some cases, the spectral windows may allow specification by name. For example,
spw = '3mmUSB, 3mmLSB' #choose by names (if available)might be meaningful for the dataset in question.
Note that the order in which multiple spws are given may be important for other parameters. For example, the mode = ‘channel’ in clean uses the first spw as the origin for the channelization of the resulting image cube.
Channel selection in the spw parameter
Channel selection can be included in the spw string in the form ‘SPWSEL:CHANSEL’ where CHANSEL is the channel selector. In the end, the spectral selection within a given spectral window comes down to the selection of specific channels. We provide a number of shorthand selection options for this. These CHANSEL options include:
Channel ranges: ‘START~STOP’
Frequency ranges: ‘FSTART~FSTOP’
Channel striding/stepping: ‘START~STOP^STEP’ or ‘FSTART~FSTOP^FSTEP’
The most common selection is via channel ranges ‘START~STOP’ or frequency ranges ‘FSTART~FSTOP’:
spw = '0:13~53' #spw 0, channels 13-53, inclusive
spw = '0:1413~1414MHz' #spw 0, 1413-1414MHz section only
All ranges are inclusive, with the channel given by, or containing the frequency given by, START and STOP plus all channels between included in the selection. You can also select the spectral window via frequency ranges ‘FSTART~FSTOP’, as described above:
spw = '1413~1414MHz:1413~1414MHz' #channels falling within 1413~1414MHz
spw = '*:1413~1414MHz' #does the same thing
You can also specify multiple spectral window or channel ranges, e.g.
spw = '2:16, 3:32~34' #spw 2, channel 16 plus spw 3 channels 32-34
spw = '2:1~3;57~63' #spw 2, channels 1-3 and 57-63
spw = '1~3:10~20' #spw 1-3, channels 10-20
spw = '*:4~56' #all spw, channels 4-56
Note the use of the wildcard in the last example.
A step can be also be included using ‘^STEP’ as a postfix:
spw = '0:10~100^2' #chans 10,12,14,...,100 of spw 0
spw = ':^4' #chans 0,4,8,... of all spw
spw = ':100~150GHz^10GHz' #closest chans to 100,110,...,150GHz
A step in frequency will pick the channel in which that frequency falls, or the nearest channel.
Frequency Selection - Expression for selection along the frequency axis (CASAcore docs)
The selectdata Parameters¶
The selectdata parameter, if set to True (default), will expand the inputs to include a number of sub-parameters, given below and in the individual task descriptions (if different). If selectdata = False, then the sub-parameters are treated as blank for selection by the task.
The common selectdata expanded sub-parameters are:
The antenna Parameter¶
The antenna selection string is a semi-colon (‘;’) separated list of baseline specifications. A baseline specification is of the form:
‘ANT1’ — Select all baselines including the antenna(s) specified by the selector ANT1.
‘ANT1&’ — Select only baselines between the antennas specified by the selector ANT1.
‘ANT1&ANT2’ — Select only the cross-correlation baselines between the antennas specified by selector ANT1 and antennas specified by selector ANT2. Thus ‘ANT1&’ is an abbreviation for ‘ANT1&ANT1’.
‘ANT1&&ANT2’ — Select only auto-correlation and cross-correlation baselines between antennas specified by the selectors ANT1 and ANT2. Note that this is what the default antenna=’’ gives you.
‘ANT1&&&’ — Select only autocorrelations specified by the selector ANT1.
The selectors ANT1 and ANT2 are comma-separated lists of antenna integer-IDs or literal antenna names, patterns, or regular expressions. The ANT strings are parsed and converted to a list of antenna integer-IDs or IDs of antennas whose name match the given names/pattern/regular expression. Baselines corresponding to all combinations of the elements in lists on either side of ampersand are selected.
Integer IDs can be specified as single values or a range of integers. When items of the list are parsed as literal strings or regular expressions or patterns. All antenna names that match the given string (exact match)/regular expression/pattern are selected.
ALERT: Just for antenna selection, a user supplied integer (or integer list) is converted to a string and matched against the antenna name. If that fails, the normal logic of using an integer as an integer and matching it with antenna index is done. Note that currently there is no method for specifying a pure index (e.g. a number that will not first be checked against the name).
The comma is used only as a separator for the list of antenna specifications. The list of baselines specifications is a semi-colon separated list, e.g.
antenna = '1~3 & 4~6 ; 10&11'
will select baselines between antennas 1,2,3 and 4,5,6 (‘1&4’, ‘1&5’, …, ‘3&6’) plus baseline ‘10&11’.
The wildcard operator (‘*’) will be the most often used pattern. To make it easy to use, the wildcard (and only this operator) can be used without enclosing it in quotes. For example, the selection
antenna = 'VA*'
will match all antenna names which have ‘VA’ as the first 2 characters in the name (irrespective of what follows after these characters).
There is also a negation operator “*!**”* that can be used to de-select antennas or baselines.
Some examples:
antenna='' #shows blank autocorr pages
antenna='*&*' #does not show the autocorrs
antenna='*&&*' #show both auto and cross-cor (default)
antenna='*&&&' #shows only autocorrs
antenna='5&*' #shows non-auto baselines with AN 5
antenna='5,6&&&' #AN 5 and 6 autocor
antenna='5&&&;6&*' #AN 5 autocor plus cross-cors to AN 6
antenna='5,6,7&' #all baselines in common between antennas 5, 6, and 7
antenna='!5' #baselines not involving AN 5
Antenna numbers as names: Needless to say, naming antennas such that the names can also be parsed as a valid token of the syntax is a bad idea. Nevertheless, antenna names that contain any of the reserved characters and/or can be parsed as integers or integer ranges can still be used by enclosing the antenna names in double quotes (‘ “ANT” ‘). E.g. the string
antenna = '10~15,21,VA22'
will expand into an antenna ID list 10,11,12,13,14,15,21,22 (assuming the index of the antenna named ‘VA22’ is 22). If, however, the antenna with ID index 50 is named ‘21’, then the string
antenna = '10~15,21,VA22'
will expand into an antenna ID list of 10,11,12,13,14,15,50,22. Keep in mind that numbers are FIRST matched against names, and only against indices if that matching fails. There is currently no way to force a selection to use the index, and if there an antenna with that name it will select that.
Read elsewhere (e.g. info regex under Unix) for details of regular expression and patterns.
Antenna stations
Instead of antenna names, the antenna station names are also accepted by the selection syntax., e.g. ‘N15’ for the JVLA.
ANT@STATION selection syntax
Sometimes, data from multiple array configurations are stored in a single MS. But some antennas may have been moved during reconfiguration and the* **’ANT@STATION’* syntax can distinguish between them. ‘ANT’ is the antenna name or index and ‘STATION’ is the antenna station name, e.g., ‘EA12@W03’ selects antenna EA012 but only at times when it is positioned on station W03. Wildcards are accepted, e.g. ‘EA12@*’ selects all visibilities from antenna EA12, and ‘*@W03’ would select all antennas that are located on station ‘W03’ during any observations included in the MS.
Antenna/Baseline Selection - Expression for selection along baseline/antenna aixs (CASAcore docs)
The scan Parameter¶
The scan parameter selects the scan ID numbers of the data. There is currently no naming convention for scans. The scan ID is filled into the MS depending on how the data was obtained, so use this with care.
Examples:
scan = '3' #scan number 3.
scan = '1~8' #scan numbers 1 through 8, inclusive
scan = '1,2,4,6' #scans 1,2,4,6
scan = '<9' #scans <9 (1-8)NOTE: ALMA and VLA/JVLA number scans starting with 1 and not 0. You can see what the numbering is in your MS using the **listobs** task with *verbose=True.*
Scan/Sub-array Selection - Expression for selection based on scan or sub-array indices (CASAcore docs)
The timerange Parameter¶
The time strings in the following (T0, T1 and dT) can be specified as YYYY/MM/DD/HH:MM:SS.FF. The time fields (i.e., YYYY, MM, DD, HH, MM, SS and FF), starting from left to right, may be omitted and they will be replaced by context sensitive defaults as explained below.
Some examples:
timerange=’T0~T1’: Select all time stamps from T0 to T1. For example:
timerange = '2007/10/09/00:40:00 ~ 2007/10/09/03:30:00'
Note that fields missing in T0 are replaced by the fields in the time stamp of the first valid row in the MS. For example,
timerange = '09/00:40:00 ~ 09/03:30:00'
where the* **YY/MM/* part of the selection has been defaulted to the start of the MS.
Fields missing in T1, such as the date part of the string, are replaced by the corresponding fields of T0 (after its defaults are set). For example:
timerange = '2007/10/09/22:40:00 ~ 03:30:00'
does the same thing as above.
timerange=’T0’: Select all time stamps that are within an integration time of T0. For example:
timerange = '2007/10/09/23:41:00'
Integration time is determined from the first valid row (more rigorously, an average integration time should be computed). Default settings for the missing fields of T0 are as in the first example.
timerange=’T0+dT’: Select all time stamps starting from T0 and ending with time stamp T0+dT. For example:
timerange = '23:41:00+01:00:00'
picks an hour-long chunk of time.
Defaults of T0 are set as usual. Defaults for dT are set from the time corresponding to MJD=0. Thus, dT is a specification of length of time from the assumed nominal “start of time”.
timerange=’>T0’: Select all times greater than T0. For example:
timerange = '>2007/10/09/23:41:00'
timerange = '>23:41:00' #Same thing without day specification
Default settings for T0 are as above.
timerange=’<T1’: Select all times less than T1. For example:
timerange = '<2007/10/09/23:41:00'
Default settings for T1 are as above.
An ultra-conservative selection might be:
timerange = '1960/01/01/00:00:00~2020/12/31/23:59:59'
which would choose all possible data!
Time Selection - Expression for selection data along time axisTime Selection (CASAcore docs)
The uvrange Parameter¶
Rows in the MS can also be selected based on the uv-distance or physical baseline length that the visibilities in each row correspond to. This uvrange can be specified in various formats.
The basic building block of uv-distance specification is a valid number with optional units in the format N[UNIT] (the unit in square brackets is optional). We refer to this basic building block as UVDIST. The default unit is meter. Units of length (such as ‘m’ and ‘km’) select physical baseline distances (independent of wavelength). The other allowed units are in wavelengths (such as ‘lambda’, ‘klambda’ and ‘Mlambda’ and are true uv-plane radii
If only a single UVDIST is specified, all rows, the uv-distance of which exactly matches the given UVDIST, are selected.
UVDIST can be specified as a range in the format ‘N0~N1[UNIT]’ ](where N0 and N1 are valid numbers). All rows corresponding to uv-distance between N0 and N1 (inclusive) when converted the specified units are selected.]
UVDIST can also be selected via comparison operators. When specified in the format ‘>UVDIST’, all visibilities with uv-distances greater than the given UVDIST are selected. Likewise, when specified in the format ‘<UVDIST’, all rows with uv-distances less than the given UVDIST are selected.
Any number of above mentioned uv-distance specifications can be given as a comma-separated list.
Examples:
uvrange = '100~200km' #an annulus in physical baseline length
uvrange = '24~35Mlambda, 40~45Mlambda' #two annuli in units of mega-wavelengths
uvrange = '< 45klambda' #less than 45 kilolambda
uvrange = '> 0lambda' #greater than zero length (no auto-corrs)
uvrange = '100km' #baselines of length 100km
uvrange = '100klambda' #uv-radius 100 kilolambda
UV-distance Selection - Expression for selection based on uv-distance (CASAcore docs)
The correlation Parameter¶
The correlation parameter will select between different correlation products. They can be either the correlation ID or values such as ‘XX’, ‘YY’, ‘XY’, ‘YX’, ‘RR’, ‘LL’, ‘RL’, ‘LR’.
Polarization Selection - Expression for selection along the polarization axis (CASAcore docs)
The intent Parameter¶
intent is the scan intent that was specified when the observations were set up. They typically describe what was intended with a specific scan, i.e. a flux or phase calibration, a bandpass, a pointing, an observation of your target, or something else or a combination. The format for the scan intents of your observations are listed in the logger when you run listobs. Minimum matching with wildcards will work, like ‘*BANDPASS*’. This is especially useful when multiple intents are attached to scans.
‘Scan Intent’ based selection - Selection by intent (CASAcore docs)
The observation Parameter¶
The observation parameter can select between different observation IDs. They will be assigned to parts of a combined data set during a run of concat. Each input MS will receive its own observation id in the process.
The feed Parameter¶
The feed parameter can select between different feeds, e.g. for different feeds in a single dish multibeam array.
The msselect Parameter¶
More complicated selections within the MS structure are possible using the Table Query Language (TaQL). This is accessed through the msselect parameter.
Note that the TaQL syntax does not follow the rules given in above for our other selection strings. TaQL is explained in more detail in CASAcore NOTE 199 — Table Query Language. The specific columns of the MS are given in the most recent MS specification document.
Most selection can be carried out using the other selection parameters. However, these are merely shortcuts to the underlying TaQL selection. For example, field and spectral window selection can be done using msselect rather than through field or spw:
msselect='FIELD_ID == 0' #Field id 0 only
msselect='FIELD_ID <= 1' #Field id 0 and 1
msselect='FIELD_ID IN [1,2]' #Field id 1 and 2
msselect='FIELD_ID==0 && DATA_DESC_ID==3' #Field id 0 in spw id 3 only
ALERT: The msselect style parameters will be phased out of the tasks. TaQL selection will still be available in the Toolkit.
This page describes the syntax for the various expressions for selecting data from the MeasurementSet, implemented in the MSSelection module of CASACore. All expressions consist of a comma or semi-colon separated list of specifications. The MSSelection module can also be used for other tables that follow the general data base design of the MeasurementSet. The CASA CalTables is an example which also uses the MSSelection module for selection.The most up to date document will always be on the CASACore GitHub Repository. The content in this page is derived from this GitHub snapshot.
Using *MSSelection* expressions in TaQL - MSSelection expressions can also be used in pure-TaQL expressions (CASAcore docs)
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/data_examination.ipynb
Data Examination/Editing¶
Plotting and flagging visibility data in CASA
Visibility Information¶
There are tasks provided for basic listing and manipulation of MeasurementSet data and metadata. These include the following tasks, which are described in more detail in the subsequent pages.
listsdm — summarize the contents of an SDM
listobs — summarize the contents of a MS
listpartition — list the partition structure of a Multi-MS
vishead — list and change the metadata contents of a MS
visstat — statistics on data in a MS
plotants — plotting antenna locations
plotms — plotting uv-coverages
plotweather — VLA weather statistics, calculation of opacities
browsetable — examining an MS
MeasurementSet Summary¶
The MeasurementSet is the way CASA stores visibility data (the MS definition can be found in the Reference Material section). This page describes theree tasks to gain access to information stored in the MS: listobs displays observational details such as spatial (field), spectral (spectral window), temporal (scans), and polarization setup of an MS; listpartition provides information on how a MS was subdivided by the partition task (used for parallelized processing); listvis prints out the visibility values themselves.
Summarizing your MS (listobs)
An observational summary of the MS contents can be displayed with the listobs task. The inputs are:
vis = 'day2_TDEM0003_10s_norx' #Name of input visibility file (MS)
selectdata = True #Data selection parameters
field = '' #Field names or field index
#numbers: '' ==>all, field='0~2,3C286'
spw = '' #spectral-window/frequency/channel
antenna = '' #antenna/baselines: ''==>all, antenna ='3,VA04'
timerange = '' #time range: ''==>all,timerange='09:14:0~09:54:0'
correlation = '' #Select data based on correlation
scan = '' #scan numbers: ''==>all
intent = '' #Select data based on observation intent: ''==>all
feed = '' #multi-feed numbers: Not yet implemented
array = '' #(sub)array numbers: ''==>all
uvrange = '' #uv range: ''==>all; uvrange
#='0~100klambda', default units=meters
observation = '' #Select data based on observation ID: ''==>all
verbose = True
listfile = '' #Name of disk file to write output: ''==>to terminal
listunfl = False #List unflagged row counts?
#If true, it can have significant negative performance
#impact
The summary (of the selected data) will be written to the logger, to the casapy-YYYYMMDD-HHMMSS.log file, and optionally to a file specified in the listfile parameter. For example,
listobs('n5921.ms')
results in a logger message like the following (also the format if a ‘listfile’ text file is requested):
listobs(vis="day2_TDEM0003_10s_norx",selectdata=True,spw="",field="",
antenna="",uvrange="",timerange="",correlation="",scan="",
intent="",feed="",array="",observation="",verbose=True,
listfile="",listunfl=False)
================================================================================
MeasurementSet Name: /Users/jott/casa/casatest/casa4.0/irc/day2_TDEM0003_10s_norx MS Version 2
================================================================================
Observer: Mark J. Mark Claussen Project: T.B.D.
Observation: EVLA
Data records: 290218 Total integration time = 10016 seconds
Observed from 26-Apr-2010/03:21:56.0 to 26-Apr-2010/06:08:52.0 (UTC)
ObservationID = 0 ArrayID = 0
Date Timerange (UTC) Scan FldId FieldName nRows SpwIds Average Interval(s) ScanIntent
26-Apr-2010/03:21:51.0 - 03:23:21.0 5 2 J0954+1743 2720 [0, 1] [10, 10]
03:23:39.0 - 03:28:25.0 6 3 IRC+10216 9918 [0, 1] [10, 10]
03:28:38.0 - 03:29:54.0 7 2 J0954+1743 2700 [0, 1] [10, 10]
03:30:08.0 - 03:34:53.5 8 3 IRC+10216 9918 [0, 1] [10, 10]
...
(nRows = Total number of rows per scan)
Fields: 4
ID Code Name RA Decl Epoch SrcId nRows
2 D J0954+1743 09:54:56.823626 +17.43.31.22243 J2000 2 65326
3 NONE IRC+10216 09:47:57.382000 +13.16.40.65999 J2000 3 208242
5 F J1229+0203 12:29:06.699729 +02.03.08.59820 J2000 5 10836
7 E J1331+3030 13:31:08.287984 +30.30.32.95886 J2000 7 5814
Spectral Windows: (2 unique spectral windows and 1 unique polarization setups)
SpwID Name #Chans Frame Ch1(MHz) ChanWid(kHz) TotBW(kHz) Corrs
0 Subband:0 64 TOPO 36387.229 125.000 8000.0 RR RL LR LL
1 Subband:0 64 TOPO 36304.542 125.000 8000.0 RR RL LR LL
Sources: 10
ID Name SpwId RestFreq(MHz) SysVel(km/s)
0 J1008+0730 0 0.03639232 -0.026
0 J1008+0730 1 0.03639232 -0.026
2 J0954+1743 0 0.03639232 -0.026
2 J0954+1743 1 0.03639232 -0.026
3 IRC+10216 0 0.03639232 -0.026
3 IRC+10216 1 0.03639232 -0.026
5 J1229+0203 0 0.03639232 -0.026
5 J1229+0203 1 0.03639232 -0.026
7 J1331+3030 0 0.03639232 -0.026
7 J1331+3030 1 0.03639232 -0.026
Antennas: 19:
ID Name Station Diam. Long. Lat. Offset from array center (m) ITRF Geocentric coordinates (m)
East North Elevation x y z
0 ea01 W09 25.0 m -107.37.25.2 +33.53.51.0 -521.9407 -332.7782 -1.1977 -1601710.017000 -5042006.928200 3554602.355600
1 ea02 E02 25.0 m -107.37.04.4 +33.54.01.1 9.8247 -20.4292 -2.7808 -1601150.059500 -5042000.619800 3554860.729400
2 ea03 E09 25.0 m -107.36.45.1 +33.53.53.6 506.0591 -251.8666 -3.5832 -1600715.948000 -5042273.187000 3554668.184500
3 ea04 W01 25.0 m -107.37.05.9 +33.54.00.5 -27.3562 -41.3030 -2.7418 -1601189.030140 -5042000.493300 3554843.425700
4 ea05 W08 25.0 m -107.37.21.6 +33.53.53.0 -432.1158 -272.1493 -1.5032 -1601614.091000 -5042001.655700 3554652.509300
5 ea07 N06 25.0 m -107.37.06.9 +33.54.10.3 -54.0667 263.8720 -4.2292 -1601162.593200 -5041829.000000 3555095.890500
6 ea08 N01 25.0 m -107.37.06.0 +33.54.01.8 -30.8810 -1.4664 -2.8597 -1601185.634945 -5041978.156586 3554876.424700
7 ea09 E06 25.0 m -107.36.55.6 +33.53.57.7 236.9058 -126.3369 -2.4443 -1600951.588000 -5042125.911000 3554773.012300
8 ea12 E08 25.0 m -107.36.48.9 +33.53.55.1 407.8394 -206.0057 -3.2252 -1600801.916000 -5042219.371000 3554706.449900
9 ea15 W06 25.0 m -107.37.15.6 +33.53.56.4 -275.8288 -166.7451 -2.0590 -1601447.198000 -5041992.502500 3554739.687600
10 ea19 W04 25.0 m -107.37.10.8 +33.53.59.1 -152.8599 -83.8054 -2.4614 -1601315.893000 -5041985.320170 3554808.304600
11 ea20 N05 25.0 m -107.37.06.7 +33.54.08.0 -47.8454 192.6015 -3.8723 -1601168.786100 -5041869.054000 3555036.936000
12 ea21 E01 25.0 m -107.37.05.7 +33.53.59.2 -23.8638 -81.1510 -2.5851 -1601192.467800 -5042022.856800 3554810.438800
13 ea22 N04 25.0 m -107.37.06.5 +33.54.06.1 -42.5986 132.8623 -3.5431 -1601173.953700 -5041902.660400 3554987.536500
14 ea23 E07 25.0 m -107.36.52.4 +33.53.56.5 318.0523 -164.1848 -2.6960 -1600880.570000 -5042170.388000 3554741.457400
15 ea24 W05 25.0 m -107.37.13.0 +33.53.57.8 -210.0944 -122.3885 -2.2581 -1601377.008000 -5041988.665500 3554776.393400
16 ea25 N02 25.0 m -107.37.06.2 +33.54.03.5 -35.6245 53.1806 -3.1345 -1601180.861480 -5041947.453400 3554921.628700
17 ea27 E03 25.0 m -107.37.02.8 +33.54.00.5 50.6647 -39.4832 -2.7249 -1601114.365500 -5042023.153700 3554844.945600
18 ea28 N08 25.0 m -107.37.07.5 +33.54.15.8 -68.9057 433.1889 -5.0602 -1601147.940400 -5041733.837000 3555235.956000
listobs shows information on the project itself like project code, observer and telescope, followed by the sequence of scans with start/stop times, integration times, and scan intents, a list of all fields with name and coordinates, available spectral windows and their shapes, a list of sources (field/spw combination), and finally the location of all antennas that are used in the observation. A row is an MS entry for a given time stamp and baseline (rows can be accessed e.g. via browsetable).
verbose=False would not show the complete list, in particular no information on the scans.
MMS summary (listpartition)
listobs can also be used for Multi MeasurementSets (MMSs). In addition, the task listpartition will provide additional information how the data is structured in preparation for parallelized processing (e.g. using the partition task). The inputs are:
#listpartition :: List the summary of a Multi-MS data set in the logger or in a file
vis = '' #Name of Multi-MS or normal MS.
createdict = False #Create and return a dictionary with
#Sub-MS information
listfile = '' #Name of ASCII file to save output:
#''==>to terminal
For example,
listpartition('n5921.mms')
results in the logger messages:
This is a multi-MS with separation axis = scan,spw
Sub-MS Scan Spw Nchan Nrows Size
ngc5921.mms.0000.ms 2 [0] [63] 1890 27M
4 [0] [63] 756
5 [0] [63] 1134
6 [0] [63] 6804
ngc5921.mms.0001.ms 1 [0] [63] 4509 28M
3 [0] [63] 6048
7 [0] [63] 1512
The output can also be redirected to a python dictionary through the createdict parameter.
Listing MS data (listvis)
The listvis prints a list of the visibility data in an MS to the terminal or a textfile. The inputs are:
#listvis :: List MeasurementSet visibilities.
vis = '' #Name of input visibility file
options = 'ap' #List options: ap only
datacolumn = 'data' #Column to list: data, float_data, corrected, model,
#residual
field = '' #Field names or index to be listed: ''==>all
spw = '*' #Spectral window:channels: '*'==>all, spw='1:5~57'
selectdata = False #Other data selection parameters
observation = '' #Select by observation ID(s)
average = '' #Averaging mode: ==>none (Not yet implemented)
showflags = False #Show flagged data (Not yet implemented)
pagerows = 50 #Rows per page
listfile = '' #Output file
For example,
Units of columns are: Date/Time(YYMMDD/HH:MM:SS UT), UVDist(wavelength), Phase(deg), UVW(m)
WEIGHT: 7
FIELD: 2
SPW: 0
Date/Time: RR: RL: LR: LL:
2010/04/26/ Intrf UVDist Chn Amp Phs Wt F Amp Phs Wt F Amp Phs Wt F Amp Phs Wt F U V W
------------|---------|------|----|--------------------|-------------------|-------------------|-------------------|---------|---------|---------|
03:21:56.0 ea01-ea02 72363 0: 0.005 -124.5 7 0.005 25.7 7 0.001 104.6 7 0.000 23.4 7 -501.93 -321.75 157.78
03:21:56.0 ea01-ea02 72363 1: 0.001 -4.7 7 0.001 -135.1 7 0.004 -14.6 7 0.001 19.9 7 -501.93 -321.75 157.78
03:21:56.0 ea01-ea02 72363 2: 0.002 17.8 7 0.002 34.3 7 0.005 -114.3 7 0.005 -149.7 7 -501.93 -321.75 157.78
03:21:56.0 ea01-ea02 72363 3: 0.004 -19.4 7 0.003 -79.2 7 0.002 -89.0 7 0.004 31.3 7 -501.93 -321.75 157.78
03:21:56.0 ea01-ea02 72363 4: 0.001 -16.8 7 0.004 -141.5 7 0.005 114.9 7 0.006 105.2 7 -501.93 -321.75 157.78
03:21:56.0 ea01-ea02 72363 5: 0.001 -29.8 7 0.009 -96.4 7 0.002 -125.0 7 0.002 -64.5 7 -501.93 -321.75 157.78
...
Type Q to quit, A to toggle long/short list, or RETURN to continue [continue]:
columns are:
COLUMN NAME DESCRIPTION
----------- -----------
Date/Time Time stamp of data sample (YYMMDD/HH:MM:SS UT)
Intrf Interferometer baseline (antenna names)
UVDist uv-distance (units of wavelength)
Fld Field ID (if more than 1)
SpW Spectral Window ID (if more than 1)
Chn Channel number (if more than 1)
(Correlated Correlated polarizations (eg: RR, LL, XY)
polarization) Sub-columns are: Amp, Phs, Wt, F
Amp Visibility amplitude
Phs Visibility phase (deg)
Wt Weight of visibility measurement
F Flag: 'F' = flagged datum; ' ' = unflagged
UVW UVW coordinates (meters)
Note that MS listings can be very large. Use selectdata=True and subselect the data to obtain the desired information as much as possible.
SDM Summary¶
The task listsdm summarizes the content of archival data in the format that is used by ALMA and the VLA for visibility data transport and archiving (the Science Data Model, known as ASDM for ALMA). The output in the casalog is very similar to that of listobs, which reads the metadata after conversion to a MeasurementSet (see next section). listsdm therefore does not require the SDM to be filled into an MS. The output of listsdm is also returned as a dictionary.
MS Metadata List/Change¶
The vishead task is provided to access keyword information in the MeasurementSet. The default inputs are:
#vishead :: List, get, and put metadata in a MeasurementSet
vis = '' #Name of input visibility file
mode = 'list' #options: list, summary, get, put
listitems = [] #items to list ([] for all)
The mode = ‘summary’ option just gives the same output as listobs.
For mode = ‘list’ the default options are: ‘telescope’, ‘observer’, ‘project’, ‘field’, ‘freq_group_name’, ‘spw_name’, ‘schedule’, ‘schedule_type’, ‘release_date’. To obtain other options, put mode = ‘list’ and listitems = []; see vishead task pages for a description of these additional options.
CASA <29>: vishead('ngc5921.demo.ms',mode='list',listitems=[])
Out[29]:
{'cal_grp': (array([-1, -1, -1], dtype=int32), {}),
'field': (array(['1331+30500002_0', '1445+09900002_0', 'N5921_2'],
dtype='|S16'),
{}),
'fld_code': (array(['C', 'A', ''],
dtype='|S2'), {}),
'freq_group_name': (array(['none'],
dtype='|S5'), {}),
'log': ({'r1': False}, {}),
'observer': (array(['TEST'],
dtype='|S5'), {}),
'project': (array([''],
dtype='|S1'), {}),
'ptcs': ({'r1': array([[[-2.74392758]],
[[ 0.53248521]]]),
'r2': array([[[-2.42044692]],
[[ 0.17412604]]]),
'r3': array([[[-2.26020138]],
[[ 0.08843002]]])},
{'MEASINFO': {'Ref': 'J2000', 'type': 'direction'},
'QuantumUnits': array(['rad', 'rad'],
dtype='|S4')}),
'release_date': (array([ 4.30444800e+09]),
{'MEASINFO': {'Ref': 'TAI', 'type': 'epoch'},
'QuantumUnits': array(['s'],
dtype='|S2')}),
'schedule': ({'r1': False}, {}),
'schedule_type': (array([''],
dtype='|S1'), {}),
'source_name': (array(['1331+30500002_0', '1445+09900002_0', 'N5921_2'],
dtype='|S16'),
{}),
'spw_name': (array(['none'],
dtype='|S5'), {}),
'telescope': (array(['VLA'],
dtype='|S4'), {})}
You can use mode=’get’ to retrieve the values of specific keywords, and likewise mode=’put’ to change them. The inputs are:
mode = 'get' #options: list, summary, get, put
hdkey = '' #keyword to get/put
hdindex = '' #keyword index to get/put, counting from zero. ==>all
and
#vishead :: List, summary, get, and put metadata in a MeasurementSet
mode = 'put' #options: list, summary, get, put
hdkey = '' #keyword to get/put
hdindex = '' #keyword index to get/put, counting from zero. ==>all
hdvalue = '' #value of hdkey
For example, a common operation is to change the Telescope name (e.g. if it is unrecognized), e.g.
CASA <36>: vishead('ngc5921.demo.ms',mode='get',hdkey='telescope')
Out[36]:
(array(['VLA'],
dtype='|S4'), {})
CASA <37>: vishead('ngc5921.demo.ms',mode='put',hdkey='telescope',hdvalue='JVLA')
CASA <38>: vishead('ngc5921.demo.ms',mode='get',hdkey='telescope')
Out[38]:
(array(['JVLA'],
dtype='|S5'), {})
Visibility Statistics¶
The visstat task is provided to obtain simple statistics for a MeasurementSet, useful in regression tests.
The inputs are:
#visstat :: Displays statistical information from a MeasurementSet, or from a Multi-MS
vis = '' #Name of MeasurementSet or Multi-MS
axis = 'real' #Which values to use
datacolumn = 'data' #Which data column to use (data, corrected, model, float_data)
useflags = False #Take flagging into account?
spw = '' #spectral-window/frequency/channel
field = '1' #Field names or field index numbers: ''==>all, field='0~2,3C286'
selectdata = True #More data selection parameters (antenna, timerange etc)
antenna = '' #antenna/baselines: ''==>all, antenna = '3,VA04'
timerange = '' #time range: ''==>all, timerange='09:14:0~09:54:0'
correlation = 'RR' #Select data based on correlation
scan = '' #scan numbers: ''==>all
array = '' #(sub)array numbers: ''==>all
observation = '' #observation ID number(s): '' = all
uvrange = '' #uv range: ''==>all; uvrange = '0~100klambda', default units=meters
timeaverage = False #Average data in time.
intent = '' #Select data by scan intent.
reportingaxes = 'ddid' #Which reporting axis to use (ddid, field, integration)
Running this task returns a record (Python dictionary) with the statistics, which can be captured in a Python variable. For example,
CASA <54>: mystat=visstat(vis='data/regression/unittest/setjy/ngc5921.ms', axis='amp', datacolumn='data', useflags=False, spw='', field='', selectdata=True, correlation='RR', timeaverage=False, intent='', reportingaxes='ddid')
CASA <55>: mystat
Out[55]:
{'DATA_DESC_ID=0': {'firstquartile': 0.023732144385576248,
'isMasked': False,
'isWeighted': False,
'max': 73.75,
'maxDatasetIndex': 12,
'maxIndex': 1204,
'mean': 4.511831488357214,
'medabsdevmed': 0.0432449858635664,
'median': 0.051963627338409424,
'min': 2.2130521756480448e-05,
'minDatasetIndex': 54,
'minIndex': 4346,
'npts': 1427139.0,
'rms': 16.42971891790897,
'stddev': 15.798076313999745,
'sum': 6439010.678462409,
'sumOfWeights': 1427139.0,
'sumsq': 385235713.187832,
'thirdquartile': 0.3004012107849121,
'variance': 249.57921522295976}}
CASA <56>: mystat['DATA_DESC_ID=0']['stddev']
Out[56]: 15.798076313999745
The options for axis are:
axis='amplitude' #or ('amp') axis='phase' axis='imag' (or 'imaginary') axis='real'
The phase of a complex number is in radians with range (−π, π).
Plot Antenna Positions¶
This task is a simple plotting interface to produce plots of the antenna positions (taken from the ANTENNA sub-table of the MS). The location of the antennas in the MS will be plotted with X-toward local east, Y-toward local north.
The inputs to plotants are:
#plotants :: Plot the antenna distribution in the local reference frame:
vis = '' #Name of input visibility file (MS)
figfile = '' #Save the plotted figure to this file
antindex = False #Label antennas with name and antenna ID
logpos = False #Whether to plot logarithmic positions
exclude = '' #Antenna name/id selection to exclude from plot
checkbaselines = False #Whether to check baselines in the main table.
title = '' #Title for the plot.
showgui = True #Show plot on gui.
For most telescopes, the default X/Y plot is in meters. For VLBA antenna plots, latitude vs. longitude (degrees) is plotted instead.
Supported format extensions for the figfile include emf, eps, pdf, png, ps, raw, rgba, svg, and svgz, depending on which python modules are installed on your system. Formats currently available in a downloaded CASA package include all but emf (enhanced metafile).
Each antenna position is labeled with the antenna name. VLBA antenna plots label the positions with “name @ station” format, e.g. “2@FD” for the Fort Davis, Texas, antenna. To add the antenna ID to the name, set antindex=True as shown in Figure 1.
ALMA antenna positions with antindex=True.
By default, plotants plots the positions of all antennas in the ANTENNA subtable. However, the user has the option to exclude certain antennas with the exclude parameter. Its value is a string to select which antennas to exclude, using the same syntax as the antenna parameter in MeasurementSet selection. For example, exclude=”5~6” would exclude the PM antennas from the plot in Figure 1.
To plot only those antennas which appear in the MAIN table (e.g. after a split, which retains the entire ANTENNA subtable in the dataset), set checkbaselines=True. This parameter would have automatically removed antenna 7 (DV10) from the plot in Figure 1, as it does not appear in the main table of this dataset.
To plot logarithmic positions instead of X/Y positions, set logpos=True as shown in Figure 2:
Antenna positions with logpos=True
The default title for the plot is “Antenna Positions for ” the MS name (vis argument), as shown in all figures on this page. To set a custom title, set the title parameter to the desired string.
The plotants GUI
By default, the plotants GUI will be shown when the task is used. If the GUI is not needed, as in scripting mode to produce a figfile, set showgui=False. When casa flags are set to avoid starting GUI tools or to run without the matplotlib ‘tkagg’ backend (–nogui, –pipeline, or –agg), the plotants GUI will not be shown regardless of the value of the showgui parameter.
The antennas will be plotted in a plotter window as shown below. Several tool buttons are available to manipulate and save the plot:
The ‘Home’ button (leftmost house icon) is used to return to the first, default view after panning or zooming.
The ‘Forward’ and ‘Back’ buttons (left- and right-arrow icons) are used to navigate between previous plot views after pan/zoom actions.
The ‘Pan/Zoom’ button (crossed blue arrows, fourth icon) is used to drag the plot to a new position by pressing and holding the mouse button.
The ‘Zoom-to-rectangle’ button (magnifier icon, fifth from left) is used to mark a rectangular region with the mouse in order to zoom in on the plot.
The ‘Subplot-configuration’ button (sixth icon) can be used to stretch or compress the left, right, top, or bottom of the plot, as well as the ability to reset the plot to the original shape after manipulation before exiting the configuration dialog.
The ‘Save’ button (rightmost icon) is used to export the plot. A file save dialog is launched to select a location, name, and format (default png) for the file.
plotants GUI for a VLA dataset with antindex=True. Note the tool buttons at the bottom of the window.
VLA Weather Information¶
Weather data for the VLA can be displayed with the task plotweather. This task will also calculate opacities based on the weather data taken at the time of the observation, or from a seasonal model.
Inputs are:
#plotweather :: Plot elements of the weather table; estimate opacity.
vis = '' #MS name
seasonal_weight = 0.5 #weight of the seasonal model
doPlot = True #set this to True to create a plot
plotName = '' #(Optional) the name of the plot file
The amount of seasonal data can be set by the parameter seasonal_weight, where a value of 1 will only use the seasonal model and a value of 0 will only use the actual weather data to calculate opacities.
Typical output of plotweather looks like below:
Typical output from plotweather. The panel at the top displays the following properties as a fiunction of time across the observation: elevation of the sun, wind speed and direction, temperature and dew point, and precipitable water vapor (pwv). The bottom panel shows the calculated zenith opacity as a function of frequency. The opacities calculated from the actual weather data, from a seasonal model and the specified mix of both are shown in the PWV and Tau plots.
The methods used in this task are described EVLA Memo 143, VLA Test Memo 232, and VLA Scientific Memo 176. The wind direction aligns with the meteorological definition, i.e., north is up (0deg) with the angle increasing clockwise E, S, W (e.g., a vector pointing to the right indicates westerly winds with an angle of 270deg).
Allowed output plot formats are those supported by matplotlib, currently emf, eps, pdf, png, ps, raw, rgba, svg, and svgz.
Alert: plotweather accesses the WEATHER table in the MS. The task may therefore also work for non-VLA data as long as such a table is present. The plots and calculations, however, have been tailored for the VLA, so non-VLA data may or may not be interpreted correctly.
Browse MS/Calibration Tables¶
The browsetable task is available for viewing data directly. It handles all CASA tables, including MeasurementSets, calibration tables, and images. This task brings up a CASA Qt table browser.
browsetable is not required for normal data reduction but is useful for troubleshooting or for identifying table column names and formats. If you want to edit a long column or extract data for manipulation outside CASA (e.g. the uv data), see flagdata and the table tools in the Global Tool List. The MeasurementSet columns and subtables are described here.
For CASA 6, inp/go no longer works with browsetable, and the application can no longer be launched from outside CASA using casabrowser. Browsetable should be invoked using the argument:
browsetable('ngc5921_ut.ms')
Available sub-parameters are given on the browsetable task pages.
For an MS, as in this example, the table browser will display the MAIN table (Figure 1). To look at subtables, use the table keywords tab along the left side to bring up a panel with the subtables listed (Figure 2), then choose (double-click) a table name (Keyword) to display the subtable in a new tab (Figures 3 and 4). You can double-click on a cell in a table to view the contents (fourth figure below) then use the “Close” or “Close All” buttons at the bottom of the contents display to close one or all displayed values.
The browser displays the MAIN table within a frame. You can scroll through the data with the sliders at right and bottom, and step through the pages with the “<<” and “>>” buttons or using “First” and “Last” to quickly advance to the beginning or end. To go to a specific page, input the page number in the text box then click “Go”. By default, 1000 rows of the table are loaded at a time, but you can specify this setting in the “Loading … rows” text box.
Use the “table keywords” tab to look at other tables within an MS. Double-click on a table name to view its contents in a new tab, as shown in the following figures.
Viewing the ANTENNA table of the MS.
The POLARIZATION table shows the number and types of correlations. The CORR_TYPE integer array indicates the Stokes type as defined in the Stokes class enumeration. Common types include RR (5), RL (6), LR (7), and LL (8) for circular polarization, and XX (9), XY (10), YX (11), and YY (12) for linear polarization.
Double-click a cell in the table or sub-table to see its value displayed to the right. Here, the DATA column cell (top right) contains a [2,63] array of complex numbers. The WEIGHT_SPECTRUM for this data is shown below it as a [2,63] array of float values. Use the sliders to see other values in the arrays, and click “Close” to close the cell contents display or “Close All” to close all contents displays.
Many options are available on the browsetable toolbar and menus:
To open a table, click the “Open Table” button or use the File > Open Table menu to open a file browser dialog box.
To close a table, click the “Close Table” button to close the table in the active tab, or use the options on the “File” menu to close the table in the active tab (“Close Table”), to select an open table to close (“Close…”), or to close all tables (“Close All”). You can also “Close All and Exit” the table browser.
To edit the table and its contents, click the “Edit Table” button or use the Edit > Edit Table menu. You can also use the “Edit**”** menu to add a row to the table. Be careful with this, and make a backup copy of the table before editing!
To view table information, click the “Table Information” (blue *”i”* button) or use the View > Table Information menu. You can also hover the mouse pointer over the table name tab to get a popup with information.
To set a TaQL filter, click the “Filter on Fields” button or use the View > Filter on Fields menu. This will open a “Filter Rules” dialog box within the table browser in which to set the filter. Another option is to use the taql parameter in the browsetable() call.
To choose which columns to display, use View > Columns to select the columns from a list, which you can select individually or toggle with “Show All Columns” or “Hide All Columns”. Another option is to use the skipcols parameter in the browsetable() call.
To format the contents of the column cells, use View > Format Display to select a column then choose its formatting (depending on its type). For example, for numerical values you can set the precision and choose to use scientific format, or set the font and color for negative and nonnegative values.
To find data using filter rules, click the “Find” button or use Tools > Find to open a Search Rules dialog box.
To sort the table, click the “Sort” button, use the Tools > Sort menu to open a Table Sorter dialog box in which you can select the sort columns,or just click on the column name. Another option is to use the sortlist parameter in the browsetable() call.
To plot table data, click the “Plot 2D” button or use the Tools > Plot 2D menu to open a Plot Options dialog box where you can select the rows and axes to plot, along with plot display options. Click “Overplot” or “Clear and Plot” to make the plot in the Table Browser Plotter window. There is also an option to export the plot; select PNG or JPG format and click Go.
Currently, Export > VOTable results in a Fatal IO Error and kills the table browser.
The default display is 1000 rows, but this can be set in the input box at the lower right. To page through the table, use the PAGE NAVIGATION buttons to advance forward or backward one page, or go directly to the First or Last page. You can also enter a page number and click Go.
To exit the table browser, use File > Exit or click the Close “X” button at the upper right of the window.
Alert: You are likely to find that browsetable needs to get a table lock before proceeding. Use the clearstat command to clear the lock status in this case. You may also be unable to use other tasks on the table while it is open in the table browser.
Plot/Flag Visibilities¶
A number of CASA tasks handle the plotting and flagging of visibilities. The following subsections describe the usage of the relevant tasks:
plotms — create X-Y plots of data in MS and calibration tables, flag data
flagdata — data flagging
flagcmd — manipulate and apply flags using FLAG_CMD table
flagmanager — manage versions of data flags
msview — two-dimensional viewer used for manipulating visibilities
Plot/Edit using plotms¶
plotms is a GUI-style plotter, based on Qt, for creating X-Y plots of visibility data and calibration tables. It should be started as a task within CASA. This task also provides editing capability.
plotms was originally intended to plot MeasurementSets (the “ms” in “plotms”) but has been extended to include calibration tables. Supported cal table types include B Jones, B TSYS, BPOLY, D Jones, Df Jones, DfLLS Jones, EGainCurve, F Jones, Fringe Jones, G Jones, GlinXphf Jones, G EVLASWPOW, GSPLINE, K Jones, KAntPos Jones, Kcross Jones, T Jones, TOpac, Xf Jones, A Mueller, M Mueller, and SDSKY_PS (single-dish sky calibration). Some axis choices do not apply to calibration tables, and the calibration axes do not apply to MeasurementSets. Selection can be applied to calibration tables; where relevant, channel selection has been implemented and tested for the cal tables listed. Averaging cannot be used for BPOLY and GSPLINE tables, which use an older table format. Some options, such as certain axes and transformations, cannot be used for calibration tables.
For simplicity, this document primarily addresses plotting MeasurementSets.
The current inputs and default values for plotms include:
plotms :: A plotter/interactive flagger for visibility data.
vis = '' # input MS or CalTable (blank for none)
gridrows = 1 # number of subplot rows (default 1).
gridcols = 1 # number of subplot columns (default 1).
rowindex = 0 # row location of the plot (0-based, default 0)
colindex = 0 # column location of the plot (0-based, default 0)
plotindex = 0 # index to address a subplot (0-based, default 0)
xaxis = '' # plot x-axis (blank for default/current)
yaxis = '' # plot y-axis (blank for default/current)
selectdata = True # data selection parameters
field = '' # field names or field index numbers (blank for all)
spw = '' # spectral windows:channels (blank for all)
timerange = '' # time range (blank for all)
uvrange = '' # uv range (blank for all)
antenna = '' # antenna/baselines (blank for all)
scan = '' # scan numbers (blank for all)
correlation = '' # correlations/polarizations (blank for all)
array = '' # (sub)array numbers (blank for all)
observation = '' # observation ID(s) (blank for all)
intent = '' # observing intent (blank for all)
feed = '' # feed (blank for all)
msselect = '' # MS selection (blank for all)
averagedata = True # data averaging parameters
avgchannel = '' # average over channel (blank = False, otherwise value in channels)
avgtime = '' # average over time (blank = False, other value in seconds)
avgscan = False # average over scans if time averaging is enabled
avgfield = False # average over fields if time averaging is enabled
avgbaseline = False # average over all baselines (mutually exclusive with avgantenna)
avgantenna = False # average by per-antenna (mutually exclusive with avgbaseline)
avgspw = False # average over all spectral windows
scalar = False # do scalar averaging
transform = False # transform data in various ways
extendflag = False # extend flagging to other data points
iteraxis = '' # axis over which to iterate
customsymbol = False # set a custom symbol for unflagged points
coloraxis = '' # set data axis to use for colorizing
customflaggedsymbol = False # set a custom plot symbol for flagged points
xconnector = '' # set connector for data points (blank="none"; "line","step")
plotrange = [] # plot axes ranges: [xmin,xmax,ymin,ymax]]
title = '' # title written along top of plot
titlefont = 0 # font for plot title
xlabel = '' # text for horizontal axis. Blank for default.
xaxisfont = 0 # font for plot x-axis
ylabel = '' # text for vertical axis. Blank for default.
yaxisfont = 0 # font for plot y-axis
showmajorgrid = False # show major grid lines (horiz and vert.)
showminorgrid = False # show minor grid lines (horiz and vert.)
showlegend = False # show a legend on the plot
plotfile = '' # name of plot file to save automatically
showgui = True # show GUI
clearplots = True # remove any existing plots (do not overplot)
callib = [''] # calibration library string or filename for on-the-fly calibration.
headeritems = '' # comma-separated list of pre-defined page header items
showatm = False # compute and overlay the atmospheric transmission curve
showtsky = False # compute and overlay the sky temperature curve
showimage = False # compute and overlay the image sideband curve.
Note that when some parameters are set or are True, their subparameters are displayed by inp( ). By default, selectdata, averagedata, and showgui are True and their subparameters are shown above. Other parameters with subparameters include:
xaxis = 'real' # plot x-axis (blank for default/current)
xdatacolumn = '' # data column for x-axis (blank for default/current)
yaxis = 'imag' # plot y-axis (blank for default/current)
ydatacolumn = '' # data column for y-axis (blank for default/current)
yaxislocation = 'left' # set yaxis to the left of the plot
transform = True # transform data in various ways?
freqframe = '' # frame in which to render frequency and velocity axes
restfreq = '' # rest frequency to use for velocity conversions
veldef = 'RADIO' # definition in which to render velocity
phasecenter = '' # direction coordinates of new phase center
extendflag = True # extend flagging to other data points
extcorr = False # extend flags based on correlation
extchannel = False # extend flags based on channel
iteraxis = 'baseline' # axis over which to iterate
xselfscale = False # use common x-axis range (scale) for iterated plots
yselfscale = False # use common y-axis range (scale) for iterated plots
xsharedaxis = False # enable iterated plots on a grid to share a common external x-axis per column
ysharedaxis = False # enable iterated plots on a grid to share a common external y-axis per row
customsymbol = True # set a custom symbol for unflagged points
symbolshape = 'autoscaling' # shape of plotted unflagged symbols
symbolsize = 2 # size of plotted unflagged symbols
symbolcolor = '0000ff' # color of plotted unflagged symbols
symbolfill = 'fill' # fill type of plotted unflagged symbols
symboloutline = False # select outlining plotted unflagged points
customflaggedsymbol = True # set a custom plot symbol for flagged points
flaggedsymbolshape = 'nosymbol # shape of plotted flagged symbols
flaggedsymbolsize = 2 # size of plotted flagged symbols
flaggedsymbolcolor = 'ff0000' # color of plotted flagged symbols
flaggedsymbolfill = 'fill' # fill type of plotted flagged symbols
flaggedsymboloutline = False # select outlining plotted flagged points
showmajorgrid = True # show major grid lines (horiz and vert.)
majorwidth = 0 # line width in pixels of major grid lines
majorstyle = '' # major grid line style: solid dash dot none
majorcolor = '' # color of major grid lines as name or hex code
showminorgrid = True # show minor grid lines (horiz and vert.)
minorwidth = 0 # line width in pixels of minor grid lines
minorstyle = '' # minor grid line style: solid dash dot none
minorcolor = '' # color of minor grid lines as name or hex code
plotfile = 'plot.jpg' # name of plot file to save automatically
expformat = '' # export format type (jpg, png, ps, pdf, txt), else use plotfile extension
verbose = True # include metadata in text export
exprange = '' # export all iteration plots or only the current one
highres = False # use high resolution
dpi = -1 # DPI of exported plot
width = -1 # width of exported plot
height = -1 # height of exported plot
overwrite = False # overwrite plot file if it already exists
Note that if the vis parameter is set to the name of a MeasurementSet here, when you start plotms the entire MeasurementSet will be plotted, which can be time consuming. You may want to set selection or averaging parameters first.
To start a “blank” plotms window then enter your selections interactively in the GUI, use these commands:
default plotms
plotms
Alternatively, they can be specified as task parameters in a plotms call, for scripting:
plotms(vis1, yaxis='phase', ydatacolumn='corrected', xaxis='frequency', coloraxis='spw', antenna='1', spw='0:3~10', corr='RR', avgtime='1e8', plotfile='vis1.jpg')
Note that subsequent plotms calls will return any unspecified parameters in that call to their default values. See also the Examples tab in the plotms task for plotms calls using many of the parameters.
The plotms GUI will be described in the following sections, along with the corresponding parameters for the task interface or scripting. For non-interactive scripting, set showgui=False and export the plot into an image specified by plotfile.
The Plot Tab¶
Loading, Selecting, and Averaging Data: the Plot Data Tab
The plotms window starts on the Plot > Data tab. No parameters have been set.
File Selection
When plotms is first started, by default it will display the Plot tab (as chosen from the tabs at the top of the plotms window) and its Data subtab (as chosen from the tabs on the left side) as shown in Figure 1. First, a MeasurementSet or calibration table should be loaded by clicking on Browse in the File section and selecting a MeasurementSet directory (just select the directory itself; do not descend into it).
A plot can now be made of the MeasurementSet by clicking on the Plot button, but you may want to set selection or averaging parameters first rather than plot the entire dataset. By default, plotms will plot Amplitude versus Time for a MeasurementSet; see the Axes Tab section for axis options. The default axes change for calibration tables depending on the table type. plotms self-scales axes and the symbol size. For a very large range, this can hide points close to zero; see the Axes Tab section for setting axis ranges and the Display Tab section for setting symbol size.
The plotms task parameter for file selection is vis.
Data Selection
The options for data selection are:
field
spw
timerange
uvrange
antenna
scan
corr (correlated polarizations)
array
observation
intent
feed
msselect
Note that, unlike when setting data selection parameters from the CASA command line, no quotation marks are needed around strings in the GUI. For more information on data selection strings, see the documentation here. To view information about your data in order to make your selection, use the Summary menu or the listobs task.
Calibration table selection may differ from MeasurementSet selection:
antenna selection for a calibration table depends on its type.
For antenna-based cal tables without a reference antenna, only ANTENNA1 is matched and the “&” operators in selection expressions are ignored. Single-dish sky calibration tables use ANTENNA1 selection.
For antenna-based cal tables with a reference antenna, ANTENNA2 is interpreted as a reference antenna and matched against the ANT2 in “ANT1&ANT2” type expressions. “ANT” selections continue to match ANTENNA1 only.
For baseline-based cal tables, antenna selection uses both ANTENNA1 and ANTENNA2 as described in the MSSelection documentation.
corr selection is used to select calibration table polarizations, including “/” for a ratio plot.
The plotms task parameter for data selection is selectdata (default is True, but no selection occurs unless one or more subparameters is set). Its subparameters include field, spw, timerange. uvrange, antenna, scan, correlation, array, observation, intent, feed, and msselect. These should be set to string values.
Averaging Data
plotms enables averaging of the data in order to increase signal-to-noise of the plotted points or to increase plotting speed.
Averaging is currently not supported for Ant-Ra and Ant-Dec axes and will result in a warning in the log, then the unaveraged data will be plotted.
Averaging is supported for calibration tables with the exception of BPOLY and GSPLINE tables, which use an older table format.
The options for averaging in the Plot > Data tab include:
channel
time (optionally over scans or fields)
all baselines or per antenna
all spectral windows
vector (default) or scalar
The box next to a given averaging mode needs to be checked for that averaging to take effect. The Weight and Sigma axes are not supported in some averaging modes. Note that the “average weight” is actually the weight sum accumulated when performing the average; i.e., the net weight of a weighted-averaged datum is the sum of the weights going into the average.
When averaging, plotms will prefer unflagged data. If an averaging bin contains any unflagged data at all, only the average of the unflagged will be shown. For averaging bins that contain only unflagged data, the average of that unflagged data will be shown. When flagging on a plot of averaged data, the flags will be applied to the unaveraged data in the MS.
The plotms task parameter for averaging is averagedata (default is True, but no averaging occurs unless one or more subparameters are set). It subparameters include avgchannel and avgtime (set to a string value in channels or seconds, default “”), and boolean parameters avgscan, avgfield, avgbaseline, avgantenna, avgspw, and scalar (True/False, default False). Invalid combinations of averaging will result in an error message (e.g. avgbaseline=True, avgantenna=True) or will be ignored (e.g. avgscan=True but avgtime has not been set).
Channel Averaging: to average n channels together, the user would click on the box next to Channel so that an “X” appears in it, and then type the number n in the empty box. When the user next clicks on Plot, every n channels will then be averaged together and plotted against the average channel numbers. The total number of channels plotted will be decreased by a factor of n.
Channel selection may be combined with channel averaging. For MeasurementSets, each selected channel range is binned and averaged individually, but for calibration tables, the selected channels are treated as contiguous. See examples in the plotms task channel averaging documentation.
Warning: If a complex channel selection is made e.g. of continuum in the presense of multiple lines, channel averaging is unlikely to produce a meaningful plot.
The averaged channel ids are reindexed starting at 0 to reflect the bin number, not the averaged channel number.
The averaged frequency and velocity values are the average of the frequencies/velocities in the bin.
Time Averaging: Time averaging is controlled by three fields. If the checkbox next to Time is checked, a blank box with units of seconds will become active, along with two additional checkboxes: Scan and Field. If averaging is desired over a relatively short interval (say, 30 seconds, shorter than the scan length), a number can simply be entered into the blank box and, when the data are replotted, the data will be time averaged. Clicking on the Scan or Field checkbox in this case will have no impact on the time averaging. These checkboxes become relevant if averaging over a relatively long time—say the entire observation, which consists of multiple scans—is desired. Regardless of how large a number is set in the Time averaging box, only data within individual scans will be averaged together. In order to average data across scan boundaries, the Scan checkbox must be checked and the data replotted. Finally, clicking on the Field checkbox enables the averaging of multiple fields together in time.
When averaging over scan, the scan number is the first scan number in the bin, independent of unflagged/flagged data.
When averaging over field, the field number is the first field number in the bin, independent of unflagged/flagged data.
Averaging All Baselines/Per Antenna: Clicking on the All Baselines checkbox will average all baselines in the array together. Alternatively, the Per Antenna box may be checked, which will average all baselines for a given antenna together. In this case, all baselines are represented twice; baseline 3-24 will contribute to the averages for both antenna 3 and antenna 24. This can produce some rather strange-looking plots if the user also selects on antenna—say, if the user requests to plot only antenna 0 and then averages Per Antenna, In this case, an average of all baselines including antenna 0 will be plotted, but each individual baseline including antenna 0 will also be plotted (because the presence of baselines 0-1, 0-2, 0-3, etc. trigger Per Antenna averaging to compute averages for antennae 1, 2, 3, etc. Therefore, baseline 0-1 will contribute to the average for antenna 0, but it will also singlehandedly be the average for antenna 1.) These averaging modes currently do not support the Weight and Sigma axes.
Averaging All Spectral Windows: Spectral windows can be averaged together by checking the box next to All Spectral Windows. This will result in, for a given channel n, all channels n from the individual spectral windows being averaged together. This averaging mode currently does not support the Weight and Sigma axes.
Vector/Scalar Averaging: The default mode is vector averaging, where the complex average is formed by averaging the real and imaginary parts of the relevant visibilities. If Scalar is chosen, then the amplitude of the average is formed by a scalar average of the individual visibility amplitudes.
Brief Note Regarding plotms Memory Usage
In order to provide a wide range of flexible interactive plotting options while minimizing the I/O burden and speeding up the plotting, plotms caches the data values for the plot (along with a subset of relevant meta-info) in as efficient a manner as possible. Sometimes, however, the data changes on disk, for example when other data processing tasks are applied. To force plotms to reload the data, check the Reload box next to the Plot button or press the SHIFT key while clicking the Plot button.
For plots of large numbers of points, the total memory requirement can be quite large. plotms attempts to predict the memory it will require (typically 5 or 6 bytes per plotted point when only one axis is a data axis, depending upon the data shapes involved), and will complain if it believes there is insufficient memory to support the requested plot. For most practical interactive purposes (plots that load and draw in less than a few or a few 10s of minutes), there is usually not a problem on typical modern workstations. Attempts to plot large datasets on small laptops might be more likely to encounter problems here.
The absolute upper limit on the number of simultaneously plotted points is currently set by the ability to index the points in the cache. For modern 64 bit machines, this is about 4.29 billion points (requiring around 25GB of memory). Such plots are not especially useful interactively, since the I/O and draw become prohibitive.In general, it is usually most efficient to plot data in modest chunks of no more than a few hundred million points or less, either using selection or averaging. Note that all iterations are (currently) cached simultaneously for iterated plots, so iteration is not a way to manage memory use. A few hundred million points tends to be the practical limit of interactive plotms use with respect to information content and utility in the resulting plots, especially when you consider the number of available pixels on your screen.
On-The-Fly Calibration: the Plot Calibration Tab
The plotms Calibration tab. This MeasurementSet has no CORRECTED_DATA column. A calibration library file was selected with the file browser and applied on the fly.
One can apply calibration tables to the uncalibrated data on the fly, i.e. without a run of applycal beforehand, by specifying a calibration library and selecting the corrected Data Column for the plotted axes. See the Cal Library Syntax documentation for more information on specifying calibration in a string or file.
The Calibration tab on the left hand side contains a field to specify a calibration library file, or use Browse to open a file selection dialog. You can also specify the calibration library commands directly in a string. There is a switch to apply the calibration library to produce the corrected data (Calibration On) or to show an existing CORRECTED_DATA column (Calibration Off). If the corrected Data Column is requested but the column is not present in the MS and the calibration library is not set or enabled, plotms issues a warning and plots the DATA column instead.
The plotms task parameter ‘callib’ can be used to provide a calibration library file or a string containing the cal library commands. It is enabled by default when the parameter is set.
Selecting Plot Axes: The Plot Axes Tab
The plotms Plot > Axes tab, used here to make a plot of Amp vs. Channel.
Selecting Axes
The X and Y axes of a plot are selected by clicking on the Plot > Axes tab and choosing an entry from the drop-down menus below X Axis and Y Axis. The axes are grouped by type and listed in this order:
Metadata: - Scan — The scan number, as listed by listobs or the plotms summary. When averaging over scan, the first scan number in the bin is used. - Field — The field number, as listed by listobs or the plotms summary. When averaging over field, the first field id in the bin is used. - Time —The time at which the visibility was observed, given in terms of calendar year (yyyy/mm/dd/hh:mm:ss.s). - Interval — The integration time in seconds. - Spw — The spectral window number. The characteristics of each spectral window are listed in listobs or the plotms summary. - Channel — The spectral channel number. When channel averaging, the channel ids are reindexed starting at 0 to reflect the bin number, not the averaged channel number. - Frequency — Frequency in units of GHz. The frame for the frequency (e.g., topocentric, barycentric, LSRK) can be set in the Plots > Transform tab. - Velocity — Velocity in units of km s−1, as defined by the Frame, Velocity Defn, and Rest Freq parameters in the Plots > Transform tab. - Corr — Correlations which have been assigned integer IDs, including RR (5), RL (6), LR (7), LL (8), XX (9), XY (10), YX (11), and YY (12). The axis values are these IDs, as listed by listobs or the plotms summary. This axis may also be used to plot polarizations for calibration tables. - Antenna1 — The first antenna in a baseline pair; for example, for baseline 2-4, Antenna1= 2. Antennae are numbered according to the antenna IDs listed in listobs or the plotms summary. - Antenna2 — The second antenna in a baseline pair; for baseline 2-4, Antenna2 = 4. Antennae are numbered according to the antenna IDs listed in listobs or the plotms summary. - Antenna — Antenna ID for plotting antenna-based quantities. Antennae are numbered according to the antenna IDs listed in listobs or the plotms summary. - Baseline — The baseline number. - Row — The MS data row number. A row number corresponds to a unique timestamp, baseline, and spectral window in the MeasurementSet. - Observation — The observation ID (index) - Intent — The intent ID (index) - Feed1 — The first feed number, most useful for single-dish data. - Feed2 — The second feed number, most useful for single-dish data.
Visibility values and flags: - Amp — Data amplitudes in units which are proportional to Jansky (for data which are fully calibrated, the units should be in Jy). - Phase — Data phases in units of degrees. - Real and Imag — The real and imaginary parts of the visibility in units which are proportional to Jansky (for data which are fully calibrated, the units should be Jy). - Wt and **Wt*Amp** — the weight of the visibility and the product of the weight and the amplitude. - **WtSp — WEIGHT_SPECTRUM column, i.e. a weight per channel. - Sigma — the SIGMA column of the visibilities - SigmaSp — SIGMA_SPECTRUM column, i.e. a sigma per channel - Flag — Data which are flagged have Flag = 1, whereas unflagged data are set to Flag = 0. Note that, to display flagged data, you will have to click on the Plots > Display tab and choose a Flagged Points Symbol. - FlagRow — In some tasks, if a whole data row is flagged, then FlagRow will be set to 1 for that row. Unflagged rows have FlagRow = 0. However, note that some tasks (like plotms) may flag a row, but not set FlagRow = 1. It is probably better to plot Flag than FlagRow for most applications.
Observational geometry: - UVdist — Projected baseline separations in units of meters. Note that UVDist is not a function of frequency. - UVwave — Projected baseline separations in units of the observing wavelength (lambda, not kilolambda). UVwave is a function of frequency, and therefore, there will be a different data point for each frequency channel. - U, V, and W — u, v, and w in units of meters. - Uwave, Vwave, and Wwave — u, v, and w in units of wavelengths lambda. - Azimuth and Ant-Azimuth — Azimuth in units of degrees. Azimuth plots a fiducial value for the entire array, while Ant-Azimuth plots the azimuth for each individual antenna (their azimuths will differ depending on each antenna’s longitude, latitude, and elevation). - Elevation and Ant-Elevation — Elevation in units of degrees. Elevation is a representative value for the entire array, while Ant-Elevation is the elevation for each individual antenna (their elevations will differ depending on each antenna’s longitude, latitude, and elevation). - Ant-Ra and Ant-Dec — Longitude and latitude of the direction to which the first antenna of a baseline points at data-taking timestamps. - HourAngle — Hour angle in units of hours. This is a fiducial value for the entire array. - ParAngle and Ant-ParAng — Parallactic angle in units of degrees. ParAngle is the fiducial parallactic angle for all antennae in the array, while Ant-ParAng plots the parallactic angle for each individual antenna (their parallactic angles will differ depending on each antenna’s longitude, latitude, and elevation).
Calibration: - GainAmp, GainPhase, GainReal, GainImag — the amplitude, phase, real and imaginary part of the calibration tables for regular complex gain tables. - Delay — The delay of a delay or fringefit (Fringe Jones) calibration table. - Delay Rate — The delay rate of a fringefit (Fringe Jones) calibration table. - Disp Delay — The dispersive delay of a fringefit (Fringe Jones) calibration table. - SwPower — Switched Power values for a VLA switched power calibration table. - Tsys — Tsys for Tsys calibration tables. - Opac — Opacity values of a Opacity calibration table. - SNR — Signal-to-Noise Ratio of a calibration table. - TEC — Total Electron Content of an ionosphere correction calibration table. - Antenna Positions — Antenna position offsets for a KAntPos Jones calibration table.
Ephemeris: - Radial Velocity — for an ephemeris source, in km/s. - Distance (rho) — for an ephemeris source, in km.
If the data axis selected from the drop-down menu is already stored in the cache (therefore implying that plotting will proceed relatively quickly), an “X” will appear in the checkbox next to Cached. To reload the data from disk, the Reload checkmark should be set at the bottom of this display.
The plotms task parameters used to select the axes are xaxis and yaxis. Valid options include ‘scan’, ‘field’, ‘time’, ‘interval’, ‘spw’, ‘chan’ (or ‘channel’), ‘freq’ (or ‘frequency’), ‘vel’ (or ‘velocity’), ‘corr’ (or ‘correlation), ‘ant1’ (or ‘antenna1’), ‘ant2’ (or ‘antenna2’), ‘baseline’, ‘row’, ‘observation’, ‘intent’, ‘feed1’, ‘feed2’, ‘amp’ (or ‘amplitude’), ‘phase’, ‘real’, ‘imag’, ‘wt’ (or ‘weight’), ‘wtsp’ (or ‘weightspectrum’), ‘flag’, ‘flagrow’, ‘uvdist’, ‘uvwave’ (or ‘uvdistl’), ‘u’, ‘v’, ‘w’, ‘uwave’, ‘vwave’, ‘wwave’, ‘azimuth’, ‘elevation’, ‘hourang’ (or ‘hourangle’), ‘parang’ (or ‘parangle’), ‘ant’ (or ‘antenna’), ‘ant-azimuth’, ‘ant-elevation’, ‘ant-ra’, ‘ant-dec’, ‘ant-parang’ (or ‘ant-parangle’), ‘gainamp’ (or ‘gamp’), ‘gainphase’ (or ‘gphase’), ‘gainreal’ (or ‘greal’), ‘gainimag’ (or ‘gimag’), ‘delay’ (or ‘del’), ‘delayrate’ (or ‘rate’), ‘dispdelay’ (or ‘disp’), ‘swpower’ (or ‘swp’ or ‘spgain’), ‘tsys’, ‘opacity’ (or ‘opac’), ‘snr’, ‘tec’, ‘radialvelocity’, ‘distance’ (or ‘rho’).
When left as the default empty strings (“”), the axes for a MeasurementSet will be Amp vs. Time. The default axes for a calibration table depend on the type.
Setting Axes Parameters
Data Columns: - For relevant data axes like Amp and Phase, the user will be presented with the option to plot raw data or calibrated data. This can be selected via a Data Column drop-down menu, located directly under the drop-down menu for X Axis or Y Axis selection. To plot raw data, select “data”; to plot calibrated data, select “corrected”. Note that this choice will only have an impact on a plot if a calibration table has been applied to the MeasurementSet or a calibration library is set and enabled. - If a data model is present in the MeasurementSet (e.g., created by setjy, clean, or ft), it can be plotted by selecting “model” from the Data Column menu. For MeasurementSets with float data instead of complex data, common in singledish datasets, select the “float” datacolumn. - Residuals can be plotted via “corrected-model_vector”, “corrected-model_scalar”, “data-model_vector”, data-model_scalar”, “corrected/model_vector”, “corrected/model_scalar”, “data/model_vector”, and “data/model_scalar”. The vector and scalar options distinguish between versions where values like amp, phase, etc. are calculated before (scalar) or after (vector) the subtraction or division. - The plotms task parameters used to select the data columns are xdatacolumn and ydatacolumn. Valid options include ‘data’, corrected’, ‘model’, ‘float’, ‘corrected-model’ (vector implied), ‘corrected-model_vector’, ‘corrected-model_scalar’, ‘data-model’ (vector implied), ‘data-model_vector’, ‘data-model_scalar’, ‘corrected/model’ (vector implied), ‘corrected/model_vector’, ‘corrected/model_scalar’, ‘data/model’ (vector implied), ‘data/model_vector’, and ‘data/model_scalar’. The implied vector residual datacolumns were kept for backwards compatibility. Default data columns for x and y are both ‘data’.
Antenna Pointing Direction Parameters - Ant-Ra, Ant-Dec axes are the longitude and the latitude of the direction to which the first antenna of a baseline points at data-taking timestamps. Their value is computed by: - Interpolating with a user-supplied method the direction of that antenna at that data-taking timestamp, from the known directions pointed by that antenna at pointing-direction-recording timestamps, recorded in MeasurementSet’s POINTING table - Converting the result to a user-specified output reference frame - plotms task parameters to set ant-ra and ant-dec axes parameters are: 1. xinterp: interpolation method to use when xaxis=’ant-ra’ or xaxis=’ant-dec’ 2. xframe: output reference frame to use when xaxis=’ant-ra’ or xaxis=’ant-dec’ 3. yinterp: interpolation method to use when yaxis=’ant-ra’ or yaxis=’ant-dec’ 4. yframe: output reference frame to use when yaxis=’ant-ra’ or yaxis=’ant-dec’ 5. Valid values for xframe and yframe are: ‘icrs’, ‘j2000’, ‘b1950’, ‘galactic’, ‘azelgeo’. The default value is ‘icrs’. 6. Valid values for xinterp and yinterp are: ‘nearest’, ‘cubic spline’, ‘spline’. The default value is ‘cubic spline’.
Note:
‘spline’ is a synonym for ‘cubic spline’
When the interpolation method is set to ‘nearest’, reference frame conversion is performed at the nearest pointing-recording timestamp, not at the data-taking timestamp.
WARNING: plotting antennas pointing directions with the Ant-Ra / Ant-Dec axes has only been implemented for ALMA, ASTE, and NRO data.
Axis Locations
The location of the x-axis and y-axis can be set using the radio buttons in the GUI, where the x-axis can be located at the Bottom (default) or Top, and the y-axis can be located at the Left (default) or Right.
The plotms task parameter to set the y-axis location is yaxislocation. Valid values for this parameter include ‘left’ (default) and ‘right’. There is no parameter to set the x-axis location.
Axes Ranges
The X and Y ranges of the plot can be set manually or automatically. By default, the circle next to Automatic will be checked, and the ranges will be auto-scaled in ascending order based on the values plotted.
To define the range, click on the circle below Automatic and enter a minimum and maximum value in the blank boxes. Note that if identical values are placed in the blank boxes (xmin=xmax and/or ymin=ymax), then the values will be ignored and automatically set. When xmin > xmax or ymin > ymax, the tick values will be descending (reversed).
The plotms task parameter used to set the axes ranges is plotrange, and its value is a list of numbers in the format [xmin, xmax, ymin, ymax] (default [ ], automatic range).
Plotting Multiple Y-Axes
Different values of the same dataset can be shown at the same time. To add a second y-axis, press the Add Y Axis Data button at the bottom of the Axes tab. Then select the parameters for the newly created axis by selecting from the new “Y Axis Data” drop-down menu. If the two y-axes have the same units, they can be displayed both on the same axis. If they are different (or their ranges are dissimilar), e.g. Amplitude and Elevation (both versus Time; see Figure 4 below), one axis should be attached to the left and the other to the right hand side of the plot. Using more than a single y-axis data is also reflected in the Display tab where a drop-down menu appears in order to select multiple y-axis options; here you may colorize each axis differently. See the Plot Display Tab section below to learn more about symbol properties. To remove the additional y-axis, click Delete Y Axis Data at the bottom of the Axes tab.
Overplotting in plotms: Two different y-axes for the same dataset have been chosen for this plot, amplitude and elevation.
The plotms task parameters used to plot multiple y-axes are the same as for a single y-axis: yaxis and yaxislocation; multiple y-axes can be specified as a list of strings if you are specifying the plotms command in the terminal. The values for yaxis and yaxislocation should be set to lists of the same length:
plotms(vis='ngc5921.ms', yaxis=['amp','elevation'], yaxislocation=['left','right'])
Atmospheric Curve Overlays
The ability to compute and overlay an atmospheric transmission curve or a sky temperature curve, available in plotbandpass, has been added to plotms. For this feature, the x-axis must be Channel or Frequency; if another axis is chosen, a warning is issued and the plot continues without the overlay.
plotms uses the dataset’s subtables to compute the mean weather values: pressure, humidity, temperature, and precipitable water vapor (pwv). If these subtables are not found, reasonable defaults are used instead and reported in a log message. The atmosphere tool is then used by plotms to calculate dry and wet opacities to produce the requested overlay curve, corrected by the airmass based on elevation.
Amp vs. Frequency plot with a Tsky overlay. The Tsky y-axis is automatically added on the right, and the curve is plotted in magenta. The Plot > Axes tab shows the radio buttons to select the Overlay: None, Atm, or Tsky.
The plotms task parameters used to plot the overlays are showatm and showtsky. These take boolean values and their defaults are False. Only one overlay can be selected; if both are set to True, only the atmospheric curve (showatm) will be displayed.
plotms(vis=myvis, yaxis='amp', xaxis='freq', showatm=True)
The image sideband curve may also be shown in plotms when the atmospheric transmission or sky temperature curves are plotted. In order to do this, the MS (or associated MS for a calibration table) cannot have reindexed spectral window IDs as a result of a split, and must have an ASDM_RECEIVER table in order to read the LO frequencies. If these conditions are not met, a warning is issued and only the atm/tsky curves are calculated and plotted.
Gain Amp vs. Frequency plot for a bandpass calibration table with the Atm Transmission (magenta) and Image Sideband (black) overlays, colorized by spw and one antenna selected. The Plot > Axes tab shows the checkbox to select the image sideband curve, enabled only when the Overlay is Atm or Tsky.
The plotms task parameter used to plot the image sideband curve overlay is showimage. This takes a boolean value and its default is False. If showatm=False and showtsky=False, a warning is issued and the curve will not be calculated and plotted.
plotms(vis=mycaltable, yaxis='amp', xaxis='freq', antenna='0',
coloraxis='spw', showatm=True, showimage=True)
Iteration and Page Header : The Plot Page Tab
The plotms Plot Page Tab, used to iterate by scan with a page header added. The scan number is appended to the plot title.
Iteration
In many cases, it is desirable to iterate through the data that were selected in the Data tab. A typical example is to display a single baseline in an amplitude vs. time plot and then proceed to the next baselines step by step. This can be done via the Plot > Page tab. A drop-down menu allows you to select the iteration axis, with options None, Scan, Field, Spw, Baseline, Antenna, Time, and Corr. Press the Plot button after changing your selection. Each plot will be autoscaled according to its iteration value range unless a Range is specified in the Axis tab.
The current iteration is indicated in the plot title of the displayed plot. To proceed to the next plot use the green arrow buttons below the main panel. Use the icons to proceed panel by panel (single arrow symbols) or to jump to the first or last panel directly (double arrow symbols).
The number of plots per page can be selected under Options > Grid, the last of the top row of tabs, as described in the Options Tab section. There are two scaling options for the iterated axes in a grid, set in this tab: Global and Shared. Global will use a common axis range based on data loaded with the selection criteria specified in the Data tab. Shared displays one set of x-axes and y-axes for the page rather than per-plot. When left unchecked, Global and Shared results in plots with axes scaling to the data for each individual panel of the iteration. An example of global shared x-axes and y-axes is in the Options Tab section.
The plotms task parameter used to select an iteration axis is iteraxis. The options include ‘scan’, ‘field’, ‘spw’, ‘baseline’, ‘antenna’, ‘time’, and ‘corr’.
To use a global axis range for iterated plots, set parameters xselfscale=True and/or yselfscale=True. To use a shared external x-axis per column on a grid, set xsharedaxis=True (must also set xselfscale=True and gridrows greater than 1). To use a shared external y-axis per row on a grid, set ysharedaxis=True (must also set yselfscale=True and gridcols greater than 1).
Page Header
It is sometimes useful to display above the plots a page header showing some metadata information. To do so, select in the lower list the header items you want to display, and press the antenna-shaped “arrow” pointing up. This will move the items you selected to the upper list showing the header contents, without updating the page header. Multiple items can be selected at once by pressing the Shift or the Control key, Control+A selects all items. To remove items from the Contents list, select in that list the items to remove and press the antenna-shaped “arrow” pointing down. The arrows blink red when clicked while their corresponding selection is empty, green otherwise.
Press the Plot button to update the page header. Items included in the Contents list are laid out on 2 columns in the page header, in “Z” order. The contents of the header is common to all pages.
Header items from multiple plots can be displayed in the page header. In that case items from the first plot are laid out first, items from the second plot are then laid out starting from the first empty row, and so on.
The plotms task parameter used to specify header items is headeritems. The value is a single string whose value can be any comma-separated combination of the following pre-defined keywords:
‘filename’, ‘projid’, ‘telescope’, ‘observer’,’obsdate’, ‘obstime’, ‘targname’, ‘targdir’, ‘ycolumn’
When selected data leaves room for multiple candidates (e.g when selected data spans multiple observations or include multiple fields or sources), the first selected row in MeasurementSet’s Main table is used as a starting point for looking up a single “first” candidate in MeasurementSet’s auxiliary tables.
Observation Start Date and Observation Start Time are looked up in MeasurementSet’s Observation table, and therefore differ from the output of listobs task.
Transforming the Velocity Frame or Phase Center: The Plot Transform Tab
Frequency Frame
If the user plans to plot frequency, the reference frame must be defined. By default, plotms selects the frame keyword (if any) present in the data, usually the frame observed at the telescope unless modified during previous processing. However, transformations can be made by choosing a Frame from the drop-down menu in the Plot > Transform tab. Frequency reference frames can be chosen to be:
LSRK — local standard of rest (kinematic)
LSRD — local standard of rest (dynamic)
BARY — barycentric
GEO — geocentric
TOPO — topocentric
GALACTO — galactocentric
LGROUP — local group
CMB — cosmic microwave background dipole
The plotms task parameter used to select frequency frame is freqframe. Valid options include those listed above (strings with all caps). The default empty string “” results in no frame transformation.
Velocity
If Velocity is selected as an axis, by default the transformation from frequency uses the parameters in the MS metadata, or, if absent, using the central frequency and TOPO frame. The user can change this by using the Frame, Velocity Defn, and Rest Freq options in the Transform tab. The velocity definition is chosen from the Velocity Defn drop-down menu, offering selections of Radio, True (Relativistic), or Optical.
For more information on frequency frames and spectral coordinate systems, see the paper by Greisen et al. (A&A, 446, 747, 2006) (Also at http://www.aoc.nrao.edu/~egreisen/scs.ps)
Finally, the spectral line’s rest frequency in units of MHz should be typed into the Rest Freq input box next. You can use the slsearch task to search a spectral line table, or the Measures tool me.spectralline method to turn transition names into frequencies:
CASA <16>: me.spectralline('HI')
[ Out[17]: ]
{'m0': {'unit': 'Hz', 'value': 1420405751.786},
'refer': 'REST',
'type': 'frequency'}
For a list of known lines in the CASA measures system, use the toolkit command me.linelist(). For example:
CASA <21>: me.linelist()
[ Out[21]: 'HI H186A H185A H184A H183A H182A H181A H180A H179A H178A H177A H176A H175A ]
H174A H173A H172A H171A H170A H169A H168A H167A H166A H165A H164A H163A H162A H161A H160A...
He182A He181A He180A He179A He178A He177A He176A He175A He174A He173A He172A He171A He170A
He169A He168A He167A He166A He165A He164A He163A He162A He161A He160A He159A He158A He157A...
C186A C185A C184A C183A C182A C181A C180A C179A C178A C177A C176A C175A C174A C173A C172A
C171A C170A C169A C168A C167A C166A C165A C164A C163A C162A C161A C160A C159A C158A C157A...
NH3_11 NH3_22 NH3_33 NH3_44 NH3_55 NH3_66 NH3_77 NH3_88 NH3_99 NH3_1010 NH3_1111 NH3_1212
OH1612 OH1665 OH1667 OH1720 OH4660 OH4750 OH4765 OH5523 OH6016 OH6030 OH6035 OH6049 OH13433
OH13434 OH13441 OH13442 OH23817 OH23826 CH3OH6.7 CH3OH44 H2O22 H2CO4.8 CO_1_0 CO_2_1 CO_3_2
CO_4_3 CO_5_4 CO_6_5 CO_7_6 CO_8_7 13CO_1_0 13CO_2_1 13CO_3_2 13CO_4_3 13CO_5_4 13CO_6_5
13CO_7_6 13CO_8_7 13CO_9_8 C18O_1_0 C18O_2_1 C18O_3_2 C18O_4_3 C18O_5_4 C18O_6_5 C18O_7_6
C18O_8_7 C18O_9_8 CS_1_0 CS_2_1 CS_3_2 CS_4_3 CS_5_4 CS_6_5 CS_7_6 CS_8_7 CS_9_8 CS_10_9
CS_11_10 CS_12_11 CS_13_12 CS_14_13 CS_15_14 CS_16_15 CS_17_16 CS_18_17 CS_19_18 CS_12_19
SiO_1_0 SiO_2_1 SiO_3_2 SiO_4_3 SiO_5_4 SiO_6_5 SiO_7_6 SiO_8_7 SiO_9_8 SiO_10_9 SiO_11_10
SiO_12_11 SiO_13_12 SiO_14_13 SiO_15_14 SiO_16_15 SiO_17_16 SiO_18_17 SiO_19_18 SiO_20_19
SiO_21_20 SiO_22_21 SiO_23_22'
The plotms task parameters used to set velocity definition and rest frequency are veldef and restfreq. Valid options for veldef are ‘RADIO’, ‘TRUE’, or ‘OPTICAL’ (default is ‘RADIO’). restfreq should be in a string in MHz, for example ‘22235.08MHz’.
Setting the Phase Center
The plot’s phase center can be changed in the Plot > Transform tab. This will allow coherent vector averaging of visibility amplitudes far from the phase tracking center. Enter the direction coordinates of the new phase center, specified as absolute world coordinates including frame, e.g. ‘J2000 19h53m50 40d06m00’. Time-dependent systems such as AZEL are not supported, nor are ephemeris objects. This will change the plot but not the MeasurementSet data. To change the phase center in the MS, use task phaseshift.
The plotms task parameter used to set the phase center is phasecenter. Its value should be a string as described above.
Display Options for Plots: The Plot Display Tab
Colorizing Your Data
Data points can be given informative symbol colors using the Colorize option in the Plot > Display tab. By checking the box next to Colorize and selecting a data axis from the drop-down menu, the data will be plotted with colors that vary along that axis. For example, if “corr” is chosen from the Colorize menu, “RR”, “LL”, “RL”, and “LR” data will each be plotted with a different color. Note that Colorize while plotting flagged data will override the default flagged red symbol color.
The plotms task parameter used to colorize data is coloraxis. Options include ‘scan’, ‘field’, ‘spw’, ‘antenna1’, ‘antenna2’, ‘baseline’, ‘channel’, ‘corr’, ‘time’, ‘observation’, and ‘intent’.
Customizing Your Symbols
Unflagged and flagged plot symbols can be customized in the Plot > Display tab. Most fundamentally, the user can choose to plot unflagged data and/or flagged data. By default, unflagged data is plotted (the circle next to Default is selected under Unflagged Points Symbol), and flagged data is not plotted (the circle next to None is selected under Flagged Points Symbol). We note here that plotting flagged data on an averaged plot is undertaken at the user’s own risk, as the distinction between flagged points and unflagged points becomes blurred if data are averaged over a dimension that is partially flagged. Take, for example, a plot of Amplitude vs. Time where all channels are averaged together, but some channels have been flagged due to RFI spikes. In creating the average, plotms will skip over the flagged channels and only use the unflagged ones. The averaged points will be considered unflagged, and the flagged data will not appear on the plot at all.
Symbol options include:
None — no data points
Default — data points which are small circles (blue for unflagged data and red for flagged data)
Custom — allows the user to define a plot symbol
If Custom plot symbols are chosen, the user can determine:
Size, by typing a number in the blank box next to px or by clicking on the adjacent up or down arrows.
Shape, chosen from the drop-down menu; options include circle, square, diamond, pixel, or autoscaling. Note that pixel only has one possible size. autoscaling attempts to adjust the size of the points from dots to circles of different sizes, depending on how many points are plotted.* *
Color, chosen by typing a hex color code in the Fill input box or by clicking on the … button and selecting a color from the pop-up GUI.
Fill, using the adjacent drop-down menu for how heavily the plot symbol is shaded with this color, from heaviest to lightest; options include fill, mesh1, mesh2, mesh3, and no fill.
Outline, by selecting None (no outline) or Default (outlined in black)
Note that if “no fill” and Outline: None are selected, the plot symbols will be invisible.
The plotms task parameter and subparameters used to customize unflagged symbols include:
customsymbol (True/False, default False) - must be True for subparameters to take effect
symbolshape (‘autoscaling’, ‘circle’, ‘square’, ‘diamond’, ‘pixel’, ‘nosymbol’, default ‘autoscaling’)
symbolsize (in number of pixels, default 2)
symbolcolor (RGB hex code e.g. ‘aa55ff’ or string color name e.g. ‘purple’, default ‘0000ff’ blue)
symbolfill (‘fill’, ‘mesh1’, ‘mesh2’, ‘mesh3’, ‘no fill’, default ‘fill’)
symboloutline (True/False, default False)
The plotms task parameters used to customize flagged symbols include customflaggedsymbol (default False) with subparameters flaggedsymbolshape (default ‘nosymbol’), flaggedsymbolsize (default 2), flaggedsymbolcolor (default ‘ff0000’ red), flaggedsymbolfill (default ‘fill’), and flaggedsymboloutline (default False). Supported values are the same as for unflagged symbols.
Symbols for Multiple Y-Axes
If you have added an additional y-axis in the Plot > Axes tab, you may customize each y-axis individually by selecting the axis in the Y Axis Data pull-down menu at the top of the Plot > Display tab and then customizing the symbols for that axis.
To set multiple symbols in the plotms task, set the symbol parameters as a list:
plotms(vis='ngc5921.ms', yaxis=['amp','elevation'], yaxislocation=['left','right'],
customsymbol=[True,True], symbolcolor=['purple','green'])
In this plot, the ‘amp’ axis will be purple, and the ‘elevation’ axis will be green.
Connecting the Points
Plotms has the capability to connect points for calibration tables; support for MeasurementSets will be added later. The points are colorized and connected along the x-axis or time axis by line or step. Points with the same metadata but varying values of the x-axis or time are connected. Unflagged points are not connected to flagged points, even when they are not displayed. The Colorize axis will override the connection colorization.
Plot Display tab showing the Connect Points options for a gain table. Here, points with the same spw, channel, polarization, and antenna1 are connected along the time axis.
The plotms task parameters used to connect points in a calibration table plot are xconnector (default “none”, options “line” or “step”) and timeconnector (default False, or True to connect along the time axis instead of x-axis).
For an antenna position (KAntPos Jones) calibration table, the x, y, z antenna positions are located in the first axis of the data, normally the correlation axis. To distinguish these offsets, set coloraxis=’corr’ or xconnector=’line’ as shown in the figure below. To determine which points are x, y, and z, use the Locate tool.
Plot Display tab showing the Connect Points options for an antenna position table.
Plot Labels: The Plot Canvas Tab
Plot Title
Options to change the plot title include None (no title), Default, and a user-input string. To set the plot title, under Title, click on the circle next to the input box and enter the desired text. This text box shows the grayed-out default string, “%%yaxis%% vs. %%xaxis%%” (to substitute the axis names for “yaxis” and “xaxis”). The user can also choose the size of the title font by checking the Title Font checkbox and entering the font size or using the arrows to increase or decrease the value. The default is to scale the title font depending on the plot size.
The plotms task parameters used to set the title and its font are title (default empty string “” for yaxis vs. xaxis) and titlefont (default 0 to autoscale). Set a space ” ” for no title.
Legend
A plot symbol legend can be added to the plot by clicking on the checkbox next to Legend. For a simple plot, a symbol legend simply echoes the plot axes (e.g. “Amp vs Time”) but is useful when overplotting data with custom colors so that you can identify the data (e.g. “Amp vs Time” in blue and “Phase vs Time” in green on the same plot).
When enabled, a drop-down menu next to Legend allows the user to select the legend location either within the plot (Upper Right, Lower Right, Upper Left, Lower Left) or outside the plot (Out Right, Out Left, Out Top, Out Bottom).
The plotms task parameter used to enable the legend is showlegend (default is False). To select the legend location, use showlegend=True and set legendposition to ‘upperRight’, ‘upperLeft’, ‘lowerRight’, ‘lowerLeft’, ‘exteriorRight’, ‘exteriorLeft’, ‘exteriorTop’, or ‘exteriorBottom’ (default empty string “” == upperRight).
Axis Labels
To enable the X- and Y-axis labels, check the Show Label checkboxes under X Axis and Y Axis (default is checked). As with the plot title, the user may set the label to None (no label), Default (axis name with units), or type the desired text in the blank box. The font size of labels can also be customized by enabling then setting the font size for each axis.
The plotms task parameters used to set the label text and font are xlabel and ylabel (default empty string “” is axis name with units); set to ‘ ‘ space to disable label. Set font size with xaxisfont and yaxisfont (default 0 == autoscale).
Grid Lines
A grid of lines can be superimposed on the plot using Grid Lines in the Plot > Canvas tab. “Major” grid lines are drawn at the locations of major tick marks, while “minor” grid lines are drawn at minor tick marks.
Grid line colors, thicknesses, and styles are selected independently for the “major” and “minor” grid lines. Desired line thickness should be typed into the blank boxes just to the right of the Major and Minor labels. Colors are set by clicking on the … buttons. The blank boxes to the left of the … buttons will then contain the hex codes for the selected colors (e.g., “808080”). Line styles can also be selected from the drop-down menus to the right of … buttons; style options include solid, dash, dot, and none.
The plotms task parameter used to add and customize major grid lines include showmajorgrid (default is False) with subparameters majorwidth (default is 1), majorstyle (‘solid’, ‘dash’, ‘dot’, ‘none’; default is ‘solid’), and majorcolor (RGB hex code or color name; default is ‘b0b0b0’ dark gray).
Parameters for minor grid lines include showminorgrid (default is False) with subparameters minorwidth (default is 0), minorstyle (default is ‘solid’), and minorcolor (default is ‘d0d0d0’ light gray).
Flag Extensions: The Flag Tab¶
The plotms Flag tab. Here the Extend flags box has been checked, enabling the Correlation (selected) and Channel options. The plot shows unflagged data in blue and flagged data in red.
See the section below on interactive flagging in plotms. The options in this tab allows the user to have flagging extend to other data points besides what is marked on the plot.
When enabled with the Extend flags checkbox, the user may choose to extend flags based on correlation or channel by checking the corresponding checkboxes. Future options for flag extensions are planned.
By checking the boxes next to Extend Flags and Correlation, flags will be extended beyond the correlations displayed. Currently the only option is to extend to All correlations as noted by the radio button, implying that all correlations will be flagged. For example, with RR displayed, the correlations RR, RL, LR, and LL will all be flagged when this option is enabled.
By checking the boxes next to Extend Flags and Channel, flagging will be extended to other channels in the same spw as the displayed point. For example, if spw=’0:0’ and channel 0 is displayed, then flagging will extend to all channels in spw 0.
The plotms task parameter used to extend flags is extendflag (True/False, default is False) with subparameters extcorr (True/False), and extchannel (True/False). These parameters will enable flag extensions when interactively flagging the plot.
Interactive Tools: The Tools Tab, Annotate Tab, and Tool Icons¶
The plotms Tools tab. Here the Tracker Display tool is showing the (X,Y) coordinates of the cursor position. A previous position was saved to the text box by pressing the SPACE bar.
Various interactive GUI tools are selectable with the radio buttons in the Hand Tools section of the Tools tab at the top of the plotms window. They are also available as icon buttons at the bottom of the plotms window. These tools can be used to zoom, pan, annotate, flag/unflag, and locate data. Described below are the bottom icon buttons in order.
Zoom — The “magnifying glass” button (1st on left) lets you draw a box around a region of the plot (left-click on one corner of the box, and drag the mouse to the opposite corner of the desired box), and then zooms in on this box.
Pan — The “four-arrow” button (2nd from left) lets you pan around a zoomed plot.
Annotate — The 3rd button from the left is chosen from a drop-down menu to either Annotate Text (“T with a green diamond” button) or Annotate Rectangle (“pencil” button). With Annotate Text activated, click on a location in the plot where text is desired; a window will pop up, allowing you to type text in it. When you click the OK button, this text will appear on the plot. Annotate Rectangle simply lets you draw a box on the plot by left-clicking and dragging the mouse. By clicking on the Annotate tab near the top of the plotms window, different fonts, colors, line styles, etc. can be selected for annotations.
Stack Base — The “house” button (5th from left) returns to the original zoom level.
Stack Back and Stack Forward — The left and right arrow buttons (4th and 6th from left) step through the zoom settings you’ve visited.
Mark Regions — The “box with a green diamond” button (7th from left) lets you mark a region for flagging, unflagging, or locating. Left-click on one corner of the desired region, and then drag the mouse to set the opposite corner of the region. You can mark multiple boxes before performing an operation on them. The selected regions will appear on the plot as shaded rectangles.
Subtract Regions — The “box with a minus sign” button (8th from left) lets you de-select marked regions (draw around a marked region and the shaded area will disappear). To de-select all marked regions, use the next button.
Clear Regions — Clicking on the “box with a red circle” button (9th from left) will clear all regions which have been marked using Mark Regions.
Locate — The “magnifying glass on a sheet of paper” button (10th from left) will print out information about points in the marked regions. This information is printed to the shell terminal when plotms was started with casaplotms, or to the casa logger/logfile when plotms was started in a casa python session. The header of the output indicates the plotted X and Y axes and the range of values in the selected region. The output for each point includes scan, field, time, baseline, spw, channel, frequency, correlation, X, Y, and observation ID. By copying this list to a text file, or setting a new logfile with casalog.setlogfile as described in the CASA logger documentation, the Locate information can be edited to provide input for flagdata. To list an entire column, e.g. all visibilities for a source, use the listvis task or the table tools.
Flag — Click on the “flag” button (11th from left) to flag all points in the marked regions. See the section below on Interactive Flagging.
Unflag — Click on the “crossed-out flag” button (12th from left) to unflag any flagged points in the marked regions (even if not displayed).
Flag All — Click on the “per-grid flag/unflag” button (13th from left) to enter/leave the “Flag All” mode. See the section below on Interactive Flagging.
Iteration — The next four green arrow buttons (14th through 17th from left) control iteration, with the first and last “double arrow” buttons used to display the first and last iteration, and the center two “single arrow” button to display the previous or next iteration. If the plots are on a grid, these arrows navigate through the pages of plots which contain multiple iterations.
Hold Drawing — If the Hold Drawing button (rightmost, or 18th from left) is clicked to activate it, when new plot axes are selected from the Plot > Axes tab, the new data will be cached but not plotted. When the button is clicked again (de-activated), it will automatically plot the data that was last requested. This can be particularly useful when changing the size of the plotms window.
The Tools tab also contains Tracker tools including Hover and Display. When Hover is selected and the mouse is moved over the plot, the pointer’s position is displayed on the plot in (X, Y) format. When Display is selected, the (X, Y) position is displayed in the text box under the Display checkbox.
To record various tracked positions, enable Display then click on the plot to activate it. As usual, moving the pointer displays the position in the small display text box. Pressing the SPACE bar will copy the displayed line into the larger white box below it. This can be repeated many times and a log of positions and values will be created. The content in the box can then be easily copied and pasted into any other application that is used for data analysis. The Clear button wipes out the content of the box for a fresh start into new scientific adventures.
Miscellaneous Options: The Options Tab¶
A few miscellaneous per-page plot options and GUI options are available in the Options tab, the last tab at the top of the plotms window.
Plotting on a Grid
The layout of the page is set on the plotms Options tab. For multiple plots per page, set the grid layout, the number of rows and columns that determine the number of sub-plots. When set, click “Update” to activate the grid changes.
Plotting Iterations on a Grid
The plotms Options tab. Here a 2x2 grid has been created with iteration on the ‘antenna’ axis.
If iteration is enabled in the Plot > Page tab, the grid will be filled automatically with each iterated plot. The Plot > Page tab is also where common axis scales and shared axes will be set; they are enabled for the plot in Figure 9. These axis options are only available for iterated plots in a grid.
The plotms task parameters used to create a grid with iteration include gridrows and gridcols (default is 1). To create the plot shown in Figure 9, the plotms command would be:
plotms('ngc5921_ut.ms', xaxis='freq', iteraxis='antenna', gridrows=2, gridcols=2,
xsharedaxis=True, xselfscale=True, ysharedaxis=True, yselfscale=True)
Plotting Multiple Data on a Grid
We note here that plotting multiple datasets or axes on a grid is possible in plotms but covered separately in the Plotting Multiple Data section below, as this involves many settings in the GUI or multiple plotms task commands. Since the grid affects all of the plots, its settings are in the Options tab rather than the Plot tab.
Tool Button Style
The Tool Button Style drop-down menu determines the format of the tool buttons at the bottom of the plotms window. The options include Icon Only, Text Only, Text Beside Icon, and Text Under Icon. In Icon Only mode (default), hovering the cursor over each icon will give a text description of the icon.
To hide the bottom icons, see the description of the View menu. The tools can also be accessed in the Tools tab.
Log Events
This drop-down menu shows a checklist of events and plotms functions so that you can customize how verbose plotms is in documenting its actions.
Clear Regions and Annotations
The When changing plot axes, clear any existing regions or annotations checkbox determines when regions and annotation are deleted from the plot. By default, this is enabled.
Cached Images
A useful option is the Use fixed size for cached image checkbox. It determines how large the dots in the panel are with respect to the screen resolution. The values influence how the data is redrawn on the panel. When Screen resolution is selected, the plotms window can be resized without redrawing on the canvas – a considerable speedup for large data sets. The penalty is that the dots of the data points are the size of a pixel on the screen, which may be very small for high resolution monitors. By default, this feature is not enabled.
File Chooser History Limit
This setting allows the user to limit how many remembered filepaths are displayed in file chooser dialogs produced by clicking Browse in the Plot > Data tab to select a MeasurementSet or calibration table and in the Plot > Calibration tab to select a calibration library.
The Plotms Menus¶
File Menu: Quit
The File menu in the top menu bar allows you to Quit plotms, or you can click the X in the upper right corner of the window.
Export Menu: Saving Your Plot
You can save a copy of a plot to file with the Export menu, which produces an Export Plots dialog box with many settings.
Filename: Click the Browse button for a GUI-based selection of the directory and filename to which the plot will be saved, or click the Insert MS Name button to minimize typing. You may also just type in a file name. The file format can be determined in this GUI by the suffix given to the filename: .png, .jpg, .ps, .pdf, and txt.
If a file already exists with the given filename, it will be overwritten without warning!
Format: Alternatively, the file format can be selected from the Format drop-down menu, with these options: [by file extension], PNG, JPG, PS, PDF, and TEXT. For the first option, if your filename is “test.jpg” the plot will be exported in JPG format. For the other formats, plotms will use the filename as given and not add a suffix to indicate its format. See below for an example of TEXT format; the header will include the name of the visibility and other specified parameters including selection, averaging, transformations, etc.
Verbose: When a text export is selected, the output can be verbose (include metadata). When this checkbox is unchecked, the text export will include x and y values only. This parameter is ignored for other formats.
Range: When iteration is chosen, producing multiple plots, you may select to export only the Current Page or All Pages. Each saved plot will have the name of the iteration appended to the given filename before the extension. For example, with filename “ngc5921_ut.jpg” and iteration on antenna, the first plot will be named ngc5921_ut_Antenna1@VLA:N7.jpg. This is so the exported plots can be identified without viewing them. Be warned that if you are plotting iterations on a grid, the filenames will have all of the iterations on the page appended, which can lead to a very long filename. Filenames exceeding 255 characters in length will be automatically shortened upon export. One plotfile per page is produced; multipage pdf exports are not currently supported.
High Resolution: Exporting to images in screen resolution is currently not working, so plot exports are always high resolution. A notice is issued in the console/log.
DPI, Size: Use the text boxes or up/down arrows to set the output DPI or size (in pixels) of the exported plot.
Export: When settings are complete, click Export to create your plotfile.
Note: The plot files produced by the PS and PDF options can be large and time-consuming to export. The JPG is the smallest.
The TEXT format will not save an image but all of the data points themselves. This allows one to dump the current plot into a file that is used in other programs for further processing.
ALERT: The exported TEXT file can be quite large and take some time to create. Using averaging, selection, etc. is recommended to keep the file size manageable. If a region is marked as described in the Interactive Tools section, only those points are exported to the text file.
The reported data is the same as when using the Locate button in plotms, with the following format (when verbose=True):
#vis: ngc5921_ut.ms
#scan: 4
#channel average: 63
#time average: None
#From plot 0
#x y chan scan field ant1 ant2 ant1name ant2name time freq spw corr obs
#Time Amp None None None None None None None MJD(seconds) GHz None None None
4304483429.999 62.7219 0 4 1 1 1 2@VLA:W1 2@VLA:W1 4304483429.999 1.413332233 0 RR 0
4304483429.999 59.0717 0 4 1 1 1 2@VLA:W1 2@VLA:W1 4304483429.999 1.413332233 0 LL 0
4304483429.999 59.0252 0 4 1 27 27 28@VLA:W7 28@VLA:W7 4304483429.999 1.413332233 0 RR 0
4304483429.999 60.8603 0 4 1 27 27 28@VLA:W7 28@VLA:W7 4304483429.999 1.413332233 0 LL 0
4304483429.999 57.4048 0 4 1 7 7 8@VLA:W8 8@VLA:W8 4304483429.999 1.413332233 0 RR 0
etc.
where x and y are the two plotted axes and the other columns contain additional information such as the baselines and frequencies, with an additional line for units. The header for the file includes the name of the visibility and other parameters such as selection, averaging, etc.
The plotms task parameter used to export plots is plotfile. Unlike the Export Plots dialog box in the GUI, plotms will issue an error if this file exists unless overwrite=True. Its subparameters include:
expformat (‘jpg’, ‘png’, ‘pdf’, ‘ps’, ‘txt’) - select the format if no extension is included in the plotfile. If there is no plotfile extension and no expformat set, the plot will be exported as a PNG.
verbose (True/False, default is True) - include metadata in text export; ignored for other formats.
exprange (‘current’, ‘all’; defaults to ‘current’) for iterated plots
highres (True/False, default is False)
dpi - dots per inch
width (in pixels)
height (in pixels)
overwrite (True/False, default is False)
Summary Menu: Information About Your Dataset
Information about the MeasurementSet can be obtained from within plotms by clicking on the Summary menu in the top menu bar. If All is chosen from the pull-down menu next to Type, listobs-style output about scans, correlator configurations, and antennae will be written to the console or log. Other options for a subset of the data include Where, What, How, Main, Tables, Antenna, Feed, Field, Observation, History, Polarization, Source, Spectral Window, Spectral Window and Polarization, SysCal, and Weather.
For calibration tables, options in the Summary menu include All, Where, What, How, Main, Tables, Antenna, Field, Observation, History, and Spectral Window.
For more detail, click on the Verbose checkbox.
View Menu: Hide or Display Tool Icons
This menu controls the display of tool icons. Use the View > Toolbars menu to de-select and hide the Tools, Iteration (green arrows), or Display (Hold Drawing) icons. By default these icons are all selected and displayed at the bottom of the plotms window.
Help Menu: About plotms
This menu’s About option describes plotms and the versions of CASA, Qt, and Qwt it uses, along with links. Qt is the software framework that plotms uses for its GUI, and Qwt is a library that provides plotting functionality on top of the Qt framework.
Click About Qt for more detail about this software package and its licensing.
Plotting Multiple Data¶
Overplotting Multiple Datasets or Axes on the Same Plot
It is possible to overplot two datasets on the same plot, or the same dataset with different y-axes in each plot. To do this, set up the first plot as usual. Then press the Add Plot button at the bottom left of the plotms window. This will bring up an additional data input panel in the Plot > Data tab where you can specify the plot parameters as you did for the first one, which is automatically minimized. Use the slider to scroll vertically through the panels. Use right-click options or the Minimize, Maximize, or Close buttons to keep a better overview on the individual datasets.
When overplotting, you may want to set different custom colors for each dataset in its Display tab. If you are plotting different axes or the axes ranges are a significantly different range of values, you may want to set different axes locations for each plot in the Axes tab. When you are done, click the Plot button to see the overplot.
Use the Close button in each data panel to close the panel and remove that plot.
In the plotms task interface, you can overplot by invoking plotms more than once with clearplots=False. Each plotms command corresponds to a plot to go on top of previous ones, and each must have its own plotindex (0-based, default is 0). Otherwise, with the same plotindex, the second plot will overwrite the first. In the following example, we are plotting Scan vs. Time for MeasurementSet test1.ms with plotindex 0, and Field vs. Time for MeasurementSet test2.ms on the same plot with plotindex 1. The test2 data is a different color and its yaxis is on the right.
Note that since the plotindex is an index into subplots, this parameter must be assigned in consecutive order. If a plotindex is skipped, plotms will adjust the index number and inform the user of the corrected plotindex value.
plotms(vis='test1.ms', yaxis='scan')
plotms(vis='test2.ms', yaxis='field', plotindex=1, clearplots=False, customsymbol=True, symbolcolor='00FF00', yaxislocation='right')
Plotting Multiple Datasets or Axes on a Grid
plotms allows you to plot more than one dataset or axes on the same page by specifying a grid size then a grid location for each plot as described below. Here is an example of two plots with different datasets:
Plotting multiple data sets on a 2x1 grid. Here, the MS is plotted in grid location (1,1). Then the Add Plot button was used to select its bandpass calibration table and plot it in grid location (2,1).
The process is similar to the one above, except that you specify the grid and each plot’s location:
Set up your first plot as described above.
Use the Options tab to set up a grid by incrementing the number of rows and/or columns. By default the plot you set up in step 1 will be in row 0, column 0.
Use the Add Plot button to set up the second plot’s parameters. Pay particular attention to the new dataset’s Page tab, where you can set the Grid Location (row and column number) of the new plot. This section appears only when a grid is set up.
Unlike iteration, you cannot share axes among the plots.
Add as many plots as you desire to fill your grid, then click Plot.
Several plotms task parameters are used to create a grid and specify a plot location.
gridcols and gridrows define the number of plots on the screen.
colindex and rowindex (0-based) set the location of an individual plot
plotindex (0-based) must be incremented by 1 for each plotms call
clearplots is set to False to keep previous plots
Here is an example of multiple plotms calls to set up two plots on a grid and export the plot page; note the defaults on the first call are rowindex=0, colindex=0, plotindex=0 so just set up the grid. On each subsequent plotms call set clearplots=False and increment the plotindex by 1. To save the gridded plot, set a plotfile on the final plot.
plotms(vis='test1.ms', yaxis='field', gridrows=2, gridcols=1)
plotms(vis='test2.ms', yaxis='field', gridrows=2, gridcols=1, rowindex=1, colindex=0, plotindex=1, clearplots=False, plotfile='fields.jpg')
Interactive Flagging¶
Interactive flagging, on the principle of “see it — flag it”, is possible on the X-Y display of the data plotted by plotms. Mark one or more regions as described below, then flag or unflag the data that falls in these regions of the display.
Do not attempt to flag data while another task is accessing the same data set.
A note about Table locks
For plotting, plotms opens the MeasurementSet read-only, so there should be no problem if another task accesses the same dataset, unless the other task locks the file. When this happens, you can wait for the lock to be released, cancel cache-loading in the plotms dialog box, type go clearstat at the prompt, or exit plotms. Do not attempt to flag data in plotms while another task is accessing the same data set, as in this case plotms must open the MeasurementSet with a file lock for writing.
Mark data for flagging
Using the row of icons at the bottom of the plotms window, click on the Mark Regions button, then mark a region by left-clicking and dragging the mouse. Each click and drag will mark an additional region. You can remove all of your marked regions by clicking on the Clear Regions button. Once regions are marked, you can then click on one of the other buttons to take action:
Flag — flag all of the points in the region(s),
Unflag — unflag flagged points in the region(s),
Locate — list information (X and Y value, scan, field, baseline, frequency, etc.) about the points in the region(s) to the command line or log (Warning: this could be a long list!).
WARNING: If you Flag a region, using Unflag will not return the data to its previous state but will unflag all of the data in the marked region.
The following figure shows an example of marking regions and then clicking the Flag button. Whenever you click on a button, that action occurs without requiring an explicit disk-write. If you quit plotms and re-enter, you will see your previous edits because your flag changes were written to the MeasurementSet on disk.
Plot of amplitude versus time, before (top) and after (bottom) flagging two marked regions. Note that flagged data is not displayed so these regions are not plotted after flagging. To unflag these regions, mark the two same regions and click the Unflag button.
New interactive flagging is available in CASA 5.5 and later. The button Flag All, which is located next to the Unflag button, can turn on and off “Flag all/Unflag all” mode by clicking it. Instead of flag/unflag operation on selected region, the “Flag all/Unflag all” mode allows you to flag/unflag whole data associated with the grid. The usage of this mode is as follows:
- Press the Flag All button – enter the “Flag all/Unflag all” mode. Background color of completely flagged grids will become yellow.
- Select grids to flag/unflag – click each grid to select for flag/unflag when the mode is active. Unflag is selected for the grids where all data are already flagged, otherwise flag is selected. The background color of the grids selected for flag will change to yellow while the grids selected for unflag will change to the default color.
- Press the Flag All button again – leave the “Flag all/Unflag all” mode. At this moment, flag/unflag operations are applied to the data of the currently displayed grids selected in the previous step, and each grid is updated accordingly.
WARNING: On macOS, the “Flag all/Unflag all” mode does not work as expected!
On macOS, background color of grids doesn’t yet change properly although flag/unflag operations work fine. It is not recommended to use this mode on macOS.
WARNING: You cannot “undo” flagging to a previous state!
plotms does not automatically create flag backups in the <msname>.flagversions file. It is thus recommended to save the initial flags with the flagmanager task before starting plotms interactive flagging. Important intermediate flagging stages may also be saved during plotms flagging in the same fashion. Flagging can also be performed using the interactive msview task or scripted with the flagdata or flagcmd tasks.
WARNING: Use of flag extensions may lead to deletion of much more data than desired. Be careful!
Flags can also be extended with options in the Flag tab; see that section for a more detailed description of these options. Flag extension enables the user to plot a subset of the data and extend the flagging to a wider set. In this release, the only functional extensions are correlation and channel.
WARNING: Interactive flagging doesn’t support a collaboration with Iteration buttons!
The flag/unflag operations are applied to currently displayed grids only, although you can move to other iterations in the “Flag all/Unflag all” mode.
Scripting With No GUI¶
When scripting to produce exported plotfiles, set the plotms parameter showgui=False to suppress the GUI and pop-up dialog boxes which require a user response. The default is showgui=True.
Exiting plotms¶
To exit the plotms GUI, select Quit from the File menu at the top of the plotms window, or click the “X” on the frame.
Alternatively, plotms will keep running in the background and update with each subsequent plotms call. If the data file changes in the background while you are using other tasks, you can force reloading the data via the Reload checkbox next to the Plot button, or press SHIFT while clicking on Plot for the same purpose.
If started in a casa session, plotms will automatically quit when the session is ended.
Flag using flagdata¶
Flagging MeasurementSets and calibration tables
flagdata is the main flagging task in CASA. flagdata can flag MeasurementSets and calibration tables with an elaborate selection syntax. It also contains auto-flagging routines.
Other tasks related to flagging are flagmanager to save the existing flags before adding more, and plotms and msview for interactive flagging. In addition, optionally, data for which calibration solutions have failed will be flagged when these solutions are applied.
The inputs to flagdata are:
#In CASA
CASA<1>: inp flagdata
#flagdata :: All-purpose flagging task based on data-selections and flagging modes/algorithms.
vis = '' #Name of MS file or calibration table to flag
mode = 'manual' #Flagging mode
field = '' #Field names or field index numbers: '' ==> all, field='0~2,3C286'
spw = '' #Spectral-window/frequency/channel: '' ==> all, spw='0:17~19'
antenna = '' #Antenna/baselines: '' ==> all, antenna ='3,VA04'
timerange = '' #Time range: '' ==> all,timerange='09:14:0~09:54:0'
correlation = '' #Correlation: '' ==> all, correlation='XX,YY'
scan = '' #Scan numbers: '' ==> all
intent = '' #Observation intent: '' ==> all, intent='CAL*POINT*'
array = '' #(Sub)array numbers: '' ==> all
uvrange = '' #UV range: '' ==> all; uvrange ='0~100klambda', default units=meters
observation = '' #Observation ID: '' ==> all
feed = '' #Multi-feed numbers: Not yet implemented
autocorr = False #Flag auto-correlations
action = 'apply' #Action to perform in MS and/or in inpfile (none/apply/calculate)
display = '' #Display data and/or end-of-MS reports at runtime (data/report/both).
flagbackup = True #Back up the state of flags before the run
savepars = False #Save the current parameters to the FLAG_CMD table or to a file
vis can take a MeasurementSet or calibration table. Data selection for calibration tables is limited to field, scan, time, antenna, spw, and observation. See section at end about which parameters are applicable for calibration tables. Since calibration tables do not have a FLAG_CMD table, parameter settings, if requested, can only be saved in external files.
The *mode* parameter
The mode parameter selects the flagging algorithm and the following are available:
list = list of flagging commands to apply to MS
manual = flagging based on specific selection parameters
clip = clip data according to values
quack = remove/keep specific time range at scan beginning/end
shadow = remove antenna-shadowed data
elevation = remove data below/above given elevations
tfcrop = automatic identification of outliers on the time-freq plane
rflag = automatic detection of outliers based on sliding-window RMS filters
antint = flag integrations if all baselines to a specified antenna are flagged
extend = extend and/or grow flags beyond what the basic algorithms detect
summary = report the amount of flagged data
unflag = unflag the specified data
these are described in detail lower down.
Flagging will only be applied to the data selection that is performed with the usual selection parameters. The dataset is iterated-through in chunks (small pieces of data) consisting of one field, one spw, and a user-defined timerange (default is one scan). In addition to the typical antenna, spw, timerange, etc. selections, we would like to point out some addition of the correlation syntax for modes clip, tfcrop, and rflag. One can combine correlation products with simple mathematical expressions
'ABS', 'ARG', 'RE', 'IM', 'NORM'
where ABS is the absolute value, RE, the real and IM the imaginary part. NORM and ARG refer to the amplitude and phase. This parameter is followed by the polarization products (using an underscore in between “_” )
'ALL', 'I', 'XX', 'YY', 'RR', 'LL', 'WVR'
‘WVR’ refers to the water vapour radiometer of ALMA data. Note that the operators ABS,ARG,RE, etc. are written only once as the first value. if more than one correlation is given, the operator will be applied to all of them. An example would be
correlation = 'RE_XX, XY'
which would select all real XX and XY polarization for flagging.
The *action* parameter
The parameter action controls whether the actual flagging commands will be applied or not and the options are the empty string “, ‘apply’ and ‘calculate’.
apply is likely the most popular one as it applies the flags to the MS:
#In CASA:
action = 'apply' #Action to perform in MS and/or in inpfile
#(none/apply/calculate)
display = '' #Display data and/or end-of-MS reports at runtime
#(data/report/both).
flagbackup = True #Back up the state of flags before the run
flagbackup specifies if a backup of the current flags should be saved in the “*.flagversions” file. display can be “, ‘data’, ‘report’, ‘both’ where the empty string ” will report no individual flagging statistics, whereas ‘data’ launches an interactive GUI to display data and flags for each chunk to browse through. The plots are time-frequency planes and both old and new flags are being overlaid for all correlations per baseline. In the GUI, one can step though all chunks for inspection and if the flagging is unsatisfactory, one can exit without applying the flags. If the flagging is acceptable, it is also possible to continue flagging without viewing all chunks (the number of chunks can be very large for typical JVLA and ALMA data sets. display=’report’ lists the flagging statistics at the end of the procedure on the screen and both starts the GUI and reports all statistics at the end.
action=’calculate’ calculates the flags but does not write them to the MS or calibration table. This is useful if one would like to inspect the computed flags in the GUI without a straight application:
action = 'calculate' #Action to perform in MS and/or in inpfile (none/apply/calculate)
display = '' #Display data and/or end-of-MS reports at runtime (data/report/both).
The empty string action=” will do nothing and is useful when the commands themselves shall only be written to the FLAG_CMD sub-table or to an external file using the savepars parameter to specify the filename.
savepars will save the flagging commands to a file that can be later used for input in flagdata via mode=’list’. It also shares the flagcmd syntax and can be used there. The file name is specified by outfile and, if empty, the FLAG_CMD table in the MS will be populated. A REASON can be given by the reason parameter which may be useful for bookkeeping as well as for unflagging data that are marked by specific REASON parameters. The overwrite parameter will control overwriting an existing file when saving the flag commands.
Flagging Modes
Manual Flag/Unflag
mode = 'manual' #Flagging mode (list/manual/clip/shadow/quack/el
#evation/tfcrop/rflag/extend/unflag/summary)
field = '' #Field names or field index numbers: '' ==> all,
#field='0~2,3C286'
spw = '' #Spectral-window/frequency/channel: '' ==> all,
#spw='0:17~19'
antenna = '' #Antenna/baselines: '' ==> all, antenna
#='3,VA04'
timerange = '' #Time range: '' ==>
#all,timerange='09:14:0~09:54:0'
correlation = '' #Correlation: '' ==> all, correlation='XX,YY'
scan = '' #Scan numbers: '' ==> all
intent = '' #Observation intent: '' ==> all,
#intent='CAL*POINT*'
array = '' #(Sub)array numbers: '' ==> all
uvrange = '' #UV range: '' ==> all; uvrange ='0~100klambda',
#default units=meters
observation = '' #Observation ID: '' ==> all
feed = '' #Multi-feed numbers: Not yet implemented
autocorr = False #Flag auto-correlations
The ‘manual’ mode is the most straight-forward of all modes. All visibilities that are selected by the various data selection parameters will be flagged or unflagged, depending on the action parameter. autocorr is a shorthand for antenna=’*&&&’ to flag all auto correlations in the data.
List
mode = 'list' #Flagging mode (list/manual/clip/shadow/quack/el
#evation/tfcrop/rflag/extend/unflag/summary)
inpfile = '' #Input ASCII file, list of
#files or Python list of strings with
#flag commands.
reason = 'any' #Select by REASON types
A list of flag commands can be provided through a file or a list of files, specified by the inpfile parameter. Each input line may contain a flagging mode with data selection parameters as well as parameters that are specific to that mode. All parameters that are not set will be reset to their default values (default mode is ‘manual’). Each line of this file or list of strings will be taken as a command to the flagdata task. This mode=’list’ is similar to the task flagcmd with the inpmode=’list’ option.
An example for such a file would be:
mode='shadow'
mode='clip' clipminmax=[0,5] correlation='ABS_ALL'
mode='quack' quackmode='end' quackinterval=1.0
antenna='ea01' timerange='00:00:00~01:00:00'
antenna='ea11' timerange='00:00:00~03:00:00' spw='0~4'
Alternatively, this can be issued in the task directly like:
#In CASA:
CASA<1>: flagdata(vis='vis',mode='list',
inpfile=["mode='shadow'",
"mode='clip' clipminmax=[0,5] correlation='ABS_ALL'",
"mode='quack' quackmode='end' quackinterval=1.0"'
"antenna='ea01' timerange='00:00:00~01:00:00'",
"antenna='ea11' timerange='00:00:00~03:00:00' spw='0~4'"])
or via a variable
#In CASA:
CASA<1>:cmds=["mode='shadow',
"mode='clip' clipminmax=[0,5] correlation='ABS_ALL'",
"mode='quack' quackmode='end' quackinterval=1.0",
"antenna='ea01' timerange='00:00:00~01:00:00'",
"antenna='ea11' timerange='00:00:00~03:00:00' spw='0~4'"]
CASA<2>: flagdata(vis='vis',mode='list', inpfile=cmds)
The syntax needs to be written with quotes e.g. mode=’manual’ antenna=’ea10’. There should be no space between key=value. Spaces are used to separate pairs of parameters, not commas.
Clip
mode = 'clip' #Flagging mode (list/manual/clip/shadow/quack/
#elevation/tfcrop/rflag/extend/unflag/summary)
...
datacolumn = 'DATA' #Data column on which to operate
#(data,corrected,model,residual)
clipminmax = [] #Range to use for clipping
clipoutside = True #Clip outside the range, or within it
channelavg = False #Average over channels (scalar average)
timeavg = False #Average over time ranges
timebin = '' #Bin width for time averaging.
clipzeros = False #Clip zero-value data
in addition to the regular selection parameters, mode=’clip’ also has an option to select between a number of scratch columns in datacolumn. This includes the usual DATA, CORRECTED, etc., and also clipping based on data weights WEIGHT, WEIGHT_SPECTRUM as well as other MS columns. clipminmax selects the range of values to be clipped – usually this is combined with clipoutside=True to clip everything but the values covered in clipminmax. The data can also be averaged over the selected spw channel ranges by setting channelavg=True, or time averages via timeavg=True and setting of timebin. clip will also flag ‘NaN’, ‘inf’, and ‘-inf’ values by default and can flag exact zero values (these are sometimes produced by the JVLA correlator) using the clipzeros parameter.
Note : For modes clip, tfcrop and rflag, channel-ranges can be excluded from flagging by selecting ranges such as spw=’0:05;1063’. This is a way to protect known spectral-lines from being flagged by the autoflag algorithms.
Shadow
mode = 'shadow' #Flagging mode (list/manual/clip/shadow/quack/
#elevation/tfcrop/rflag/extend/unflag/summary)
...
tolerance = 0.0 #Amount of shadow allowed (in meters)
addantenna = '' #File name or dictionary with additional antenna names,
#positions and diameters
This option flags shadowed antennas, i.e. when one antenna blocks part of the aperture of a second antenna that is behind the first one. Shadowing can be gradual and the criterion for a shadow flag is when a baseline is shorter than radius1 + radius2 − tolerance (where the radii of the antennae are taken from the MS antenna subtable); see the figure below. addantenna may be used to account for shadowing when antennas are not listed in the MS but are physically present. Please read the flagdata inline help for the syntax of this option.
This figure shows the geometry used to compute shadowed antennas.
Quack
mode = 'quack' #Flagging mode (list/manual/clip/shadow/quack/
#elevation/tfcrop/rflag/extend/unflag/summary)
...
quackinterval = 0.0 #Quack n seconds from scan beginning or end
quackmode = 'beg' #flag an interval at the beginning of scan
'endb' #flag an interval at the end of scan
'tail' #flag all but an interval at the beginning of
scan
'end' #flag all but an interval at end of scan
quackincrement = False #Flag incrementally in time?
quack is used to remove data at scan boundaries. quackinterval specifies the time in seconds to be flagged, and quackmode can be ‘beg’ to flag the quackinterval at the beginning of each selected scan, ‘endb’ at the end of scan. ‘tail’ flags all but the beginning of scan and ‘end’ all but the end of scan. The quackincrement is either True or False, depending if one wishes to flag the quackinterval from the first unflagged data in the scan, or from the scan boundaries independent of data being already flagged or not.
Because quackincrement=True needs to know the state of the previous flags in order to know which is the first unflagged data, it cannot be used in ‘list’ mode, unless it is the first command of the list.
Visual representation of quack mode when flagging a scan
with 1s duration. The following diagram shows what is flagged
for each quack mode when quackinterval is set to 0.25s.
The flagged part is represented by crosses (+++++++++)
scan with 1s duration
--------------------------------------------
beg
+++++++++++---------------------------------
endb
---------------------------------+++++++++++
tail
-----------+++++++++++++++++++++++++++++++++
end
+++++++++++++++++++++++++++++++++-----------
Elevation
mode = 'elevation' #Flagging mode (list/manual/clip/shadow/quack/
#elevation/tfcrop/rflag/extend/unflag/summary)
...
lowerlimit = 0.0 #Lower limiting elevation (in degrees)
upperlimit = 90.0 #Upper limiting elevation (in degrees)
Flagging based on the elevation of the antennae. This may be useful to avoid data taken at very low elevations or close to transit and the lowerlimit and upperlimit parameters specify the range of good elevations.
Tfcrop
mode = 'tfcrop' #Flagging mode (list/manual/clip/shadow/quack/
#elevation/tfcrop/rflag/extend/unflag/summary
#)
...
ntime = 'scan' #Time-range to use for each chunk (in seconds
#or minutes)
combinescans = False #Accumulate data across scans.
datacolumn = 'DATA' #Data column on which to operate
#(data,corrected,model,residual)
timecutoff = 4.0 #Flagging thresholds in units of deviation
#from the fit
freqcutoff = 3.0 #Flagging thresholds in units of deviation
#from the fit
timefit = 'line' #Fitting function for the time direction
#(poly/line)
freqfit = 'poly' #Fitting function for the frequency direction
#(poly/line)
maxnpieces = 7 #Number of pieces in the polynomial-fits (for
#'freqfit' or 'timefit' = 'poly')
flagdimension = 'freqtime' #Dimensions along which to calculate fits
#(freq/time/freqtime/timefreq)
usewindowstats = 'none' #Calculate additional flags using sliding
#window statistics (none,sum,std,both)
halfwin = 1 #Half-width of sliding window to use with
#'usewindowstats' (1,2,3).
extendflags = True #Extend flags along time,
#frequency and correlation.
channelavg = False #Pre-average data across channels before
#analyzing visibilities for flagging.
chanbin = False #Bin width for channel average in
#number of input channels.
timeavg = False #Pre-average data across time before
#analyzing visibilities for flagging
timebin = False #Bin width for time average in seconds
TFCrop is an autoflag algorithm that detects outliers on the 2D time-frequency plane, and can operate on un-calibrated data (non bandpass-corrected). The original implementation of this algorithm is described in NCRA Technical Report 202 (Oct 2003).
The algorithm iterates through the data in chunks of time. For each chunk, the result of user-specified visibility-expressions are organized as 2D time-frequency planes, one for each baseline and correlation-expression result, and the following steps are performed.
As of CASA 4.6 the data can also be pre-averaged over the selected spw channel ranges by setting channelavg=True and chanbin to the desired bin (in number of channels), or time averaged over the selected time ranges by setting timeavg=True and timebin to the desired time range (in seconds). This averaging is independent from the tfcrop time/channel average, and allows to specify custom time/channel average bins, instead of averaging all data across time and/or channel direction.
Calculate a bandshape template : Average the data across time, to construct an average bandpass. Construct an estimate of a clean bandpass (without RFI) via a robust piece-wise polynomial fit to the average bandpass shape.
Note : A robust fit is computed in up to 5 iterations. It begins with a straight line fit across the full range, and gradually increases to ‘maxnpieces’ number of pieces with third-order polynomials in each piece. At each iteration, the stddev between the data and the fit is computed, values beyond N-stddev are flagged, and the fit and stddev are re-calculated with the remaining points. This stddev calculation is adaptive, and converges to a value that reflects only the data and no RFI. At each iteration, the same relative threshold is applied to detect flags, and this results in a varying set of flagging thresholds, that allows deep flagging only when the fit represents the true data best. Iterations stop when the stddev changes by less than 10%, or when 5 iterations are completed.
The resulting clean bandpass is a fit across the base of RFI spikes.
Divide out this clean bandpass function from all timesteps in the current chunk. Now, any data points that deviate from a mean of 1 can be considered RFI. This step helps to separate narrow-band RFI spikes from a smooth but varying bandpass, in situations where a simple range-based clipping will flag good sections of the bandpass.
Perform iterative flagging (robust flagging) of points deviating from a value of 1.
Flagging is done in up to 5 iterations. In each iteration, for every timestep, calculate the stddev of the bandpass-flattened data, flag all points further than N times stddev from the fit, and recalculate the stddev. At each iteration, the same relative threshold is applied to detect flags. Optionally, use sliding-window based statistics to calculate additional flags.
Repeat steps 1 and 3, but in the other direction (i.e. average the data across frequency, calculate a piece-wise polynomial fit to the average time-series, and find flags based on deviations w.r.to this fit.)
The default parameters of the tfcrop implementation are optimized for strong narrow-band RFI (see the example figure below). With broad-band RFI, the piece-wise polynomial can sometimes model it as part of the band-shape, and therefore not detect it as RFI. In this case, reducing the maximum number of pieces in the polynomial can help. This algorithm usually has trouble with noisy RFI that is also extended in time of frequency, and additional statistics-based flagging is recommended (via the ‘usewindowstats’ parameter). It is often required to set up parameters separately for each spectral-window.
If frequency ranges of known astronomical spectral lines are known a-priori , they can be protected from automatic flagging by de-selecting those frequency-ranges via the ‘spw’ data-selection parameter.
The extendflag parameter will clean up small portions of data between flagged data points along time and/or frequency when more than 50% of all timeranges or 80% of all channels are already flagged. It will also extend the flags to the other polarizations. Alternatively, mode=’extend’ can be used (see section below).
This screenshot represents a run where ‘tfcrop’ was run on a spw=’9’ with mainly narrow-band RFI. RIGHT : An example of protecting a spectral line (in this case, demonstrated on an RFI spike) by setting the spw-selection to spw=’0:0 45;53 63’. In both figures, the top row indicates the data before flagging, and the bottom row after flagging.
Rflag
mode = 'rflag' #Flagging mode (list/manual/clip/shadow/quack/
#elevation/tfcrop/rflag/extend/unflag/summary
#)
...
ntime = 'scan' #Time-range to use for each chunk (in seconds
#or minutes)
combinescans = False #Accumulate data across scans.
datacolumn = 'DATA' #Data column on which to operate
#(data,corrected,model,residual)
winsize = 3 #Number of timesteps in the sliding time
#window [aips:fparm(1)]
timedev = '' #Time-series noise estimate [aips:noise]
freqdev = '' #Spectral noise estimate [aips:scutoff]
timedevscale = 5.0 #Threshold scaling for timedev [aips:fparm(9)]
freqdevscale = 5.0 #Threshold scaling for freqdev
#[aips:fparm(10)]
spectralmax = 1000000.0 #Flag whole spectrum if freqdev is greater
#than spectralmax [aips:fparm(6)]
spectralmin = 0.0 #Flag whole spectrum if freqdev is less than
#spectralmin [aips:fparm(5)]
extendflags = True #Extend flags along time, frequency and correlation.
channelavg = False #Pre-average data across channels before
#analyzing visibilities for flagging.
chanbin = False #Bin width for channel average in
#number of input channels.
timeavg = False #Pre-average data across time before
#analyzing visibilities for flagging
timebin = False #Bin width for time average in seconds
RFlag is an autoflag algorithm based on a sliding window statistical filter. The RFlag algorithm was originally developed by Eric Greisen in AIPS (31DEC11). AIPS documentation : Subsection E.5 of the AIPS cookbook (Appendix E : Special Considerations for JVLA data calibration and imaging in AIPS http://www.aips.nrao.edu/cook.html#CEE)
In RFlag, the data is iterated-through in chunks of time, statistics are accumulated across time-chunks, thresholds are calculated at the end, and applied during a second pass through the dataset.
The CASA implementation also optionally allows a single-pass operation where statistics and thresholds are computed and also used for flagging, within each time-chunk (defined by ‘ntime’ and ‘combinescans’).
For each chunk, calculate local statistics, and apply flags based on user supplied (or auto-calculated) thresholds.
As of CASA 4.6 the data can also be pre-averaged over the selected spw channel ranges by setting channelavg=True and chanbin to the desired bin (in number of channels), or time averaged over the selected time ranges by setting timeavg=True and timebin to the desired time range (in seconds). This averaging is independent from the rflag time/channel sliding window, as it performs not only an average but also a binning operation (so there is no data overlap between adjacent bins), and allows to specify independent time/channel bins.
Time analysis (for each channel)
Calculate local rms of real and imag visibilities, within a sliding time window
Calculate the median rms across time windows, deviations of local rms from this median, and the median deviation
Flag if local rms is larger than timedevscale x (medianRMS + medianDev)
Spectral analysis (for each time)
Calculate avg of real and imag visibilities and their rms across channels
Calculate the deviation of each channel from this avg, and the median-deviation
Flag if deviation is larger than freqdevscale x medianDev
The extendflags parameter will clean up small portions of data between flagged data points along time and/or frequency when more than 50% of all timeranges or 80% of all channels are already flagged. It will also extend the flags to the other polarizations. Alternatively, mode=’extend’ can be used.
Some examples (see figure below):
Calculate thresholds automatically per scan, and use them to find flags. Specify scale-factor for time-analysis thresholds, use default for frequency.
#In CASA:
CASA<1>: flagdata('my.ms', mode='rflag',spw='9',timedevscale=4.0)
Supply noise-estimates to be used with default scale-factors.
#In CASA:
CASA<1>: flagdata(vis='my.ms', mode='rflag', spw='9', timedev=0.1, freqdev=0.5);
Two-passes. This replicates the usage pattern in AIPS.
The first pass saves commands in an output text files, with auto-calculated thresholds. Thresholds are returned from rflag only when action=’calculate’ (calc-only mode). The user can edit this file before doing the second pass, but the python-dictionary structure must be preserved.
The second pass applies these commands (action=’apply’).
#In CASA:
CASA<1>: flagdata(vis='my.ms', mode='rflag', spw='9,10', timedev='tdevfile.txt', freqdev='fdevfile.txt', action='calculate');
CASA<2>: flagdata(vis='my.ms', mode='rflag', spw='9,10', timedev='tdevfile.txt', freqdev='fdevfile.txt', action='apply');
Example of rflag on narrow-band RFI
Extend
mode = 'extend' #Flagging mode (list/manual/clip/shadow/quack/el
#evation/tfcrop/rflag/extend/unflag/summary)
field = '' #Field names or field index numbers: '' ==> all,
#field='0~2,3C286'
spw = '' #Spectral-window/frequency/channel: '' ==> all,
#spw='0:17~19'
antenna = '' #Antenna/baselines: '' ==> all, antenna
#='3,VA04'
timerange = '' #Time range: '' ==>
#all,timerange='09:14:0~09:54:0'
correlation = '' #Correlation: '' ==> all, correlation='XX,YY'
scan = '' #Scan numbers: '' ==> all
intent = '' #Observation intent: '' ==> all,
#intent='CAL*POINT*'
array = '' #(Sub)array numbers: '' ==> all
uvrange = '' #UV range: '' ==> all; uvrange ='0~100klambda',
#default units=meters
observation = '' #Observation ID: '' ==> all
feed = '' #Multi-feed numbers: Not yet implemented
ntime = 'scan' #Time-range to use for each chunk (in seconds or
#minutes)
combinescans = False #Accumulate data across scans.
extendpols = True #If any correlation is flagged, flag all
#correlations
growtime = 50.0 #Flag all 'ntime' integrations if more than X%
#of the timerange is flagged (0-100)
growfreq = 50.0 #Flag all selected channels if more than X% of
#the frequency range is flagged(0-100)
growaround = False #Flag data based on surrounding flags
flagneartime = False #Flag one timestep before and after a flagged
#one (True/False)
flagnearfreq = False #Flag one channel before and after a flagged one
#(True/False)
Although the modes tfcrop and rflag already have extendflags parameters, some autoflagging algorithms may still leave small islands of unflagged data behind, data that are surrounded by flagged visibilities in the time-frequency space. Although the algorithm may deem these visibilities as good ones, they are frequently affected by low-level RFI that spills from the adjacent, flagged points and one may wish to clean those up.
ntime specifies the time ranges over which to clean up, e.g. ‘1.5min’ or ‘scan’ which checks on all data within a scan. To span time ranges larger than scans, one can set combinescans to True.
extendpols=True would extend all flags to all polarization products when at least one of them is flagged.
growtime flags the entire time range for a flagged channel, when a certain fraction of flagged time intervals is exceeded.
growfreq is similar but extends the flags in frequency when a given fraction of channels is already flagged.
growaround checks for flagged data points in the time-frequency domain that neighbor a datum. The threshold is four data points. If more surrounding points are flagged, the central datum will be flagged, too.
flagneartime flags adjacent data points along the time axis, around a flagged datum
flagnearfreq flags neighboring channels.
This screenshot represents a run where ‘tfcrop’ was run only on ‘ABS_RR’ (top row) and followed by an extension along time and correlations (bottom row).
Unflag
mode = 'unflag' #Flagging mode (list/manual/clip/shadow/quack/
#elevation/tfcrop/rflag/extend/unflag/summary
#)
field = '' #Field names or field index numbers: ''==>all,
#field='0~2,3C286'
spw = '' #spectral-window/frequency/channel
antenna = 'ea01' #antenna/baselines: ''==>all, antenna
#='3,VA04'
timerange = '' #time range:
#''==>all,timerange='09:14:0~09:54:0'
correlation = '' #Select data based on correlation
scan = '' #scan numbers: ''==>all
intent = '' #Select data based on observation intent:
#''==>all
feed = '' #multi-feed numbers: Not yet implemented
array = '' #(sub)array numbers: ''==>all
uvrange = '' #uv range: ''==>all; uvrange ='0~100klambda',
#default units=meters
observation = '' #Select data based on observation ID: ''==>all
The selection data will be unflagged.
Summary
mode = 'summary' #Flagging mode (list/manual/clip/shadow/quack/
#elevation/tfcrop/rflag/extend/unflag/summary
#)
...
minrel = 0.0 #minimum number of flags (relative)
maxrel = 1.0 #maximum number of flags (relative)
minabs = 0 #minimum number of flags (absolute)
maxabs = -1 #maximum number of flags (absolute). Use a
#negative value to indicate infinity.
spwchan = False #Print summary of channels per spw
spwcorr = False #Print summary of correlation per spw
basecnt = False #Print summary counts per baseline
This mode reports the number of rows and data points that are flagged. The selection of reported points can be restricted (see inline help for details).
mode=’summary’ can also report back a dictionary if the task is run as
#In CASA:
CASA<1>: s = flagdata(..., mode='summary')
with a variable assigned, here ‘s’.
For advanced information about some parts of flagdata, please visit this historic memo on Flaggin in CASA:
http://www.aoc.nrao.edu/~rurvashi/FlaggerDocs/FlaggerDocs.html
Flag using flagcmd¶
Flag the visibility data set or calibration table based on a specified list of flagging commands
Info: We recommend using task flagdata as the preferable and safer way for flagging based on the visitibilites inspection and for many other capabilities. The option to import XML files with online flag in flagcmd has largely become obsolete with the deprecation of importevla, because the recommended importasdm task cannot copy the actual XML tables from the original SDM to the newly created MS (it can only apply the online flags directly, or write them into ascii tables).
The task flagcmd will flag the visibility data set or calibration table based on a specified set of flagging commands using a flagging syntax. These commands can be input from the FLAG_CMD table within the MS, from an ascii file, or from input python strings. Input can also be from an XML table within a VLA SDM, but given that importasdm does not copy XML files (and importevla is deprecated), the Flag.xml, Antenna.xml and SpectralWindow.xml tables must first be copied manually into the top-level MS directory for use by flagcmd (not the recommended approach). Facilities for manipulation, listing, or plotting of these flags are also provided.
When doing any flagging with flagcmd it is wise to use the flagbackup=True parameter to save the current flags into a .flagversions file. See flagmanager for more details about this.
Alert: The FLAG_CMD sub-table is meant only for meta-data selections such as online flags. Using it to save other parameters (from modes such as clip, quack, shadow, etc) is possible but carries a risk that in future releases these parameters maybe renamed or changed their default values. There will be no automatic way to rename any parameter that changes in the future.
Alert: There is no way to guarantee that a command from the COMMAND column has been applied or not to the MS, even if the APPLIED column is set to True. If you use other ways to flag such as interactive flagging in plotms, the FLAG_CMD will not be updated! Use at your own risk!
The inputs to flagcmd are:
#flagcmd :: Flagging task based on batches of flag-commands
vis = '' #Name of MS file or calibration table to flag
inpmode = 'table' #Input mode for flag commands(table/list/xml)
inpfile = '' #Source of flag commands
[tablerows = [] #Rows of inpfile to read]
reason = 'any' #Select by REASON types
useapplied = False #Select commands whose rows
#have APPLIED column set to True
action = 'apply' #Action to perform in MS and/or in inpfile
#(apply/unapply/list/plot/clear/extract)
flagbackup = True #Automatically backup the
#FLAG column before execution
savepars = False #Save flag commands to the MS or to a file
The default input mode is inpmode=’table’ which directs the task to input flag commands from the FLAG_CMD internal MS table. Other options include list and xml, explained below.
The default operation mode is action=’apply’ directing the task to apply relevant flagging commands to the MS data main table. Other options include ‘unapply’, ‘list’, ‘plot’, ‘clear’, and ‘extract’, explained below.
See the Flagging Command Syntax section below for more detail.
Alert: It is possible to flag calibration tables using flagcmd, although we recommend using the flagdata task for this in most cases.
When using flagcmd to flag calibration tables, only the apply and list actions are supported. Because calibration tables do not have a FLAG_CMD sub-table, the default inpmode=’table’ can only be used if an MS is given in the inpfile parameter so that flags from the MS are applied to the calibration table directly. Otherwise, the flag commands must be given using inpmode=’list’, either from a file or from a list of strings.
Input modes inpmode:
The inpmode parameter selects options for the input mode for the flagging commands.
Available inpmode options are:
‘table’ — input from MS table
‘list’ — input from ASCII file or from a list of strings
‘xml’ — input from XML table (largely obsolete with the deprecation of importevla in CASA 5.4)
The default input mode is inpmode=’table’ which directs the task to input flag commands from a FLAG_CMD MS table. This has the sub-parameters:
inpmode = 'table' #Input mode for flag commands(table/list/xml)
inpfile = '' #Source of flag commands
[tablerows = [] #Rows of inpfile to read]
reason = 'any' #Select by REASON types
useapplied = False #Select commands whose rows
#have APPLIED column set to
#True
If inpfile = ‘’ then it will look for the FLAG_CMD table in the MS given by vis. You can use this sub-parameter to tell the task to read the FLAG_CMD table from another MS and apply then to the MS given in vis.
The tablerows sub-parameter is a simple Python list of the row numbers of the table to consider in processing flags. The default is all rows. Currently it only takes fully-enumerated integer lists. Use the Python range function to generate ranges,
tablerows = range(0,30) + range(50,55)
#Do not use strings such as '0~29,50~54'
The useapplied sub-parameter toggles whether only flag commands marked as not having been applied are considered (the default), or to allow (re)processing using all commands.
The reason sub-parameter selects the reason type to process. The default ‘any’ means all commands.
Info: what is within the string is literally matched, e.g. reason=’’ matches only blank reasons, and reason = ‘FOCUS_ERROR,SUBREFLECTOR_ERROR’ matches this compound reason string only. To flag by either of these reasons alone, run flagcmd twice, once with reason=’FOCUS_ERROR’, and then with the other reason.
One use case is to read the flag commands from the FLAG_CMD of an MS and apply them to a calibration table given in the parameter vis. Example:
flagcmd(vis='cal-X54.B1', inpmode='table',inpfile='uid___A002_X2a5c2f_X54.ms', action='apply')
Input mode inpmode=’list’
See flagdata help for syntax.
This mode allows one to give a list of strings with flagging commands, the name of a file or a list of filenames that contains these commands equivalent to the mode=’list’ in flagdata. E.g. a file * flags.txt* that contains
scan='1~3' mode='manual'
mode='clip' clipminmax=[0,2] correlation='ABS_XX' clipoutside=False
spw='9' mode='tfcrop' correlation='ABS_YY' ntime=51.0
mode='extend' extendpols=True
can be used via
flagcmd(vis='some.ms',inpmode='list',inpfile='flags.txt')
A list of input files can also be given:
flagcmd(vis='some.ms',inpmode='list',inpfile=['flags.txt,'userflags.txt'])
Alternatively, the individual flagging commands can be directly provided in the call itself such as:
inpfile=["scan='1~3' mode='manual'",
"mode='clip' clipminmax=[0,2] correlation='ABS_XX' clipoutside=False",
"spw='9' mode='tfcrop' correlation='ABS_YY' ntime=51.0",
"mode='extend' extendpols=True"]
Input mode inpmode=’xml’
Alert: With the deprecation of importevla, XML files can no longer be imported directly from the original SDM into the newly created MS, but only by manually copying the Flag.xml, Antenna.xml and SpectralWindow.xml tables into the top-level MS directory (not the recommended approach). Also, the XML mode is not available for cal tables, therefore it will not work for ALMA MSs. However, task importasdm with process_flags=True will copy the flags from the XML files directly to the FLAG_CMD sub-table, see importasdm help for options. This is the recommend way of dealing with the online flags.
The input mode inpmode=’xml’ tells the task to input flag commands from XML SDM files, which contain the online flags. When set this opens the sub-parameters:
inpmode = 'xml' #Input mode for flag commands(table/list/xml)
tbuff = 0.0 #Time buffer (sec) to pad flags
ants = '' #Allowed flag antenna names to select by
reason = 'any' #Select by REASON types
This mode will look for files called Flag.xml, Antenna.xml and optionally SpectralWindow.xml inside the MS directory specified under vis.
Info: You can only apply the flags from an XML file. It is not possible to unapply them. For that, transfer the flags to the FLAG_CMD table using action=’list’, then unapply them.
The tbuff sub-parameter sets a padding buffer (in seconds) to the begin and end times of the online flags in the XML file. The online flag time buffer tbuff is specified in seconds, but in fact should be keyed to the intrinsic online integration time to allow for events (like slewing) that occur within an integration period. This is particularly true for JVLA data, where a tbuff value of 0.5× to 1.5× the integration time is needed. For example, if data were taken with 1-second integrations, then at least tbuff=0.5 should be used, likewise tbuff=5 for 10-second integrations.
Info: For JVLA data you should use 1.5× (e.g. tbuff=15 for 10-second integrations) for data taken in early 2011 or before due to a timing error. We do not yet know what ALMA data will need for padding (if any).
The ants sub-parameter selects the antennas from which online flags will be selected (default is all antennas). For example, ants=’ea01’ is a valid choice for JVLA data.
The reason sub-parameter selects by the REASON field in the Flag.xml file. The default ‘any’ means all commands. Note that reason=” would only select flags who have a blank REASON field entry.
Operation types action
The action selects options for operating on the selected flags and possibly the data.
Available action options are:
‘apply’ — apply flag commands to data
‘unapply’ — unapply flags in data
‘list’ — list and/or save flag commands
‘plot’ — plot flag commands
‘clear’ — clear rows from FLAG_CMD table
‘extract’ — extract internal flag dictionary
Apply flags action=’apply’
The default operation mode is action=’apply’ directing the task to apply relevant flagging commands to the vis data main table.
action = 'apply' #Action to perform in MS and/or in inpfile
#(apply/unapply/list/plot/clear/extract)
flagbackup = True #Automatically backup the
#FLAG column before execution
The flagbackup parameter toggles whether a new copy of the MS FLAG column is written to the .flagversions backup directory for that MS before the requested flagging operation.
Unapply flags action=’unapply’
The unapply option allows unflagging of data based on the selected flag commands. This choice opens the sub-parameters:
action = 'unapply' #Action to perform in MS and/or in inpfile
#(apply/unapply/list/plot/clear/extract)
flagbackup = True #Automatically backup the
#FLAG column before execution
As in action=’apply’, it is possible to make a backup to the *.flagversions file by using flagbackup=True.
List and save flags action=’list’
The ‘list’ option will give a listing of the flagging commands. This choice opens the sub-parameters:
action = 'list' #Action to perform in MS and/or in inpfile
#(apply/unapply/list/plot/clear/extract)
savepars = True #Save flag commands to the MS or to a file
outfile = '' #Name of output file to save commands
overwrite = True #Overwrite an existing file to save the flag commands
This action lists the commands on the screen without applying them. One can save the flagging script to an file specified in the outfile parameter when savepars=True. If outfile is empty, it will save the commands to a new row in the MS FLAG_CMD table given in vis.
The format of the listing output depends on the source of the flagging commands. A set of flagging commands specified through inpmode=’list’ will be listed directly. The flagging commands extracted through inpmode=’table’ will reflect the columns in the table: ‘Row’, ‘Timerange’, ‘Reason’, ‘Type’, ‘Applied’, ‘Lev’, ‘Sev’, ‘Command’ while commands from inpmode=’xml’ will be shown with the SDM XML table fields: ‘Key’, ‘FlagID’, ‘Antenna’, ‘Reason’, ‘Timerange’
Plot flags action=’plot’
The ‘plot’ option will produce a graphical plot of flags of time versus antenna. This choice opens the sub-parameters:
action = 'plot' #Action to perform in MS and/or in inpfile
#(apply/unapply/list/plot/clear/extract)
plotfile = '' #Name of output file to save plot
This is only useful for MS. flagcmd is most often used to plot the VLA or ALMA flags generated online using impmode=’table’ or ‘xml’ and provided in a FLAG_CMD or Flags.xml table. Using these tables, only the standard on-line REASONs are recognised. These include ‘FOCUS’,’SUBREFLECTOR’, ‘OFF SOURCE’, ‘NOT IN SUBARRAY’ for the VLA and ‘*Calibration_device_(ACD)_is_not_in_the_correct_position’,**’Mount_is_off_source’, ‘FrontEnd_LO_Amps_not_optimized’ ‘Power_levels_are_being_optimized.’, ‘The_WCA_is_not_locked.’* for ALMA
If the plotfile sub-parameter is non-blank, then a plotfile will be made with that name instead of appearing in a matplotlib plotter window on the users workstation. There are additional parameters that control the shape of the output file, such as dimensions, and resolution.
Alert: The plotted enumerations are currently only those known to be allowed for JVLA online flags as of 15 April 2011, and include:
‘FOCUS’, ‘SUBREFLECTOR’, ‘OFF SOURCE’, ‘NOT IN SUBARRAY’ with all others being plotted as ‘Other’.
Clear flags action=’clear’
This is only useful for MS, using inpmode=”table”. The ‘clear’ action will delete selected rows from the FLAG_CMD MS table. This choice opens the sub-parameters:
action = 'clear' #Action to perform in MS and/or in inpfile
#(apply/unapply/list/plot/clear/extract)
clearall = False #Delete all rows from FLAG_CMD
[ rowlist = [] #FLAG_CMD rows to clearThis box is intended for CASA Inputs. Insert your text here.]
The rowlist sub-parameter is a simple Python list of the row numbers of the table to consider in processing flags. The default is a blank list which indicates the desire to clear all rows. Currently it only takes fully-enumerated integer lists. Use the Python range function to generate ranges,
tablerows = range(0,30) + range(50,55)
#Do not use strings such as '0~29,50~54'
In either case, if clearall=False then nothing will happen by default as a safeguard. If clearall=True, then a blank list will direct the deletion of the selected rows from the table.
Alert: Use this option with care. You can easily mess up the FLAG_CMD table.
Extract Flag Commands action=’extract’
This will return the requested flags (depending on inpmode) as a Python dictionary.
action = 'extract' #Action to perform in MS and/or in inpfile
#(apply/unapply/list/plot/clear/extract)
The dictionary can be saved to a variable such as shown below. If a variable is not set, only the first 1000 flags will be printed to the terminal:
myflagd = flagcmd(vis=msfile,useapplied=True,action='extract')
An example of the dictionary returned by action=’extract’ is given below:
{0: {'antenna': 'PM04&&*',
'applied': False,
'command': 'antenna=PM04&&* timerange=2013/04/28/04:35:58.217~2013/04/28/04:35:58.468 ',
'id': '662',
'interval': 0.0,
'level': 0,
'mode': '',
'reason': 'ACD_motors_are_not_in_position.',
'severity': 0,
'time': 0.0,
'timerange': '2013/04/28/04:35:58.217~2013/04/28/04:35:58.468',
'type': ''},
1: {'antenna': 'CM03&&*',
'applied': False,
'command': 'antenna=CM03&&* timerange=2013/04/28/04:35:58.196~2013/04/28/04:35:58.503 ',
'id': '663',
'interval': 0.0,
'level': 0,
'mode': '',
'reason': 'ACD_motors_are_not_in_position.',
'severity': 0,
'time': 0.0,
'timerange': '2013/04/28/04:35:58.196~2013/04/28/04:35:58.503',
'type': ''}}
Flagging command syntax
A flagging command syntax has been devised to populate the COMMAND column of the FLAG_CMD table and to direct the operation of the flagcmd task.
The syntax is the same used in flagdata, so please check “help flagdata” for more updated info.
Commands are a string (which may contain internal “strings”) consisting of KEY=VALUE pairs separated by whitespace (see examples below).
Alert: There should be no whitespace between KEY=VALUE or within each KEY or VALUE, since the simple parser first breaks command lines on whitespace, then on “=”.
The key is the name of a parameter and the value is the value of that parameter. The parameter data types enforced in flagdata are the same used in these flag commands. As an example, the parameter clipminmax accepts only a list of double values in flagdata and should have the same type when given in a flag command list, e.g. mode=’clip’ clipminmax=[0.1,10.2]
Each key should only appear once on a given command line/string
There is an implicit “mode” for each command, with the default being ‘manual’ if not given.
Comment lines can start with ‘#’ and will be ignored.
Data selection parameters (used by all flagging modes)
antenna=''
spw=''
correlation=''
field=''
scan=''
feed=''
array=''
timerange=''
uvrange=''
intent=''
observation=''
Info: a command consisting only of meta-data selection key-value pairs is a basic “manual” operation, i.e. flag the data meeting the selection
Modes with default values for relevant parameters (for further details, refer to the task flagdata)
Mode manual
autocorr=False
Mode clip
datacolumn='DATA' clipminmax=[] clipoutside=True channelavg=False clipzeros=False timeavg=False timebin=''
Mode shadow
tolerance=0.0 addantenna=''
Mode quack
quackinterval=0.0 quackmode='beg' quackincrement=False
Mode elevation
lowerlimit=0.0 upperlimit=90.0
Mode tfcrop
ntime='scan' combinescans=False datacolumn='DATA' timecutoff=4.0 freqcutoff=3.0 timefit='line' freqfit='poly' maxnpieces=7 flagdimension='freqtime' usewindowstats='none' halfwin=1 extendflags=True channelavg=False chanbin=1 timeavg=False timebin='0s'
Mode extend
ntime='scan' combinescans=False extendpols=True growtime=50.0 growfreq=50.0 growaround=False flagneartime=False flagnearfreq=False
Mode rflag
ntime='scan' combinescans=False datacolumn='DATA' winsize=3 timedev='' freqdev='' timedevscale=5.0 freqdevscale=5.0 spectralmax=1000000.0 spectralmin=0.0 extendflags=True channelavg=False chanbin=1 timeavg=False timebin='0s'
Mode antint
minchanfrac=0.6 verbose=False
Mode unflag
This mode does not have any sub-parameters.
Typical example of a list with flag commands
spw='0,1' antenna='ea1,ea10' autocorr=True mode='clip' clipzeros=True datacolumn='DATA' mode='summary' name='Initial_flags'
Basic elaboration options for online and interface use
id='' #flag ID tag (not necessary)
reason='' #reason string for flag
flagtime='' #a timestamp for when this flag was generated (for
user history use)
Info: there is no flagtime column in FLAG_CMD at this time, but we will propose to add this as an optional column. These are currently ignored and not used
Extended elaboration options for online and interface use Note: these are FLAG_CMD columns, but their use is not clear but included here for compatibility and future expansion
level=N #flagging "level" for flags with same reason
severity=N #Severity code for the flag, on a scale of 0-10 in order
of increasing severity; user specific
Manage flag versions¶
Managing flag versions
The flagmanager task will allow you to manage different versions of flags in your data. These are stored inside a CASA flag versions table, under the name of the MS <msname>.flagversions. For example, for the MS jupiter6cm.usecase.ms, there will need to be jupiter6cm.usecase.ms.flagversions on disk. This is created on import (by importasdm, importvla or importuvfits) or when flagging is first done on an MS without a .flagversions.
By default, when the .flagversions is created, this directory will contain a flags.Original table containing a copy of the original flags in the MAIN table of the MS so that you have a backup. It will also contain a file called FLAG_VERSION_LIST that has the information on the various flag versions there. The flag versions are cumulative, i.e. a specific version number contains all the flags from the lower version numbers, too.
The flags are stored in an MS column of the same spectral and correlator polarization shape as the visibility data values, with Boolean 1 to indicate that a particular time, channel, polarization has been flagged or 0 for unflagged.
The inputs for flagmanager are:
vis = '' #Name of input visibility file (MS)
mode = 'list' #Flag management operation (list,save,restore,delete)
The mode=’list’ option will list the available flag versions from the <msname>.flagversions file in the logger. For example:
CASA <102>: default('flagmanager')
CASA <103>: vis = 'jupiter6cm.usecase.ms'
CASA <104>: mode = 'list'
CASA <105>: flagmanager()
MS : /home/imager-b/smyers/Oct07/jupiter6cm.usecase.ms
main : working copy in main table
Original : Original flags at import into CASA
flagautocorr : flagged autocorr
flagdata_1 : Flags autosave on 2018-07-16 08:57:20]
mode=’list’ will also return a Python dictionary with the available flag versions. For example:
myflags = flagmanager('jupiter6cm.usecase.ms', mode='list')
myflags
{0: {'comment': 'Original flags at import into CASA', 'name': 'Original'},1: {'comment': 'flagged autocorr', 'name': 'flagautocorr'},2: {'comment': 'Flags autosave on 2018-07-16 08:57:20', 'name': 'flagdata_1'},'MS': '/home/imager-b/smyers/Oct07/jupiter6cm.usecase.ms'}}
The mode parameter expands the options. For example, if you wish to save the current flagging state of vis=<msname>:
mode = 'save' #Flag management operation (list, save, restore, delete)
versionname = '' #Name of flag version (no spaces)
comment = '' #Short description of flag version
merge = 'replace' #Merge option (replace, and, or)
with the output version name specified by versionname. For example, the above xyflags version was written using:
default('flagmanager')
vis = 'jupiter6cm.usecase.ms'
mode = 'save'
versionname = 'xyflags'
comment = 'Plotxy flags'
flagmanager()
and you can see that there is now a sub-table in the flag versions directory:
CASA <106>: ls jupiter6cm.usecase.ms.flagversions/
IPython system call: ls -F jupiter6cm.usecase.ms.flagversions/
flags.flagautocorr flags.Original flags.xyflags FLAG_VERSION_LIST
It is recommended that you use this task regularly to save versions during flagging.
Note that if a flag version already exists under a name, the task will give a warning and add a suffix ‘.old.timestamp’ to the previous version.
You can restore a previously saved set of flags using the mode=’restore’ option:
mode = 'restore' #Flag management operation (list,save,restore,delete)
versionname = '' #Name of flag version (no spaces)
merge = 'replace' #Merge option (replace, and, or)
The merge sub-parameter will control how the flags are restored. For merge=’replace’, the flags in versionname will replace those in the MAIN table of the MS. For merge=’and’, only data that is flagged in BOTH the current MAIN table and in versionname will be flagged. For merge=’or’, data flagged in EITHER the MAIN or in versionname will be flagged.
The mode=’delete’ option can be used to remove versionname from the flag versions:
mode = 'delete' #Flag management operation (list,save,restore,delete)
versionname = '' #Name of flag version (no spaces)
2-D Plot/Flag: msview¶
2-dimensional visualization and editing of visibilities with the msview GUI.
Viewing MeasurementSets
Visibility data can also be displayed and flagged directly from the task msview. Task msview allows the user to select data before it is loaded into the GUI and displayed. A screenshot is shown in Data Selection in msview. Selection parameters are field, spectral window, time range, uv range, antenna, corr, scan, array, and ms selection expression in the usual CASA selection syntax (see Data Selection).]
For MeasurementSet files the only option for display is ‘Raster’ (similar to AIPS task TVFLG). An example of MS display is shown in Data Display Options; loading of an MS is shown in Loading a MeasurementSet.
Warning: Only one MS should be registered at a time on a Display Panel. Only one MS can be shown in any case. You do not have to close other images/MSs, but you should at least ‘unregister’ them from the Display Panel used for viewing the MS. If you wish to see other images or MSs at the same time, create multiple Display Panel windows.
Data Display Options Panel for MeasurementSets
The Data Display Options panel provides adjustments for MSs similar to those for images, and also includes flagging options. As with images, this window appears when you choose the Data\:Adjust menu or use the wrench icon from the Main Toolbar. It is also shown by default when an MS is loaded. The right panel of Data Display Options shows a Data Options window. It has a tab for each open MS, containing a set of categories. The options within each category can be either ‘rolled up’ or expanded by clicking the category label.
For a MeasurementSet, the categories are:
Advanced
MS and Visibility Selection
Display Axes
Flagging Options
Basic Settings
Axis Drawing and Labels
Color Wedge
MS Options — Basic Settings
The Basic Settings roll-up is expanded by default. It contains entries similar to those for a raster image, Data Display Options - Basic Settings). Together with the brightness/contrast and colormap adjustment icons on the Mouse Toolbar of the Display Panel, they are especially important for adjusting the color display of your MS.
The available Basic options are:
Data minimum/maximum
This has the same usage as for raster images. Lowering the data maximum will help brighten weaker data values.
Scaling power cycles
This has exactly the same usage as for raster images (see Data Display Options - Basic Settings). Again, lowering this value often helps make weaker data visible. If you want to view several fields with very different amplitudes simultaneously, this is typically one of the best adjustments to make early, together with the Colormap fiddling mouse tool, which is on the middle mouse button by default.
Colormap
Greyscale or Hot Metal colormaps are generally good choices for MS data.
MS Options— MS and Visibility Selections
Visibility Type
Visibility Component
Moving Average Size
This roll-up provides choice boxes for Visibility Type (Observed, Corrected, Model, Residual) and Component (Amplitude, Phase, Real, or Imaginary).
Changes to Visibility Type or Component (changing from Phase to Amplitude, for example) require the data to be retrieved again from the disk into memory, which can be a lengthy process. When a large MS is first selected for viewing, the user must trigger this retrieval manually by pressing the Apply button (located below all the options), after selecting the data to be viewed (see Field IDs and Spectral Windows, below).
Tip: Changing visibility type between ‘Observed’ and ‘Corrected’ can also be used to assure that data and flags are reloaded from disk. You should do this if you’re using another flagging tool such as autoflag simultaneously, so that msview sees the other tool’s new edits and doesn’t overwrite them with obsolete flags. The Apply button alone won’t reload unless something within msview itself requires it; in the future, a button will be provided to reload flags from the disk unconditionally.
You can also choose to view the difference from a running mean or the local RMS deviation of either Phase or Amplitude. There is a slider for choosing the nominal number of time slots in the ‘local neighborhood’ for these displays.
(Note: Insufficient Data is shown in the tracking area during these displays when there is no other unflagged data in the local neighborhood to compare to the point in question. The moving time windows will not extend across changes in either field ID or scan number boundaries, so you may see this message if your scan numbers change with every time stamp. An option will be added later to ignore scan boundaries).
Field IDs
Spectral Windows
You can retrieve and edit a selected portion of the MS data by entering the desired Spectral Window and Field ID numbers into these boxes.
Important: Especially with large MSs, often the first thing you’ll want to do is to select spectral windows which all have the same number of channels and the same polarization setup. It also makes sense to edit only a few fields at a time. Doing this will also greatly reduce data retrieval times and memory requirements.
You can separate the ID numbers with spaces or commas; you do not need to enter enclosing brackets. Changes to either entry box will cause the selected MS data to be reloaded from disk.
If you select, say, spectral windows 7, 8, 23, and 24, the animator, slice position sliders, and axis labeling will show these as 0, 1, 2, and 3 (the ‘slice positions’ or ‘pixel coordinates’ of the chosen spectral windows). Looking at the position tracking display is the best way to avoid confusion in such cases. It will show something like: Sp Win 23 (s 2) when you are viewing spectral window 23 (plane 2 of the selected spectral windows).
Changes to MS selections will not be allowed until you have saved (or discarded) any previous edits you have made (see Flagging Options – Save Edits, below). A warning is printed on the console (not the logger).
Initially, all fields and spectral windows are selected. To revert to this ‘unselected’ state, choose ‘Original’ under the wrench icons next to the entry boxes.
See MeasurementSet Visibility Selections for an example showing the use of the MS and Visibility Selections controls when viewing an MS.
MS Options — Display Axes
This roll-up is very similar to that for images: it allows the user to choose which axes (from Time, Baseline, Polarization, Channel, and Spectral Window) are on the display and the animator. There are also sliders here for choosing positions on the remaining axes. (It’s useful to note that the data is actually stored internally in memory as an array with these five axes).
Within the Display Axes rollup you may also select whether to order the baseline axis by antenna1-antenna2 (the default) or by (unprojected) baseline length.
See MeasurementSet Display Axes–Changing the Axis of a MeasurementSet showing the use of the Display Axes controls to change the axes on the animation and sliders.
MS Options — Flagging Options
These options allow you to edit (flag or unflag) MS data. The Point Tool and Rectangle Region Mouse Tools are used on the display to select the area to edit. When using the Rectangle Region tool, double-click inside the selected rectangle to confirm the edit.
The options below determine how edits will be applied.
Show Flagged Regions…
You have the option to display flagged regions in the background color (as in TVFLG) or to highlight them with color. In the former case, flagged regions look just like regions of no data. With the (default) color option, flags are shown in shades of blue: darker blue for flags already saved to disk, lighter blue for new flags not yet saved; regions with no data will be shown in black.
Flag or Unflag
This setting determines whether selected regions will be flagged or unflagged. This does not affect previous edits; it only determines the effect which later edits will have. Both flagging and unflagging edits can be accumulated and then saved in one pass through the MS.
Flag/Unflag All…
These flagging extent checkboxes allow you to extend your edit over any of the five data axes. For example, to flag all the data in a given time range, you would check all the axes except Time, and then select the desired time range with the Rectangle Region mouse tool. Such edits will extend along the corresponding axes over the entire selected MS (whether loaded into memory or not) and optionally over unselected portions of the MS as well (Use Entire MS, below). Use care in selecting edit extents to assure that you’re editing all the data you wish to edit.
Flag/Unflag Entire Antenna?
This control can be used to extend subsequent edits to all baselines which include the desired antenna[s]. For example, if you set this item to ‘Yes’ and then click the point tool on a visibility position with baseline 3-19, the edit would extend over baselines 0-3, 1-3, 2-3, 3-3, 3-4, … 3-]nAntennas-1. Note that the second antenna of the selection (19) is irrelevant here – you can click anywhere within the ‘Antenna 3 block’, i.e., where the first antenna number is 3, to select all baselines which include antenna 3.
This item controls the edit extent only along the baseline axis. If you wish to flag all the data for a given antenna, you must still check the boxes to flag all Times, Channels, Polarizations and Spectral Windows. There would be no point, however, in activating both this item and the ‘Flag All Baselines’ checkbox. You can flag an antenna in a limited range of times, etc., by using the appropriate checkboxes and selecting a rectangular region of visibilities with the mouse.
Note: You do not need to include the entire ‘antenna block’ in your rectangle (and you may stray into the next antenna if you try). Anywhere within the block will work. To flag higher-numbered antennas, it often helps to zoom in.
Undo Last Edit
Undo All Edits
The ‘Undo’ buttons do the expected thing: completely undo the effect of the last edit (or all unsaved edits). Please note, however, that only unsaved edits can be undone here; there is no ability to revert to the flagging state at the start of the session once flags have been saved to disk (unless you have previously saved a ‘flag version’. The flag version tool is not available through msview directly).
Use Entire MS When Saving Edits?
“Yes” means that saving the edits will flag/unflag over the entire MS, including fields (and possibly spectral windows) which are not currently selected for viewing. Specifically, data within time range(s) you swept out with the mouse (even for unselected fields) will be edited.
In addition, if “Flag/Unflag All…” boxes were checked, such edits will extend throughout the MS. Note that only unselected times (fields) can be edited without checking extent boxes for the edits as well. Unselected spectral windows, e.g., will not be edited unless the edit also has “Flag/Unflag All Spectral Windows” checked.
Warning: Beware of checking “All Spectral Windows” unless you have also checked “All Channels” or turned “Entire MS” off; channel edits appropriate to the selected spectral windows may not be appropriate to unselected ones. Set “Use Entire MS” to “No” if your edits need to apply only to the portion of the MS you have selected for viewing. Edits can often be saved significantly faster this way as well. Also note that checkboxes apply to individual edits, and must be checked before making the edit with the mouse. “Use Entire MS”, on the other hand, applies to all the edits saved at one time, and must be set as desired before pressing “Save Edits”.
Save Edits
MS editing works like a text editor in that you see all of your edits immediately, but nothing is committed to disk until you press “Save Edits”. Feel free to experiment with all the other controls; nothing but ‘Save Edits’ will alter your MS on disk. As mentioned previously, however, there is no way to undo your edits once they are saved, except by manually entering the reverse edits (or restoring a previously-saved ‘flag version’).
Also, you must save (or discard) your edits before changing the MS selections. If edits are pending, the selection change will not be allowed, and a warning will appear on the console.
If you close the MS in msview, unsaved edits are simply discarded, without prior warning. It’s important, therefore, to remember to save them yourself. You can distinguish unsaved flags (when using the ‘Flags In Color’ option), because they are in a lighter shade of blue.
The program must make a pass through the MS on disk to save the edits. This can take a little time; progress is shown in the console window.
MS Options— Advanced
These settings can help optimize your memory usage, especially for large MSs. A rule of thumb is that they can be increased until response becomes sluggish, when they should be backed down again.
You can run the unix ‘top’ program and hit ‘M’ in it (to sort by memory usage) in order to examine the effects of these settings. Look at the amount of RSS (main memory) and SWAP used by the X server and ‘casaviewer’ processes. If that sounds familiar and easy, then fiddling with these settings is for you. Otherwise, the default settings should provide reasonable performance in most cases.
Cache size
The value of this option specifies the maximum number of different views of the data to save so that they can be redrawn quickly. If you run an animation or scroll around zoomed data, you will notice that the data displays noticeably faster the second time through because of this feature. Often, setting this value to the number of animation frames is ideal Note, however, that on multi-panel displays, each panel counts as one cached image.
Large images naturally take more room than small ones. The memory used for these images will show up in the X server process. If you need more Visibility Memory (below) for a really large ms, it is usually better to forgo caching a large number of views.
Max. Visibility Memory
This option specifies how many megabytes of memory may be used to store visibility data from the MeasurementSet internally. Even if you do not adjust this entry, it is useful to look at it to see how many megabytes are required to store your entire (selected) MS in memory. If the slider setting is above this, the whole selected MS will fit into the memory buffer. Otherwise, some data planes will be ‘grayed out’ (see MS Options - Apply Button), and the selected data will have to be viewed one buffer at a time, which is somewhat less convenient. In most cases, this means you should select fewer fields or spectral windows – see MS Options - MS and Visibility Selections. The ‘casaviewer’ process contains this buffer memory, which can take most of the space.
MS Options — Apply Button
When viewing large MSs the display may be partially or completely grey in areas where the required data is not currently in memory, either because no data has been loaded yet, or because not all the selected data will fit into the allowed memory (see Max. Visibility Memory above). When the cursor is over such an area, the following message shows in the position tracking area:
press 'Apply' on Adjust panel to load data
Pressing the Apply button (which lies below all the options) will reload the memory buffer so that it includes the slice you are trying to view.
The message No Data has a different meaning; in that case, there simply is no data in the selected MS at the indicated position.
For large MeasurementSets, loading visibility data into memory is the most time-consuming step. Progress feedback is provided in the console window. Again, careful selection of the data to be viewed can greatly speed up retrieval.
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/uv_manipulation.ipynb
UV Manipulation¶
How to modify visibility data in CASA
Modify UV-data: mstransform¶
mstransform is a multipurpose task that provides all the functionality of split, partition, cvel, hanningsmooth, uvcontsub and applycal with the possibility of applying each of these transformations separately or together in an in-memory pipeline, thus avoiding unnecessary I/O steps. The list of transformations that mstransform can apply is as follows:
Data selection and re-indexing
Data partitioning (create output Multi-MS)
On-the-fly calibration (via “Cal Library”)
Time averaging (weighted and baseline dependent)
UV continuum subtraction
Combination of spectral windows
Channel averaging (weighted)
Hanning smoothing
Spectral regridding and reference frame transformation
Separation of spectral windows
Notice that the order in the above list is not arbitrary. When various transformations are applied on the data using mstransform, the order in which the transformations are piped one after the other is the one shown in the above list. These operations are described in the sections that follow.
Besides mstransform in itself, there are a series of tasks which perform only the individual operations: split, hanningsmooth and cvel2. Under the hood these are all based on the mstransform framework; they are provided to facilitate back compatibility, and for simplicity in cases where only the simpler operations are needed. Notice that as of CASA 4.6, the mstransform based versions of split and hanningsmooth are the default ones, whereas cvel still is based on the old implementation by default, and the cvel2 interface points to the mstransform implementation.
Manipulate spectral windows¶
How to combine, separate, and Hanning smooth spectral windows in mstransform
Combination of spectral windows
Both mstransform and cvel2 are able to combine SPWs (in mstransform by specifying combinespws = True and in cvel2 by default). The algorithm is in general the same as the old cvel, however, there are two significant differences in the new framework:
mstransform is able to only combine SPWs, only regrid each SPW separately or to combine all SPWs and regrid them together. The details about independent regridding operations are explained in the following sections. For instance, if you wish to do channel averaging while regridding in mstransform, please see the section on channel averaging.
mstransform and cvel2 automatically detect combination of SPWs with different exposure, and use the WEIGHT column (or WEIGHT_SPECTRUM if available) in addition to the geometrical factors to calculate the combined visibility in the overlapping regions.
Alert: For the time being, mstransform is not able to combine SPWs with different numbers of channels. This is a limitation in the Visibility Iterator (VI/VB2) layer of CASA.
The output MS created by mstransform and cvel2 has a uniform binning in spectral axis. If the width parameter is not specified then the size of the input bin corresponding to the lowest absolute frequency will be used as the size of the output bin, where the lowest absolute frequency is retrieved from the full list of spectral bins for all the input SPWs (after selection). Remember that in some cases, especially after certain data selection, the lowest frequency might not be part of the first input SPW ID.
Note that when combining SPWs which have different polarization settings only the common polarizations to all SPWs will be part of the output MS.
Separation of spectral windows
A completely new feature in mstransform is the ability to separate an input SPW into several ones, or to combine various input SPWs into a single one with a uniform grid (resolving overlaps or gaps) to then separate it in several output SPWs. This option is activated under the regridding section (therefore by specifying regridms = True), together with the nspw parameter which when bigger than 1 implies that the input SPW or combination of input SPWs should be separated:
#In CASA
regridms = True #Regrid the MS to a new spw,
nspw = 1 #Number of output spws to create in output MS.
Internally, the framework will combine the selected spws before separating them so that channel gaps and overlaps are taken into account. This sub-parameter will create a regular grid of spws in the output MS. If nchan is set, it will refer to the number of output channels in each of the separated spws.
Hanning smoothing
Both mstransform and hanningsmooth are able to perform hanning smoothing (in mstransform by specifying hanning = True and in hanningsmooth by default). For details on hanning smoothing please refer to the section on Hanning Smoothing and the hanningsmooth **task documentation linked here. Please note that both **mstransform **and **hanningsmooth will create a new MS (i.e. do not have an option to do in-place hanning smoothing). This is the only difference with respect to the old hanningsmooth task that existed in previous versions of CASA.
Note that straightforward channel averaging can be done as described in the section on channel averaging.
Select/Reindex UV-data¶
mstransform / split are able to create a new MS with a specific data selection, for instance splitting a science target. The new MS contains only the selected data and also the subtables are re-generated to contain only the metadata matching the data selection. The details about pure split operation are described in the split task documentation.
Keywords relevant to data selection are as follows:
CASA <1>: inp mstransform
--------> inp(mstransform)
vis = '' #Name of input MeasurementSet or Multi-MS.
outputvis = '' #Name of output MeasurementSet or Multi-MS.
tileshape = [0] #List with 1 or 3 elements giving the
tile shape of the disk data columns.
field = '' #Select field using ID(s) or name(s).
spw = '' #Select spectral window/channels.
scan = '' #Select data by scan numbers.
antenna = '' #Select data based on antenna/baseline.
correlation = '' #Correlation: '' ==> all, correlation='XX,YY'.
timerange = '' #Select data by time range.
intent = '' #Select data by scan intent.
array = '' #Select (sub)array(s) by array ID number.
uvrange = '' #Select data by baseline length.
observation = '' #Select by observation ID(s).
feed = '' #Multi-feed numbers: Not yet implemented.
datacolumn = 'corrected' #Which data column(s) to process.
keepflags = True #Keep *completely flagged rows* or
drop them from the output.
usewtspectrum = False #Create a WEIGHT_SPECTRUM column in the output MS.
New features related to data selection and re-indexing provided by mstransform / split are the following:
Spectral weight initialization: mstransform can initialize the output WEIGHT and SIGMA_SPECTRUM columns by specifying usewtspectrum = True. The details about spectral weights initialization are described in section
Tile shape specification for the data columns: mstransform also allows to specify a custom default tile shape for the output data columns, namely a list of 3 elements specifying the number of correlations, channels and rows to be contained in each tile, for instance tileshape = [4,128,351] would specify a tile with shape (4 correlations)x(128 channels)x(351 rows). This can be used to optimize the access pattern of subsequent tasks, for instance imaging tasks.
Support for SPWs with different sets of correlation products: mstransform / split are both able to work when a given SPW is associated with several correlation products (like in some EVLA correlation setups). This is transparent for the user and simply works by using the spw data selection parameter normally. It also works in conjunction with the polarization parameter, so for instance if a given MS has separated RR and LL data associated with spw 0 the following data selection would work flawlessly: spw = ‘0’ correlation = ‘LL’
Support for multiple channel selection: Both mstransform / split are also capable of working with multiple channel selection. This support also goes transparently for the user, by simply following the SPW syntax as described in the Visibility Data Selection chapter. For example spw = ‘4,7:4~59,8:4~13;18~594’ which means “select all channels in spectral window four, channels 4 through 59 inclusive in spectral window seven; also select spectral window 8, channels 4 through 13 and 18 through 594, also inclusive.”. See more examples of spectral window selections in the split examples chapter.
On-the-fly calibration¶
As of CASA 4.5 mstransform incorporates the possibility of applying on the-the-fly (OTF) calibration by specifying docallib = True, which in turn allows to specify the “Cal Library” filename (callib parameter). This transformation is the first one applied to the data, producing effectively a corrected data column on-the-fly, which can be further transformed. callib is the filename pointing to the calibration specification file. Details on how to specify the Cal Library file can be found on this CASA Docs page, where conventions and current limitations are also described. The combination of OTF calibration and cal libraries enable complex calibrations to be applied, and for the calculation to proceed more quickly than they otherwise might.
docallib = True #Enable OTF calibration
callib = '' #Cal Library filename
An example of a Cal Library file is given below.
caltable='ngc5921_regression/ngc5921.bcal' calwt=True tinterp='nearest'
caltable='ngc5921_regression/ngc5921.fluxscale' calwt=True tinterp='nearest' fldmap='nearest'
caltable='ngc5921_regression/ngc5921.gcal' calwt=True field='0' tinterp='nearest' fldmap=[0]
caltable='ngc5921_regression/ngc5921.gcal' calwt=True field='1,2' tinterp='linear' fldmap='1'
See the full description of a Cal Library here.
Time average¶
mstransform and split are both able to perform a weighted time average of data. In mstransform time averaging is accessed by setting timeaverage = True and controlled by the resulting sub-parameters of timeaverage. Additionally, mstransform is able to perform a baseline dependent time average as described in the paper Effects of Baseline Dependent Time Averaging of UV Data by W.D. Cotton [1] .
#mstransform time average parameters
timeaverage = True # Average data in time.
timebin = '0s' # Bin width for time averaging.
timespan = '' # Span the timebin across scan, state or both.
maxuvwdistance = 0.0 # Maximum separation of start-to-end baselines that
can be included in an average(meters)
In split time averaging is available via top level parameters timebin and combine. The functionality is very similar to that in the old version of split. However, there are some differences as explained below.
# split time average parameters
timebin = '0s' # Bin width for time averaging
combine = '' # Span the timebin across scan, state or both
Warning: the timebin parameter works based on wall clock time, not integration time. For example, if each scan is 5 seconds long, and scans are separated by 15 seconds, then timebin=’25s’ is the minimum value needed to combine data from 2 consecutive scans. One might assume that timebin=’10s’ should work to combine two 5 second scans, but it does not. This is mostly relevant to single dish data, and when using the tasks like sdtimeaverage or mstransform.
Whereas the old version of split used exclusively the WEIGHT column to perform the weighted average, mstransform and split use both FLAG and spectral weights (when present). To be specific WEIGHT_SPECTRUM is used when averaring CORRECTED_DATA, and SIGMA_SPECTRUM is used when averaging the DATA column.
Also mstransform and split are able to transform the input WEIGHT/SIGMA_SPECTRUM according to the rules of error propagation that apply to a weighted average, which result in an output weight equals to the sum of the input weights. For a detailed reference see, Data Reduction and Error Analysis by Bevington & Robinson [2].
When mstransform and split process an ALMA MS and timebin is greater than 30s, timespan is automatically set to state, to overcome a limitation of the ALMA ASDM binary dumps.
As of version 4.5, mstransform and split both allow timespan field in addition to scan and state.
maxuvdistance: In the case of mstransform, when maxuvdistance is greater than 0 this parameter controls the maximum uv distance allowed when averaging data from the same baseline. It works in conjunction with the timebin parameter in the sense that the averaging process is finalized when either timebin is completed or maxuvdistance is reached. The details of the baseline dependent averaging algorithm are available here:
Bibliography
Effects of Baseline Dependent Time Averaging of UV Data by W.D. Cotton (OBIT No. 13, 2008).
Data Reduction and Error Analysis by Bevington & Robinson (3rd Ed., McGraw Hill, 2003).
Channel average¶
Both mstransform & split support averaging data by frequency channel. In split the amount of channel averaging (if any) is set by top-level parameter width.
width = 1 #Number of channels to average to form one output channel
In mstransform this capability is accessed by specifying chanaverage=True and setting the resulting sub-parameter chanbin, as shown here:
chanaverage = True #Average data in channels.
chanbin = 1 #Width (bin) of input channels to average to form an output channel.
Some new features of split / mstransform relative to the old implementation of split are as follows
Whereas the old version of split performed a flat average taking into account only the FLAG column, mstransform / split use both FLAG and spectral weights (when present), resulting in a weighted average. To be specific WEIGHT_SPECTRUM is used when averaging CORRECTED_DATA, and SIGMA_SPECTRUM is used when averaging the DATA column.
Also mstransform / split are able to transform the input WEIGHT/SIGMA_SPECTRUM according to the rules of error propagation that apply to a weighted average, which result in an output weight equals to the sum of the input weights. For a detailed reference see, Data Reduction and Error Analysis [1].
Both mstransform / split drop the last output channel bin when there are not enough contributors to fully populate it. For instance, if the input SPW has 128 channels and chanbin is 10, the resulting averaged SPW would have 12 channels and not 13 channels.
The chanbin parameter can be either a scalar or a vector. In the former case, the same chanbin is applied to all the spectral windows. In the second case, each element of the chanbin vector will apply to the selected spectral windows. Obviously the size of the chanbin vector and the number of selected spectral windows have to match.
If spw combination and channel average are used together (combinespws=True, chanaverage = True), the chanbin parameter can only be a scalar. This is due to the fact that channel average applies to the already spw combined MS, which contains one single spw.
Bibliography
Bevington & Robinson, 3rd Ed., McGraw Hill, 2003
Flags and data-averaging¶
How flags are treated and propagated when using data average (time or channel average).
CASA uses common infrastructure to implement data average transformations across different tasks. This infrastructure follows certain common rules to propagate flags from the original data to the averaged data. This page explains the common rules that different CASA tasks follow to produce the flags of averaged data; in other words, these rules dictate how data averaging transformations propagate the flags from the original data to the averaged data.
These rules apply to channel average and time average, as implemented by
tasks such as mstransform and split, to output averaged MeasurementSets, or
tasks such as flagdata and plotms, to transform data on-the-fly and then modify the flags in the input MeasurementSets.
In short, the rule that CASA follows for the propagation of flags from the original input data to the averaged data is a logical AND. This is detailed in the next section. The second subsection explains how tasks such as flagdata and plotms propagate back the averaged flags to the input MeasurementSet.
In what follows we explain how CASA treats flags and data when using data averaging. By the term data, we refer to the data columns present in a MeasurementSet (DATA, CORRECTED_DATA, etc.). The data always have a companion FLAG column, with matching dimensions. The data also have other companion columns related to weights: WEIGHT, SIGMA, WEIGHT_SPECTRUM, SIGMA_SPECTRUM. The focus of this page is on data flags. But data average, and the use of the data weights in the average, also plays a role in the explanations below. The treatment of data weights in CASA is explained in more detail in Data weights.
Propagation of flags from the original data to the averaged data
For a given bin of input data and flags, the averaged flags are calculated as the logical “AND” of all the flags in the input bin (timebin or chanbin). The averaged flag will be set only if all the flags in the input bin are set. Let us first illustrate this with examples. For simplicity we consider some examples of channel average. The flags for every data channel are represented by an ‘X’ (flag set) or a ‘O’ (flag unset). There are 12 channels and we try different channel bins:
Flags in the input bin ------------> averaged flags
----------------------------------------------------------------------------------------
chanbin=2
X X X O O X X X X X O X ------------> X O O X X O (flagging summary: 50% flagged)
chanbin=3
------------> X O X O (flagging summary: 50% flagged)
chanbin=4
------------> O O O (flagging summary: 0% flagged)
The implication of this AND rule is that after applying the averaging transformation the percentage of data flagged can only remain the same or decrease. And it will tend to decrease more as the bin size increases. It will also tend to decrease more for more sparse patterns of original flags. In an averaged dataset one should expect a lower percentage of flags, proportionally to the bin size used and the sparseness of the flags of the original data.
For time average the same principle applies, on a per-baseline basis, with the only difference that the bins are defined across time instead of channels.
The AND rule of propagation of flags to the averaged data
The logical “AND” rule can be formulated as follows. Let us consider a bin size \(n\), and original data \(d_i, i=0,... n-1\), with associated flags, \(f_i\). Every subindex \(_i\) corresponds to a value of the data column for a given baseline, time and channel. As a convention, \(f_i = 1\) when the flag is set (i.e., the visibility is “flagged”), and \(f_i = 0\) when the flag is not set (i.e., the visibility is “unflagged”). For every data point produced in the averaged data, \(d_{avg}\), its flag, \(f_{avg}\), is calculated as:
That is, the value of the averaged flag is defined as the product of the values of the flags in the input bin, such that including any unflagged visibility in an averaged output bin will result in an unflagged visibility in the output bin.
How flags and data are averaged
Does the “AND” rule mean that flagged data becomes unflagged via averaging? No, this doesn’t mean that CASA uses flagged data or unsets the flags of the initially flagged data. If any data point in the input bin is not flagged, the averaged data point will be not flagged. But this does not imply that flagged data is propagated to the averaged data. When one or more of the data in the input bin are no flagged, only the data that are not flagged are used in the average. The flagged data in the original bin are excluded from the average. These are the two possible scenarios:
If one or more unflagged data points can be found in the input bin, the averaged data will be produced as follows:
averaged data: calculated as the average of the input data points that are not flagged.
associated averaged flag: not set (averaged data is unflagged)
Only if all the data points in the input bin are flagged, then the averaged data will be produced as follows:
averaged data: calculated as the average of all the input data in the bin (all flagged).
associated averaged flag: set (averaged data is flagged)
To define an equation of data averaging with flags, let us now consider the data weights. A bin of \(n\) data points \(d_i, i=0,...n-1\), with flags \(f_i\), are averaged into an average data point \(d_{avg}\) with flag \(f_{avg}\). The \(d_i\) are the visibility data and the \(w_i\) are their respective weights. CASA calculates the averaged data, \(d_{avg}\) as:
There are two terms, and they are mutually exclusive
The first one represents the case where all input data are flagged (scenario B above). The output averaged data is flagged and the averaged data is calculated from all the input data in the bin.
The second term represents the case where some input data are not flagged (scenario A). The output averaged data is not flagged and the data is calculated as the average of all the unflagged input data in the bin.
In any case, data that is flagged in the input data is either:a) never propagated or used after the data average (when there is other not flagged data in the bin),b) propagated but kept flagged (when all the data in the bin are flagged).
Writing and (back)propagation of flags from the averaged data to the original data (input MeasurementSet)
This section concerns tasks such as flagdata or plotms which can apply on-the-fly average, either time or frequency, flag and/or unflag data, and write the averaged flags back to the original MeasurementSet. These tasks have the additional complexity that they need to be able to propagate back to the original MeasurementSet flags from averaged data+flags that have been transformed on-the-fly. A reverse or backward propagation is required to map the averaged flags to the original MeasurementSet.
These tasks can perform the following sequence of data manipulation steps, all in one go:
Take an input MeasurementSet and apply averaging on the data+flags.
Edit or modify the averaged flags.
Write the edited averaged flags back to the original input MeasurementSet.
Since version 6.4, CASA implements two alternative approaches to step c:
flagdata alternative: preserve pre-existing flags, flags can be added (set) but never removed (unset).
plotms alternative: flags can be added (set) but also removed (unset).
flagdata will only add new flags (true or 1) to the original data. It will never unset a previously set flag. This is implemented as follows:
If an averaged flag is set, the flag is propagated back to all the original flags in the corresponding input bin.
If an averaged flag is not set, nothing is done, and the flags that were already set in the corresponding input bin remain set.
As a consequence, a flagdata command that uses data average will only increase the amount of flags in the input MeasurmentSet (or simply keep the same amount, if the flagging methods applied do not add any new flags). This way, all original flags are preserved in the input MeasurementSet.
In contrast, plotms will write back to the input MeasurementSet both true (1) and false (0) flag values. That is, plotms can set and unset flags, and the initially set flags in the input MeasurementSet are not necessarily preserved.
Split UV-data¶
The split task selects a subset of data from a MeasurementSet and creates a new MS with only those selected data. All of the usual selection criteria can be applied (field, spectral window, antenna, time range, etc.). Additionally the user can choose to select only a single column of data (typically CORRECTED); the user may also elect to export all data columns (DATA, MODEL, CORRECTED, FLOAT_DATA). If only a single column is selected, it will appear as the DATA column in the output MS, unless the input datacolumn is set to FLOAT_DATA. In this case the output column will also be FLOAT_DATA. This suite of capabilities is of great utility for creating smaller MeasurementSets for imaging and further analysis. It is also helpful in many circumstances to facilitate self-calibration, since the gaincal task operates from the DATA column.
The top-level inputs of split are:
CASA <1>: inp split
split :: Create a visibility subset from an existing visibility set
vis = '' #Name of input MeasurementSet or Multi-MS
outputvis = '' #Name of output MeasurementSet or Multi-MS
keepmms = True #If the input is a Multi-MS the output will also be a Multi-MS
field = '' #Select field using ID(s) or name(s)
spw = '' #Select spectral window/channels
scan = '' #Select data by scan numbers
antenna = '' #Select data based on antenna/baseline
correlation = '' #Correlation: '' ==> all, correlation='XX,YY'
timerange = '' #Select data by time range
intent = '' #Select data by scan intent.
array = '' #Select (sub)array by array ID number(s)
uvrange = '' #Select data bt baseline length
observation = '' #Select data by observation ID(s).
feed = '' #Multi-feed numbers: Not yet implemented.
datacolumn = 'corrected' #Which data column(s) to process
keepflags = True #Keep *completely flagged rows* instead of dropping them
width = 1 #Number of channels to average to form one output channel
timebin = '0s' #Bin width for time averaging
Usually you will run split with datacolumn=’corrected’ as previous operations (e.g. applycal) will have placed the calibrated data in the CORRECTED_DATA column of the MS. This will produce a new MS with this CORRECTED_DATA in its DATA column. The modes available in datacolumn are:
corrected
data
model
data,model,corrected
float_data
lag_data
float_data,data
lag_data,data
all
#float_data is used for single dish processing
For example, to split out 46 channels (5-50) from spw 1 of our NGC5921 calibrated dataset:
split(vis='ngc5921.usecase.ms',
outputvis='ngc5921.split.ms',
field='2', #Output NGC5921 data (field 2)
spw='0:5~50', #Select 46 chans from spw 0
datacolumn='corrected') #Take the calibrated data column
Starting in CASA 4.6.0, split2 has been renamed to split and became the default in CASA. The interface of both implementations is the same, but the new split uses the MSTransform framework underneath. See the “Manipulating Visibilities with MSTransform” chapter for detailed information on mstransform.
Average in split¶
Time and channel averaging are available within split using the timebin and width parameters. The way that time averaging operates can be further controlled by combine, which is a sub-parameter of timebin.
The timebin parameter gives the averaging interval. It takes a quantity, e.g. timebin=’30s’ for 30-second time averaging. By default the time-span of time averaging will be limited by scan boundaries, i.e. the time average will not be extended to average data from multiple scans together. Similarly, time averaging will by default not extend across state boundaries (states are telescope-specific entities; for ALMA a state corresponds to a sub-scan). These default limits on time averaging can be modified by setting combine to ‘scan’, ‘state’, or ‘state,scan’ to ignore one or both types of boundaries and average across them. combine is a sub-parameter of timebin which is enabled by selecting a non-zero time averaging interval.
The width parameter defines the number of channels to average to form a given output channel. This can be specified globally for all spw, e.g.
width = 5
or specified per spw, e.g.
width = [2,3]
to average 2 channels of 1st spectral window selected and 3 in the second one.
ALERT: When averaging channels split will produce negative channel widths (as reported by listobs) if frequency goes down with increasing channel number, whether or not the input channel widths are negative. The bandwidths and channel resolutions will still be positive.”
Recalculate UVW Values¶
Sometimes the u,v,w coordinates of a MeasurementSet are not recorded correctly by the correlator. For instance, if the catalog position of a phase calibrator is in error, the control system will place the phase center on this erroneous position. Although the actual wavefront from the quasar will not be coming from the phase center, the normal calibration process will adjust all the phases to place the quasar there (because the default model is a point source at the phase center), which will yield science target images with wrong absolute positions.
To fix this problem, you can use fixvis to shift the raw data on the phase calibrator to have the correct phase center so that your science target will then come out with the correct position. This technique was used in the Band 9 ALMA SV data on IRAS16293.
One useful feature of fixvis is that it can change the phase center of a MeasurementSet. This can be done with absolute coordinates or using offsets. An example is:
fixvis(vis='ngc3256.ms',outpuvis='ngc3256-fixed.ms',field='NGC3256',
phasecenter='J2000 10h27m51.6s -43d54m18s')
which will recalculate the u,v,w coordinates relative to the new phase center for the field ‘NGC3256’. Invoking fixvis as follows will instead re-calculate the (u,v,w) values using the existing phase center in the FIELD table of the MS –
fixvis(vis='ngc3256.ms',outpuvis='ngc3256-fixed.ms',field='NGC3256')
Other parameters fixvis accepts are as follows –
fixvis :: Recalculates (u, v, w) and/or changes Phase Center
vis = '' #Name of the input visibility set.
outputvis = '' #Name of the output visibility set. (Can be the same #as vis.)
field = '' #Fields to operate on. = all.
refcode = '' #reference frame to convert UVW coordinates to
reuse = True #base UVW calculation on the old values?
phasecenter = '' #use this direction as phase center
Hanning Smooth UV-data¶
For strong spectral line sources (like RFI sources), the Gibbs phenomenon may cause ringing across the frequency channels of an observation. This is called the Gibbs phenomenon and a proven remedy is the Hanning smoothing algorithm. Hanning smoothing is a running mean across the spectral axis with a triangle as a smoothing kernel. The central channel is weighted by 0.5 and the two adjacent channels by 0.25 to preserve the flux; mathematically this is an N=5 sample Hann window kernel (including the outer-most zero-weight samples in the window). Hanning smoothing significantly reduces Gibbs ringing at the price of a factor of two in spectral resolution. Users should also be aware that it will introduce noise correlations between channels which can affect the interpretation of cross-channel chi-squared or uv-model fitting analyses.
The new hanningsmooth task (based on mstransform) does not write to the input MS, but it always creates an output MS. It can also handle a Multi-MS and process it in parallel (see more information here).
In CASA, the hanningsmooth task will apply Hanning smoothing to a spectral line uv data MeasurementSet. The inputs are:
#hanningsmooth :: Hanning smooth frequency channel data to remove Gibbs ringing
vis = '' #Name of input MeasurementSet or Multi-MS.
outputvis = '' #Name of output MeasurementSet or Multi-MS.
keepmms = True #If the input is a Multi-MS the output will also
be a Multi-MS.
field = '' #Select field using ID(s) or name(s).
spw = '' #Select spectral window/channels.
scan = '' #Select data by scan numbers.
antenna = '' #Select data based on antenna/baseline.
correlation = '' #Correlation: '' ==> all, correlation='XX,YY'.
timerange = '' #Select data by time range.
intent = '' #Select data by scan intent.
array = '' #Select (sub)array(s) by array ID number.
uvrange = '' #Select data by baseline length.
observation = '' #Select by observation ID(s).
feed = '' #Multi-feed numbers: Not yet implemented.
datacolumn = 'all' #Input data column(s) to process.
The datacolumn parameter determines which of the data columns is to be Hanning smoothed: ‘all’, ‘model’, ‘corrected’, ‘data’, ‘float_data’ or ‘lag_data’. ‘all’ will use whichever of the visibility data columns that are present in the input MS. If ‘corrected’ is specified, the task will smooth the input CORRECTED_DATA column and save the smoothed data in DATA of the output MS.
The Hanning smoothing transformation in mstransform is available via a single parameter, as shown below:
#Hanning smooth in mstransform
hanning = True #Hanning smooth data to remove Gibbs ringing
Regrid Frequency/Velocity¶
Although not strictly a calibration operation, spectral regridding of a MS is available to aid in calibration operations (e.g. continuum subtraction) and preparation for imaging. For this purpose, the cvel task has been developed.
The inputs are:
#cvel :: regrid an MS to a new spectral window / channel structure or frame
vis = '' #Name of input MeasurementSet
outputvis = '' #Name of output MeasurementSet
passall = False #Pass through (write to output MS) non-selected data with
#no change
field = '' #Select field using field id(s) or field name(s)
spw = '' #Select spectral window/channels
selectdata = True #Other data selection parameters
timerange = '' #Range of time to select from data
array = '' #(sub)array indices
antenna = '' #Select data based on antenna/baseline
scan = '' #scan number range
mode = 'channel' #Regridding mode
nchan = -1 #Number of channels in output spw (-1=all)
start = 0 #first input channel to use
width = 1 #Number of input channels to average
interpolation = 'linear' #Spectral interpolation method
phasecenter = '' #Image phase center: position or field index
restfreq = '' #rest frequency (see help)
outframe = '' #Output frame (not case-sensitive, ''=keep input frame)
veltype = 'radio' #velocity definition
hanning = False #If true, Hanning smooth data before regridding to remove
#Gibbs ringing.
The key parameters for the operation of cvel are the regridding mode, the output reference outframe, veltype, restfreq and the standard selection parameters (in particular spw and field).
The syntax for mode options (‘channel’,’velocity’,’frequency’,’channel_b’) has been made compatible with the respective modes of clean. The combination of selected spw and mode will determine the output channels and spw(s):
spw = '0,1'; mode = 'channel'
#will produce a single spw containing all channels in spw 0 and 1
spw='0:5~28^2'; mode = 'channel'
#will produce a single spw made with channels (5,7,9,...,25,27)
spw = '0'; mode = 'channel': nchan=3; start=5; width=4
#will produce an spw with 3 output channels
#new channel 1 contains data from channels (5+6+7+8)
#new channel 2 contains data from channels (9+10+11+12)
#new channel 3 contains data from channels (13+14+15+16)
spw = '0:0~63^3'; mode='channel'; nchan=21; start = 0; width = 1
#will produce an spw with 21 channels
#new channel 1 contains data from channel 0
#new channel 2 contains data from channel 2
#new channel 21 contains data from channel 61
spw = '0:0~40^2'; mode = 'channel'; nchan = 3; start = 5; width = 4
#will produce an spw with three output channels
#new channel 1 contains channels (5,7)
#new channel 2 contains channels (13,15)
#new channel 3 contains channels (21,23)
The simplest use of cvel is to shift a single spectral window into an output frame. This is done with mode=’channel’. For example:
cvel(vis='test_w3oh_nohann.ms',
outputvis ='test_w3oh_nohann_chanbary.ms',
mode='channel',nchan=-1,start=0,width=1,
interpolation='linear',
phasecenter='',
spw='',
outframe='BARY')
will transform all SPWs into the BARY reference frame. This implies a “software Doppler tracking”, i.e. the SPW definitions are transformed into the BARY frame and in the different integrations of the dataset, the visibilties are regridded into the transformed SPW accounting for the time-dependencies. E.g. if the original SPWs were in reference frame TOPO and thus the movent of the Earth w.r.t. the source will have smeared out line emission, the new SPWs will be in reference frame BARY and the effects of the movement of the Earth will have been removed but the number of channels will remain the same and the frequency resolution will be maximal.
Mode channel is intended to not change the frequency resolution beyond the reference frame transformation or at least only change the resolution in units of whole channels. For most scientific applications we recommend using the mode=’velocity’’ and mode=’frequency’ options, as it is easiest to determine what the resulting channel width will be in terms of velocity or frequency bandwidth. For example:
cvel(vis='test_w3oh_nohann.ms',
outputvis ='test_w3oh_nohann_cvellsrk.ms',
mode='velocity',nchan=45,start='-35.0km/s',width='-0.55km/s',
interpolation='linear',
phasecenter='',
spw='',
restfreq='1665.4018MHz',
outframe='LSRK')
cvel(vis='test_w3oh_nohann.ms',
outputvis ='test_w3oh_nohann_cvelbary.ms',
mode='velocity',nchan=45,start='-35.0km/s',width='-0.55km/s',
interpolation='linear',
phasecenter='',
spw='',
restfreq='1665.4018MHz',
outframe='BARY')
will transform a MS into the LSRK and BARYcenter frames respectively.
The sign of the width parameter determines whether the channels run along increasing or decreasing values of frequency or velocity (i.e. if the cube is reversed or not).
Info: in order to permit the calculation of velocities from the internally stored frequencies, you need to provide a rest frequency in parameter restfreq when you operate in mode ‘velocity’. This rest frequency will not be stored with the MS (as opposed to the rest frequency which you provide to the clean task which is subsequently stored with the image).
The intent of cvel regridding is to transform channel labels and the visibilities to a spectral reference frame which is appropriate for the science analysis, e.g. from TOPO to LSRK, e.g. to correct for Doppler shifts throughout the time of the observation. Naturally, this will change the shape of the spectral features to some extent. According to the Nyquist theorem you should oversample a spectrum with twice the numbers of channels to retain the shape. Based on some tests, however, we recommend to observe with at least 3-4 times the number of channels for each significant spectral feature (like 3-4 channels per linewidth). This will minimize regridding artifacts in cvel.
If cvel has already established the grid that is desired for the imaging, clean should be run with the default channel mode (width=1). This will avoid additional regridding in clean. Hanning smoothing is optionally offered in cvel, but tests have shown that already the regridding process itself, if it involved a transformation from TOPO to a non-terrestrial reference frame, implies some smoothing (due to channel interpolation) such that Hanning smoothing may not be necessary.
The interpolation method fftshift calculates the transformed visibilities by applying a FFT, then a phase ramp, and then an inverse FFT. Note that if you want to use this interpolation method, your frequency grid needs to be equidistant, i.e. it only works in mode velocity with veltype=radio, in mode frequency, and in mode channel (in the latter only if the input grid is itself equidistant in frequency). Note also that, as opposed to all other interpolation methods, this method will apply a constant (frequency-independent) shift in frequency which is not fully correct in the case of large fractional bandwidth of the given spectral window.
The task cvel can also be used to transform spectral windows into the rest frame of the ephemeris object by setting the parameter outframe to “SOURCE” as in the following example:
cvel(vis='europa.ms', outputvis='cvel_europa.ms', outframe='SOURCE')
This will make cvel perform a transformation to the GEO reference frame followed by an additional Doppler correction for the radial velocity given by the ephemeris for the each field. (Typically, this should happen after calibration and after splitting out the spectral widows and the target of interest). The result is an MS with a single combined spectral window in reference frame REST. From this frame, further transformations to other reference frames are not possible.
Combine MeasurementSets¶
Once you have your data in the form of CASA MeasurementSets, you can go ahead and process your data using the editing, calibration, and imaging tasks. In some cases, you will most efficiently operate on single MS for a particular session (such as calibration). Some tasks can take multiple MeasurementSets as input. For others, it is easiest to combine your multiple data files into one.
If you need to combine multiple datasets, you can use the concat task. The default inputs are:
#concat :: Concatenate several visibility data sets.
vis = [''] #Name of input visibility files to be concatenated
concatvis = '' #Name of output visibility file
freqtol = '' #Frequency shift tolerance for considering data as the same spwid
dirtol = '' #Direction shift tolerance for considering data as the same field
respectname = False #If true, fields with a different name are not merged even if their direction agrees
timesort = False #If true, sort by TIME in ascending order
copypointing = True #Copy all rows of the POINTING table.
visweightscale = [] #List of the weight scaling factors to be applied to the individual MSs
forcesingleephemfield = '' #make sure that there is only one joint ephemeris for every field in this list
The vis parameter will take the list of MSs to combine. concat will presort them in time.
With visweightscale, a list of weights can be manually specified for the respective input data sets. They will be applied at the time of the combination. To determine the appropriate weights for this procedure, one can inspect the weights (‘Wt’ and ‘WtSp’ axis inputs) of the input datasets in plotms. The weights of the individual MSs will be scaled in the concatenated output MS by the factors in this list. SIGMA will be multiplied by 1/sqrt(factor) (i.e. the weights will be multiplied by factor). This capability can be useful for handling heterogeneous arrays, depending on how the weights were initially estimated and set.
The concatvis parameter contains the name of the output MS. If this points to an existing file on disk, then the MS in vis will appended to it, otherwise a new MS file is created to contain the concatenated data. Be careful here!
The timesort parameter can be used to make sure the output MS is in time order (e.g. if your input MS have concurrent times). This can possibly speed up some subsequent calibration operations.
Furthermore, the parameter copypointing can be used to control whether the POINTING table will be carried along in the concatenation process or if the output MS should not contain a POINTING table. This table is quite large for some data (e.g. ALMA) and is mainly needed for mosaic imaging. If you are certain that you will not need it, you can save time and diskspace by setting copypointing=False.
For ALMA and VLA data, importasdm will fill the correct coordinates from ephemerides data into the SOURCE table. And, as stated in the ephemeris section, concat will correctly merge fields which use the same ephemeris.
Using the parameter forcesingleephemfield, you can control whether the attached tabulated ephemerides are merged into one. By default, concat will only merge two ephemeris fields if the first ephemeris covers the time range of the second. Otherwise, two separate fields with separate ephemerides are placed in the output MS. In order to override this behaviour and make concat merge the non-overlapping or only partially overlapping input ephemerides, the name or id of the field in question needs to be placed into the list in parameter forcesingleephemfield. Example: forcesingleephemfield=’Neptune’ - will make sure that there is only one joint ephemeris for field Neptune in the output MS.
The parameters freqtol and dirtol control how close together in frequency and angle on the sky spectral windows or field locations need to be before calling them the same.
ALERT: If multiple frequencies or pointings are combined using freqtol or dirtol, then the data are not changed (i.e. not rephased to the single phase center). Use of these parameters is intended to be tolerant of small offsets (e.g. planets tracked which move slightly in J2000 over the course of observations, or combining epochs observed with slightly different positions).
For example:
default('concat')
vis = ['n4826_16apr.split.ms','n4826_22apr.split.ms']
concatvis = 'n4826_tboth.ms'
freqtol = '50MHz'
visweightscale=['1','2']
concat()
combines the two days in n4826_16apr.split.ms and n4826_22apr.split.ms into a new output MS called n4826_tboth.ms, and the second MS is weighted twice the first one.
ALERT: If you are concatenating MSs which use antennas which were moved between observations, the MS definition does only foresee a unique antenna ID, but not a unique name(!). The moved antenna will appear twice in the antenna list under the same name but on different stations and with two different IDs. The pair (‘NAME@STATION’) will be the unique identifier.
If you would like to only concatenate the subtables of several MSs, not the bulk visibility data, you can use the task testconcat instead of concat to save time and diskspace. testconcat has the same parameters as concat. It produces an output MS with the concatenated subtables and an empty MAIN table.
Furthermore, the task virtualconcat permits to concatenate MSs into a multi-MS (MMS) which is much faster as the data is moved into the MMS rather than copied and only some reindexing is done. The bulk data is not rewritten. If you want to keep a copy of the original MSs, set the parameter keepcopy of virtualconcat to True. The creation of that copy will of course consume some of the time you saved by doing a virtual concatenation. Otherwise virtualconcat offers the same functionality as concat.
UV Continuum Subtraction¶
After general calibration is done and if there is significant continuum emission present in what is intended as a spectral line observation, continuum subtraction may be desirable. You can estimate and subtract continuum emission in the \(uv\)-domain prior to imaging or wait and subtract an estimate of it in the image-plane. Note that neither method is ideal, and the choice depends primarily upon the distribution and strength of the continuum emission. Subtraction in the \(uv\)-domain is desirable if continuum emission dominates the source, since deconvolution of the line emission will be more robust if it not subject to the deconvolution errors of the brighter continuum. There is also a performance benefit since the continuum is nearly the same in each channel of the observation, and it is desirable to avoid repeating its deconvolution in each channel. However, doing the continuum estimation in the \(uv\)-domain has the serious drawback that interpolating visibilities between channels is only a good approximation for emission from near the phase center. Thus, \(uv\)-domain based continuum subtraction will do an increasingly poor job for emission distributed further from the phase center. If the continuum emission is relatively weak, it is usually adequate to subtract it in the image plane; this is described in the Image Analysis section of this document. Here, we describe how to do continuum subtraction in the \(uv\)-domain.
Basic Concept¶
A good review of different approaches to the subtraction of the continuum emission is found in Cornwell, Uson & Haddad (1992) [1].
Sault (1994) [2] gives the detailed analysis of the \(uv\)-domain based algorithms. We assume here that the sky brightness \(I_\nu\) at a sky position \((l,m)\) is composed of the continuum (C) and line (L) emission such that \(I_\nu(l,m)=C_\nu(l,m)+L_\nu(l,m)\). The continuum is estimated from fitting a polynomial selecting only “line-free” channels. The fitting of the visibiity spectrum is generally done for each sampling and separately for real and imaginary parts to make it a linear process. Then the polynomial continuum model is subtracted from all channels. Note that because the real and imaginary parts are fitted separately, the fitted model amplitude has the functional form of sqrt(polynomial of order fitorder *2) which, in general, is not a polynomial.
This technique is known to work well when the dominant continuum source is at or near phase center. Its effectiveness tends to decline as distance to the source location with respect to the phase center increases and thus residual continuum left in the subtracted data increases. The effectiveness which has the same expression as in bandwidth smearing, can be parameterized as \(\eta=\frac{\Delta\nu}{\nu}\frac{l_{0}}{\theta_{synth}}\) in terms of the the distance in the synthesized beam (\(l_{0}/\theta_{synth}\)) from the phase center, where \(\nu\) and 2\(\Delta\nu\) are the observing frequency and the bandwidth, respectively. In order to the method to work well, \(\eta << 1\) must be met. If the brightest continuum emission lies beyond \(\frac{\nu}{\Delta\nu}\) beams from the phase center the source need to be shifted before fitting.
The recommended procedures¶
The general recommended procedures are described below. For detailed examples on how to use the \(uv\)-domain continuum subtraction tasks available in CASA, please refer to the next section and the documentation for each individual task.
Finish general calibration.
Use clean/tclean task on the calibrated MS to form an exploratory cube that is useful for determining the line-free channels.
Use one of the uv-domain continuum subtraction tasks with as low fit order as possible to estimate and subtract the continuum from the input MS and write the continuum-subtracted MS.
Use clean/tclean with the continuum subtracted data to make an image cube of the line emission; inspect for residual continuum, tweak and repeat if needed. Computing image moments in nominally line-free channels may be a useful diagnostic if your imaging requirements are stringent.
CASA implementations¶
CASA 6.5 has introduced a new uvcontsub task. Its aim is to consolidate \(uv\)-domain continuum subtraction in one task and to replace other implementations available in different tasks. Older tasks to do \(uv\)-domain continuum subtraction are: uvcontsub_old, and mstransform. Other variants have been removed. The current plan is for uvcontsub to replace uvcontsub_old and the continuum subtraction functionality of mstransform. All of these tasks are based on the same basic concept described above and achieve essentially the same output but offer different interfaces and are based on different underlying implementations.
uvcontsub is based on the same code infrastructure as mstransform and has been defined to better support the workflow of pipelines in operations. uvcontsub_old has been used as the standard task to do \(uv\) continuum subtraction job and it is based on calibrator tool (‘cb’) to solve for continuum by fitting a polynomial. It creates an ‘A’-type (‘additive noise’) caltable which will be deleted before the task exits. This solution is applied to correct (or subtract) the rest of the data. A parameter to perform continuum subtraction in mstransform (douvcontsub) was introduced in mstransform in CASA 4.7 and is now deprecated.
In terms of parallelisation, only uvcontsub_old and the mstransform implementation (parameter douvcontsub) use the mpi4casa framework. See some notes here on how to handle special cases in parallel using uvcontsub.
Using uvcontsub¶
The inputs to uvcontsub are:
# uvcontsub -- continuum subtraction in the uv domain
vis = '' # Name of input visibility file (MeasurementSet)
outputvis = '' # Name of output MeasurementSet (visibility file)
field = '' # Select field using field id(s) or field name(s)
spw = '' # Select spectral window/channels
scan = '' # Select scans by scan numbers
intent = '' # Select observing intent
array = '' # Select (sub)array(s) by array ID number
observation = '' # Select by observation ID(s)
datacolumn = 'data' # Which data column(s) to process
fitspec = '' # Specification of polynomial order and spectral window:channel for fitting
fitmethod = 'gsl' # Choose fitting method
fitorder = 0 # Polynomial order for the fits
writemodel = False # Write fitted model into the MODEL column of the output MS
The parameters field, spw scan, intent, array, and observation can be used (followint the MSSelection syntax) to select the data that will be included in the output MeasurementSet. The continuum subtracted data will be produced as follows. For each baseline and integration, a polynomial will be fitted to the real and imaginary parts of the visibilities. The input visibilities can be taken from the data column of the input MeasurementSet and also from other columns such as corrected data, via the datacolumn parameter.
A subset of channels can be specified in the parameter fitspec, which can be interpreted as a specification of the continuum-only channels. fitspec follows the MSSelection syntax for SPW and channel specification (only in native reference frame of the MeasurementSet). fispec can also be set as a dictionary that specifes different sets of channels and different polynomial orders for different fields and SPWs (see task page for details) which makes it possible to produce output MeasurementSets with multiple fields in one uvcontsub call. The fitted polynomial models are subtracted from all channels and the result is written in the data column of the output MS. Optionally, writemodel can be enabled to also write the models fitted into the model column of the output MS. The order of the polynomial can be set via the parameter fitorder. Typically, low orders for the polynomial work best, like 0th (a constant), or 1st order (a linear fit). Use higher orders with caution and check your results carefully.
uvcontsub implements polynomial fitting using the Linear least-squares method and an implementation based on the GSL library and its gsl_multifit_linear routines. This is the implementation used for the default fitting method, ‘gsl’.
Linear least-squares is based on minimizing \(\chi^2\) (or chi-squared) which is defined as the weighted sum of the squared residuals. To define the residuals in uvcontsub, let us consider a baseline, integration, and polarization for which the visibility data are \(d_c, i=0,... n-1\), with \(n\) the number of channels, and with associated flags, \(f_c\) and associated weights \(w_c\). Every subindex \(_i\) corresponds to a value of the data column for a given channel for that baseline, integration and polarization.
The real part, \(\mathrm{Re}(d_c)\), and imaginary part, \(\mathrm{Im}(d_c)\), are fitted separately following the same process. Let us consider the real part. The aim is to fit a polynomial model \(Y(a,\nu_c)\) to the visibility values \((\nu_c, \mathrm{Re}(d_c))\), where the parameters of the polynomial are \(a = {a_0, a_1,\cdots a_g}\), i.e. the coefficients of a polynomial of order \(g\). Least-squares fits are found by minimizing the chi-squared cost function:
That is the weighted sum of the squared residuals (\(\mathrm{Re}(d_c) - Y(a,\nu_c)\)) or differences between the input visibility value and the model produced. The weights are taken directly from the visibility weights. Further details can be found in the GSL multifit linear documentation.
In addition to the fitspec parameter, the channelized data flags and data weights also influence how the channels will be used for the purpose of fitting the continuum of both the real and imaginary parts of thscane visibilities. From the definition of the chi-squared cost function, the weights have a direct multiplicative effect on the relevance of every channel. The channel weights, adjusted for example using the statwt task influence how relevant different channels will be for the fitting, in a more gradual way than the fitspec parameter.
Regarding the flags, as a convention, \(f_c = 1\) when the flag is set (i.e., the visibility is “flagged”), and \(f_c = 0\) when the flag is not set. Those channels for which their flags are set are excluded from the set of data points fitted (\((\nu_c, Re(d_c))\)). Channels that are flagged are effectively excluded for the purpose of fitting. In this sense a set flag is equivalent to excluding a channel from fitspec.
The \(\chi^2\) values defined above are the metric used to estimate the goodness of fit of the polynomials fitted. uvcontsub returns a dictionary with the \(\chi^2\) values grouped and aggregated by field, scan, spw, polarization and real and imaginary part.
Using uvcontsub_old¶
The inputs to uvcontsub_old are:
# uvcontsub_old -- Continuum fitting and subtraction in the uv plane
vis = '' # Name of input MS. Output goes to vis + ".contsub" (will be overwritten if already exists)
field = '' # Select field(s) using id(s) or name(s)
fitspw = '' # Spectral window:channel selection for fitting the continuum
combine = '' # Data axes to combine for the continuum estimation (none, or spw and/or scan)
solint = 'int' # Continuum fit timescale (int recommended!)
fitorder = 0 # Polynomial order for the fits
spw = '' # Spectral window selection for output
want_cont = False # Create vis + ".cont" to hold the continuum estimate.
For each baseline, and over the timescale specified in solint, uvcontsub_old will provide a polynomial fit to the real and imaginary parts of the (continuum-only) channels specified in fitspw (using the standard spw selection syntax), and then subtract this model from all channels specified in spw, or from all channels in spectral windows of fitspw if spw=’’. By setting the subparameter excludechannels=True, the channel selection in fitspw will be inverted. In that case one can select the line channels themselves and/or corrupted channels that are not used in the continuum fit to the data. Note that for MeasurementSets with multiple fields, if different fitspw specifications are needed for different fields, only one field can be processed at a time with a single uvcontsub call. fitspw can also take frequency ranges, e.g.
fitspw='*:113.767~114.528GHz;114.744~115.447GHz'
where ‘*’ indicates to go across all spws. Usually, one should set solint=’int’ which does no averaging and fits each integration. However, if the continuum emission comes from a small region around the phase center and fitorder = 0, then you can set solint larger (as long as it is shorter than the timescale for changes in the visibility function of the continuum). If your scans are short enough you can also use scan averaging with combine=’scan’ and solint=’inf’. Be warned, setting solint too large will introduce “time smearing” in the estimated continuum and thus not properly subtract emission not at the phase center. Increasing solint speeds up the calculation but it does not improve the overall result quality of uvcontsub_old - although the continuum estimates of each baseline may be noisy (just like each visibility in a continuum MS may be noisy), it is better to use the ensemble of individual fits than to average the ensemble before fitting. Note that plotms can do time and baseline averaging on the fly to help you examine noisy data.
uvcontsub_old will append “.contsub” for the continuum subtracted MS and “.cont” if want_cont=True. Although the continuum model is available with the latter parameter, we recommend to use line-free channels for creating continuum images. The interpolation across the line channels will not gain better signal-to-noise but may introduce noise or model residuals Because the continuum model is necessarily a smoothed fit, images made with the .cont MS are liable to have their field of view reduced in some strange way. Images of the continuum should be made by simply excluding the line channels (and probably averaging the remaining ones) in tclean.
Using mstranform¶
mstransform has gotten support to subtract the continuum in the UV-domain using a polynomial fit along the spectral channels. This option is deprecated and will be removed in an upcoming release. This transformation can be stacked with the rest of the transformations supported by mstransform. To activate continum subtraction in mstransform the option douvcontsub must be set:
douvcontsub = True #Enable continuum subtraction as in task **uvcontsub**
The most relevant parameter to fit the continuum is fitspw, which allows to select which channels are supposedly free of lines and therefore represent with better fidelity the continuum. The syntax of this parameter is similar to the usual syntax for the selection of spw’s. For instance
fitspw='19:5~50;100~220,20:1~100'
will use channels 5 to 50 and 100 to 220 when computing the continuum of spw 19. For spw 20 it will use channels 1 to 100.
There is currently no support to fit the continuum over several spw’s at the same time. You can use uvcontsub_old task if you need that functionality.
The output MS will contain the continuum subtracted signal. If one, on the other hand, is interested in the fitted continuum itself, then the parameter want_cont should be set to True. Note that in this case, if there are other transformations enabled in mstransform, the subsequent transformations will work on the fitted continuum data.
The algorithm implemented by mstransform allows to reject some outliers in the fit by doing an iterative fit. After the first fit has been obtained, the absolute residuals of each point with respect to the fit are computed and are used as weights for the next iteration. In this way outliers are usually given less and less weight in each iteration. To enable this feature, set the parameter niter to a value larger than 1.
niter = 1 #Number of iterations for re-weighted linear fit
Additionally one can control the order of the polynomial fit using parameter fitorder
fitorder = 0 #Polynomial order for the fits
Bibliography
Cornwell, T. J., Uson, J. M., & Haddad, N. 1992, A&A, 258, 583 (ADS)
Sault, R. J. 1994, A&AS, 107, 55 (ADS)
Subtract/Add Model Visibilities¶
uvsub is a simple task that allows one to subtract or add the MODEL_DATA column to the CORRECTED_DATA column of a given MeasurementSet. It has only 2 parameters: vis and reverse.
If the CORRECTED_DATA column does not exist then it will be created first and the DATA column will be copied into it before the addition/subtraction of the MODEL_DATA is performed.
The MODEL_DATA column can either be the scratch column or a virtual one; either one will work with uvsub. The model visibilities are usually populated by the tasks clean/tclean, ft, and setjy.
Note that uvsub does the subtraction over the whole ms. If only a subsection (say field or spw selection was done whiile using clean or ft) of the MS was used in these tasks that populate the model visibilities, then the uvsub operation will give the expected results for only those parts. The remainder of the MS will get the CORRECTED_DATA added/subtracted with whatever existed originally in the MODEL_DATA. On initialization the model visibilities are 1 for parallel hand and 0 for cross hand.
Fit Gaussians to Visibilities¶
UV-Domain Model Fitting (uvmodelfit)
It is often desirable to fit simple analytic source component models directly to visibility data. Such fitting has its origins in early interferometry, especially VLBI, where arrays consisted of only a few antennas and the calibration and deconvolution problems were poorly constrained. These methods overcame the calibration uncertainties by fitting the models to calibration-independent closure quantities and the deconvolution problem by drastically limiting the number of free parameters required to describe the visibilities. Today, even with larger and better calibrated arrays, it is still desirable to use visibility model fitting in order to extract geometric properties such as the positions and sizes of discrete components in radio sources. Fits for physically meaningful component shapes such as disks, rings, and optically-thin spheres, though idealized, enable connecting source geometry directly to the physics of the emission regions.
Visibility model fitting is carried out by the uvmodelfit task. The inputs are:
#uvmodelfit :: Fit a single component source model to the uv data:
vis = '' #Name of input visibility file
field = '' #field name or index
spw = '' #spectral window
selectdata = False #Activate data selection details
niter = 5 #Number of fitting iterations to execute
comptype = 'P' #Component type (P=pt source,G=ell. gauss,D=ell. disk)
sourcepar = [1, 0, 0] #Starting guess (flux,xoff,yoff,bmajaxrat,bpa)
varypar = [] #Which parameters can vary in fit
outfile = '' #Optional output component list table
ALERT: This task currently only fits a single component. For multiple, arbitrary shaped component fitting, we refer to the uvmultifit [1] software that was developed by the Nordic ALMA Regional Center Node.
The user specifies the number of non-linear solution iterations (niter), the component type (comptype), an initial guess for the component parameters (sourcepar), and optionally, a vector of Booleans selecting which component parameters should be allowed to vary (varypar), and a filename in which to store a CASA component list for use in other applications (outfile). Allowed comptypes are currently point ‘P’ or Gaussian ‘G’.
The function returns a vector containing the resulting parameter list. This vector can be edited at the command line, and specified as input (sourcepar) for another round of fitting.
The sourcepar parameter is currently the only way to specify the starting inputs for the fit. For points, there are three inputs: I (total flux density), and relative direction (RA, Dec) offsets (in arcsec) from the observation’s phase center. For Gaussians, there are three additional inputs: the Gaussian’s semi-major axis width (arcsec), the aspect ratio, and position angle (degrees). It should be understood that the quality of the result is very sensitive to the starting inputs provided by the user. If this first guess is not sufficiently close to the global χ2 minimum, the algorithm will happily converge to an incorrect local minimum. In fact, the χ2 surface, as a function of the component’s relative direction inputs, has a shape very much like the inverse of the absolute value of the dirty image of the field. Any peak in this image (positive or negative) corresponds to a local χ2 minimum that could conceivable capture the fit. It is the user’s responsibility to ensure that the correct minimum does the capturing.
Currently, uvmodelfit relies on the likelihood that the source is very near the phase center (within a beamwidth) and/or the user’s savvy in specifying the starting parameters. This fairly serious constraint will soon be relieved somewhat by enabling a rudimentary form of uv-domain weighting to increase the likelihood that the starting guess is on a slope in the correct χ2 valley.
Improvements in the works for visibility model fitting include:
User-specifiable uv-domain weighting
Additional component shapes, including elliptical disks, rings, and optically thin spheroids.
Optional calibration pre-application
Multiple components. The handling of more than one component depends mostly on efficient means of managing the list itself (not easy in command line options), which are currently under development.
Combined component and calibration fitting.
Example (see Figure 1):
##Note: It's best to channel average the data if many channels #before running a modelfit #
split('ngc5921.ms','1445+099_avg.ms', datacolumn='corrected',field='1445*',width='63')
#Initial guess is that it's close to the phase center
#and has a flux of 2.0 (a priori we know it's 2.47)
uvmodelfit('1445+099_avg.ms', #use averaged data
niter=5, #Do 5 iterations
comptype='P', #P=Point source, G=Gaussian, D=Disk
sourcepar=[2.0,.1,.1], #Source parameters for a point source
spw='0',
outfile='gcal.cl') #Output component list file
#Output looks like:
There are 19656 - 3 = 19653 degrees of freedom.
iter=0: reduced chi2=0.0418509: I=2, dir=[0.1, 0.1] arcsec
iter=1: reduced chi2=0.003382: I=2.48562, dir=[-0.020069, -0.0268826] arcsec
iter=2: reduced chi2=0.00338012: I=2.48614, dir=[0.00323428, -0.00232235] arcsec
iter=3: reduced chi2=0.00338012: I=2.48614, dir=[0.00325324, -0.00228963] arcsec
iter=4: reduced chi2=0.00338012: I=2.48614, dir=[0.00325324, -0.00228963] arcsec
iter=5: reduced chi2=0.00338012: I=2.48614, dir=[0.00325324, -0.00228963] arcsec
If data weights are arbitrarily scaled, the following formal errors
will be underestimated by at least a factor sqrt(reduced chi2). If
the fit is systematically poor, the errors are much worse.
I = 2.48614 +/- 0.0176859
x = 0.00325324 +/- 0.163019 arcsec
y = -0.00228963 +/- 0.174458 arcsec
Writing componentlist to file: /home/sandrock/smyers/Testing/Patch2/N5921/gcal.cl
#Fourier transform the component list to a model of the MS
ft('1445+099_avg.ms', complist='gcal.cl')
#Plot data versus uv-distance
plotms(vis='1445+099_avg.ms', xaxis='uvdist', ydatacolumn='corrected')
#Plot model data versus uv-distance
plotms(vis='1445+099_avg.ms', xaxis='uvdist', ydatacolumn='model')
The Nordic ALMA ARC node maintains the UVMULTIFIT package that is based on CASA and which provides addition, powerful tools for visibility modelling. See the Nordic ARC software page and Marti-Vidal et al. (2014) [1] for details.
Type Figure 1: ID Plot visualizing the corrected data (red and blue points) and the uv model fit (green circles). This particular plot was made using plotxy, which was deprecated in CASA 5.1 - use plotms instead.
Bibliography
Marti-Vidal, I., Vlemmings, W.H.T., Muller, S., & Casey, S. 2014, A&A, 563, A136 (ADS)
Marti-Vidal et al. 2014, A&A 563, 136 (arXiv:1401.4984)
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/synthesis_calibration.ipynb
Synthesis Calibration¶
This chapter explains how to calibrate interferometer data within the CASA task system. Calibration is the process of determining the net complex correction factors that must be applied to each visibility in order to make them as close as possible to what an idealized interferometer would measure, such that when the data is imaged an accurate picture of the sky is obtained. This is not an arbitrary process, and there is a philosophy behind the CASA calibration methodology. For the most part, calibration in CASA using the tasks is not too different than calibration in other packages such as AIPS or Miriad.
Calibration tasks¶
Alert: The calibration table format changed in CASA 3.4. CASA 4.2 is the last version that will support the caltabconvert function that provides conversions from the pre-3.4 caltable format to the modern format; it will be removed for CASA 4.3. In general, it is best to recalculate calibration using CASA 3.4 or later.
Alert: In CASA 4.2 the gaincurve and opacity parameters have been removed from all calibration tasks (as advertised in 4.1). These calibration types are supported via the gencal task.
Alert: As part of continuing development of a more flexible and improved interface for specifying calibration for apply, a new parameter has been introduced in applycal and the solving tasks: docallib. This parameter toggles between use of the traditional calibration apply parameters ( gaintable, gainfield, interp, spwmap, and calwt), and a new callib parameter which currently provides access to the experimental Cal Library mechanism, wherein calibration instructions are stored in a file. The default remains docallib=False in CASA 4.5, and this reveals the traditional apply parameters which continue to work as always, and the remainder of this chapter is still written using docallib=False. Users interested in the Cal Library mechanism’s flexibility are encouraged to try it and report any problems; see here for information on how to use it, including how to convert traditional applycal to Cal Library format. Note also that plotms and mstransform now support use of the Cal Library to enable on-the-fly calibration when plotting and generating new MSs.
The standard set of calibration solving tasks (to produce calibration tables) are:
bandpass — complex bandpass (B) calibration solving, including options for channel-binned or polynomial solutions
gaincal — complex gain (G,T) and delay (K) calibration solving, including options for time-binned or spline solutions
fringefit — fringe-fitting solutions (usually for VLBI) that parameterize phase calibration with phase, delay, rate, and a dispersive term
polcal — polarization calibration including leakage, cross-hand phase, and position angle
blcal — baseline-based complex gain or bandpass calibration
There are helper tasks to create, manipulate, and explore calibration tables:
applycal — Apply calculated calibration solutions
clearcal — Re-initialize the calibration for a visibility dataset
fluxscale — Bootstrap the flux density scale from standard calibration sources
listcal — List calibration solutions
plotms — Plot calibration solutions
plotbandpass — Plot bandpass solutions
setjy — Compute model visibilities with the correct flux density for a specified source
smoothcal — Smooth calibration solutions derived from one or more sources
calstat — Statistics of calibration solutions
gencal — Create a calibration tables from metadata such as antenna position offsets, gaincurves and opacities
wvrgcal — Generate a gain table based on Water Vapor Radiometer data (for ALMA)
uvcontsub — Carry out uv-plane continuum fitting and subtraction
The Calibration Process¶
A work-flow diagram for CASA calibration of interferometry data is shown in the following figure. This should help you chart your course through the complex set of calibration steps. In the following sections, we will detail the steps themselves and explain how to run the necessary tasks and tools.
Flow chart of synthesis calibration operations. Not shown are use of table manipulation and plotting tasks: plotms and smoothcal
The process can be broken down into a number of discrete phases:
Calibrator Model Visibility Specification — set model visibilities for calibrators, either unit point source visibilities for calibrators with unknown flux density or structure (generally, sources used for calibrators are approximately point-like), or visibilities derived from a priori images and/or known or standard flux density values. Use the setjy task for calibrator flux densities and models.
Prior Calibration — set up previously known calibration quantities that need to be pre-applied, such antenna gain-elevation curves, atmospheric models, delays, and antenna position offsets. Use the gencal task for antenna position offsets, gaincurves, antenna efficiencies, opacity, and other prior calibrations
Bandpass Calibration — solve for the relative gain of the system over the frequency channels in the dataset (if needed), having pre-applied the prior calibration. Use the bandpass task
Gain Calibration — solve for the gain variations of the system as a function of time, having pre-applied the bandpass (if needed) and prior calibration. Use the gaincal task
Polarization Calibration — solve for polarization leakage terms and linear polarization position angle. Use the polcal task.
Establish Flux Density Scale — if only some of the calibrators have known flux densities, then rescale gain solutions and derive flux densities of secondary calibrators. Use the fluxscale task
Smooth — if necessary smooth calibration using the smoothcal task.
Examine Calibration — at any point, you can (and should) use plotms and/or listcal to look at the calibration tables that you have created
Apply Calibration to the Data — Corrected data is formed using the applycal task, and can be undone using clearcal
Post-Calibration Activities — this includes the determination and subtraction of continuum signal from line data (uvcontsub), the splitting of data-sets into subsets (split, mstransform), and other operations (such as simple model-fitting: uvmodelfit).
The flow chart and the above list are in a suggested order. However, the actual order in which you will carry out these operations is somewhat fluid, and will be determined by the specific data-reduction use cases you are following. For example, you may need to obtain an initial gain calibration on your bandpass calibrator before moving to the bandpass calibration stage. Or perhaps the polarization leakage calibration will be known from prior service observations, and can be applied as a constituent of prior calibration.
Calibration Philosophy¶
Calibration is not an arbitrary process, and there is a methodology that has been developed to carry out synthesis calibration and an algebra to describe the various corruptions that data might be subject to: the Hamaker-Bregman-Sault Measurement Equation (ME), described here. The user need not worry about the details of this mathematics as the CASA software does that for you. Anyway, it’s just matrix algebra, and your familiar scalar methods of calibration (such as in AIPS) are encompassed in this more general approach.
There are a number of ``physical’’ components to calibration in CASA:
data — in the form of the MeasurementSet (MS). The MS includes a number of columns that can hold calibrated data, model information, and weights
calibration tables — these are in the form of standard CASA tables, and hold the calibration solutions (or parameterizations thereof)
task parameters — sometimes the calibration information is in the form of CASA task parameters that tell the calibration tasks to turn on or off various features, contain important values (such as flux densities), or list what should be done to the data.
At its most basic level, Calibration in CASA is the process of taking “uncalibrated” data, setting up the operation of calibration tasks using task parameters, solving for new calibration tables, and then applying the calibration tables to form “calibrated” data. Iteration can occur as necessary, e.g., to re-solve for an eariler calibration table using a better set of prior calibration, often with the aid of other non-calibration steps (e.g. imaging to generate improved source models for “self-calibration”).
The calibration tables are the currency that is exchanged between the calibration tasks. The “solver” tasks (gaincal, bandpass, blcal, polcal) take in the MS (which may have a calibration model attached) and previous calibration tables, and will output an “incremental” calibration table (it is incremental to the previous calibration, if any). This table can then be smoothed using smoothcal if desired.
The final set of calibration tables represents the cumulative calibration and is what is applied to correct the data using applycal. It is important to keep track of each calibration table and its role relative to others. E.g., a provisional gain calibration solution will usually be obtained to optimize a bandpass calibration solve, but then be discarded in favor of a new gain calibration solution that will itself be optimized by use of the bandpass solution as a prior; the original gain calibration table should be discarded in this case. On the other hand, it is also permitted to generate a sequence of gain calibration tables, each relative to the last (and any other prior calibration used); in this case all relative tables should be carried forward through the process and included in the final applycal. It is the user’s responsibility to keep track of the role of and relationships between all calibration tables. Depending on the complexity of the observation, this can be a confusing business, and it will help if you adopt a consistent table naming scheme. In general, it is desirable to minimize the number of different calibration tables of a specific type, to keep the overall process as simple as possible and minimize the computational cost of applying them, but relative calibraition tables may sometimes be useful as an aid to understanding the origin and properties of the calibration effects. For example, it may be instructive to obtain a short time-scale gain calibraiton relative to a long time-scale one (e.g., obtained from a single scan) to approximatly separate electronic and atmospheric effects. Of course, calibration tables of different types are necessarily relative to each other (in the order in which they are solved).
Preparing for Calibration¶
A description of the range of prior information necessary to solve for calibration
There is a range of a priori information that may need to be initialized or estimated before calibration solving is carried out. This includes establishing prior information about the data within the MS:
weight initialization — if desired, initialization of spectral weights, using initweight (by default, unchannelized weight accounting is used, and no special action is required)
flux density models — establish the flux density scale using “standard” calibrator sources, with models for resolved calibrators, using setjy as well as deriving various prior calibration quanitities using various modes of gencal
gain curves — the antenna gain-elevation dependence
atmospheric optical depth — attenuation of the signal by the atmosphere, including correcting for its elevation dependence
antenna position errors — offsets in the positions of antennas assumed during correlation
ionosphere — dispersive delay and Faraday effects arising from signal transmission through the magnetized plasma of the ionosphere
switched power (EVLA) — electronic gains monitored by the EVLA online system
system temperature (ALMA) — turn correlation coefficient into correlated flux density (necessary for some telescopes)
generic cal factors — antenna-based amp, phase, delay
These are all pre-determined effects and should be applied (if known) as priors when solving for other calibration terms, and included in the final application of all calibration. If unknown, then they will be solved for or subsumed in other calibration such as bandpass or gains.
Each of these will now be described in turn.
Weight Initialization¶
See the section on data weights for a more complete description of weight accounting in CASA.
CASA 4.3 introduced initial experimental support for spectral weights. At this time, this is mainly relevant to ALMA processing for which spectral \(T_{sys}\) corrections, which faithfully reflect spectral sensitivity, are available. In most other cases, sensitivity is, to a very good approximation, channel-independent after bandpass calibration (and often also before), except perhaps at the very edges of spectral windows (and for which analytic expressions of the sensitivity loss are generally unavailable). Averaging of data with channel-dependent flagging which varies on sufficiently short timescales will also generate channel-dependent net weights (see split or mstransform for more details).
By default, CASA’s weight accounting scheme maintains unchannelized weight information that is appropriately updated when calibration is applied. In the case of spectral calibrations (\(T_{sys}\) and bandpass), an appropriate spectral average is used for the weight update. This spectral average is formally correct for weight update by bandpass. For \(T_{sys}\), traditional treatments used a single measurement per spectral window; ALMA has implemented spectral \(T_{sys}\) to better track sensitivity as a function of channel, and so should benefit from spectral weight accounting as described here, especially where atmospheric emmission lines occur. If spectral weight accounting is desired, users must re-initialize the spectral weights using the initweights task:
initweights(vis='mydata.ms', wtmode='nyq', dowtsp=True)
In this task, the wtmode parameter controls the weight initialization convention. Usually, when initializing the weight information for a raw dataset, one should choose wtmode=’nyq’ so that the channel bandwidth and integration time information are used to initialize the weight information (as described here). The dowtsp parameter controls whether or not (True or False) the spectral weights (the WEIGHT_SPECTRUM column) are initialized. The default is dowtsp=False, wherein only the non-spectral weights (the WEIGHT column) will be initialized. If the spectral weights have been initialized, then downstream processing that supports spectral weights will use and update them.
Note that importasdm currently initializes the non-spectral weights using channel bandwidth and integration time information (equivalent to using dospwt=False in the above example. In general, it only makes sense to run initweights on a raw dataset which has not yet been calibrated, and it should only be necessary if the filled weights are inappropriate, or if spectral weight accounting is desired in subsequent processing. It is usually not necessary to re-initialize the weight information when redoing calibration from scratch (the raw weight information is preserved in the SIGMA/SIGMA_SPECTRUM columns). (Re-)initializing the weight information for data that has already been calibrated (with calwt=True, presumably) is formally incorrect and is not recommended.
When combining datasets from different epochs, it is generally preferable to have used the same version of CASA (most recent is best), and with the same weight information conventions and calwt settings in calibration tasks. Doing so will minimize the likelihood of arbitrary weight imbalances that might lead to net loss of sensitivity, and maximize the likelihood that real differences in per-epoch sensitivity (e.g., due to different weather conditions and instrumental setups) will be properly accounted for. Modern instruments support more variety in bandwidth and integration time settings, and so use of these parameters in weight initialization is preferred (c.f. use of simple unit weight initialization, which has often been the traditional practice).
Alert: Full and proper weight accounting for the EVLA formally depends on the veracity of the switched power calibration scheme. As of mid-2015, use of the EVLA switched power is not yet recommended for general use, and otherwise uniform weights are carried through the calibration process. As such, spectral weight accounting is not yet meaningful. Facilities for post-calibration estimation of spectral weights are rudimentarily supported in statwt.
Flux Density Models¶
It is necessary to be sure calibrators have appropriate models set for them before solving for calibration. Please see the task documentation for setjy and ft for more information on setting non-trivial model information in the MS. Also, information about setting models for flux density calibrators can be found here. Fields in the MS for which no model has been explicitly set will be rendered as unpolarized unit flux density (1 Jy) point sources in calibration solving.
Antenna Gain-Elevation Curve Calibration¶
Large antennas (such as the 25-meter antennas used in the VLA and VLBA) have a forward gain and efficiency that changes with elevation. Gain curve calibration involves compensating for the effects of elevation on the amplitude of the received signals at each antenna. Antennas are not absolutely rigid, and so their effective collecting area and net surface accuracy vary with elevation as gravity deforms the surface. This calibration is especially important at higher frequencies where the deformations represent a greater fraction of the observing wavelength. By design, this effect is usually minimized (i.e., gain maximized) for elevations between 45 and 60 degrees, with the gain decreasing at higher and lower elevations. Gain curves are most often described as 2nd- or 3rd-order polynomials in zenith angle.
Gain curve calibration has been implemented in CASA for the modern VLA and old VLA (only), with gain curve polynomial coefficients available directly from the CASA data repository. To make gain curve and antenna efficiency corrections for VLA data, use gencal:
gencal(vis='mydata.ms', caltable='gaincurve.cal', caltype='gceff')
Use of caltype=’gceff’ generates a caltable that corrects for both the elevation dependence and an antenna-based efficiency unit conversion that will render the data in units of approximate Jy (NB: this is generally not a good substitute for proper flux density calibration, using fluxscale!). Use of caltype=’gc’ or caltype=’eff’ can be used to introduce these corrections separately.
The resulting calibration table should then be used in all subsequent processing the requires the specification of prior calibration.
Alert: If you are not using VLA data, do not use gaincurve corrections. A general mechanism for incorporating gaincurve information for other arrays will be made available in future releases. The gain-curve information available for the VLA is time-dependent (on timescales of months to years, at least for the higher frequencies), and CASA will automatically select the date-appropriate gain curve information. Note, however, that the time-dependence was poorly sampled prior to 2001, and so gain curve corrections prior to this time should be considered with caution.
Atmospheric Optical Depth Correction¶
The troposphere is not completely transparent. At high radio frequencies ($>$15 GHz), water vapor and molecular oxygen begin to have a substantial effect on radio observations. According to the physics of radiative transmission, the effect is threefold. First, radio waves from astronomical sources are absorbed (and therefore attenuated) before reaching the antenna. Second, since a good absorber is also a good emitter, significant noise-like power will be added to the overall system noise, and thus further decreasing the fraction of correlated signal from astrophysical sources. Finally, the optical path length through the troposphere introduces a time-dependent phase error. In all cases, the effects become worse at lower elevations due to the increased air mass through which the antenna is looking. In CASA, the opacity correction described here compensates only for the first of these effects, tropospheric attenuation, using a plane-parallel approximation for the troposphere to estimate the elevation dependence. (Gain solutions solved for later will account for the other two effects.)
To make opacity corrections in CASA, an estimate of the zenith opacity is required (see observatory-specific chapters for how to measure zenith opacity). This is then supplied to the caltype=’opac’ parameter in gencal which creates a calibration table that will introduce the elevation-dependent correction when applied in later operaions. E.g. for data with two spectral windows:
gencal(vis='mydatas.ms',
caltable='opacity.cal',
caltype='opac',
spw='0,1',
parameter=[0.0399,0.037])
If you do not have an externally supplied value for opacity, for example from a VLA tip procedure, then you should either use an average value for the telescope, or omit this cal table and let your gain calibration compensate as best it can (e.g. that your calibrator is at the same elevation as your target at approximately the same time). As noted above, there are no facilities yet to estimate this from the data (e.g. by plotting \(T_{sys}\) vs. elevation).
The resulting calibration table should then be used in all subsequent processing the requires the specification of prior calibration.
Below, we give instructions for determining opacity values for Jansky VLA data from weather statistics and VLA observations where tip-curve data is available. It is beyond the scope of this description to provide information for other telescopes.
Determining opacity corrections for *modern* VLA data
For the VLA site, weather statistics and/or seasonal models that average over many years of weather statistics prove to be reasonable good ways to estimate the opacity at the time of the observations. The task plotweather calculates the opacity as a mix of both actual weather data and seasonal model. It can be run as follows:
myTau=plotweather(vis='mydata.ms',doPlot=True)
The task plots the weather statistics if doPlot=T, generating a plot shown in the figure below. The bottom panel displays the calculated opacities for the run as well as a seasonal model. An additional parameter, seasonal_weight can be adjusted to calculate the opacities as a function of the weather data alone (seasonal_weight=0), only the seasonal model (seasonal_weight=1), or a mix of the two (values between 0 and 1). Calculated opacities are shown in the logger output, one for each spectral window. Note that plotweather returns a python list of opacity values with length equal to the number of spectral windows in the MS, appropriate for use in gencal:
gencal(vis='mydata.ms', caltype='opac', spw='0,1', parameter=myTau)
Note that the spw parameter is used non-trivially and explicitly here to indicate that the list of opacity values corresponds to the specified spectral windows.
The resulting calibration table should then be used in all subsequent processing the requires the specification of prior calibration.
The weather information for a MS as plotted by the task {\tt plotweather}.}
Determining opacity corrections for historical VLA data
For VLA data, zenith opacity can be measured at the frequency and during the time observations are made using a VLA tipping scan in the observe file. Historical tipping data are available here. Choose a year, and click Go to get a list of all tipping scans that have been made for that year.
If a tipping scan was made for your observation, then select the appropriate file. Go to the bottom of the page and click on the button that says Press here to continue. The results of the tipping scan will be displayed. Go to the section called ‘Overall Fit Summary’ to find the fit quality and the fitted zenith opacity in percent. If the zenith opacity is reported as 6%, then the actual zenith optical depth value is 0.060. Use this value in gencal as described above.
If there were no tipping scans made for your observation, then look for others made in the same band around the same time and weather conditions. If nothing is available here, then at K and Q bands you might consider using an average value (e.g. 6% in reasonable weather). See the VLA memo here for more on the atmospheric optical depth correction at the VLA, including plots of the seasonal variations.
Antenna-position corrections¶
When antennas are moved, residual errors in the geographical coordinates of the antenna will cause time-dependent delay errors in the correlated data. Normally, the observatory will solve for these offsets soon after the move and correct the correlator model, but sometimes science data is taken before the offsets are available, and thus the correction must be handled in post-processing. If the 3D position offsets for affected antennas are known, use gencal as follows:
gencal(vis='mydata.ms', caltable='antpos.cal', caltype='antpos', antenna='ea01',
parameter=[0.01,0.02,0.005])
In this execution, the position offset for antenna ea01 is [1cm,2cm,0.5cm] in an Earth-centered right-handed coordinate system with the first axis on the prime meridian and third axis coincident with the Earth’s axis. Corrections for multiple antennas can be specified by listing all affected antennas and extending the parameter list with as many offset triples as needed.
In general, it is difficut to know what position offsets to use, of course. For the VLA, gencal will look up the required offests automatically, simply by omitting the antenna and parameter arguments:
gencal(vis='mydata.ms', caltable='antpos.cal', caltype='antpos')
For the historical VLA, the antenna position coordinate system was a local one translated from the Earth’s center and rotated to the VLA’s longitude. Use caltype=’antposvla’ to force this coordiate system when processing old VLA data.
The resulting calibration table should then be used in all subsequent processing the requires the specification of prior calibration.
Ionospheric corrections¶
CASA 4.3 introduced initial support for on-axis ionospheric corrections, using time- and direction-dependent total electron content (TEC) information obtained from the internet. The correction includes the dispersive delay (\(\propto \nu^{-1}\)) delay and Faraday rotation (\(\propto \nu^{-2}\)) terms. These corrections are most relevant at observing frequencies less than \(\sim\) 5 GHz. When relevant, the ionosphere correction table should be generated at the beginning of a reduction along with other calibration priors (antenna position errors, gain curve, opacity, etc.), and carried through all subsequent calibration steps. Formally, the idea is that the ionospheric effects (as a function of time and on-axis direction) will be nominally accounted for by this calibration table, and thus not spuriously leak into gain and bandpass solves, etc. In practice, the quality of the ionospheric correction is limited by the relatively sparse sampling (in time and direction) of the available TEC information. Especially active ionospheric conditions may not be corrected very well. Also, direction-dependent (within the instantaneous field-of-view) ionosphere corrections are not yet supported. Various improvements are under study for future releases.
To generate the ionosphere correction table, first import a helper function from the casapy recipes repository:
from casatasks.private import tec_maps
Note that this only works for CASA 6.1.2 or later. For CASA 5, see here.
Then, generate a TEC surface image:
tec_maps.create(vis='mydata.ms',doplot=True,imname='iono')
This function obtains TEC information for the observing date and location from NASA’s CDDIS Archive of Space Geodesy Data, and generates a time-dependent CASA image containing this information. The string specified for imname is used as a prefix for two output images, with suffixes .IGS_TEC.im (the actual TEC image) and .IGS_RMS_TEC.im (a TEC error image). If imname is unspecified, the MS name (from vis) will be used as the prefix.
The quality of the retrieved TEC information for a specific date improves with time after the observing date as CDDIS’s ionospheric modelling improves, becoming optimal 1-2 weeks later. Both images can be viewed as a movie in the CASA task imview. If doplot=T, the above function will also produce a plot of the TEC as a function of time in a vertical direction over the observatory.
Finally, to generate the ionosphere correction caltable, pass the .IGS\_TEC.im image into gencal, using caltype=’tecim’:
gencal(vis='mydata.ms',caltable='tec.cal',caltype='tecim',infile='iono.IGS_TEC.im')
This iterates through the dataset and samples the zenith angle-dependent projected line-of-sight TEC for all times in the observation, storing the result in a standard CASA caltable. Plotting this caltable will show how the TEC varies between observing directions for different fields and times, in particular how it changes as zenith angle changes, and including the nominal difference between science targets and calibrators.
This caltable should then be used as a prior in all subsequent calibration solves, and included in the final applycal.
A few warnings:
The TEC information obtained from the web is relatively poorly sampled in time and direction, and so will not always describe the details of the ionospheric corruption, especially during active periods.
For instrumental polarization calibration, it is recommended that an unpolarized calibrator be used; polarized calibrators may not yield as accurate a solution since the ionospheric corrections are not yet used properly in the source polarization portion of the polcal solve.
TEC corrections are only validated for use with VLA data. For data from other (low-frequency) telescopes, TEC corrections are experimental - please use at your own discretion.
Special thanks are due to Jason Kooi (UIowa) for his contributions to ionospheric corrections in CASA.
Switched-power (EVLA)¶
The EVLA is equipped with noise diodes that synchronously inject a nominally constant and known power contribution appropriate for tracking electronic gain changes with time resolution as short as 1 second. The total power in both the ON and OFF states of the noise diodes is continuously recorded, enabling a gain calibration derived from their difference (as a fraction of the mean total power), and scaled by a the approximately known contributed power (nominally in K). Including this calibration will render the data in units of (nominal) K, and also calibrate the data weights to units of inverse K2. To generate a switched-power calibration table for use in subsequent processing, run gencal as follows:
gencal(vis='myVLAdata.ms',caltable='VLAswitchedpower.cal',caltype='evlagain')
The resulting calibration table should then be used in all subsequent processing the requires the specification of prior calibration.
To ensure that the weight calibration by this table works correctly, it is important that the raw data weights are proprotional to integration time and channel bandwidth. This can be guaranteed by use of initweights as described above.
System Temperature (ALMA)¶
ALMA routinely measures \(T_{sys}\) while observing, and these measurements are used to reverse the online normalization of the correlation coefficients and render the data in units of nominal K. To generate a \(T_{sys}\) calibration table, run gencal as follows:
gencal(vis='myALMAdata.ms',caltable='ALMAtsys.cal',caltype='tsys')
The resulting calibration table should then be used in all subsequent processing the requires the specification of prior calibration.
Miscellaneous ad hoc corrections¶
The gencal task supports generating ad hoc amp, phase, and delay corrections via appropriate settings of the caltype parameter. Currently, such factors must be constant in time (gencal has no mechanism for specifying multiple timestamps for parameters), but sometimes such corrections can be useful. See the general gencal task documenation for more information on this type of correction.
Virtual Model Visibilities¶
The tasks that generate model visibilities (clean, tclean, ft, and setjy) can either (in most cases) save the data in a MODEL_DATA column inside of the MeasurementSet (MS) or it can save it in a virtual one. In the latter case the model visibilities are generated on demand when it is requested and the data necessary to generate that is stored (usually the Fourier transform of the model images or a component list). More detailed descriptions of the structure of an MS can be found on the CASA Fundamentals pages.
The tasks that can read and make use of the virtual model columns include calibration tasks, mstransform tasks (including uvsubtraction), plotms.
Advantages of virtual model column over the real one:
Speed of serving visibilities (in most cases because calculating models visibilities is faster than disk IO)
Disk space saving (a full size of the original data size is saved)
When not to use virtual model
When working with time-dependent models (e.g. ephemerides sources) within setjy; please use ephemerides source models only with usescratch=True
Model image size is a significant fraction of the visibility data size (e.g large cube from a small data set). Virtual model column serving might be slower than real one
When the user wants to edit the model physically via the table tool for e.g
When using an FTMachine that does not support virtual model saving when imaging (AWProjectFT for e.g)
Additional Information
When both a physical model column exists along with a virtual model, then the virtual model is the one that gets served by tasks that uses the visbuffer (e.g calibration tasks)
Use delmod* *task to manage your MODEL_DATA column and virtual model
If model data is written for a subset of the MS (say the user used field , spw and/or intent selection in tclean) then the model visibilities will be served properly for the subset in question the other part of the MS will have 1 served for parallel hand visibilities and 0 for crosshand visibilities. So be careful when doing calibration or uvsub after writing model visibilities only for a subset of the MS (this applies to using the physical scratch column MODEL_DATA too)
The virtual model info is written in the SOURCE table of the MS usually (and in the main table if the SOURCE table does not exist)
FTMachines (or imaging gridding mode) supporting virtual model data are:
GridFT: Standard gridder (including mutiterm and multi fields or cube),
WProjectFT: widefield wterm (including mutiterm and multi fields or cube),
MosaicFT: mosaic imaging (including mutiterm or cube),
ComponentLists
Solve for Calibration¶
The gaincal, bandpass, fringefit, polcal, and blcal tasks actually solve for the unknown calibration parameters from the visibility data obtained on calibrator sources, placing the results in a calibration table. They take as input an MS, and a number of parameters that specify any prior calibration tables to pre-apply before computing the solution, as well as parameters controlling the exact properties of the solving process.
We first discuss the parameters that are in common between many of the calibration tasks. Subsequent sub-sections will discuss the use of each of these solving task in more detail.
Common Calibration Solver Parameters
There are a number of parameters that are in common between the calibration solver tasks.
Input/output
The input MeasurementSet and output calibration table are controlled by the following parameters:
vis = '' #Name of input visibility file
caltable = '' #Name of output calibration table
The MS name is specified in vis. If it is highlighted red in the inputs then it does not exist, and the task will not execute. Check the name and path in this case.
The output table name is specified in caltable. Be sure to give a unique name to the output table, or be careful. If the table exists, then what happens next will depend on the task and the values of other parameters. The task may not execute giving a warning that the table already exists, or will go ahead and overwrite the solutions in that table, or append them. Be careful.
Data selection
Data selection is controlled by the following parameters:
field = '' #field names or index of calibrators: ''==>all
spw = '' #spectral window:channels: ''==>all
intent = '' #Select observing intent
selectdata = False #Other data selection parameters
Field and spectral window selection are so often used, that we have made these standard parameters, field and spw respectively. Additionally, intent is also included as a standard parameter to enable selection by the scan intents that were specified when the observations were set up and executed. They typically describe what was intended with a specific scan, i.e. a flux or phase calibration, a bandpass, a pointing, an observation of your target, or something else or a combination. The format for the scan intents of your observations are listed in the logger when you run listobs. Minimum matching with wildcards will work, like *BANDPASS*. This is especially useful when multiple intents are attached to scans. Finally, observation is an identifier to distinguish between different observing runs, mainly used for ALMA.
The selectdata parameter expands, revealing a range of other selection sub-parameters:
selectdata = True #data selection parameters
timerange = '' #time range (blank for all)
uvrange = '' #uv range (blank for all)
antenna = '' #antenna/baselines (blank for all)
scan = '' #scan numbers (blank for all)
correlation = '' #correlations (blank for all)
array = '' #(sub)array numbers (blank for all)
observation = '' #Select by observation ID(s)
msselect = '' #MS selection (blank for all)
Note that if selectdata=False these parameters are not used when the task is executed, even if set non-trivially.
Among the most common selectdata=True parameters to use is uvrange, which can be used to exclude longer baselines if the calibrator is resolved, or short baselines if the calibrator contains extended flux not accounted for in the model. The rest of these parameters may be set according to information and values available in the listobs output. Note that all parameters are specified as strings, even if the values to be specified are numbers. See the section on MS Selection for more details on the powerful syntax available for selecting data.
Prior calibration
Calibration tables that have already been determined can be arranged for apply before solving for the new table using the following parameters:
docallib = False #Use traditional cal apply parameters
gaintable = [] #Gain calibration table(s) to apply on the fly
gainfield = [] #Select a subset of calibrators from gaintable(s)
interp = [] #Interpolation mode (in time) to use for each gaintable
spwmap = [] #Spectral windows combinations to form for gaintable(s)
The docallib parameter is a toggle that can be used to select specification of prior calibration using the new “cal library” mechanism (docallib=True) which is described in greater detail here.
When docalib=False, the traditional CASA calibration apply sub-parameters will be used, as listed above.
gaintable
The gaintable parameter takes a string or list of strings giving the names of one or more calibration tables to arrange for application. For example:
gaintable = ['ngc5921.bcal','ngc5921.gcal']
specifies two tables, in this case bandpass and gain calibration tables respectively.
The gainfield, interp, and spwmap parameters key off gaintable, taking single values or lists, with an entries for each corresponding table in specified in gaintable. The caltables can be listed in gaintable in any order, without affecting the order in which they are applied to the data (for consistency, this is controlled internally according to the Measurement Equation framework). If non-trivial settings are required for only a subset of the tables listed in gaintable, it can be convenient to specify these tables first in gaintable, include their qualifying settings first in the other paramters, and omit specifications for those tables not needing qualification (sensible defaults will be used for these).
gainfield
The gainfield parameter specifies which field(s) from each respective gaintable to select for apply. This is a list, with each entry a string. The default for an entry means to use all in that table. For example, use
gaintable = ['ngc5921.bcal', 'ngc5921.gcal']
gainfield = [ '1331+305', '1331+305,1445+099']
to specify selection of 1331+305 from ngc5921.bcal and fields 1331+305 and 1445+099 from ngc5921.gcal. Selection of this sort is only needed if avoiding other fields in these caltables is necessary. The field selection used here is the general MS Selection syntax.
In addition, gainfield supports a special value:
gainfield = [ 'nearest' ]
which selects the calibrator that is the spatially closest (in sky coordinates) to each of the selected MS fields specified in the field data selection parameter. Note that the nearest calibrator field is evaluated once per execution and is never dependent on time, spw or any other data meta-axis. This can be useful for running tasks with a number of different sources to be calibrated in a single run, and when this simple proximity notion is applicable. Note that the cal library mechanism provides increased flexibility in this area.
interp
The interp parameter chooses the interpolation scheme to be used when pre-applying the solution in the tables. Interpolation in both time and frequency (for channel-dependent calibrations) are supported. The choices are currently ‘nearest’ and ‘linear’ for time-dependent interpolation, and ‘nearest’, ‘linear’, ‘cubic’, and ‘spline’ for frequency-dependent interpolation. Frequency-dependent interpolation is only relevant for channel-dependent calibration tables (like bandpass) that are undersampled in frequency relative to the data.
‘nearest’ just picks the entry nearest in time or freq to the visibility in question
‘linear’ calibrates each datum with calibration phases and amplitudes linearly interpolated from neighboring values in time or frequency. In the case of phase, this mode will assume that phase never jumps more than 180 degrees between neighboring points, and so phase changes exceeding this between calibration solutions cannot be corrected for. Also, solutions will not be extrapolated arbitrarily in time or frequency for data before the first solution or after the last solution; such data will be calibrated using nearest to avoid unreasonable extrapolations.
‘cubic’ (frequency axis only) forms a 3rd-order polynomial that passes through the nearest 4 calibration samples (separately in phase and amplitude)
‘spline’ (frequency axis only) forms a cubic spline that passes through the nearest 4 calibration samples (separately in phase and amplitude)
The time-dependent interp options can be appended with ‘PD’ to enable a “phase delay” correction per spw for non-channel-dependent calibration type. For example: ‘linearPD’. This will adjust the time-dependent phase by the ratio of the data frequency and solution frequency and effect a time-dependent delay-like calibration over spws, and is most useful when distributing a single-spw’s solution (e.g.., as might be generated by combine=’spw’ in gaincal) over many data spws, and when the the residual being calibrated is non-dispersively delay-like.
The time-dependent interp options can also be appended with ‘perobs’ to enforce observation Id boundaries in the interpolation.
The frequency-dependent interp options can be appended with ‘flag’ to enforce channel-dependent flagging by flagged bandpass channels (i.e., ‘nearestflag’, ‘linearflag’, ‘cubicflag’, and ‘splineflag’, rather than to automatically fill such channels in with interpolation (which is the default).
For each gaintable, specify the interpolation style in quotes, with the frequency-dependent interpolation style specified after a comma, if relevant. For example:
gaintable = ['ngc5921.bcal', 'ngc5921.gcal']
gainfield = ['1331+305', ['1331+305','1445+099'] ]
interp = ['linear,spline', 'linear']
uses linear interpolation on the time axis for both cal tables, and a cubic spline for interpolation of the frequency axis in the bandpass table.
spwmap
The spwmap parameter is used to redistribute the calibration available in a caltable flexibly among spectral windows, thereby permitting correction of some spectral windows using calibration derived from others. The spwmap parameter takes a list or a list of lists of integers, with one list of integers for every caltable specified in gaintable. Each list is indexed by the MS spectral window ids, and the values indicate the calibration spectral windows to use for each MS spectral window. I.e., for each MS spw, i, the calibration spw j will be j=spwmap[i].
The default for spwmap (an empty list per gaintable) means that MS spectral windows will be calibrated by solutions identified with the same index in the calibration table (i.e., by themselves, typically). Explicit specification of the default would be spwmap=[0,1,2,3], for an MS with four spectral windows. Less trivially, for a caltable containing solutions derived from and labelled as spectral windows 0 and 1, these two cal spectral windows can be mapped to any of the MS spectral windows. E.g., (for a single gaintable):
spwmap=[0,1,1,0] #apply from cal spw=0 to MS spws 0,3 and from cal spw 1 to MS spws 1,2
For multiple gaintables, use a lists of lists (one spwmap list per gaintable), e.g.,
gaintable = ['ngc5921.bcal', 'ngc5921.gcal']
gainfield = ['1331+305', ['1331+305','1445+099'] ]
interp = ['linear,spline', 'linear']
spwmap = [ [0,1,1,0], [2,3,2,3] ]
which will use bandpass spws 0 and 1 for MS spws (0,3), and (1,2), respectively, and gain spws 2 and 3 for MS spws (0,2) and (1,3), respectively.
Any spectral window mapping is mechanically valid, including using specific calibration spectral windows for more than one different MS spectral window (as above) and using alternate calibration even for spectral windows for which calibration is nominally available, as long as the mapped calibration spectral windows have calibration solutions available in the caltable. If a mapped calibration spectral window is absent from the caltable (and not merely flagged), an exception will occur.
The scientific meaningfulness of a non-trivial spwmap specification is the responsibility of the user; no internal checks are performed to attempt the scientific validity of the mapping. Usually, spwmap is used to distribute calibration such as Tsys, which may be measured in a wide low-resolution spectral window, to narrow high-resolution spectral windows that fall within the wide one. It is also used to distribute calibration derived from a gaincal solve which was performed using combine=’spw’ (e.g., for increased SNR) to each of the spectral windows (and perhaps others) aggregated in the solve; in this case, it may be useful to consider using the ‘PD’ (“phase delay”) interpolation option described above, to account for the frequency ratios between each of the individual MS spectral windows and the aggregated calibration spectral window.
Absolute vs. Relative frequency in frequency-dependent interpolation
By default, frequency-dependent solutions are interpolated for application in absolute sky frequency units. Thus, it is usually necessary to obtain bandpass solutions that cover the frequencies of all spectral windows that must be corrected. In this context, it is mechanically valid to use spwmap to transfer a bandpass solution from a wide, low-resolution spectral window to a narrow, higher-resolution spectral window that falls within the wide one in sky frequency space. On the other hand, if adequate data for a bandpass solution is unavailable for a specific spectral window, e.g., due to contamination by line emission or absorption (such as HI), or because of flagging, bandpass solutions from other spectral windows (i.e., at different sky frequencies) can be applied using spwmap. In this case, it is also necessary to add ‘rel’ to the frequency interpolation string in the interp parameter, as this will force the interpolation to be calculated in relative frequency units. Specifically, the center frequency of the bandpass solution will be registered with the absolute center frequency of each of the MS spectral windows to which it is applied, thereby enabling relative frequency registration. The quality of such calibration transfer will depend, of course, on the uniformity of the hardware parameters and properties determining the bandpass shapes in the observing system–this is often appropriate over relatively narrow bandwidths in digital observing systems, as long as the setups are sufficiently similar (same sideband, same total spectral window bandwidth, etc., though note that the channelization need not be the same). Traditionally (e.g., at the VLA, for HI observations), bandpass solutions for this kind of calibration transfer have be solved by combining spectral windows on either side of the target spectral window (see the task documentation for **bandpass** for more information on solving with combine=’spw’).
For example, to apply a bandpass solution from spectral window 0 (in a bandpass table called ngc5921.bcal) to MS spectral windows 0,1,2,3 with linear interpolation calculated in relative frequency units (and with frequency-dependent flagging respected):
gaintable = ['ngc5921.bcal']
interp = ['nearest,linearflagrel']
spwmap = [ [0,0,0,0] ]
When selecting channels for a bandpass solution that will be applied using ‘rel’, it is important to recognize that the selected channels will be centered on each of the _absolute_ centers of the MS spectral windows to which it will be applied. An asymmetric channel selection for the bandpass solve will cause an undesirable shift in the relative registration on apply. Avoid this by using symmetrical channel selection (or none) for the bandpass solve.
Also note that if relative frequency interpolation is required but ‘rel’ is not used in interp, the interpolation mechanism currently assumes you want absolute frequency interpolation. If there is no overlap in absolute frequency, the result will be nearest (in channel) interpolation such that the calibration edge channel closest to the visibility data will be used to calibrate that data.
Finally, please note that relative frequency interpolation is not yet available via the cal library.
Parallactic angle
The parang parameter turns on the application of the antenna-based parallactic angle correction (P) in the Measurement Equation. This is necessary for polarization calibration and imaging, or for cases where the parallactic angles are different for geographically spaced antennas and it is desired that the ordinary calibration solves not absorb the inter-antenna parallactic angle phase. When dealing with only the parallel-hand data (e.g. RR, LL, XX, YY), and an unpolarized calibrator model for a co-located array (e.g. the VLA or ALMA), you can set parang=False and save some computational effort. Otherwise, set parang=True to apply this correction, especially if you are doing polarimetry.
Solving parameters
The parameters controlling common aspects of the solving process itself are:
solint = 'inf' #Solution interval: egs. 'inf', '60s' (see help)
combine = 'scan' #Data axes which to combine for solve (obs, scan,
#spw, and/or field)
preavg = -1.0 #Pre-averaging interval (sec) (rarely needed)
refant = '' #Reference antenna name(s)
minblperant = 4 #Minimum baselines _per antenna_ required for solve
minsnr = 3.0 #Reject solutions below this SNR
solnorm = False #Normalize solution amplitudes post-solve.
corrdepflags = False #Respect correlation-dependent flags
The time and frequency (if relevant) solution interval is specified in solint. Optionally a frequency interval for each solutglobal-task-list.ipynb#task_bandpassion can be added after a comma, e.g. solint=’60s,300Hz’. Time units are in seconds unless specified differently. Frequency units can be either ‘ch’ or ‘Hz’ and only make sense for bandpass or frequency dependent polarization calibration. On the time axis, the special value ‘inf’ specifies an infinite solution interval encompassing the entire dataset, while ‘int’ specifies a solution every integration. Omitting the frequency-dependent solution interval will yield per-sample solutions on this axis. You can use time quanta in the string, e.g. solint=’1min’ and solint=’60s’ both specify solution intervals of one minute. Note that ‘m’ is a unit of distance (meters); ‘min’ must be used to specify minutes. The solint parameter interacts with combine to determine whether the solutions cross scan, field, or other meta-data boundaries.
The parameter controlling the scope of each solution is combine. For the default, combine=’’, solutions will break at obs, scan, field, and spw boundaries. Specification of any of these in combine will extend the solutions over the specified boundaries (up to the solint). For example, combine=’spw’ will combine spectral windows together for solving, while combine=’scan’ will cross scans, and combine=’obs,scan’ will use data across different observation IDs and scans (usually, obs Ids consist of many scans, so it is not meaningful to combine obs Ids without also combining scans). Thus, to do scan-based solutions (single solution for each scan, per spw, field, etc.), set
solint = 'inf'
combine = ''
To obtain a single solution (per spw, per field) for an entire observation id (or the whole MS, if there is only one obsid), use:
solint = 'inf'
combine = 'scan'
You can specify multiple choices for combination by separating the axes with commas, e.g.:
combine = 'scan,spw'
Care should be exercised when using combine=’spw’ in cases where multiple groups of concurrent spectral windows are observed as a function of time. Currently, only one aggregate spectral window can be generated in a single calibration solve execution, and the meta-information for this spectral window is calculated from all selected MS spectral windows. To avoid incorrect calibration meta-information, each spectral window group should be calibrated independently (also without using append=True). Additional flexibility in this area will be supported in a future version.
The reference antenna is specified by the refant parameter. Ordinary MS Selection antenna selection syntax is used. Ideally, use of refant is useful to lock the solutions with time, effectively rotating (after solving) the phase of the gain solutions for all antennas such that the reference antennas phase remains constant at zero. In gaincal it is also possible to select a refantmode, either ‘flex’ or ‘strict’. A list of antennas can be provided to this parameter and, for refantmode=’flex’, if the first antenna is not present in the solutions (e.g., if it is flagged), the next antenna in the list will be used, etc. See the documentation for the rerefant task for more information. If the selected antenna drops out, the next antenna specified (or the next nearest antenna) will be substituted for ongoing continuity in time (at its current value) until the refant returns, usually at a new value (not zero), which will be kept fixed thenceforth. You can also run without a reference antenna, but in this case the solutions will formally float with time; in practice, the first antenna will be approximately constant near zero phase. It is usually prudent to select an antenna near the center of the array that is known to be particularly stable, as any gain jumps or wanders in the refant will be transferred to the other antenna solutions. Also, it is best to choose a reference antenna that never drops out, if possible.Setting a preavg time will let you average data over periods shorter than the solution interval first before solving on longer timescales. This is necessary only if the visibility data vary systematically within the solution interval in a manner independent of the solve-for factors (which are, by construction, considered constant within the solution interval), e.g., source linear polarization in polcal. Non-trivial use of preavg in such cases will avoid loss of SNR in the averaging within the solution interval.
The minimum signal-to-noise ratio allowed for an acceptable solution is specified in the minsnr parameter. Default is minsnr=3.
The minblperant parameter sets the minimum number of baselines to other antennas that must be preset for each antenna to be included in a solution. This enables control of the constraints that a solution will require for each antenna.
The solnorm parameter toggles on the option to normalize the solution after the solutions are obtained. The exact effect of this depends upon the type of solution (see gaincal, bandpass, and blcal). Not all tasks use this parameter.One should be aware when using solnorm that if this is done in the last stage of a chain of calibration, then the part of the calibration that is normalized away will be lost. It is best to use this in early stages (for example in a first bandpass calibration) so that later stages (such as final gain calibration) can absorb the lost normalization scaling. It is generally not strictly necessary to use solnorm=True at all, but it is sometimes helpful if you want to have a normalized bandpass for example.
The corrdepflags parameter controls how visibility vector flags are interpreted. If corrdepflags=False (the default), then when any one or more of the correlations in a single visibility vector is flagged (per spw, per baseline, per channel), it treats all available correlations in the single visibility vector as flagged, and therefore it is excluded from the calibration solve. This has been CASA’s traditional behavior (prior to CASA 5.7), in order to be conservative w.r.t. flags. If instead corrdepFlags=True (for CASA 5.7+), correlation-dependent flags will be respected exactly and precisely as set, such that any available unflagged correlations will be used in the solve for calibration factors. For the tasks currently supporting the corrdepflags parameter (gaincal, bandpass, fringefit, accor), this means any unflagged parallel-hand correlations will be used in solving, even if one or the other parallel-hand (or either of the cross-hands) is flagged. Note that the polcal task does not support corrdepflags since polarization calibration is generally more sensitive to correlation-dependence in the flagging in ways which may be ill-defined for partial flagging; this stricture may be relaxed in future for non-leakage solving modes. Most notably, this feature permits recovery and calibration of visibilities on baselines to antennas for which one polarization is entirely flagged, either because the antenna did not have that polarization at all (e.g., heterogeneous VLBI, where flagged visibilities are filled for missing correlations on single-polarization antennas), or one polarization was not working properly during the observation.
Appending calibration solutions to existing tables
The append parameter, if set to True, will append the solutions from this run to existing solutions in caltable. Of course, this only matters if the table already exists. If append=False and the specified caltable exists, it will overwrite it (if the caltable is not open in another process).
The append parameter should be used with care, especially when also using combine in non-trivial ways. E.g., calibration solves will currently refuse to append incongruent aggregate spectral windows (e.g., observations with more than one group of concurrent spectral windows) when using combine=’spw’. This limitation arises from difficulty determining the appropriate spectral window fan-out on apply, and will be relaxed in a future version.
Frequency and time labels for calibration solutions
Since calibration may be solved from aggregate ranges of frequency and time, it is interesting to consider how calibration solutions are labeled in frequency and time in caltables, based on the dataset from which they are solved, and also the relevance of this when applying the calibration to visibility data.
On the time axis, solutions are labeled with the unflagged centroid of the timestamps and baselines supplied to the solver, thereby permitting reasonably accurate time-dependent interpolation of solutions onto nearby data.
On the frequency axis, solutions are labeled with the channel-selected centroid frequencies, without regard to channelized flagging, at the frequency-axis granularity of the solutions. I.e., for per-spectral window and unchannelized gaincal solutions, this means the centroid frequency of the selected channels within each spectral window, and in the same frequency measures frame. For channelized bandpass solutions, this will be the centroid frequency of each (possibly partially aggregated) solution channel; if there is no frequency-axis solution interval, then the solution channel frequencies will be the same as the data channel frequencies, and in the same frequency measures frame. Note that the calibration table format demands that frequency labels are spectral-window specific and not time- or antenna-dependent; hence the possible time- and baseline-dependence of channel-dependent flagging cannot be accounted for in the frequency label information.
When combining spectral windows for calibration solves using combine=’spw’, the (selected) bandwidth-weighted centroid of all effectively-selected spectral windows will be used, and solutions will be identified with the lowest spectral window id in the effective spectral window selection. Note that it is the effective spectral window selection that matters, i.e., if user-specified scan or field or other non-spectral window selection effectively selects a subset of the spectral windows in a dataset (along with any explicit spw selection), only the net-selected spectral windows are used to calculate the net frequency label for the aggregate solution. Also, this net centroid frequency is calculated once per output spectral window (effectively for all solution intervals), even if the dataset is such that the constituent spectral windows are not available for all solutions. Thus, strictly speaking, the frequency labels will be less accurate for solution intervals containing only a subset of the global (selected) aggregate, and this may adversely affect the effective accuracy in the phase-delay (PD) calibration interpolation (described below). (Most raw datasets are not heterogeneous in this way.)
Note that combining multiple different groups of spectral windows with combine=’spw’ requires running the solving task separately for each group, and appending to the same caltable with append=True.
When appending solutions using multiple solving executions, the frequency label information for the net output solution spectral windows in the new execution must match what is already in the caltable, where there is overlap. If it does not, execution will be interrupted.
The solution frequency labels are relevant to the application of these solutions only if the specified interpolation mode or solution parameterization needs to care about it. E.g., for per-spectral window unchannelized gaincal solutions applied to the corresponding identical spectral windows in the visibility dataset (even if differently channel-selected), the solution frequency labels do not matter (the solutions are just directly applied)–unless the the time-axis interpolation includes the ‘PD’ option which will apply a scaling to the phase correction equal to the ratio of data frequency (whole spectral window centroid) and solution frequency. This ‘phase-delay’ interpolation mode enforces a delay-like adjustment to per-spectral window (not channelized) phases of the sort that is expected for non-dispersive tropospheric phase variations, and is much more interesting in the context of transferring phase calibration among spectral windows at different centroid freuqencies (even to different observing bands) using spwmap, including the case of a distributing a combined spectral window solution to the constituent spectral windows from which it was solved. (This interpolation mode is not likely to be very effective at observing frequencies and conditions where dispersive ionospheric phase effects are important.) Also, solutions that have a frequency-dependent parameterization such as from fringefit or gaincal with gaintype=’K’ (simple delays) require the centroid frequency label values recorded in the caltable to perform the net channelized solution phase calculation.
Bandpass solutions (from bandpass) will be non-trivially interpolated from the solution frequencies to the data frequencies in a conventional manner if the solutions were decimated in frequency when solved; if the solution frequencies exactly match the data frequencies, the solutions will just be directly applied.
In general, the centroid solution time and frequency labeling conventions described here are consistent with the assumption that calibration solutions are essentially constant during the time and frequency intervals within which they are solved, even if practically, there is some variation within these ranges. If this variation is significant, then the granularity at which the calibration is solved should be reconsidered, if possible, consistent with available SNR. Also, the available interpolation modes can be used to better track and compensate for gradients between solution samples. In all cases, solution granularity is constrained by a balance between available SNR and the systematic variation being sampled.
Gain Calibration¶
In general, gain calibration includes solving for time- and frequency-dependent multiplicative calibration factors, usually in an antenna-based manner. CASA supports a range of options.
Note that polarization calibration is described in detail in a different section.
Frequency-dependent calibration: bandpass
Frequency-dependent calibration is discussed in the general task documentaion for bandpass.
Gain calibration: gaincal
Gain calibration is discussed in the general task documentation for gaincal.
Flux density scale calibration: fluxscale
Flux density scale calibration is discussed in the general task documentation for fluxscale.
Baseline-based (non-closing) calibration: blcal
Non-closing baseline-based calibration is disussed in the general task documentation for blcal.
Polarization Calibration¶
Instrumental polarization calibration is necessary because the polarizing hardware in the receiving system will, in general, be impure and non-orthogonal at a level of at least a few percent. These instrumental polarization errors are antenna-based and generally assumed constant with time, but the algebra of their effects is more complicated than the simple ~scalar multiplicative gain calibration. Also, the net gain calibration renders the data in an arbitrary cross-hand phase frame that must also be calibrated. The polcal task provides support for solving for instrumental polarization (poltype=’Df’ and similar) and cross-hand phase (‘Xf’). Here we separately describe the heuristics of solving for instrumental polarization for the circular and linear feed bases.
Polarization Calibration in the Circular Basis¶
Fundamentally, with good ordinary gain and bandpass calibration already in hand, good polarization calibration must deliver both the instrumental polarization and position angle calibration. An unpolarized source can deliver only the first of these, but does not require parallactic angle coverage. A polarized source can only also deliver the position angle calibration if its polarization position angle is known a priori. Sources that are polarized, but with unknown polarization degree and angle, must always be observed with sufficient parallactic angle coverage (which enables solving for the source polarization), where “sufficient” is determined by SNR and the details of the solving mode.
These principles are stated assuming the instrumental polarization solution is solved using the “linear approximation” where cross-terms in more than a single product of the instrumental or source polarizations are ignored in the Measurement Equation. A more general non-linearized solution, with sufficient SNR, may enable some relaxation of the requirements indicated here, and modes supporting such an approach are currently under development.
For instrumental polarization calibration, there are 3 types of calibrator choice, listed in the following table:
Cal Polarization |
PA Coverage |
Poln Model? |
poltype |
Result |
---|---|---|---|---|
Zero |
any |
Q=U=0 |
‘Df’ |
D-terms only |
Unknown |
2+ scans |
ignored |
‘Df+QU’ |
D-terms and Q,U |
Known, non-zero |
2+ scans |
Set Q,U |
‘Df+X’ |
D-terms and Pos Angle |
Note that the parallactic angle ranges spanned by the scans in the modes that require this should be large enough to give good separation between the components of the solution. In practice, 60 degrees is a good target.
Each of these solutions should be followed with a ‘Xf’ solution on a source with known polarization position angle (and correct fractional Q+iU in the model).
The polcal task will solve for the ‘Df’ or ‘Xf’ terms using the model visibilities that are in the model attached to the MS. Calibration of the parallel hands must have already been obtained using gaincal and bandpass in order to align the amplitude and phase over time and frequency. This calibration must be supplied through the gaintable parameters, but any caltables to be used in polcal must agree (e.g. have been derived from) the data in the DATA column and the FT of the model. Thus, for example, one would not use the caltable produced by fluxscale as the rescaled amplitudes would no longer agree with the contents of the model.
Be careful when using resolved calibrators for polarization calibration. A particular problem is if the structure in Q and U is offset from that in I. Use of a point model, or a resolved model for I but point models for Q and U, can lead to errors in the ‘Xf’ calibration. Use of a uvrange will help here. The use of a full-Stokes model with the correct polarization is the only way to ensure a correct calibration if these offsets are large.
A note on channelized polarization calibration
When your data has more than one channel per spectral window, it is important to note that the calibrator polarization estimate currently assumes the source polarization signal is coherent across each spectral window. In this case, it is important to be sure there is no large cross-hand delay still present in your data. Unless the online system has accounted for cross-hand delays (typically intended, but not always achieved), the gain and bandpass calibration will only correct for parallel-hand delay residuals since the two polarizations are referenced independently. Good gain and bandpass calibration will typically leave a single cross-hand delay (and phase) residual from the reference antenna. Plots of cross-hand phases as a function of frequency for a strongly polarized source (i.e., that dominates the instrumental polarization) will show the cross-hand delay as a phase slope with frequency. This slope will be the same magnitude on all baselines, but with different sign in the two cross-hand correlations. This cross-hand delay can be estimated using the gaintype=’KCROSS’ mode of gaincal (in this case, using the strongly polarized source 3C286):
gaincal(vis='polcal_20080224.cband.all.ms',
caltable='polcal.xdelcal',
field='3C286',
solint='inf',
combine='scan',
refant='VA15',
smodel=[1.0,0.11,0.0,0.0],
gaintype='KCROSS',
gaintable=['polcal.gcal','polcal.bcal'])
Note that smodel is used to specify that 3C286 is polarized; it is not important to specify this polarization stokes parameters correctly in scale, as only the delay will be solved for (not any absolute position angle or amplitude scaling). The resulting solution should be carried forward and applied along with the gain (.gcal) and bandpass (.bcal) solutions in subsequent polarization calibration steps.
Circular Basis Example
In the following example, we have a MS called polcal_20080224.cband.all.ms for which we already have bandpass, gain and cross-hand delay solutions. An instrumental polarization calibrator with unknown linear polarization has been observed. We solve for the instrumental polarization and source linear polarization with polcal using poltype=’Df+QU’ as follows:
polcal(vis= 'polcal_20080224.cband.all.ms',
caltable='polcal.pcal',
field='2202+422',
solint='inf',
combine='scan',
preavg=300.0,
refant='VA15',
poltype='Df+QU',
gaintable=['polcal.gcal','polcal.bcal','polcal.xdelcal])
This run of polcal assumes that the model stored in the MS for 2202+422 is the one that was used to obtain the net gain calibration stored in polcal.gcal (i.e., we have not substituted a fluxscale result, which would create an inconsistent scale).
Alternatively, if we have an instrumental polarization calibrator that we know is unpolarized, we run polcal with poltype=’Df’:
polcal(vis='polcal_20080224.cband.all.ms',
caltable='polcal.pcal',
field='0319+415',
refant='VA15',
poltype='Df',
gaintable=['polcal.gcal','polcal.bcal','polcal.xdelcal])
In general, if there is more than one calibrator suitable for instrumental polarization calibration, it is useful to obtain a solution from each of them, and compare results. The instrumental polarization should not vary with field, of course. Note that it is not yet possible to effectively use combine=’field’ for instrumental polarization calibration solves with polcal, unless the prior models for all fields are set to the correct apparent linear polarization for each.
Having obtained the instrumental polarization calibration, we solve for the cross-hand phase using the flux density calibrator (for which the instrinsic linear polarization is known):
polcal(vis='polcal_20080224.cband.all.ms',
caltable= 'polcal.polx',
field='0137+331',
refant='VA15',
poltype='Xf',
smodel=[1.0,-0.0348,-0.0217,0.0], #the fractional Stokes for 0137+331 (3C48)
gaintable=['polcal.gcal','polcal.bcal','polcal.xdelcal','polcal.pcal'])
Note that the correct fractional polarization has been specified for 0137+331. It is not necessary to use the correct absolute total and linearly polarized flux densities here, since the Xf calibration is entirely phase-like.
Polarization Calibration in the Linear Feed Basis¶
CASA now supports instrumental polarization calibration for the linear feed basis at a level that is practical for the general user. Some details remain to be implemented with full flexibility, and much of what follows will be streamlined in future releases.
Calibrating the instrumental polarization for the linear feed basis is somewhat more complicated than the circular feed basis because the polarization effects (source and instrument) appear in all four correlations at first or zeroth order (whereas for circular feeds, the polarization information only enters the parallel hand correlations at second order). As a result, e.g., the time-dependent gain calibration will be distorted by any non-zero source polarization, and some degree of iteration will be required to isolate the gain calibration if the source polarization is not initially known. These complications can actually be used to advantage in solving for the instrumental calibration; in can be shown, for example, that a significantly linearly polarized calibrator enables a better instrumental polarization solution than an unpolarized calibrator.
In the following example, we show the processing steps for calibrating the instrumental polarization using a strongly (>5%) polarized point-source calibrator (which is also the time-dependent gain calibrator) that has been observed over a range of parallactic angle (a single scan is not sufficient). We assume that we have calibrated the gain, bandpass, and cross-hand delay as described elsewhere, and that the gain calibration was obtained assuming the calibrator was unpolarized.
Linear Basis Example
First, we import some utility functions from the CASA recipes area:
from recipes.almapolhelpers import *
Our MS in this example is called polcal_linfeed.ms. We begin by assuming we already have a bandpass calibration result (obtained by conventional means) stored in polcal.bcal. We first solve for a time-dependent gain solution on the instrumental polarization calibrator, which we expect to be significantly polarized, but for which we do not yet have a polarization model:
gaincal(vis='polcal_linfeed.ms',
caltable='polcal.gcal',
field='1', #the instrumental polarization calibrator
solint='int',
smodel=[1,0,0,0], #assume zero polarization
gaintype='G',
gaintable=['polcal.bcal'],
parang=T) #so source poln properly rotated
Since the gain calibrator was assumed unpolarized, the time-dependent gain solutions contain information about the source polarization. This can be seen by plotting the amp vs. time for this cal table using poln=’/’. The antenna-based polarization amplitude ratios will reveal the sinusoidal (in parallactic angle) function of the source polarization. Run the utility method qufromgain to extract the apparent source polarization estimates for each spw:
qu=qufromgain('polcal.gcal')
The source polarization reported for all spws should be reasonably consistent. This estimate is not as good as can be obtained from the cross-hands (see below) since it relies on the gain amplitude polarization ratio being stable which may not be precisely true. However, this estimate will be useful in resolving an ambiguity that occurs in the cross-hand estimates.
Next we estimate both the XY-phase offset and source polarization from the cross-hands. The XY-phase offset is a spectral phase-only bandpass relating the X and Y systems of the reference antenna. If the XY-phase is solved for in a channel-dependent manner (as below), it is strictly not necessary to have solved for the cross-hand delay as described above, but it does not hurt, as it allows reasonably coherent channel averages for data examination (we assume below that we have obtained the cross-hand delay solution at this stage). The source polarization occurs in the cross-hands as a sinusoidal function of parallactic angle that is common to both cross-hands on all baselines (for a point-source). If the XY-phase bandpass is uniformly zero, then the source linear polarization function will occur entirely in the real part of the cross-hand visibilities. Non-zero XY-phase has the effect of rotating the source linear polarization signature partially into the imaginary part, where circular (and instrumental) polarization occur (cf. the circular feed basis where the cross-hand phase merely rotates the position angle of linear polarization). The following gaincal solve averages all baselines together and first solves for a channelized XY-phase (the slope of the source polarization function in the complex plane in each channel), then corrects the slope and solves for a channel-averaged source polarization. This calibration is obtained using gaintype=’XYf+QU’ in gaincal:
gaincal(vis='polcal_linfeed.ms',
caltable='polcal.xy0amb', #possibly with 180deg ambiguity
field='1', #the calibrator
solint='inf',
combine='scan',
preavg=200.0, #minimal parang change
smodel=[1,0,1,0], #non-zero U assumed
gaintype='XYf+QU',
gaintable=['polcal.gcal','polcal.bcal','polcal.xdelcal]) #all prior calibration
Note that we imply non-zero Stokes U in smodel; this is to enforce the assumption of non-zero source polarization signature in the cross-hands in the ratio of data and model. This solve will report the center-channel XY-phase and apparent Q,U for each spw. The Q,U results should be recognizable in comparison to that reported by qufromgain above. However, since the XY-phase has a 180 degree ambiguity (you can rotate the source polarization signature to lie entirely in the visibility real part by rotating clockwise or counter-clockwise), some or all spw Q,U estimates may have the wrong sign. We correct this using the xyamb utility method, using the qu obtained from qufromgain above (which is not ambiguous):
S=xyamb(xy='polcal.xy0amb',qu=qu,xyout='polcal.xy0')
The python variable S now contains the mean source model (Stokes I =1; fractional Q,U; V=0) that can be used in a revision of the gain calibration and instrumental polarization calibration.
Next we revise the gain calibration using the full polarization source model:
gaincal(vis='polcal_linfeed.ms',
caltable='polcal.gcal1',
field='1',
solint='int',
smodel=S, #obtained from xyamb
gaintype='G',
gaintable=['polcal.bcal'],
parang=T) #so source poln properly rotated
Note that parang=T so that the supplied source linear polarization is properly rotated in the parallel-hand visibility model. This new gain solution can be plotted with poln=’/’ as above to show that the source polarization is no longer distorting it. Also, if qufromgain is run on this new gain table, the reported source polarization should be statistically indistinguishable from zero.
Finally, we can now solve for the instrumental polarization:
polcal(vis= 'polcal_linfeed.ms',
caltable='polcal.dcal',
field='1',
solint='inf',
combine='scan',
preavg=200,
poltype='Dflls', #freq-dep LLS solver
refant='', #no reference antenna
smodel=S,
gaintable=['polcal.gcal1','polcal.bcal','polcal.xdelcal','polcal.xy0'])
Note that no reference antenna is used since this solve will produce an absolute instrumental polarization solution that is registered to the assumed source polarization (S) and prior calibrations. Applying a refant (referring all instrumental polarization terms to a reference antennas X feed, which would then be assumed perfect) would, in fact, discard valid information about the imperfections in the reference antennas X feed. (Had we used an unpolarized calibrator, we would not have a valid xy-phase solution, nor would we have had access to the absolute instrumental polarization solution demonstrated here.)
A few points:
Since the gain, bandpass, and XY-phase calibrations were obtained prior to the instrumental polarization solution and maybe distorted by it, it is generally desirable to re-solve for them using this instrumental polarization solution as a prior calibration. In effect, this means iterating the sequence of calibration steps using all of the best of the available information at each stage, including the source polarization (and parang=T). This is a generalization of traditional self-calibration.
If the source linear polarization fraction and position angle is known a priori, the processing steps outlined above can be amended to use that source polarization assertion in the gain and instrumental calibration solves from the start. The qufromgain method is then not needed (but can be used to verify assumptions), the gaincal(…,gaintype=XYf+QU,…) should not be altered (parallactic angle coverage is still required!), and the xyamb run should use the a priori polarization for qu. If there is likely to be a large systematic offset in the mean feed position angle, iteration of the gain, bandpass, and instrumental polarization terms is required to properly isolate the calibration effects.
Note that the above process does not explicitly include a position angle calibration. In effect, the estimated source polarization sets the mean feed position angle as the reference position angle, and this is usually within a degree or so of optimal for linear feeds. If your mean X feed position angle is not 0 degrees, and your MS does not account for the offset in its FEED subtable, be careful in your interpretation of the final position angle. Currently, the circular feed-specific position angle calibration modes of polcal(…,poltype=’Xf’,…) will not properly handle the linear feed basis; this will be fixed in a future release.
Water Vapor Radiometers¶
The task wvrgcal generates a gain table based on Water Vapor Radiometer (WVR) data and is used for ALMA data reduction. Briefly, the task enables a Bayesian approach to calculating the coefficients that convert the outputs of the ALMA 183 GHz water-vapor radiometers (mounted on each antenna) into estimates of path fluctuations which can then be used to correct the observed interferometric visibilities.
The CASA task is an interface to the executable wvrgcal, which is part of the CASA distribution and can also be called from outside CASA. The wvrgcal software is based on the libair and libbnmin libraries which were developed by Bojan Nikolic at the University of Cambridge as part of EU FP6 ALMA Enhancement program. CASA 5 contains version 2.1 of wvrgcal. The algorithmic core of wvrgcal is described in three ALMA memos (number 587 [1], 588 [2], and 593 [3] ) which describe the algorithms implemented in the software.
The CASA task interface to wvrgcal follows closely the interface of the shell executable at the same time staying within the CASA task parameter conventions. In ALMA data, the WVR measurements belonging to a given observation are contained in the ASDM for that observation. After conversion to an MS using importasdm, the WVR information can be found in separate spectral windows. As of April 2016, it is still one single spectral window for all WVRs, however, the ID of the spectral window may vary between datasets. The wvrgcal task identifies the SPW autonomously, but it can also be specified via the parameter wvrspw (see below). The specified spectral window(s) must be present in the MS for wvrgcal to work. This is not to be mixed up with the list of spectral windows for which solutions should be calculated and which can be specified with the parameter spw. Note that wvrgcal will calculate a correction only for the scans with the words ON_SOURCE, SIGNAL, or REFERENCE in the scan intent. The various features of wvrgcal are then controlled by a number of task parameters (see the list above). They have default values which will work for ALMA data. An example for a typical wvrgcal call can be found in the ALMA CASA guide for the NGC 3256 analysis:
wvrgcal(vis='uid___A002_X1d54a1_X5.ms',
caltable='cal-wvr-uid___A002_X1d54a1_X5.W',
toffset=-1,
segsource=True, tie=["Titan,1037-295,NGC3256"], statsource="1037-295",
wvrspw=[4],
spw=[17,19,21,23])
Here, vis is the name of input visibility file which as mentioned above also contains the WVR data and caltable is the name of the output gain calibration table. WVR data is typically in spectral window 0, but in the example above, the data are contained in spectral window 4. Although wvrgcal should automatically identify this SPW, it is explicitly specified with the wvrspw parameter in the above example. The toffset parameter is the known time offset in seconds between the WVR measurements and the visibility integrations for which they are valid. For ALMA, this offset is presently -1 s (which is also the default value).
The parameter segsource (segregate source) controls whether separate coefficients are calculated for each source. The default value True is the recommended one for ALMA. When segsource is True, the subparameter tie is available. It permits the formation of groups of sources for which common coefficients are calculated as well as possible. The tie parameter ensures best possible phase transfer between a group of sources. In general it is recommended to tie together all of the sources in a single Science Goal (in ALMA speak) and their phase calibrator(s). The recommended maximum angular distance up to which two sources can be tied is 15 degrees. The parameter statsource controls for which sources statistics are calculated and displayed in the logger. This has no influence on the generated calibration table.
Via the parameter spw, one can control for which of the input spectral windows wvrgcal will calculate phase corrections and store them in the output calibration table. By default, solutions for all spectral windows are written except for the ones containing WVR data. The wvrgcal task respects the flags in the Main and ANTENNA table of the MS. The parameter mingoodfrac lets the user set a requirement on the minimum fraction of good measurements for accepting the WVR data from an antenna. If antennas are flagged, their WVR solution is interpolated from the three nearest neighboring antennas. This process can be controlled with the new parameters maxdistm and minnumants. The former sets the maximum distance an antenna used for interpolation may have from the flagged one. And minnumants sets how many near antennas there have to be for interpolation to take place. For more details on the WVR Phase correction, see also the the ALMA Memo “Quality Control of WVR Phase Correction Based on Differences Between WVR Channels” by B. Nikolic, R. C. Bolton & J. S. Richer [4] , see also ALMA memo 593 [3].
Statistical parameters shown in the logger output of wvrgcal
The wvrgcal task writes out a variety of information to the logger, including various statistical measures of the performance. This allows the user to judge whether WVR correction is appropriate for the MS, to check whether any antennas have problematic WVR values, and to examine the predicted performance of the WVR correction when applied. For each set of correction coefficients which are calculated (the number of coefficient sets are controlled by the parameters nsol, segsource and tie), the wvrgcal output to the logger first of all shows the time sample, the individual temperatures of each of the four WVR channels, and the elevation of the source in question at that time. For each of these coefficient sets, it then gives the evidence of the bayesian parameter estimation, the calculated precipitable water vapor (PWV) and its error in mm, and the correction coefficients found for each WVR channel (dTdL).
The output then shows the statistical information about the observation. First of all it gives the start and end times for the parts of the observation used to calculate these statistics (controlled by segsource). It then shows a break down for each of the antennas in the data set. This gives the antenna name and number; whether or not it has a WVR (column WVR); whether or not it has been flagged (column FLAG); the RMS of the path length variation with time towards that antenna (column RMS); and the discrepancy between the RMS path length calculated separately for different WVR channels (column Disc.). These values allow the user to see if an individual WVR appears to have been suffering from problems during the observation, and to flag that antenna using wvrflag if necessary. This discrepancy value, Disc., can in addition be used as a simple diagnostic tool to evaluate whether or not the WVR correction caltable created by wvrgcal should be applied. In the event of the WVR observations being contaminated by strong cloud emission in the atmosphere, the attempt by wvrgcal to fit the water vapor line may not be successful, and applying the produced calibration table can in extreme cases reduce the quality of the data. However, these weather conditions should identified by a high value in the discrepancy column produced when running wvrgcal.
Discrepancy values of greater than a 1000 microns usually indicate strong cloud contamination of the WVR data, and the output calibration table should probably not be applied. If the values are between 100 and 1000 microns, then the user should manually examine the phases before and after applying the caltable to decide if WVR correction is appropriate. Work is underway at ALMA to provide additional routines to at least partially remove the cloud component from the WVR data before calculating phase corrections. CASA 4.7 will contain a first tested version of such a tool. After the antenna-by-antenna statistics, the output then displays some estimates of the performance of the wvrgcal correction. These are the thermal contribution from the water vapor to the path fluctuations per antenna (in microns), the largest path fluctuation found on a baseline (in microns), and the expected error on the path length calculated for each baseline due to the error in the coefficients (in microns).
Antenna position calculation
The information about antenna pointing direction is by default taken from the POINTING table. Should this table not be present for some reason, the user can instead switch to determining the antenna positions from the phase directions in the FIELD table (under the assumption that all antennas were pointing ideally). The switch is performed by setting the parameter usefieldtab to True.
Spectral window selection
By default, wvrgcal puts solutions for all spectral windows of the MS into the output calibration table. Since usually only the spectral windows are of interest in which the science target and the calibrators were observed, it is not necessary to store solutions for other spectral windows. The spectral windows for which solutions are stored can be selected with the parameter spw, e.g. spw = [17,19,21,23] will make wvrgcal write only solutions for spectral windows 17, 19, 21, and 23. With respect to the input WVR spectral windows, wvrgcal will by default regard all windows with 4 channels as WVR data. In typical ALMA data there is only one such spectral window in each ASDM. This may change in the future. In any case, the input WVR spectral window(s) can be selected with the optional parameter wvrspw. The syntax is the same as for the parameter spw above.
Examine/Edit Cal Tables¶
How to plot, list, and adjust calibration tables
Information on examination and manipulation of calibration tables can be found in the task documentation for plotms, listcal, calstat, smoothcal, and browsetable.
Apply Calibration¶
How to apply calibration to generate data for imaging
Please see the task documentation for applycal for details on application of calibration.
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/single_dish_calibration.ipynb
Single Dish Calibration¶
This section describes the motivation, background, and process for calibrating ALMA single-dish observations.
Below are brief descriptions of the calibration process, followed by a description of the various observing modes, and then a step-by-step description of the calibration process in CASA.
It is recommended that newcomers review the description of calibration in Single-dish calibration: background and descriptions of various observing modes in Single-dish observing modes before examining the reduction processes decribed in Single-dish data calibration and reduction. Future directions are broadly outlined in Future development goals for CASA single-dish.
Background SD Calibration¶
Brief description of calibrating ALMA single-dish observations Brief description of calibrating ALMA single-dish observations
Like any single-dish telescope, ALMA’s single dish antennas (nominally, four 12m antennas) detect and quantify brightness temperature (\(T_B\), in Kelvin). In the Rayleigh-Jeans approximation, the Planck blackbody law reduces to \(T_B=\frac{B\lambda^2}{2k}\).
An ALMA single-dish observation includes contributions from sky targets in the beam, the telescope surface and receiver equipment, the ground (through reflections), the atmosphere and cosmic background, and any other electronics (necessarily noisy) following the receiver front end. Observations made with a single dish towards a target (\(T_{ON}\)) are calibrated using an additional observation towards blank sky (i.e. sky at a similar elevation, absent of any target emission at the frequencies of interest to the observer (\(T_{OFF}\))).
To determine the signal from the target, we can compute:
\(\frac{T_{targ}}{T_{sys}}=\frac{T_{ON}-T_{OFF}}{T_{OFF}}\). (In CASA, this is accomplished during the “sky calibration” step).
The \(\sim\) 183 position of the OFF is made as close as possible (in az/el) to the ON position. As a practical matter, there may be some differences in the two measurements aside from the target being in the ON position. In most cases, the differences arise chiefly from the atmospheric contribution, though any target coincidentally within the beam of the OFF measurement will contaminate and affect the accuracy of the OFF measurement, and consequently, the measurement of the calibrated brightness temperature of the target.
To calibrate single dish data, we require a measurement of \(T_{sys}\), which is done though \(T_{atm}\) (i.e. “atmosphere”) measurements at the start of each scheduling block. (In CASA, this is applied through the \(T_{sys}\) calibration step.) \(T_{sys}\) determination includes separate observations of the sky, and two “loads” of different, known temperatures.
Note that \(T_{sys}(\nu)\) measurements are spectral; that is, they determine \(T_{sys}\) as function of frequency. Since they incorporate an observation of the sky, they may include atmospheric features such as the water absorption line in Band 5 at \(\sim\) 183 GHz. So the calibration of the entire band must be done in the frequency domain.
It is policy that ALMA single-dish data must only be observed to supplement and be combined with interferometer observations. Therefore, the single-dish data needs to be converted from its native units of brightness temperature (\(T_A^*\)) to flux density units (Jy/beam) before combination with the interferometric data. The conversion from \(T_A^*\) to Jy/beam is done empirically, and incorporates a factor for forward beam efficiency. The empirical conversion (Jy to K) is computed through mapping observations (done recently in time) of a standard target - either a planet or a quasar - and the scaling from \(T_A^*\) to Jy/beam is then made simply and directly from the calibrator map and applied to the science target map.
Bibliography
O’Neil, 2002 in “The NAIC/NRAO School on Single Dish Radio Astronomy” C. Salter, et.al eds.
PdBI mm astro summer school notes (Dutrey, Dutrey & Neri; Guélin)
Unpublished ALMA memo: Robert Lucas, 2005
Observing Modes¶
At present, ALMA single dish observing has three modes: PS (position switched), OTF (on the fly), and OTF-raster (Rasterized on the fly). These have slightly different observation/scanning patterns, and also have differences in the cadence and position of the OFF measurements.
In the case of the PS mode, the OFF position is specified by the PI before the observations. The OFF positions have a specific and periodic cadence and should, where possible, be at approximately the same azimuth as the observations of the science target. Note that PS mode observations can include mapping observations, and this is actually the general mode for ALMA single-dish observations.
For OTF-raster, the telescope scans the observing target, including some additional length on either side of the target. The reference data are interpolated from the OFF positions at the edge of the scans. It is assumed that the bandpass profile varies slowly as a function of scan position. This mode yields a slightly better calibration than position switching, since the variation in the air mass is typically more consistent between the OFF data and the target. At present, ALMA uses OTF-raster mode to calibrate observations of the amplitude (i.e. Jy-to-K) calibrators.
For OTF mode, the observations are not rasterized. The scanning pattern for OTF observations can take several forms. Solar observations, for example, use a double-circle scan. In this case the entire periphery of the observed region is identified as an OFF measurement. The OTF-raster mode, on the other hand, uses the first and last points of the raster rows as the OFF. A mean bandpass is formed from the OFF measurements, interpolated in time, and applied to any measurements not identified as OFF. For OTF and OTF-raster, specific OFF posiotions are not explicitly nominated prior to the observations.
Calibration & Reduction¶
Single-dish data calibration and reduction
Generally, the calibration of ALMA single-dish data requires essentially only two steps - the application of the \(T_{sys}\) calibration, and the application of the “sky” calibration (i.e. OFF). With just these steps, the product is \(T_A^*\) in units of Kelvin. While the steps are described in detail here, an example of the full single-dish calibration process can be found in the M100 Band 3 Single Dish CASAguide.
In the following description, we refer to data in the ALMA-native format of ASDM (ALMA Science Data Model) with a variable name asdm, while we refer to the CASA-native format of MeasurementSet with a variable name sd_ms. In the case of single-dish data, the MeasurementSet data are not formally visibilities, since they are simply auto-correlations.
importasdm: Import the ASDM
listobs - List observation parameters
flagdata and flagcmd - A priori flagging
gencal, applycal, and sdcal- Generate and apply the \(T_{sys}\) and sky calibration tables
flagdata - Do initial flagging
applycal - Calibrate the data into Kelvins
sdbaseline - Subtract the baseline
1. Import of the ASDM: ImportASDM
Import of ASDM data and conversion into the MeasurementSet format is achieved via the importasdm task. The task settings are a little different to those used when importing interferometer ASDMs. For instance, bdfflags=False will de-select application of the BDF format flags, which are of no consequence in the normal reduction process, and with_pointing_correction=True will directly apply the measured pointing direction (in Azimuth and Elevation) to the data, rather than the commanded pointing positions. For example:
importasdm(asdm, asis='Antenna Station Receiver Source CalAtmosphere CalWVR CorrelatorMode SBSummary', bdfflags=False, process_caldevice=False, with_pointing_correction=True)
2. List observations parameters: listobs
listobs works for single dish observations in the same way as for interferometry, and will detail the calibration scans. It identifies pointing, \(T_{sys}\), and target scans, the science target and OFF scans and their cadence, the correlator frequencies and configuration, and the antennas used in the observations. Unique to the single-dish data is the #OFF_SOURCE intent. This task is used to identify which spectral windows are used for \(T_{sys}\) observations. This information is critical for those who wish to build their own calibrations.
listobs(sd_ms)
produces feedback in the logger (or optionally, to a file) from which we can determine which spectral windows are \(T_{sys}\) observations and which are science observations.
Spectral Windows: (25 unique spectral windows and 2 unique polarization setups)
SpwID Name #Chans Frame Ch0(MHz) ChanWid(kHz) TotBW(kHz) CtrFreq(MHz) BBC Num Corrs
0 BB_1#SQLD 1 TOPO 195994.575 2000000.000 2000000.0 195994.5750 1 XX YY
1 BB_2#SQLD 1 TOPO 197932.075 2000000.000 2000000.0 197932.0750 2 XX YY
2 BB_3#SQLD 1 TOPO 207994.575 2000000.000 2000000.0 207994.5750 3 XX YY
3 BB_4#SQLD 1 TOPO 209994.575 2000000.000 2000000.0 209994.5750 4 XX YY
4 WVR#NOMINAL 4 TOPO 184550.000 1500000.000 7500000.0 187550.0000 0 XX
5 X436890472#ALMA_RB_05#BB_1#SW-01#FULL_RES 128 TOPO 196986.763 -15625.000 2000000.0 195994.5750 1 XX YY
6 X436890472#ALMA_RB_05#BB_1#SW-01#CH_AVG 1 TOPO 195978.950 1796875.000 1796875.0 195978.9500 1 XX YY
7 X436890472#ALMA_RB_05#BB_2#SW-01#FULL_RES 128 TOPO 198924.263 -15625.000 2000000.0 197932.0750 2 XX YY
8 X436890472#ALMA_RB_05#BB_2#SW-01#CH_AVG 1 TOPO 197916.450 1796875.000 1796875.0 197916.4500 2 XX YY
9 X436890472#ALMA_RB_05#BB_3#SW-01#FULL_RES 128 TOPO 207002.388 15625.000 2000000.0 207994.5750 3 XX YY
10 X436890472#ALMA_RB_05#BB_3#SW-01#CH_AVG 1 TOPO 207978.950 1796875.000 1796875.0 207978.9500 3 XX YY
11 X436890472#ALMA_RB_05#BB_4#SW-01#FULL_RES 128 TOPO 209002.388 15625.000 2000000.0 209994.5750 4 XX YY
12 X436890472#ALMA_RB_05#BB_4#SW-01#CH_AVG 1 TOPO 209978.950 1796875.000 1796875.0 209978.9500 4 XX YY
13 BB_1#SQLD 1 TOPO 183375.638 2000000.000 2000000.0 183375.6378 1 XX YY
14 BB_2#SQLD 1 TOPO 181427.463 2000000.000 2000000.0 181427.4627 2 XX YY
15 BB_3#SQLD 1 TOPO 169374.840 2000000.000 2000000.0 169374.8404 3 XX YY
16 BB_4#SQLD 1 TOPO 170917.638 2000000.000 2000000.0 170917.6378 4 XX YY
17 X1857092512#ALMA_RB_05#BB_1#SW-01#FULL_RES 4096 TOPO 183162.808 122.070 500000.0 183412.7471 1 XX YY
18 X1857092512#ALMA_RB_05#BB_1#SW-01#CH_AVG 1 TOPO 183412.686 500000.000 500000.0 183412.6861 1 XX YY
19 X1857092512#ALMA_RB_05#BB_2#SW-01#FULL_RES 4096 TOPO 181177.524 122.070 500000.0 181427.4627 2 XX YY
20 X1857092512#ALMA_RB_05#BB_2#SW-01#CH_AVG 1 TOPO 181427.402 500000.000 500000.0 181427.4017 2 XX YY
21 X1857092512#ALMA_RB_05#BB_3#SW-01#FULL_RES 4096 TOPO 169587.670 -122.070 500000.0 169337.7310 3 XX YY
22 X1857092512#ALMA_RB_05#BB_3#SW-01#CH_AVG 1 TOPO 169337.670 500000.000 500000.0 169337.6700 3 XX YY
23 X1857092512#ALMA_RB_05#BB_4#SW-01#FULL_RES 4096 TOPO 171158.788 -122.070 500000.0 170908.8487 4 XX YY
24 X1857092512#ALMA_RB_05#BB_4#SW-01#CH_AVG 1 TOPO 170908.788 500000.000 500000.0 170908.7877 4 XX YY
From this output, we see the science spectral windows are 17, 19, 21 and 23, and have 4096 channels, while the \(T_{sys}\) spectral windows at 5, 7, 9 and 11 have 128 channels.
3. A priori flagging: flagcmd/flagdata
flagcmd works the same way on single-dish data as for interferometry. In this case, invoking it here applies flagging, by default, from the FLAG_CMD file within the MeasurementSet.
flagcmd(vis = 'uid___A002_Xb978c3_X5c4b.ms', inpmode = 'table', useapplied = True, action = 'apply')
flagdata is used at this point to remove problematic data. Conventionally, 5% of the edges of the bands are removed, as these parts of the band are significantly and detrimentally affected by the low-sensitivity edges of the filter passband. In principle, they can be retained in the cases where spectral lines of interest fall in that area, though the sensitivity losses are significant.
Users should examine their spectra using plotms, and ensure any atmospheric lines are properly accounted for. This is particularly true for Band 5 which has a strong atmospheric absorption line at \(\sim\) 183 GHz. There is no real way to remove the signature of the atmospheric lines in position-switched data, since the elevations of the ON (science target) and OFF (sky-calibration position) are almost always different, and therefore have different air masses. The most effective approach in this case is to complete the normal calibrations as described here, then apply a judiciously-selected bandpass correction polynomial and spectral window channel range, as described by the sdbaseline step below.
flagdata(vis=vis, mode='manual', spw='17:0~119;3960~4079,19:0~119;3960~4079,21:0~119;3960~4079,23:0~119;3960~4079', action='apply', flagbackup=True)
Both steps flagcmd and flagdata are generally useful, but care should be taken in case the emission lines of interest are being inadvertantly flagged out.
4. Generation of the :math:`T_{sys}` and :math:`T_{sky}` calibration tables: gencal, sdcal and applycal
There are two ways to proceed in CASA when computing and applying calibration tables for single dish observations.
Build the \(T_{sys}\) calibration tables with gencal, build the sky calibration tables with sdcal, and apply them with applycal
Build and apply both the \(T_{sys}\) and sky calibration tables with sdcal
The second option is faster, but users familar with the gencal and applycal tasks may prefer the first option.
In either case, the mapping between the \(T_{sys}\) scans and science scans must be determined either by examination of the output of listobs, or by running the sdcal and specifying the method to be used to obtain the OFF position. Usually ALMA will take position-switched observations via mode=’ps’, though other alternatives exist which do not need any OFF positions to be explicitly observed. The OFF can be obtained from the source data itself via mode=’otfraster’ or mode=’off’.
In the first of the two cases mentioned above (having identified the target spectral windows as 17,19,21 and 23, and using a target identified by the variable name, “source”) :
gencal(vis = sd_ms, caltable = sd_ms+'.tsys', caltype = 'tsys')
sdcal(infile = sd_ms, outfile = sd_ms+'.sky', calmode = 'ps')
from recipes.almahelpers import tsysspwmap
tsysmap = tsysspwmap(vis = sd_ms, tsystable = sd_ms+'.tsys', trim = False)
applycal(vis = sd_ms, applymode = 'calflagstrict', spw = '17,19,21,23', field = source, gaintable = [sd_ms+'.tsys', sd_ms+'.sky'], gainfield = ['nearest', source], spwmap = tsysmap)
In the second case:
sdcal(infile=sd_ms, calmode='ps,tsys,apply')
Note that we didn’t specify the \(T_{sys}\) spectral windows in the call to sdcal. For ALMA single-dish data from Cycle 3 onward, this is okay since the \(T_{sys}\) and science data share the same spectral window. Alternatively, the mapping between the \(T_{sys}\) and science spectral windows can be explicitly set with spwmap and spw. In this case, we would use:
sdcal(infile=vis, calmode='ps,tsys,apply', spwmap={17:[17], 19:[19], 21:[21],23:[23]}, spw='17,19,21,23')
The general structure of spwmap is {Tsys spw 0: [science spw 0],….,Tsys spw n: [science spw n]} for 0 to n spectral windows.
gencal applied at this stage builds (and optionally applies) the \(T_{sys}\) calibration tables. These calibrations are an intrinsic part of the ASDM. There are no re-computations applied to the \(T_{sys}\) data by CASA. Ultimately, the \(T_{sys}\) calibration tables will be applied in the applycal step, consistent with the descriptions of calibrations given in the sections above. We point out that the \(T_{sys}\) calibrations are a multiplicative factor, so the order of the application of the \(T_{sys}\) cal tables relative to the application of the \(T_{sky}\) calibrations is immaterial.
8. Subtracting the baseline: *sdbaseline*
It’s important at this point to define exactly what is meant by baseline in the context of single-dish data. In interferometry, baseline refers to the spatial separation of antenna pairs. For a single dish observation, baseline refers to the spectral pattern produced by the atmosphere and instrument. Since single-dish antennas measure total power, not an interference pattern, they are responsive to emission wherever it exists within the single-dish beam or signal path. This signal is dominated by the receiver/correlator/backend sampling function, but has a significant time-varying component usually dominated by atmospheric fluctuations. The power yielded by atmospheric fluctuations are invisible to interferometer observations, as they are in the near field, and are therefore generally resolved out from the data. Note, though, that the atmospheric variability can contaminate interferometric measurements by introducing a decoherence in phase, and such losses in phase are not relevant for single-dish observations.
sdbaseline removes a spectral baseline from the data on a per-integration basis. The options here are extensive, and baseline subtraction can be complex when emission is strongly variable throughout the map, or when there are nearby atmospheric absorption features. But CASA is effective at choosing intelligent defaults with mode=’auto’. With mode=’auto’, CASA will examine the brightness variability per integration and determine the most appropriate channel ranges for computing the spectral baseline, based on the mean absolute deviation of the channels. This approach is successful even when applied to spectra crowded heavily with emission lines. As long as the emission-free parts of the spectrum have statistically significant representation in the data, then the mode=’auto’ will be successful. Baseline corrections employed by CASA are subtracted, and therefore can be applied iteratively, as needed.
sdbaseline supports Polynomial, Chebychev and Sinusoid baseline removal. Sinusoidal baselines are determined with a Fourier transform of the spectral data - again, an automatic mode is available, where CASA will determine the most significant Fourier components and remove them, though specific wavenumbers can be explicitly added or removed on top of the automatic operation. Sinusoidal components occur in many single-dish telescopes, and are a typical manifestation of a standing-wave resonation of the main-reflector/subreflector cavity. ALMA has employed scattering cones in the single-dish subreflectors to effectively mitigate the strength of this standing wave. It’s worth noting that removal of Fourier components should be applied with utmost caution; the result is effectively a convolution of the spectra with a spectral filter, and MUST affect the resulting emission spectra. Users who use this baseline mode should explore and characterize the consequences and subsquent error propagation, in the context of their own data.In this example, we remove a 1st order polynomial from spectral windows 17, 19, 21 & 23, automatically finding and masking out any lines brighter than 5 \(\sigma\), and referencing the “corrected” (i.e. calibrated) data column.
sdbaseline(infile = sd_ms, datacolumn = 'corrected', spw = '17,19,21,23', maskmode = 'auto', thresh = 5.0, avg_limit = 4, blfunc = 'poly', order = 1, outfile = sd_ms+'.cal')
Note that at this point, the product dataset (sd_ms+’.cal) has only four spectral windows. These are (if all is going well) the science observations which are \(T_{sys}\) and sky calibrated, and are now bandpass-corrected.
9. Convert the Science Target Units from Kelvin to Jansky: *scaleAutocorr*
To convert the units of the single-dish observations from \(T_A^*\) (K) into Janskys and to prepare for combination with interferometer data, we need to obtain the empirically-determined Jy-to-K conversion data. These data already take into account any correlator non-linearities and also factor in the various subsystem efficiencies.
The easiest way to obtain this is simply with a call to a specialized CASA task that obtains the Jy-to-K factors that accesses polynomal fits from ongoing calibration campaign data.
jyperk = es.getJyPerK(sd_ms+'.cal')
The contents of this variable jyperk is a python dictionary:
for ant in jyperk.keys():
for spw in jyperk[ant].keys():
scaleAutocorr(vis=sd_ms+'.cal', scale=jyperk[ant][spw]['mean'], antenna=ant, spw=spw)
scaleAutocorr simply applies the scaling from \(T_A^*\) to Jy/beam. The scaling factors are determined empirically, as part of the QA2-level calibrations provided by ALMA. The scaling factors are to be provided to scaleAutocorr as a float, but are most conveniently applied in calls that iterate through antenna and spectral window, where the Jy-per-K factors are retained as a list with the format:
jyperk =
{ antenna01_name { spw0: { 'mean': 44.345, 'n': '', 'std': ''},
spw1: { 'mean': 44.374, 'n': '', 'std': ''},
spw2: { 'mean': 44.227, 'n': '', 'std': ''},
spw3: { 'mean': 44.203, 'n': '', 'std': ''}},
antenna02_name: { spw0: { 'mean': 44.345, 'n': '', 'std': ''},
spw1: { 'mean': 44.374, 'n': '', 'std': ''},
spw2: { 'mean': 44.227, 'n': '', 'std': ''},
spw3: { 'mean': 44.203, 'n': '', 'std': ''}}}
which can be iterated and applied to the actual data with the following loop:
to_amp_factor = lambda x: 1. / sqrt(x)
for ant in jyperk.keys():
factors=[]
for spw in jyperk[ant].keys():
factors.append(jyperk[ant][spw]['mean'])
gencal(vis=sd_ms, caltable=sd_ms+'.jy2ktbl', caltype='amp', spw=",".join(str(x) for x in jyperk[ant].keys()), parameter=map(to_amp_factor, factors))
applycal(vis=sd_ms+'.cal', gaintable=sd_ms+'.jy2ktbl')
Future Development Goals¶
The top development priority for CASA single-dish data reduction is the conversion to the MeasurementSet format for the full reduction process. Dispensing with the ASAP (scan table) format will help unify data processing by providing a more homogenous basis for data processing, reduction, and analysis, as well as streamline and make more versatile the development aspects; current and future developers need only be familiar with one overall format of data. As such, ASAP will be gone in 5.1. However, please note that the importasap task is the only exception and will not go away even after removing ASAP. This task supports importing existing scantables in CASA for backward compatibility.There is also significant active development to add header information to plots made by plotms. While preserving the multi-plot capability and building a layout that will not cramp or obfuscate the plot output, header information (i.e. target name, frequency, integration time, etc.) can be optionally added to the plots output by plotms.There are a number of additional improvements that we do not detail here. ALMA-related development requests and bug notices can be sent to the CASA Single Dish Team via the ALMA Helpdesk.
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/cal_library_syntax.ipynb
Cal Library Syntax¶
The “Cal Library” is a new means of expressing calibration application instructions. It has nominally been available in applycal and the calibration solve tasks since CASA 4.1, via the docallib=True parameter, as an alternative to the traditional parameters (e.g., gaintable, etc.) that most users continue to use. As of CASA 4.5, we have deployed use of the Cal Library for on-the-fly calibration in plotms and mstransform. In CASA 4.5, our intent is to demonstrate the Cal Library and begin familiarizing users with it. The capabilities remain limited in some ways, and new features, additional flexibility, and broader deployment in more tasks will be offered in later releases.This page describes basic use of the Cal Library.
Please note the section on current (CASA 4.5, 4.6, 4.7, 5.*) limitations.
Basic Cal Library usage¶
The Cal Library is a means of specifying calibration instructions in an ascii file, rather than via the traditional gaintable/gainfield/interp/spwmap/calwt parameters that often become clumsy when many caltables are involved, and which have rather limited flexibility. Instead of specifying the traditional parameters, the file name is specified in the callib parameter in applycal or plotms (in applycal one must also specifiy docallib=True). For example, to correct an MS called my.ms, with a Cal Library file called mycal.txt:
applycal(vis='my.ms',docallib=True,callib='mycal.txt')
In a Cal Library file, each row expresses the calibration apply instructions for a particular caltable and (optionally) a specific selection of data in the MS to which it is to be applied.For example, if mycal.txt contains:
#mycal.txt cal library file
caltable='cal.G' tinterp='linear' calwt=True
this will arrange a caltable called cal.G to be applied (with no detailed selection) to all MS data with linear interpolation in time, and with the weights also calibrated. It corresponds to these settings for the traditional parameters in applycal:
applycal(vis='my.ms',gaintable='cal.G',gainfield='',interp='linear',
spwmap=[],calwt=True)
If a bandpass table, cal.B, is also available for application, one might use the following Cal Library file:
#mycal.txt cal library file
caltable='cal.G' tinterp='linear' calwt=True
caltable='cal.B' finterp='linear' calwt=False
This example arranges the same instructions for cal.G, and adds a bandpass table that will be interpolated linearly in frequency (the default for time-dependent interpolation is linear, if the bandpass table contains more than one time sample), without weight calibration. The corresponding form with the traditional parameters is:
applycal(vis='my.ms',gaintable=['cal.G','cal.B'], gainfield=['',''],
interp=['linear','linear,linear'],
spwmap=[],calwt=[True,False])
In general, the Cal LIbrary file should be easier to read and manage than the traditional parameters as the number of specified caltables grows.A more complicated example, involving non-trivial spwmap as well as field selection (fldmap) in the caltable:
#mycal.txt cal library file
caltable='cal.G' tinterp='linear' fldmap='nearest' spwmap=[0,1,1,3] calwt=True
caltable='cal.B' finterp='linear' fldmap='3' spwmap=[0,0,0,0] calwt=False
In this case, solutions from cal.G will be selected based on directional proximity (‘nearest’) for each MS field via the fldmap parameter, and spw 2 will be calibrated by spw 1 solutions. For cal.B, solutions from field id 3 will be selected exclusively from the caltable, with spw 0 calibrating all MS spws (of which there are apparently 4). The corresponding settings for the traditional parameters is as follows:
applycal(vis='my.ms',gaintable=['cal.G','cal.B'], gainfield=['nearest','3'],
interp=['linear','linear,linear'],
spwmap=[[0,1,1,3],[0,0,0,0]],calwt=[True,False])
Comment lines may be included in the cal library file by starting a line with the \(#\) character. (Partial line comments are not supported, as yet.) Existing cal library lines can be turned off (for experimentation purposes) by making those lines comments with \(#\).
More advanced Cal Library Usage¶
The real power of the Cal Library arises from the ability to specify calibration instructions for a caltable per MS selection. This enables consolidating what would be multiple applycal executions using the traditional parameters into a single execution. Extending the example from above, if the MS field ‘cal’ should be calibrated by cal.G with ‘nearest’ interpolation in time, and the field ‘sci’ with ‘linear’ interpolation in time, the following Cal Library file will achieve this:
#mycal.txt cal library file
caltable='cal.G' field='cal' tinterp='nearest' fldmap='nearest' spwmap=[0,1,1,3] calwt=True
caltable='cal.G' field='sci' tinterp='linear' fldmap='nearest' spwmap=[0,1,1,3] calwt=True
caltable='cal.B' finterp='linear' fldmap='3' spwmap=[0,0,0,0] calwt=False
Note that the algorithm for selecting solutions from the caltable (fldmap=’nearest’, which may resolve differently for the two MS fields) hasn’t been changed, but it could be. In fact, for each caltable, any of the calibration parameters can be adjusted per MS selection, except calwt, which if set to True (or False) for the first-specified MS selection, will be forced to True (or False) for all entries for that caltable, to maintain weight consistency within the MS. In general, it is best to specify calwt (or rely on the default, True) uniformly for all entries, per caltable, to avoid confusion. MS selection by spw, intent, and obs id can also be used (see the glossary below).The pair of applycal executions corresponding to this Cal Library would be:
applycal(vis='my.ms',field='cal',gaintable=['cal.G','cal.B'], gainfield=['nearest','3'],
interp=['nearest','linear,linear'], spwmap=[[0,1,1,3],[0,0,0,0]],calwt=[True,False])
applycal(vis='my.ms',field='sci',gaintable=['cal.G','cal.B'], gainfield=['nearest','3'],
interp=['linear','linear,linear'], spwmap=[[0,1,1,3],[0,0,0,0]],calwt=[True,False])
When there are many fields to which to apply carefully-selected calibration, fldmap=’nearest’ may not properly select the correct calibrator fields for each target field. In this case, the index list style form of fldmap (like spwmap) can be used (where field ids 1,4,6 are calibators, and 2,5,7 are the corresponding science fields):
#mycal.txt cal library file
caltable='cal.G' field='1,2,3,4,5,6,7' tinterp='nearest' fldmap=[0,1,1,3,4,4,6,6] spwmap=[0,1,1,3] calwt=True
caltable='cal.B' finterp='linear' fldmap='3' spwmap=[0,0,0,0] calwt=False
In this example, field 1 will calibrate itself and field 2. Similarly, 4 will calibrate itself and 5, and 6 will calibrate itself and 7. The bandpass calibrator (3) has been included, too, calibrating itself. Field indices are specified in the field and fldmap parameters here, for clarity. While field names can be used in field, the fldmap parameter, which in this form is an indexing list, can only interpret indices (note that field 0 is also explicitly included in the fldmap to preserve the integrity of the indexing).
If multiple calibrators are required for each individual science fields, use the string selection form of fldmap, and specify separate entries for each science field:
#mycal.txt cal library file
caltable='cal.G' field='1,3,4,6,8,9,10' tinterp='nearest' fldmap='nearest' spwmap=[0,1,1,3] calwt=True
caltable='cal.G' field='2' tinterp='linear' fldmap='1,8' spwmap=[0,1,1,3] calwt=True
caltable='cal.G' field='5' tinterp='linear' fldmap='4,9' spwmap=[0,1,1,3] calwt=True
caltable='cal.G' field='7' tinterp='linear' fldmap='6,10' spwmap=[0,1,1,3] calwt=True
caltable='cal.B' finterp='linear' fldmap='3' spwmap=[0,0,0,0] calwt=False
The additional calibrators for science fields 2, 5, and 7 are 8, 9, and 10, respectively. The first entry for cal.G accounts for all calibrators (including field 3, the bandpass calibrator), using fldmap=’nearest’ to ensure they are each calibrated solely by themselves. Then, in separate entries, fields 1 and 8 are selected for field 2, fields 4 and 9 are selected for field 5, and fields 6 and 10 are selecterd for field 7. When using the string selection style in fldmap, field names can be used, if desired.
Exclusivity
Since the Cal Library permits MS-selection-specific calibration specifications, it is even possible to specify different caltables for different MS selections, and take advantage of an implicit exclusivity property of the Cal Library. In the above example, the G calibration for the ‘cal’ and ‘sci’ fields may come from different caltables, ‘cal.Gcal’ and ‘cal.Gsci’, respectiveily (these caltables may have been solved with different solution intervals, for example). We would specify the Cal Library as follows:
#mycal.txt cal library file
caltable='cal.Gcal' field='cal' tinterp='nearest' fldmap='nearest' spwmap=[0,1,1,3] calwt=True
caltable='cal.Gsci' field='sci' tinterp='linear' fldmap='nearest' spwmap=[0,1,1,3] calwt=True
caltable='cal.B' finterp='linear' fldmap='3' spwmap=[0,0,0,0] calwt=False
In this case, the cal.B table would be applied to both fields as before, but cal.Gcal would only be applied to field ‘cal’ and cal.Gsci would only be applied to field ‘sci’. Both tables would ignore data from the field they weren’t intended for. The corresponding pair of applycal calls would be executed as follows:
applycal(vis='my.ms',field='cal',gaintable=['cal.Gcal','cal.B'], gainfield=['nearest','3'],
interp=['nearest','linear,linear'], spwmap=[[0,1,1,3],[0,0,0,0]],calwt=[True,False])
applycal(vis='my.ms',field='sci',gaintable=['cal.Gsci','cal.B'], gainfield=['nearest','3'],
interp=['linear','linear,linear'], spwmap=[[0,1,1,3],[0,0,0,0]],calwt=[True,False])
NB: The Cal Libaray exclusivity property described here only works in CASA version 5.3 and later. In prior versions, the Cal Library system implicitly assumed that all caltables specifed in the Cal Library were nominally intended for all data selections and would have as much MS-selection-specificty as needed explicitly included. In that case, missing explicit specifications would result in an error message indicating that the Cal Library was missing an explicit MS-selection-specific entry.
General Rules (current, as of CASA 4.5, 4.6, 4.7, 5.*)¶
Each non-comment line in the Cal Library file must contain a valid (existing) caltable name
Blank lines (i.e., containing whitespace only) will be ignored
All parameters (see glossary below) are name/value pairs using an equals sign, delimited with spaces (no commas!)
Only those parameters (see glossary) for which non-default values are required need be specified
Each set of coordinated instructions must occur on a single line (there is no line continuation operator, as yet)
If detailed MS selection is used, care must be exercised to ensure it is mutually exclusive over all MS rows for the same caltable; there is currently no internal checking for redundancy, and only the last calibration instructions for a particular MS selection will be invoked
Full-line comments are supported by inserting the \(#\) character as the first non-whitespace character in the line. This mechanism can be used to turn off ordinary cal library lines.
When quoted items within a selection string are used, e.g. field=‘“B0007+106; J0010+109”,GRB021004’, the string must have double quotation marks enclosing single quotation marks or single quotation marks enclosing double quotation marks. Parsing will fail with a syntax error if the enclosed marks match the outer marks. Note: the enclosed quotation marks are not needed; field=’B0007+106; J0010+109,GRB021004’ would work, with the field names separated by commas.
Limitations
Application of parallactic angle corrections is not yet supported within the Cal Library file (this only affects use in plotms, where there is no parang parameter)
Some parametrized calibration tables (BPOLY, GSPLINE) are not yet supported
Conversion from Existing applycal Scripts¶
To convert exiting applycal commands, a simple experimental function, applycaltocallib is available. To access it, type (within CASA):
from callibrary import applycaltocallib
Then, chose a filename for the cal library file, and supply existing settings for applycal parameters (field, spw, intent, gaintable, gainfield, interp, spwmap, calwt) to the applycaltocallib function:
callibfile='mycallib.txt'
applycaltocallib(filename=callibfile,append=False,
field,spw,intent,gaintable,gainfield,
interp,spwmap,calwt)
If append=False, the specified filename will be overwritten, if it already exists. If append=True, new entries will be appended to the existing filename. Only parameters with non-trivial applycal settings need be included. In general, if gaintable is a python list, it is best if gainfield, interp, spwmap, and calwt (where non-trivially set) are also lists.For example, if your conventional script contains the following applycal executions (duplicated from above):
applycal(vis='my.ms',field='cal',
gaintable=['cal.G','cal.B'], gainfield=['nearest','3'],
interp=['nearest','linear,linear'],
spwmap=[[0,1,1,3],[0,0,0,0]],
calwt=[True,False])
applycal(vis='my.ms',field='sci',
gaintable=['cal.G','cal.B'], gainfield=['nearest','3'],
interp=['linear','linear,linear'],
spwmap=[[0,1,1,3],[0,0,0,0]],
calwt=[True,False])
…these can be edited to applycaltocallib executions as:
callibfile='mycallib.txt'
applycaltocallib(filename='mycallib.txt',append=False,
field='cal',
gaintable=['cal.G','cal.B'], gainfield=['nearest','3'],
interp=['nearest','linear,linear'],
spwmap=[[0,1,1,3],[0,0,0,0]],
calwt=[True,False])
applycaltocallib(filename='mycallib.txt',append=True,
field='sci',
gaintable=['cal.G','cal.B'],
gainfield=['nearest','3'],
interp=['linear','linear,linear'],
spwmap=[[0,1,1,3],[0,0,0,0]],
calwt=[True,False])
After running them, mycallib.txt will contain:
caltable='cal.B' calwt=False field='cal' tinterp='linear' finterp='linear' fldmap='3' spwmap=[0, 0, 0, 0]
caltable='cal.G' calwt=True field='cal' tinterp='nearest' fldmap='nearest' spwmap=[0, 1, 1, 3]
caltable='cal.B' calwt=False field='sci' tinterp='linear' finterp='linear' fldmap='3' spwmap=[0, 0, 0, 0]
caltable='cal.G' calwt=True field='sci' tinterp='linear' fldmap='nearest' spwmap=[0, 1, 1, 3]
Note that the cal.B table is specified separately for the ‘cal’ and ‘sci’ fields with otherwise the same parameters; thus, those two lines could be manually consolidated to a single line with unified field selection, yielding:
caltable='cal.B' calwt=False field='cal,sci' tinterp='linear' finterp='linear' fldmap='3' spwmap=[0, 0, 0, 0]
caltable='cal.G' calwt=True field='cal' tinterp='nearest' fldmap='nearest' spwmap=[0, 1, 1, 3]
caltable='cal.G' calwt=True field='sci' tinterp='linear' fldmap='nearest' spwmap=[0, 1, 1, 3]
The field selection for the first row could be removed entirely if cal.B will be used uniformly for all fields in the MS (equivalently, field=’’). This sort of row consolidation is optional, but it may have useful memory efficiency benefits when running applycal, and so is recommended.The applycaltocallib function should be considered experimental and used with care, and the resulting file examined thoroughly for correctness, since this function will not do any internal duplication checking or other sanity checks. All other current constraints and limitations on cal libraries (as noted above) will apply.
Glossary¶
This is a list of recognized Cal Library parameters. For each, the default is indicated. Additional parameters enhancing flexibility will be added in CASA 4.5 and later.
General
caltable — the name of the caltable for which the instructions on the current line apply; no default; required
MS Selection
Use these parameters to implement calibration instructions specific to particular MS selections (using standard MS Selection syntax, except where noted):
field — the MS field selection for which the calibration instructions on the current line apply; default=‘’ (all fields)
spw — the MS spw selection for which the calibration instructions on the current line apply; default=‘’ (all spws) Note that channel selection will be ignored, since the Cal Library does not support variety in calibration application at channel granularity.
intent — the MS intent selection for which the calibration instructions on the current line apply; default=‘’ (all intents)
obs — the MS observation id selection for which the calibration instructions on the current line apply; default=‘’ (all obs ids)
Interpolation/application
tinterp — the time-dependent interpolation mode; default=‘linear’ options: ‘linear’, ‘nearest’
finterp — the chan-dependent interpolation mode (only relevant for channelized caltables); default=’linear’ options: ‘nearest’, ‘linear’, ‘cubic’, ‘spline’
calwt — weight calibration; default=True options: True, False
Calibration mapping
The following *map parameters enable selection on the caltable. For each *map parameter, the basic specification is an ordered list indicating the caltable selection indices intended for each MS index on that axis. E.g., spwmap=[0,1,1,3] means MS spws 0,1,3 will each be be calibrated by the same spw index from the caltable, and MS spw 2 will be calibrated by cal spw 1. The *map parameters support other short-hand options as well, as indicated below. For defaults, “index identity” means that each MS index will be calibrated by the corresponding caltable index, and “no explicit mapping” means that no filter will be applied to that axis, and all available solutions on the axis will be included.
spwmap — spectral window mapping; default=index identity
fldmap — field mapping; default=[] (no explicit mapping); additional options: ‘nearest’ or a string indicating field selection on the caltable (same as traditional gainfield options)
antmap — antenna id mapping; default=index identity
obsmap — obs id mapping; default=[] (no explicit mapping)
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/data_weights.ipynb
Data Weights¶
Visibility weight initialization and calibration has undergone several improvements in CASA 4.2.2 and CASA 4.3. This appendix briefly describes the formal weight definitions, and the changes occurring in these CASA versions. If data sets shall be combined that were reduced with different CASA versions, the weights may need to be adjusted accordingly. This can be achieved, e.g. by running the same version of statwt on all datasets before combination. The best option, however, is to use a single CASA version for all reductions, preferrably 4.2.2 or later.
NOTE: Post-calibration weights, e.g. imaging weights or tapers are not covered here.
The *SIGMA* and *WEIGHT* columns
Formally, in CASA 4.2.2 and later, the SIGMA column in the measurement set will reflect the per-channel noise of the DATA column as it depends on the channel bandwidth \(\Delta \nu\) and the length of an integration \(\Delta t\):
\(SIGMA = \frac{1}{\sqrt{2\Delta \nu \Delta t}}\)
The factor of \(\sqrt{2}\) is for cross-correlations only and auto-correlation data follows:
\(SIGMA = 1/\sqrt{\Delta \nu \Delta t}\).
SIGMA will only be updated if the time and channel widths are modified along with any DATA column manipulation, e.g. through averaging, binning, smoothing, etc. (tasks like mstransform, cvel, split, exportuvfits, etc).
The WEIGHT column reflects how much weight each CORRECTED_DATA sample should receive when data are combined (e.g., in averaging). To start with, WEIGHT is initialized from the SIGMA column via:
\(WEIGHT = \frac{1}{{SIGMA}^2} = 2 \Delta \nu \Delta t\)
Data calibration by applycal with calwt=True will calculate and modify the WEIGHT values but not SIGMA. Calibration applies multiplicative factors and the WEIGHT of a visibility on a baseline between antennas \(i\) and \(j\) is calculated via:
\(WEIGHT_{ij}=\frac{c_i c_j}{{SIGMA}_{ij}^2}\)
where \(c_i\) and \(c_j\) are the net antenna-based power calibration factors derived by applycal (\(c_i=c_j\) for auto-correlation data). In the table below, we list the definitions of antenna-based \(c\) for different calibration procedures and CASA versions. When more than one calibration is applied, the product of the relevant weight factors is used.
\(<=\)CASA 4.2.1
Initialization: \(1.0\)
System Temperature: \(\frac{1}{<\sqrt{T_{\rm sys, k}}>_{k}^{2}}\)
Gain: \(||G||^2\)
Bandpass \(\frac{1}{<||B||^{-1}>_{k}^{2}}\)
CASA 4.2.2/4.3
Initialization: \(2 \Delta \nu \Delta t\)
System Temperature: \(\frac{1}{<T_{\rm sys, k}>_k}\)
Gain: \(||G||^2\)
Bandpass: \(<||B||^{2}>_{k}\)
\(>=\)CASA 4.4 (WEIGHT_SPECTRUM)
Initialization: \(2 \Delta \nu \Delta t\)
System Temperature: \(\frac{1}{T_{\rm sys, k}}\)
Gain: \(||G||^2\)
Bandpass: \(<||B||^{2}>_{k}\)
Weights in CASA 4.2.1 and Earlier
The SIGMA and WEIGHT columns are initialized with values of \(1.0\). Traditionally, this convention was adequate for datasets with uniform sampling in time in frequency; a global weight scale factor would not affect calibration and imaging fidelity. In data manipulation operations (e.g., split, etc.), SIGMA was treated as a per-channel value and WEIGHT as a per-spw (all channels) weight. Combined with unit initialization, this difference in definition could lead to incongruent weight scales for different spectral windows, in particular if bandwidth and channel count varied. CASA 4.2.1 is therefore not recommended for datasets which have variety in spectral window bandwidth and channelization and for which spectral windows are to be combined in imaging.
Weights in CASA 4.2.2
In CASA 4.2.2 the SIGMA and WEIGHT columns are properly initialized via the definition in the above equations. Both are defined as per-channel values. Also, the weight calibration factors have been subtly updated to improve robustness, as indicated in the Table.
Weights in CASA 4.3
In CASA 4.3 frequency variations of the WEIGHT and SIGMA values are (optionally) captured in additional WEIGHT_SPECTRUM and SIGMA_SPECTRUM columns. This allows accommodation of variations of effective sensitivity on a channel by channel basis (e.g. band edges, atmospheric lines, spectral \(T_{\rm sys}\) variation etc.). WEIGHT_SPECTRUM will be recognized in the applycal task as well as in mstransform and clean. Calibration solvers, however, will not yet calculate and modify WEIGHT_SPECTRUM.
Weights in CASA 4.4 and later
Full support of WEIGHT_SPECTRUM.
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/ephemeris_data.ipynb
Ephemeris Data¶
Overview of handling ephemeris data in CASA
Ephemeris Objects¶
Since ALMA Cycle 3, the ALMA observatory includes in each raw dataset (SDM) all necessary ephemerides in the so-called Ephemeris table, an XML file which corresponds to the ephemeris used during the observation. Upon import to MS format, the task importasdm will translate the xml ephemerides into separate CASA ephemeris tables for each field, which have the same format as those used by the task setjy. Examples can be found in the subdirectory “data/ephemerides/JPL-Horizons” in each CASA distribution.
The ephemeris tables are automatically attached to the corresponding field(s) and can be used whenever needed. They have two main use cases:
An ephemeris table is necessary for the spectral frame transformation of the visibilities to the source rest frame permanently when creating a new MS using the tasks cvel, cvel2, or mstransform, or on-the-fly during imaging using the task tclean by setting the parameter specmode to “cubesource”. The ephemeris provides the location of the phase center and the radial velocity of the target.
An ephemeris table is also necessary if your observation has tracked some other phase center but you would like to track the location described by the ephemeris. This can be achieved by setting the parameter phasecenter of task tclean to the string “TRACKFIELD”. See the tclean documentation for more details.
The ephemerides used by ALMA are originally for the observer location of ALMA (topocentric). They use the ICRS coordinate reference frame and typically have a time spacing of ten or twenty minutes. For the later transformation of the spectral reference frame, however, a geocentric ephemeris is necessary. The importasdm task will therefore by default also perform a conversion from the ALMA observer location to the geocenter. This is controlled by the importasdm parameter convert_ephem2geo which is True by default.
The spectral reference frame of the visibilities in the SDM is always topocentric (TOPO), the observer reference frame. To analyze spectral lines in an ephemeris source, it can be helpful to transform the visibilities to the source rest spectral frame (REST). This is in particular necessary when the velocity of the source varies significantly throughout the observation (which can, e.g., happen if the observation is spread over serveral days). As mentioned above, this offline software Doppler-shift correction can be achieved in two ways, either permanently or on-the-fly. This is described further below.
When an ephemeris is attached to a field of the MS FIELD table, the object position does not correspond to the direction columns of the FIELD table but is retrieved by linearly interpolating from the ephemeris table for the given time. The nominal entry of the direction column then changes its meaning to an angular offset with respect to the ephemeris. Thus, if the field center is exactly at the position described by the ephemeris, the nominal entries of the direction column are zero for right ascension and declination. The offset feature of the FIELD table direction column comes into play in the case of a mosaic: if there are a number of fields with nearby positions, the fields can share an ephemeris; they all reference the same ephemeris table via the optional column EPHEMERIS_ID, but have different offsets in their direction column entries.
Because the nominal FIELD table direction column entries do not correspond to the actual object position, one should obtain the object position with the following special tool method:
msmd.phasecenter()
or the more general:
ms.getfielddirmeas()
(see the inline CASA help for details on these tools). The default time of the position is taken from the TIME column of the FIELD table.
Permanent spectral frame transformation creating a new MS
Either with either the task cvel or its faster implementation cvel2 (which uses internally the same code as the task mstransform) or with mstransform, a new MS can be created which contains visibilities in the spectral reference frame of the source. All three tasks should produce the same result. As a matter of fact, an online Doppler-shift tracking corresponding to the velocity of the source at the beginning of the observation is applied during observations. This online correction allows one to tune the spectral windows adequately to match the requested rest frequencies, but the labelling of the spectral frame of the raw data remains in TOPO.
The user must set the outframe parameter of cvel, cvel2, or mstransform to “SOURCE”. This will lead to a transformation from the TOPO to the GEO reference frame followed by an additional time-dependent Doppler correction according to the radial velocity of the object as given by its ephemeris.
Such MSes should then be imaged using the setting “cubedata” for the parameter specmode in tclean. The resulting spectral reference frame in the MS is named “REST”.
On-the-fly spectral frame transformation within tclean
If a permanent storage of spectrally transformed visibilities in a new MS is not needed, cubes with the spectral frame of the ephemeris object can also be obtained by letting the task tclean perform the frame transformation on-the-fly. This is simply achieved by setting parameter specmode to the string “cubesource”. The resulting cube will also have the reference frame named “REST”.
In summary, with ephemerides included in the ALMA raw data and the added support for this in the importasdm task, the user rarely needs to worry about how to obtain the right ephemeris in the right format and how to attach it properly to the MS. This process is transparent and only a few logger messages indicate that it is happening (see CASA Docs pages on ‘Manipulate Ephemeris Objects’ if a new ephemeris needs to be added to an existing MS). The correct time-dependent positions, radial velocities, and object distances are used in all relevant tasks such as listobs, plotms, and, as described above, cvel, cvel2, mstransform, and tclean. For Solar-System object flux calibrators, the task setjy will, however, only extract the nominal position from the SDM ephemeris and otherwise use its internal set of ephemerides since these contain additionally needed object parameters. Care has to be taken when trying to extract the field positions from the FIELD table as the nominal direction column entries will only be offsets (w.r.t. the ephemeris position) when an ephemeris is attached.
WARNING: Virtual model columns in the MS do not correctly support ephemeris objects, although they will run without generating errors or warnings. If any of your calibrators exhibit significant celestial motion on the timescale of your observation (e.g., any solar system object), you must set usescratch=True in calls to setjy.
As opposed to ALMA data which use a tabulated representation of the ephemerides, VLA data use a polynomial representation of the positions and radial velocities. Also this representation is supported. The polynomial ephemeris is internally tabulated with a default time step of 0.001 days and then processed as in the ALMA case. The importasdm parameter polyephem_tabtimestep can be used to control the step size of the tabulation.
Manipulate Ephemeris Objects¶
When an astronomical object has a proper motion, in particular objects in our solar system, a static (RA,Dec) position in the FIELD table of the MeasurementSet will not accurately describe the time-dependent position. Prior to CASA 4.2, there was no support for ephemeris objects other than the built-in reference frames for the Sun, the Moon, and the planets out to PLUTO. With CASA 4.2, several new features were introduced which help the user to attach an arbitrary ephemeris to a given field and work with the object from calibration to imaging. These can be used when no ephemeris table was provided by the observatory, or in cases where the use of an improved ephemeris table is necessary.
Ephemeris tables
The CASA format for ephemeris tables was introduced in the early development stages of CASA in connection with the Measures module. The me tool (see CASA Tools on using the tool kit, as well as the inline help on the me tool inside CASA for specific usage) permits position calculations based on ephemerides in this format. Two examples for such tables can be found in the distribution directory in subdirectory data/ephemerides: VGEO is an ephemeris of Venus in the geocentric reference frame while VTOP is an ephemeris for the same object in the TOPO reference fame for the observatory location of the VLA. With the introduction of solar system source models (Butler) in the setjy task, a nearly complete set of ephemerides for the larger bodies in our solar system had to be made available. These are stored in nearly the same format as the above examples VGEO and VTOP (but with a few enhancements) in directory data/ephemerides/JPL-Horizons. If your object’s ephemeris is among those stored in data/ephemerides/JPL-Horizons, you can simply copy the ephemeris from there. Otherwise, you can request the ephemeris from the JPL-Horizons using the CASA commands (for example):
import casatasks.private.request as jplreq
(For CASA 5: import recipes.ephemerides.request as jplreq)
jplreq.request_from_JPL(objnam='Mars',startdate='2012-01-01',enddate='2013-12-31',date_incr='0.1 d', get_axis_orientation=False,
get_axis_ang_orientation=True,
get_sub_long=True, use_apparent=False, get_sep=False,
return_address='YOUR_EMAIL_ADDESS',
mailserver='YOUR_MAIL_SERVER_ADDRESS')
where you need to fill in the parameters objnam, startdate, enddate,date_incr (the time interval between individual ephemeris table entries), return_address (your email address where you want to receive the ephemeris), and mailserver (the smtp server through which you want to send the request email). The other parameters should be set as shown. Within a short time, you should receive the requested ephemeris as an email from NASA’s JPL-Horizonssystem. Save the email into a file with the “save as” function of your mail client. See the next section on how to attach it to your dataset.
Using fixplanets to attach ephemerides to a field of a MeasurementSet
Task importasdm fills the SOURCE coodinates with the correct postions based on the ephemerides table. Alternatively, one can use the task fixplanets to set the ephemeris of a given field in a MeasurementSet. Here is an example:
fixplanets(vis='uid___A002_X1c6e54_X223.ms',
field='Titan', fixuvw=True, direction='mytitanephemeris')
where you need to set the parameters vis to the name of your MS and the parameter field to the name of the field to which you want to attach the ephemeris. The parameter direction must be set to the name of your ephemeris table. Accepted formats are (a) the CASA format (as in VGEO or the ephemerides in data/ephemerides/JPL-Horizons as described above) and (b) the JPL-Horizons mail format which you obtain by saving an ephemeris email you received from JPL-Horizons. The parameter fixuvw should be set to True in order to trigger a recalculation of the UVW coordinates in your MS based on the new ephemeris. The task fixplanets can also be used for other field direction modifications. Please refer to the help text of the task.
Note that among the ephemerides in the directory data /ephemerides/JPL-Horizons/ you should only use those ending in ‘_J2000.tab’. They are the ones in J2000 coordinates.
Use of the ephemeris after attachment
Once you have attached the ephemeris to a field of an MS, it will automatically be handled in tasks like split and concat which need to hand on the ephemeris to their output MSs. In particular concat recognizes when fields of the MSs to be concatenated use the same ephemeris and merges these fields if the time coverage of the provided ephemeris in the first MS also covers the observation time of the second MS. The ephemeris of the field in the first MS will then be used for the merged field. In order to inspect the ephemeris attached to a field, you can find it inside the FIELD subdirectory of your MS. The optional column EPHEMERIS_ID in the FIELD table points to the running number of the ephemeris table. A value of −1 indicates that no ephemeris is attached. Note that in case an ephemeris is attached to a field, the direction column entries for that field in the FIELD table will be interpreted as an offset to the ephemeris direction and are therefore set to (0.,0.) by default. This offset feature is used in mosaic observations where several fields share the same ephemeris with different offsets. The time column in the FIELD table should be set to the beginning of the observation for that field and serves as the nominal time for ephemeris queries.
Spectral frame transformation to the rest frame of the ephemeris object in task cvel
The ephemerides contain radial velocity information. The task cvel can be used to transform spectral windows into the rest frame of the ephemeris object by setting the parameter outframe to “SOURCE” as in the following example:
cvel(vis='europa.ms',
outputvis='cvel_europa.ms', outframe='SOURCE', mode = 'velocity',
width = '0.3km/s', restfreq = '354.50547GHz')
This will make cvel perform a transformation to the GEO reference frame followed by an additional Doppler correction for the radial velocity given by the ephemeris for the each field. (Typically, this should happen after calibration and after splitting out the spectral widows and the target of interest). The result is an MS with a single combined spectral window in reference frame REST. From this frame, further transformations to other reference frames are not possible.
Ephemerides in ALMA datasets
The ALMA Science Data Model (the raw data format for ALMA data) now foresees an Ephemeris table. This feature has been in use since the beginning of ALMA Cycle 3 both for science targets and calibrator objects. With ALMA Cycle 3 (or later) datasets, the task importasdm will automatically translate the contents of the ASDM Ephemeris table into separate ephemeris tables in CASA format and attach them to the respective fields.
In the case of mosaics, all fields belonging to a mosaic on an ephemeris object will share the same ephemeris. The individual mosaic pointings will use the offset mechanism described above to define the position of each field.
Imaging ALMA observations with tclean
The tclean task can automatically handle the imaging of ALMA observations (both single-execution and multi-execution datasets, and both single-field and mosaics) by using the new phasecenter=’TRACKFIELD’ option. This option will use the ephemeris tables attached to each measurementSet by the ALMA control system. These tables will have ultimately been provided by the observatory for the case of large bodies (those selectable in the ALMA Observing Tool), or by the PI as attachments in the Observing Tool for the case of smaller bodies.
*WARNING*: if you want to use the old method of concatenating calibrated MeasurementSets by using the forcesingleephemfield parameter to create a common joint ephemeris table, then you must still set phasecenter=’TRACKFIELD’ in tclean order to get a sensible image, even though you are passing it only one (concatenated) MeasurementSet. If not, you may get a corrupt image, even if you select a subset of data only from the first execution block in the concatenated MS.
Imaging observations from other telescopes with tclean
For non-ALMA data, or to use a newer ephemeris than was available at the time of the ALMA observations, the user may set the phasecenter parameter to the name of an ephemeris file, constructed as described in the earlier section above. Alternatively, the user may set the phasecenter to the common name of the following bodies: ‘MERCURY’, ‘VENUS’, ‘MARS’, ‘JUPITER’, ‘SATURN’, ‘URANUS’, ‘NEPTUNE’, ‘PLUTO’, ‘SUN’, ‘MOON’, in which case the standard DE200 ephemeris table distributed with CASA will be used.
Imaging & Analysis¶
This section provides an overview of synthesis imaging, single-dish imaging and subsequent image manipulation and analysis routines.
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/synthesis_imaging.ipynb
Synthesis Imaging¶
This chapter documents CASA’s refactored imager. These features are visible to the user via the tclean task. Image products can be visualized with the CASA Viewer, which in CASA 6 should be initialized with the task imview.
The first five sections give an algorithm-centric view of the imager framework and are meant to convey the overall iterative reconstruction framework and how various algorithms and usage options fit into it. The other sections are more user-centric and focus on what one would need to know about specific imaging goals such as wideband imaging, mosaicking or details about spectral-cube definitions, etc. There is some overlap in content between sections, but this is meant to address the needs of readers who want to understand how the system works as well as those who want to learn how to approach their specific use case.
Introduction¶
Image reconstruction in radio interferometry is the process of solving the linear system of equations \(\vec{V} = [A] \vec{I}\), where \(\vec{V}\) represents visibilities calibrated for direction independent effects, \(\vec{I}\) is a list of parameters that model the sky brightness distribution (for example, a image of pixels) and \([A]\) is the measurement operator that encodes the process of how visibilities are generated when a telescope observes a sky brightness \(\vec{I}\). \([A]\) is generally given by \([S_{dd}][F]\) where \([F]\) represents a 2D Fourier transform, and \([S_{dd}]\) represents a 2D spatial frequency sampling function that can include direction-dependent instrumental effects. For a practical interferometer with a finite number of array elements, \([A]\) is non-invertible because of unsampled regions of the \(uv\) plane. Therefore, this system of equations must be solved iteratively, applying constraints via various choices of image parameterizations and instrumental models.
Implementation ( major and minor cycles ):
Image reconstruction in CASA comprises an outer loop of major cycles and an inner loop of minor cycles. The major cycle implements transforms between the data and image spaces and the minor cycle operates purely in the image domain. Together, they implement an iterative weighted \(\chi^2\) minimization process that solves the measurement equation.
Iterative Image Reconstruction - Major and Minor Cycles
The data to image transform is called the imaging step in which a pseudo inverse of \([S_{dd}][F]\) is computed and applied to the visibilities. Operationally, weighted visibilities are convolutionally resampled onto a grid of spatial-frequency cells, inverse Fourier transformed, and normalized. This step is equivalent to calculating the normal equations as part of a least squares solution. The image to data transform is called the prediction step and it evaluates the measurement equation to convert a model of the sky brightness into a list of model visibilities that are later subtracted from the data to form residual visibilities. For both transforms, direction dependent instrumental effects can be accounted for via carefully constructed convolution functions.
Iterations begin with an initial guess for the image model. Each major cycle consists of the prediction of model visibilities, the calculation of residual visibilities and the construction of a residual image. This residual image contains the effect of incomplete sampling of the spatial-frequency plane but is otherwise normalized to the correct sky flux units. In its simplest form, it can be written as a convolution of the true sky image with a point spread function. The job of the minor cycle is to iteratively build up a model of the true sky by separating it from the point spread function. This step is also called deconvolution and is equivalent to the process of solving the normal equations as part of a least squares solution. Different reconstruction algorithms can operate as minor cycle iterations, allowing for flexibility in (for example) how the sky brightness is parameterized. The imaging step can be approximate in that several direction dependent effects, especially baseline, frequency or time-dependent ones can sometimes ignored, minor cycles can be approximate in that they use only PSF patches and do not try to be accurate over the entire image, but the prediction step of the major cycle must be as accurate as possible such that model components are converted to visibilities by including all possible instrumental effects.
Basic Sequence of Imaging Logic:
Data : Calibrated visibilities, data weights, UV sampling function
Input : Algorithm and iteration controls (stopping threshold, loop gain,...)
Output : Model Image, Restored Image, Residual Image,...
Initialize the model image
Compute the point spread function
Compute the initial residual image
While ( not reached global stopping criterion ) /* Major Cycle */
{
While ( not reached minor-cycle stopping criterion ) /* Minor Cycle */
{
Find the parameters of a new flux component
Update the model and residual images
}
Use current model image to predict model visibilities
Calculate residual visibilities (data - model)
Compute a new residual image from residual visibilities
}
Convolve the final model image with the fitted beam and add to the residual image
Algorithmic Options :
Within the CASA implementation, numerous choices are provided to enable the user to fine-tune the details of their image reconstruction. Images can be constructed as spectral cubes with multiple frequency channels or single-plane wideband continuum images. One or more sub images may be defined to cover a wide field of view without incurring the computational expense of very large images. The iterative framework described above is based on the Cotton-Schwab Clean algorithm [3], but variants like Hogbom Clean [1] and Clark Clean [2] are available as subsets of this framework. The major cycle allows controls over different data weighting schemes [10] and convolution functions that account for wide-field direction-dependent effects during imaging and prediction [[6], [7] , [8]]. Deconvolution options include the use of point source vs multi-scale image models [4] , narrow-band or wide-band models [5], controls on iteration step size and stopping criteria, and external constraints such as interactive and non-interactive image masks. Mosaics may be made with data from multiple pointings, either with each pointing imaged and deconvolved separately before being combined in a final step, or via a joint imaging and deconvolution [9]. Options to combine single dish and interferometer data during imaging also exist. More details about these algorithms can be obtained from [[10], [11], [12], [13]]
Types of images¶
Ways to set up images (Cube vs MFS, single field, outliers, facets, Stokes planes ) and select data
The visibility data can be selected in many ways and imaged separately (e.g. one spectral window, one field, one channel). Data selection can also be done in the image-domain where the same data are used to create multiple image planes or multiple images (e.g. Stokes I,Q,U,V, or Taylor-polynomial coefficients or multiple-facets or outlier fields).
Parameters for data selection and image definition together define the following options.
Data Selection |
Imaging Definition |
---|---|
Spectral Axis |
Cube (multiple channels) or MFS (single wideband channel) or MT-MFS (multi-term wideband images) |
Polarization axis |
Stokes Planes ( I, IV, IQUV, pseudoI, RR, LL, XX, YY, etc ) |
Sky Coordinates |
Image shape, cell size, phasecenter, with or without outlier fields |
Data Selection |
One pointing vs multiple pointings for a mosaic, data from multiple MeasurementSets, etc. |
For the most part, the above axes are independent of each other and logical (and useful) combinations of them are allowed. For example, spectral cubes or wideband multi-term images can have outlier fields and/or mosaics. An example of a prohibited combination is the use of facets along with mosaics or a-projection as their algorithmic requirements contradict each other.
Spectral Cubes:
During gridding, N Data channels are binned onto M image channels using several optional interpolation schemes and doppler corrections to transform the data into the LSRK reference frame. When data from multiple channels are mapped to a single image channel, multi-frequency-synthesis gridding is performed within each image channel. More details are explained on the Spectral Line Imaging page. As can be seen from the diagram, parallelization for cube imaging can be naturally done by partitioning data and image planes by frequency for both major and minor cycles.
Continuum Images
Wideband imaging involves mapping data from a wide range of frequency channels onto a single image channel.
Multi Frequency Synthesis (MFS) - Single Wideband Image
Data from all selected data channels are mapped to a single broadband uv-grid using appropriate uvw coordinates, and then imaged. This is accessed via the ” specmode=’mfs’ ” option in the tclean task. Since there is only one uv grid and image, parallelization for continuum imagng is done only for the major cycle via data partitioning.
Multi-Term Multi Frequency Synthesis (MTMFS) - Taylor Coefficient Images
An improvement to standard MFS that accounts for changes in spectral index as a function of sky position is available that uses Taylor weighted averages of data from all frequencies accumulated onto NTerms uv-grids before imaging. These Taylor-weighted residual images form the input for the minor cycle of the Multi-Term MFS deconvolution algorithm which performs a linear least squares fit (see Deconvolution Algorithms section for more information) during deconvolution to obtain Taylor Coefficients per component (to represent sky spectra as polynomials in \(I\) vs \(\nu\)). This option is accessed via ” specmode=’mfs’ and deconvolver=’mtmfs’, nterms=2. ” For the same data size as standard MFS (nterms=1), Multi-Term MFS will have \(N_t\) times the gridding cost and number of images stored in memory. Parallelization is again done only for the major cycle via data partitioning.
Polarization Planes
Data in the correlation basis are gridded onto separate planes per correlation, imaged, and then transformed into the Stokes basis. A special case for single plane Stokes I is implemented where data from both parallel hands are gridded onto a single uv-grid (to save memory). The point spread function is always taken from the Stokes I gridded weights. Images can be made for all Stokes parameters and correlation pairs (or all combinations possible with the selected data). This is an image-partitioning, where the same data are used to construct the different imaging products. Currently, if any correlation is flagged, all correlations for that datum are considered as flagged. An exception is the ‘pseudoI’ option which allows Stokes I images to include data for which either of the parallel hand data are unflagged.
Multiple Fields
A very large field of view can sometimes be imaged as a main field plus a series of (typically) smaller outlier fields. Imaging of fields with relatively few bright outlier sources can benefit from the overal reduction in image size that this option provides. Instead of gridding the visibilities data onto a giant uv-grid, they are gridded onto multiple smaller images. Each sub-image is then deconvolved via separate minor cycles and their model images combined to predict model visibiliitles to subtract from the data in the next major cycle. The user must specify different phase reference centers for each image field.
Different image shapes and gridding and deconvolution algorithms can be chosen for the different outlier fields. For example, one could apply single-plane wideband imaging on the main field, but employ multi-term MFS for an outlier field to account for artificial spectral index due to the wideband primary beam at its location. One can also combine MFS and Cube shapes for different outlier fields, or choose to run Multi-Scale CLEAN on the main field and Hogbom CLEAN on a bright compact outlier.
Overlapping fields are supported when possible (i.e. when the image types are similar enough across outliers) by always picking the “last” instance of that source in the list of outlier images in the order specified by the user. This convention implies that sources in the overlapping area are blanked in the “earlier” model images, such that those sources are not subtracted during the major cycles that clean those images.
Multiple Facets
Faceted imaging is one way of handling the w-term effect. A list of facet-centers is used to grid the data separately onto multiple adjacent sub-images. The sub images are typically simply subsets of a single large image so that the deconvolution can be performed as a joint image and a single model image is formed. The PSF to be used for deconvolution is picked from the first facet. The list of phase reference centers for all facets is automatically generated from user input of the number of facets (per side) that the image is to be divided into.
Mosaics
Data from multiple pointings can be combined to form a single large image. The combination can be done either before/during imaging or after deconvolution and reconstruction.
Stitched Mosaic
Data from multiple pointings are imaged and deconvolved separately, with the final output images being combined using a primary beam model as a weight. This is achieved by running the imaging task (tclean) separately per pointing, and combining them later on using the tool im.linearmosaic().
Joint Mosaic
Data taken with multiple pointings (and/or phase-reference centres) can be combined during imaging by selecting data from all fields together (multiple field-ids), and specifying only one output image name and one phase-reference center. If mosaic mode is enabled (gridder=’mosaic’ or ‘awproject’) attention is paid to the pointing centers of each data-fieldID during gridding. Primary-beam models are internally used during gridding (to effectively weight the images that each pointing would produce during a combination) and one single image is passed on to the deconvolution modules.
Imaging Algorithms¶
Imaging is the process of converting a list of calibrated visiblities into a raw or ‘dirty’ image. There are three stages to inteferometric image-formation: weighting, convolutional resampling, and a Fourier transform.
Data Weighting¶
During imaging, the visibilities can be weighted in different ways to alter the instrument’s natural response function in ways that make the image-reconstruction tractable.
Data weighting during imaging allows for the improvement of the dynamic range and the ability to adjust the synthesized beam associated with the produced image. The weight given to each visibility sample can be adjusted to fit the desired output. There are several reasons to adjust the weighting, including improving sensitivity to extended sources or accounting for noise variation between samples. The user can adjust the weighting using tclean and changing the weighting parameter with seven options: ‘natural’, ‘uniform’, ‘briggs’, ‘superuniform’, ‘briggsabs’, ‘briggsbwtaper’, and ‘radial’. Optionally, a UV taper can be applied, and various parameters can be set to further adjust the weight calculations.
Natural weighting
The natural weighting scheme gives equal weight to all samples. Since usually, lower spatial frequencies are sampled more often than the higher ones, the inner uv-plane will have a significantly higher density of samples and hence signal-to-noise than the outer uv-plane. The resulting “density-upweighting” of the inner uv-plane will produce the largest angular resolution and can sometimes result in undesirable structure in the PSF which reduces the accuracy of the minor cycle. However, at the location of a source, this method preserves the natural point-source sensitivity of the instrument.
For weighting=’natural’, visibilities are weighted only by the data weights, which are calculated during filling and calibration and should be equal to the inverse noise variance on that visibility. Imaging weight \(w_i\) of sample \(\dot\imath\) is given by:
\(w_i = \omega_i = \frac{1}{{\sigma_i}^2}\)
where the data weight \(\omega_i\) is determined from \(\sigma_i\), the rms noise on visibility \(\dot\imath\). When data is gridded into the same uv-cell for imaging, the weights are summed, and thus a higher uv density results in higher imaging weights. No sub-parameters are linked to this mode choice. It is the default imaging weight mode, and it should produce “optimum” image with with the lowest noise (highest signal-to-noise ratio).
NOTE: This generally produces images with the poorest angular resolution, since the density of visibilities falls radially in the uv-plane.
Uniform weighting
Uniform weighting gives equal weight to each measured spatial frequency irrespective of sample density. The resulting PSF has the narrowest possible main lobe (i.e. smallest possible angular resolution) and suppressed sidelobes across the entire image and is best suited for sources with high signal-to-noise ratios to minimize sidelobe contamination between sources. However, the peak sensitivity is significantly worse than optimal (typically ~20% worse for reasonably large number of antenna interferometers), since data points in densely sampled regions have been weighted down to make the weights uniform. Also, isolated measurements can get artifically high relative weights and this may introduce further artifacts into the PSF.
For weighting=’uniform’, the data weights are calculated as in ‘natural’ weighting. The data is then gridded to a number of cells in the uv-plane, and after all data is gridded the uv-cells are re-weighted to have “uniform” imaging weights. This pumps up the influence on the image of data with low weights (they are multiplied up to be the same as for the highest weighted data), which sharpens resolution and reduces the sidelobe level in the field-of-view, but increases the rms image noise. No sub-parameters are linked to this mode choice.
For uniform weighting, we first grid the inverse variance \(\omega_i\) for all selected data onto a grid with uv cell-size given by 2 ∕ FOV, where FOV is the specified field of view (defaults to the image field of view). This forms the gridded weights \(W_k\). The weight of the \(\dot\imath\)-th sample is then:
\(w_i = \frac{\omega_i}{W_k}\)
Briggs weighting
Briggs or Robust weighting [14] creates a PSF that smoothly varies between natural and uniform weighting based on the signal-to-noise ratio of the measurements and a tunable parameter that defines a noise threshold. High signal-to-noise samples are weighted by sample density to optimize for PSF shape, and low signal-to-noise data are naturally weighted to optimize for sensitivity.
The weighting=’briggs’ mode is an implementation of the flexible weighting scheme developed by Dan Briggs in his PhD thesis, which can be viewed here.
This choice brings up the sub-parameters:
weighting = 'briggs' #Weighting to apply to visibilities
robust = 0.0 #Briggs robustness parameter
npixels = 0 #number of pixels to determine uv-cell size 0=> field of view
The actual weighting scheme used is:
\(w_i = \frac{\omega_i}{1 + W_k f^2}\)
where
\(w_i\) is the image weight for a given visibility point \(i\);
\(\omega_i\) is the visibility weight for a given visibility point \(i\);
\(W_k = \Sigma_{cell=k}\,\omega_{k}\) is the weight density of a given cell \(k\) (with \(\omega_{k}\) the weight of a uv point that falls in cell \(k\)). When using npixels > 0 then \(\Sigma_{\omega_{k}}\) is over all weights that fall in cells in range k ± npixels
\(f^2 = \frac{(5 \times 10^{-\text{R}})^2}{\frac{\Sigma_k W_k^2}{\Sigma_i \omega_i}}\);
R is the robust sub-parameter.
The key parameter is the robust sub-parameter, which sets R in the Briggs equations. The scaling of R is such that robust=0 gives a good trade-off between resolution and sensitivity. The robust R takes value between -2.0 (close to uniform weighting) to 2.0 (close to natural).
Superuniform weighting can be combined with Briggs weighting using the npixels sub-parameter. This works as in ‘superuniform’ weighting.
Briggsabs weighting
Briggsabs is an experimental weighting scheme that is an adapted version of the Briggs weighting scheme, and is much more aggressive with respect to changes in npixels, the uv-cell size.
For weighting=’briggsabs’, a slightly different Briggs weighting is used, with:
\(w_i = \frac{\omega_i}{W_k \text{R}^2 + 2\sigma_\text{i}^2}\)
where R is the robust parameter and \(\sigma_\text{i}\) is the noise parameter. In this case, R makes sense for −2.0 ≤ R ≤ 0.0 (R = 1.0 will give the same result as R = −1.0)
This choice brings up the sub-parameters:
weighting = 'briggsabs' #Weighting to apply to visibilities
robust = 0.0 #Briggs robustness parameter
noise = '0.0Jy' #noise parameter for briggs weighting when rmode='abs'
npixels = 0 #number of pixels to determine uv-cell size 0=> field of view
WARNING: Briggsabs weighting is experimental - use at own risk!
Briggsbwtaper weighting
Briggsbwtaper is an experimental weighting scheme for specmode=’cube’ and perchanweightdensity=True. This scheme adds an inverse uv taper to Briggs weighting. The uv taper is proportional to the fractional bandwidth of the entire cube, and is applied per channel. This modifies the cube (perchanweightdensity = True) imaging weights to have a similar density to that of the continuum (specmode=’mfs’) imaging weights.
The weighting is given by:
\(w_i = \frac{\omega_i}{(1+\frac{W_kf^2}{m})}\)
The uv taper \(m\) is a piecewise function of the the fractional bandwidth uv distance:
\(r_{\nu}= \frac{\Delta\nu \sqrt{u^2_{\rm pix} + v^2_{\rm pix}}}{\nu_c}\),
where \(\nu_c\) and \(\Delta\nu\) are, respectively, the central frequency and total bandwidth of the spectral window and (\(u_{\rm pix}, v_{\rm pix}\)) are the pixel coordinate associated with imaging weight \(w_i\). For \(r_{\nu} \ge 1\)
\(m = r_{\nu} + 0.5\),
and for \(r<1\)
\(m = \frac{4-r_{\nu}}{4-2r_{\nu}}\)
For more information (link to memo).
Superuniform weighting
The weighting=’superuniform’ mode is similar to the ‘uniform’ weighting mode but there is now an additional npixels sub-parameter that specifies a change to the number of cells on a side (with respect to uniform weighting) to define a uv-plane patch for the weighting renormalization. If npixels=0, you get uniform weighting.
Radial weighting
The weighting=’radial’ mode is a seldom-used option that increases the weight by the radius in the uv-plane, i.e.:
\(w_i = \omega_i \times \sqrt{u_i^2 + v_i^2}\)
Technically, this would be called an inverse uv-taper, since it depends on uv-coordinates and not on the data per-se. Its effect is to reduce the rms sidelobes for an east-west synthesis array. This option has limited utility.
Perchanweightdensity
When calculating weight density for Briggs style weighting in a cube, the perchanweightdensity parameter determines whether to calculate the weight density for each channel independently (the default, True) or a common weight density for all of the selected data. This parameter has no meaning for continuum (specmode=’mfs’) imaging but for cube imaging perchanweightdensity=True is a recommended alternative option that provides more uniform sensitivity per channel for cubes, but with generally larger psfs than the perchanweightdensity=False option (which was also the behavior prior to CASA 5.5). When using Briggs style weight with perchanweightdensity=True, the imaging weight density calculations use only the weights of data that contribute specifically to that channel. On the other hand, when perchanweightdensity=False, the imaging weight density calculations sum all of the weights from all of the data channels selected whose (u,v) falls in a given uv cell on the weight density grid. Since the aggregated weights, in any given uv cell, will change depending on the number of channels included when imaging, the psf calculated for a given frequency channel will also necessarily change, resulting in variability in the psf for a given frequency channel when perchanweightdensity=False. In general, perchanweightdensity=False results in smaller psfs for the same value of robustness compared to perchanweightdensity=True, but the rms noise as a function of channel varies and increases toward the edge channels; perchanweightdensity=True provides more uniform sensitivity per channel for cubes. This may make it harder to find estimates of continuum when perchanweightdensity=False. If you intend to image a large cube in many smaller subcubes and subsequently concatenate, it is advisable to use perchanweightdensity=True to avoid surprisingly varying sensitivity and psfs across the concatenated cube.
NOTE: Setting perchanweightdensity = True only has effect when using Briggs (robust) or uniform weighting to make an image cube. It has no meaning for natural and radial weighting in data cubes, nor does it have any meaning for continuum (specmode=’mfs’) imaging.
Mosweight
When doing Brigg’s style weighting (including uniform) in tclean, the mosweight subparameter of the mosaic gridder determines whether to weight each field in a mosaic independently (mosweight = True), or to calculate the weight density from the average uv distribution of all the fields combined (mosweight = False). The underlying issue with more uniform robust weighting is how the weight density maps onto the uv-grid, which can give high weight to areas of the uv-plane that are not actually more sensitive. The setting mosweight = True has long been known as potentially useful in cases where a mosaic has non-uniform sensitivity, but it was found that it is also very important for more uniform values of robust Briggs weighting in the presence of relatively poor uv-coverage. For example, snap-shot ALMA mosaics with mosweight = False typically show an increase in noise in the corners or in the areas furthest away from the phase-center. Therefore, as of CASA 5.4, the mosweight sub-parameter in tclean has the default value mosweight = True.
WARNING: the default setting of mosweight=True under the mosaic gridder in tclean has the following disadvantages: (1) it may potentially cause memory issues for large VLA mosaics; (2) the major and minor axis of the synthesized beam may be ~10% larger than with mosweight=False. Please change to mosweight=False to get around these issues.
uvtaper
The effect of uvtaper this is that the clean beam becomes larger, and surface brightness sensitivity increases for extended emission.
uv-tapering applies a Gaussian taper on the weights of your UV data, in addition to the weighting scheme specified via the ‘weighting’ parameter. It applies a multiplicative Gaussian taper to the spatial frequency grid, to weight down high spatial-frequency measurements relative to the rest. This means that higher spatial frequencies are weighted down relative to lower spatial frequencies, to suppress artifacts arising from poorely sampled regions near and beyond the maximum spatial frequency in the uv-plane. It is equivalent to smoothing the PSF obtained by other weighting schemes and can be specified either as a Gaussian in uv-space (eg. units of lambda or klambda) or as a Gaussian in the image domain (eg. angular units like arcsec). Because the natural PSF is smoothed out, this tunes the sensitivity of the instrument to scale sizes larger than the angular-resolution of the instrument by increasing the width of the main lobe. There are limits to how much uv-tapering is desirable, however, because the sensitiivty will decrease as more and more data is down-weighted.
NOTE: The on-sky FWHM in arcsec is roughly the uvtaper / 200 (klambda).
Examples:
uvtaper=['5klambda'] circular taper FWHM=5 kilo-lambda,
uvtaper=['5klambda','3klambda','45.0deg'],
uvtaper=['10arcsec'] on-sky FWHM 10 arcseconds,
uvtaper=['300.0'] default units are lambda in aperture plane,
uvtaper=[]; no outer taper applied (default)
Gridding + FFT¶
Imaging weights and weighted visibilities are first resampled onto a regular uv-grid (convolutional resampling) using a prolate-spheroidal function as the gridding convolution function (GCF). The result is then Fourier-inverted and grid-corrected to remove the image-domain effect of the GCF. The PSF and residual image are then normalized by the sum-of-weights.
Direction-dependent corrections
Basic gridding methods use prolate spheroidals for gridding (gridder=’standard’) along with image-domain operations to correct for direction-dependent effects. More sophisticated, and computationally-intensive methods (gridder=’wproject’,’widefield’,’mosaic’,’awproject’) apply direction-dependent, time-variable and baseline-dependent corrections during gridding in the visibility-domain, by choosing/computing the appropriate gridding convolution kernel to use along with the imaging-weights.
The figure below shows examples of kernels used for the following gridding methods: Standard, W-Projection, and A-Projection. Combinations of wide-field corrections are done by convolving these kernels together. For example, AW-Projection will convolve W-kernels with baseline aperture functions and possibly include a prolate spheroidal as well for its anti-aliasing properties. Mosaicing is implemented as a phase gradient across the gridding convolution kernel calculated at the uv-cell resolution dictated by the full mosaic image size.
In tclean, gridder=’mosaic’ uses Airy disk or polynomial models to construct azimuthally symmetric beams per antenna that are transformed into baseline convolution functions and used for gridding. gridder=’awproject’ uses ray-traced models of antenna aperture illumination functions to construct GCFs per baseline and time (including azimuthal asymmetry, beam squint, and rotation with time). More details are given in the Wide Field Imaging section.
Computing costs during gridding scale directly with the number of pixels needed to accurately describe each convolution kernel. The standard gridding kernel (prolate spheroid) typically has 3x3 pixels. W-Projection kernels can range from 5x5 to a few hundred pixels on a side. A-Projection kernels typically range from 8x8 to 50x50 pixels. When effects are combined by convolving together different kernels (for example A and W Projection), the kernel sizes increase accordingly.
Memory (and one-time computing costs) also scale with the number of distinct kernels that one must operate with. For example, a large number of different W-Projection kernels, or an array whose antenna illumination patterns are different enough between antennas that they need to be treated separately. In the case of a heterogenous array, each baseline illumination function can be different. Additionally, if any of these aperture illumination based kernels are rotationally asymmetric, they will need to be rotated (or recomputed at different parallactic angles) as a function of time.
Normalization¶
After gridding and the FFT, images must be normalized (by the sum of weights, and optionally by some form of the primary beam weights) to ensure that the flux in the images represents sky-domain flux.
Sum-Of-Weights and Weight Images
The tclean task produces a number of output images used for normalization. The primary reason these are explicit images on disk (and not just internal temporary files in memory) is that for continuum paralellization, there is the need to accumulate numerators and denominators separately before the normalization step. For the most part, end users can safely ignore the output .weight, .sumwt and .gridwt images. However, their contents are documented here.
.sumwt
A single-pixel image containing the sum-of-weights (or, the peak of the PSF). For natural weighting, this is just the sum of the data weights. For other weighting schemes it contains the effect of the weighting algorithm. For instance, uniform weighting will typically produce a smaller sum-of-weights than natural weighting. An approximate theoretical sensitivity can be computed as sqrt( 1/sumwt ). A more accurate calculation requires a different calculation (see this CASA Knowledgebase article). In tclean, facetted imaging will produce one value of sumwt per facet as the normalizations are to be done separately per facet. Also, for cube imaging, .sumwt will contain one value per image channel and it can be used to visualize the relative weights across the spectrum (and therefore expected image rms). This theoretical sensitivity information is printed to the logger after the PSF generation stage.
.weight
Projection gridders such as ‘mosaic’ and ‘awproject’ use baseline aperture illumination functions for gridding. The quantity in the .weight image represents the square of the PB, accumulated over baselines, time and frequency. For mosaics, it includes a combination across pointing as well (although as can be seen from the equations in the mosaicing section, this is not accurate when weights between pointings differ considerably).
.gridwt
A series of temporary images for cube imaging that are stored within the parallel .workdirectory, and which accumulate binned natural weights before the calculation of imaging weights. This is not used for normalization anywhere after the initial image weighting stage.
Normalization Steps
Standard Imaging
For gridders other than ‘mosaic’ and ‘awproject’, normalization of the image formed after gridding and the FFT is just the division by the sum of weights (read from the .sumwt image). This suffices to transform the image into units of sky brightness. This is the typical flat-noise normalization (see below).
Imaging with primary beams (and mosaics)
For gridder=’mosaic’ and ‘awproject’ that use baseline aperture illumination functions during gridding, the result is an additional instance of the PB in the images, which needs to be divided out. Normalization involves three steps (a) division by the sum-of-weights (b) division by an average PB given by sqrt(weightimage) and (c) a scaling to move the peak of the PB = sqrt(weightimage) to 1.0. This ensures that fluxes in the dirty image (and therefore those seen by the minor cycle) represent true sky fluxes in regions where the primary beam is at its peak value, or where the mosaic has a relatively constant flat sensitivity pattern. The reverse operations of (b) and (c) are done before predicting a model image in the major cycle. ( This description refers to flat-noise normalization, and corresponding changes are done for the other options ).
Types of normalization
There are multiple ways of normalizing the residual image before beginning minor cycle iterations. One is to divide out the primary beam before deconvolution and another is to divide out the primary beam from the deconvolved image. Both approaches are valid, so it is important to clarify the difference between the two. A third option is included for completeness.
For all options, the ‘pblimit’ parameter controls regions in the image where PB-correction is actually computed. Regions below the pblimit cannot be normalized and are set to zero. For standard imaging, this refers only to the pb-corrected output image. For gridder=’mosaic’ and ‘awproject’ it applies to the residual, restored and pb-corrected images. A small value (e.g. pblimit=0.01) can be used to increase the region of the sky actually imaged. For gridder=’standard’, there is no pb-based normalization during gridding and so the value of this parameter is ignored.The sign of the pblimit parameter is used for a different purpose. If positive, it defines a T/F pixel mask that is attached to the output residual and restored images. If negative, this T/F pixel mask is not included. Please note that this pixel mask is different from the deconvolution mask used to control the region where CLEAN based algorithms will search for source peaks. In order to set a deconvolution mask based on pb level, please use the ‘pbmask’ parameter.Based on the above, please note that certain values of pblimit to avoid, are 1, -1, and 0. When the pblimit is set to 1 the entire image is masked as the user is specifying that no normalization or deconvolution happens if the PB gain is lower than 1, which leads to the entire image being masked. Setting the pblimit to -1 also results in no deconvolution as mentioned in the case where pblimit is 1 but there is no masking of the image. Finally a pblimit of zero is not feasible but rather a small value such as 1e-6 is used instead to make a really large wide field image.
Flat-noise
The dirty image represents \(I^{dirty} = I^{psf} \star \left( I^{PB} \cdot I^{sky} \right)\)
Primary-beam correction is not done before the minor cycle deconvolution. The dirty image is the instrument’s response to the product of the sky and the primary beam, and therefore the model image will represent the product of the sky brightness and the average primary beam. The noise in the image is related directly to the measurement noise due to the interferometer, and is the same all across the image. The minor cycle can give equal weight to all flux components that it finds. At the end of deconvolution, the primary beam must be divided out of the restored image. This form of normalization is useful when the primary beam is the dominant direction-dependent effect because the images going into the minor cycle satisfy a convolution equation. It is also more appropriate for single-pointing fields-of-view.
Imaging with the prolate spheroidal gridder will automatically give flat noise images.
Flat-sky
The dirty image represents \(I^{dirty} = \frac{1}{I^{PB}} \cdot \left[I^{psf} \star \left( I^{PB} \cdot I^{sky} \right) \right]\)
Approximate Primary-beam correction is done on the dirty image, before the minor cycle iterations. The amplitude of the flux components found during deconvolution will be free of the primary beam, and will represent the true sky. However, the image going into the minor cycle will not satisfy a convolution equation and the noise in the dirty image will be higher in regions where the primary-beam gain is low. Therefore, the minor cycle needs to account for this while searching for flux components (a signal-to-noise dependent CLEAN). This form of normalization is particularly useful for mosaic imaging where the sky brightness can extend across many pointings, or if there is an uneven distribution of weights across pointings. This is because joint mosaics are usually done for sources with spatial scales larger than the field-of-view of each antenna and which are not explicitly present in the measured data. In this situation, allowing the minor cycle to use flux components that span across beams of adjacent pointings is likely to provide a better constraint on the reconstruction of these unmeasured spatial frequencies, and produce smoother large-scale emission.
PB-square normalization
The dirty image represents \(I^{dirty} = I^{PB} \cdot \left[ I^{psf} \star \left( I^{PB} \cdot I^{sky} \right) \right]\)
This third option (not currenly available for use, but supported internally) is to not do any PB-based divisions after the gridding and FFT (using gridder=’mosaic’ or ‘awproject’, but to let the minor cycle proceed as is. Advantages of this approach are the elimination of error-inducing divisions by the primary beam (especially in low gain regions and near PB cut-off edges).
Deconvolution Algorithms¶
Minor cycle algorithms (Hogbom, Clark, Multi-Scale, Multi-Term)
Deconvolution refers to the process of reconstructing a model of the sky brightness distribution, given a dirty/residual image and the point-spread-function of the instrument. This process is called a deconvolution because under certain conditions, the dirty/residual image can be written as the result of a convolution of the true sky brightness and the PSF of the instrument. Deconvolution forms the minor cycle of iterative image reconstruction in CASA.
The observed image (left) is the result of a convolution of the PSF (middle) and the true sky brightness distribution (right).
The image reconstruction framework is based on Cotton-Schwab major/minor cycles [15]. Within this system, the minor cycle algorithm operates largely in the image domain starting with a PSF and a residual image (i.e. the gradient of chi-square or the right hand side of the normal equations). The output is an incremental model image that defines the ‘step’ taken during the chi-square minimization process. In the next major cycle, the contribution of this model image is subtracted out of the list of visibilities and the result is regridded and transformed to produce a new residual image. This approach allows for a practical trade-off between the efficiency of operating in the image domain (or simply with gridded visibilities) and the accuracy that comes from returning to the ungridded list of visibilities after every ‘step’. It also allows for minor cycle algorithms that have their own internal optimization schemes (i.e. they need not be strict chi-square minimizations) with their own control parameters. Note that any minor cycle algorithm that can operate on gridded visibilities can fit into this framework. The inputs to the minor cycle algorithm are the residual image, psf and perhaps a starting model. Outputs are a model image.
CLEAN Algorithm¶
The CLEAN algorithm forms the basis for most deconvolution algorithms used in radio interferometry. The peak of the residual image gives the location and strength of a potential point source. The effect of the PSF is removed by subtracting a scaled PSF from the residual image at the location of each point source, and updating the model. Many such iterations of finding peaks and subtracting PSFs form the minor cycle.
There are several variants of the CLEAN algorithm. Some operate with a delta function sky model and others with a multi-scale sky model. In all cases, the the sky brightness is parameterized in a sparse basis such that in practice, the minor cycle algorithm needs only to search for the location and amplitude of peaks. This makes it efficient. For example, fields of compact sources are best represented by delta function locations and amplitudes. Extended emission is modeled as a linear combination of components of different scale sizes and transformed into a multi-scale basis where again, delta functions are all that are required to mark the location and amplitude of blobs of different sizes. Multi-term algorithms for wideband imaging model the sky brightness and its spectrum simultaneously, using coefficients of a Taylor polynomial as a sparse representation of a smooth spectrum. In this case, the location of each (multi-scale) component is chosen via a search and the values of the Taylor coefficients for that component are solved for via a direct linear least squares calculation.
Hogbom
Hogbom CLEAN [16] operates with a point-source model of the sky brightness distribution. The minor cycle searches for the location and amplitude of components and then subtracts a scaled and shifted version of the full PSF to update the residual image for each point source. This algorithm is efficient in that delta functions are optimal for fields of compact sources, but susceptible to errors due to inappropriate choices of imaging weights, especially if the PSF has high sidelobes. It uses the full PSF during its update step to ensure that the next residual is as accurate as possible, but this can get compute intensive.
In its original form, the Hogbom algorithm operated just once in the image domain without new residuals being computed via a major cycle. In our CASA Imager implementation, it is treated as a minor cycle where one periodically does major cycles as well (to guard against minor cycle evolution that is not strictly constrained by the ungridded visibilities).
Since Hogbom CLEAN uses only delta functions, it is most appropriate for fields of isolated point sources. It will incur errors when imaging extended emission and this is typically seen as a mottled appearance of smooth structure and the presence of correlated residuals.
Clark
Clark CLEAN [17] also operates only in the image-domain, and uses a point-source model. There are two main differences from the Hogbom algorithm. The first is that it does its residual image updates using only a small patch of the PSF. This is an approximation that will result in a significant speed-up in the minor cycle, but could introduce errors in the image model if there are bright sources spread over a wide field-of-view where the flux estimate at a given location is affected by sidelobes from far-out sources. The second difference is designed to compensate for the above. The iterations are stopped when the brightest peak in the residual image is below the first sidelobe level of the brightest source in the initial residual image and the residual image is re-computed by subtracting the sources and their responses in the gridded Fourier domain (to eliminate aliasing errors). Image domain peak finding and approximate subtractions resume again. These two stages are iterated between until the chosen minor cycle exit criteria are satisfied (to trigger the next true major cycle that operates on ungridded visibilities).
Since Clark CLEAN also uses only delta function, it is similar in behavior to Hogbom. The main difference is that the minor cycle is expected to be much faster (for large images) because only a small fraction of the PSF is used for image-domain updates. Typically, errors due to such a truncation are controlled by transitioning to a uv-subtraction or a major cycle when the peak residual reaches the level of the highest sidelobe for the strongest feature.
For polarization imaging, Clark searches for the peak in
\(I^2 + Q^2 + U^2 + V^2\)
Clarkstokes
In the ‘clarkstokes’ algorithm, the Clark psf is used, but for polarization imaging the Stokes planes are cleaned sequentially for components instead of jointly as in ‘clark’. This means that this is the same as ‘clark’ for Stokes I imaging only. This option can also be combined with imagermode=’csclean’.
Multi-Scale
Cornwell-Holdaway Multi-Scale CLEAN (CH-MSCLEAN) [18] is a scale-sensitive deconvolution algorithm designed for images with complicated spatial structure. It parameterizes the image into a collection of inverted tapered paraboloids. The minor cycle iterations use a matched-filtering technique to measure the location, amplitude and scale of the dominant flux component in each iteration, and take into account the non-orthogonality of the scale basis functions while performing updates. In other words, the minor cycle iterations consider all scales together and model components are chosen in the order of decreasing integrated flux.
MS-CLEAN can be formulated as a chi-square minimization applied to a sky model that parameterizes the sky brightness as a linear combination of flux components of different scale sizes. The figure below illustrates how a source with multi-scale features is represented by two scale sizes (for example) and how the problem reduces to one of finding the location and amplitudes of delta function components (something for which a CLEAN based approach is optimal). The top left and bottom left images show flux components of two different scale sizes. The images in the middle column show sets of delta functions that mark the locations and amplitudes of the flux components for each scale. The image on the far right is the sum of the convolutions of the first two columns of images.
A pictorial representation of how a source with structure at multiple spatial scales is modeled in MS-CLEAN.
Choosing ‘scales’
In practice, the user must specify a set of scale sizes for the algorithm to use (in units of the number of pixels). As of now, this can be done only manually with the user making guesses of what the appropriate scale sizes are. This figure illustrates how the scales can be chosen, for a given structure on the sky.
An example set of multiscale ‘scale sizes’ to choose for a given source structure.
It is recommended that a ‘0’ scale always be included to model unresolved sources. Beyond that, scale sizes should approximately follow the sizes of dominant structures in the image. For structure with very bright and sharp edges, a series of nearby scale sizes works best, often in conjunction with a mask. The largest scale size should be less than or equal to the smaller dimension of large scale features. One must also take care to avoid scale sizes that correspond to the unmeasured short spacings in the central region of uv space, as the reconstruction on these scales will see no constraints from the data and can result in arbitrary values (or divergence). For mosaics of extended emission, it is sometimes possible to use large scale sizes in the minor cycle if there are enough connected structures across pointings, but since there still is no actual short spacing uv data to constrain those scales, they should be used with caution. A reasonable starting point for setting the scales (assuming the cell size samples the mainlobe of the psf by a factor of ~5) is scales=[0,5,15].
Scale Bias
By default, the optimal choice of scale per iteration is that which produces the maximum principal solution (assuming independent scales). Given this normalization, all scales supplied via the scales parameter are treated equally.
In addition to this base normalization, a smallscalebias parameter may be used to bias the solution towards smaller or larger scales. This is especially useful when very large scale emission is coupled with weak compact features. The peak from each scale’s smoothed residual is multiplied by ( 1 - smallscalebias scale/maxscale ) to increase or decrease the amplitude relative to other scales, before the scale with the largest peak is chosen.
smallscalebias=0.0 (default) implies equal weight to all scales (as per the natural normalization that comes with the principal solution). Increasing it from 0.0 to 1.0 biases the reconstruction towards smaller scales in the supplied range. Decreasing it from 0.0 to -1.0 biases it towards larger scales in the supplied range. It can be useful to experiment with MS-clean in interactive=True mode. If you notice that bright regions of emission are overcleaned in the first few major cycles (i.e. negative regions will appear in the residual image), it suggests that too much cleaning is happening on the largest scales and it can help to increase the smallscalebias. Additionally, it is often necessary to clean comparatively deeply to reap the full benefit of a multi-scale CLEAN. Note also that scale bias (smallscalebias) is a fine-tuning tool that will be useful only if the list of supplied scale sizes is also appropriate to the structure being deconvolved; before turning to smallscalebias, it is advisable to first ensure that the scales parameter is set to reasonable values.
NOTE: An improved smallscalebias paramater was implemented in CASA 5.6 for both MultiScale and MTMFS deconvolution algorithms. Details can be found in this CASA memo.
Multi-Resolution CLEAN
A related approach, called Multi-Resolution CLEAN is available in AIPS (and not in CASA). It is very similar to MS-CLEAN, although it operates on one scale size at a time. It smoothes the residual image and PSF by a particular scale size, and runs the minor cycle only on that scale. It switches scales after the next major cycle. This algorithm uses a different scale-based normalization (compared to MS-CLEAN) and has its own scalebias parameter which has its own formula.
Multi-Term (with Multi-Scale)
Multi-Scale Multi-Frequency synthesis (MSMFS) [19] is a wide-band imaging algorithm that models the wide-band sky brightness distribution as a collection of inverted, tapered paraboloids of different scale sizes, whose amplitudes follow a polynomial in frequency. A linear-least squares approach is used along with standard clean-type iterations to solve for best-fit spectral and spatial parameters. A point-source version of this algorithm can be run by specifying only one scale size corresponding to a delta-function.
A 2x2 system of equations to represent the fitting of a 2-term Taylor polynomial (Note that this is only a representative diagram using the same images shaded differently). In reality, the Hessian matrix contains different spectral PSFs.
The figure illustrates the set of normal equations that are to be solved in the image domain. What is usually a single convolution is now a joint convolution operator. The images on the left represent Taylor-weighted residual images, the 2x2 matrix contains spectral PSFs (the instruments’ responses to spectra given by different Taylor functions), and the model images on the right represent Taylor coefficients per component. (Note : This figure only illustrates the structure of the system of equations.)
More details about the algorithm and how to choose parameters such as the number of Taylor coefficients (nterms) and the reference frequency (reffreq) are given in the Wideband Imaging section.
Multiple Scales as part of the MTMFS algorithm are treated in the same way as MS-Clean (above), with the scales and smallscalebias parameters available for choosing a range of scales and fine-tuning which ones get preference during reconstruction.
Restoration¶
Standard Restoration and PSF Fitting
The final list of flux components (or an image containing just the component delta functions) is restored by smoothing it with a Gaussian that matches the resolution of the main lobe of the PSF and adding back the residual image. This step is done in order to compensate for the unphysical nature of CLEAN based component images that include delta functions, and to include residual flux (especially for extended emission) that was not picked up as part of the model image. The need for restoration varies depending on the choice of algorithm, but since all our CLEAN-based approaches include delta functions (with or without multi-scale components), this restoration step is always applied.
The Gaussian beam used in the restoration is defined by a major axis, minor axis, and position angle, as described here. This 2-dimensional Gaussian is fit to the main lobe of the PSF when the .psf image is created. During the restoration process, this Gaussian beam is used as the Clean beam.
The following algorithm is used to fit a Gaussian to the main lobe of the PSF. This algorithm was updated in CASA 6.2 to be much more stable with small (less than 5) and large (more than 10) numbers of pixels spanning the beam.
A region of 41 x 41 pixels around the peak of the PSF is compared against the psfcutoff (tclean parameter, default 0.35, acceptable values are 0.01 to 0.99). Sidelobes are ignored by radially searching from the PSF peak.
Calculate the bottom left corner (blc) and top right corner (trc) from the points. Expand blc and trc by 5 pixels.
Create a new sub-matrix from blc and trc.
Interpolate the matrix to 3001 points using CUBIC spline.
All the non-sidelobe points, in the interpolated matrix, that are above the psfcutoff are used to fit a Gaussian. A Levenberg-Marquardt algorithm is used.
If the fitting fails the algorithm is repeated with the psfcutoff decreased (psfcutoff=psfcutoff/1.5).
A message in the log will appear if the fitting fails along with the new value of psfcutoff. This will be done up to 50 times if fitting fails.
Varying psfcutoff might be useful for producing a better fit for highly non-Gaussian PSFs, however, the resulting fits should be carefully checked. The default psfcutoff of 0.35 produces consistent results with the previous psf fitting algorithm (which did not include a sub-matrix interpolation) when the major/minor FWHM of the beam are spanned by ~5-10 pixels, as is recommended for general imaging.
Multi-term restoration
Multi-term (wideband) restoration is a bit different from standard restoration in that it also modifies the residuals that are added to the smoothed model. Residuals are converted from Taylor-weighted averages of the residual data into Taylor coefficient space such that they represent the ‘next higher order term’ being imaged (a standard way of represent error). Practical implications of this are a higher than expected rms in the zero-th order image because the higher order terms being fitted have more reconstruction error and are not strictly linearly independent from the zero-th order term. In the outputs of the Multi-Term algorithm, the restored images contain these modified residuals, whereas the residual images contain the unmodified residuals which conform to what astronomers typically mean by ‘residual’ images. More details about the algorithm are provided in the Wideband Imaging section.
Clean Bias¶
Clean bias, an effect noticed for decades by users of the CLEAN algorithm, is a systematic shift of reconstructed peak intensities to lower than expected values. This is usually seen in deep imaging runs with large numbers of closely-spaced weak sources, and when the PSF has sidelobes above the 0.1 level. The use of masks or clean boxes to constrain the search space alleviates the problem. A PSF with lower sidelobes (for example the PSF from MFS imaging as compared to a single channel PSF) can also prevent this type of flux bias with the CLEAN algorithm and more importantly it does so without having to invoke complicated masking procedures.
The clean bias effect can be explained by considering that the CLEAN algorithm is an L1-norm basis-pursuit method that is optimized for sparse signals that can be described with a minimal number of basis functions. For astronomical images this implies well-separated point sources whose properties can be described by single basis functions (one pixel each) and whose central peaks are minimally affected by PSF sidelobes from neighbouring sources. In a crowded field of point sources, especially with a PSF with high sidelobes, the CLEAN algorithm is more error-prone in the low SNR regime. A systematic lowering of source brightness can be explained by the algorithm constructing many artificial source components from the sidelobes of real sources.
Other Algorithms¶
There are other options that are present in our code base, but not used much, could be experimental, coming in the near future, or simply untested. Information on how to add external algorithms is given below.
MEM
This algorithm models the sky brightness distribution as a collection of point-sources and uses a prior image along with an entropy-based penalty function to bias the solution of pixel amplitudes. The Maximum Entropy method (MEM) [20] [21] is a pixel-based deconvolution algorithm that performs a rigorously-constrained optimization in a basis of pixel amplitudes. MEM uses the Bayesian formulation of chi-square minimization, and applies a penalty function based on relative image entropy. This choice of penalty function biases the estimate of the true sky brightness towards a known prior image. If a flat image is chosen as the prior, the solution is biased towards being smooth, and produces a more realistic reconstruction of extended emission. Positivity and emptiness constraints can be applied on the image pixels via a penalty function.
The MEM implementation in CASA’s imager is unstable, and is unlikely to get attention as there are better methods available now. Please use multi-scale CLEAN instead.
ASP
The Adaptive Scale Pixel (ASP) [22] deconvolution algorithm parameterizes the sky brightness distribution into a collection of Gaussians and does a formal constrained optimization on their parameters. In the major cycle, visibilities are predicted analytically with high accuracy. In the minor cycle, the location of a flux component is chosen from the peak residual, and the parameters of the largest Gaussian that fits the image at that location are found. The total number of flux-components is also updated as the iterations proceed.
This algorithm is currently not available in CASA, but is on the mid-term implementation plan.
Comparison between deconvolution algorithms : One example
Due to the fact that the uv-sampling is always incomplete, the result of a reconstruction algorithm can vary depending on the choice of sky model and the type of algorithm and constraints used. This figure shows a comparison between point-source CLEAN, MS-CLEAN, MEM and the ASP algorithms.
In the figure below, the top row of panels show the component images that illustrate the different sky models being used. The middle row of panels shows restored images (used for the science). It should be noted that they are all different from each other and that they are all valid images. The main difference appears to be the achievable angular resolution. The bottom panels show residual images (gradient of chi-square) which radio astronomers typically use to judge whether all the signal in the data has been modeled or not. These images show how well the different methods handle extended emission. For example, CLEAN results in significant correlated flux in the residuals. MEM does better but the error pattern has significant structure outside the source too. MS-CLEAN has lower residuals than the two previous methods but has a characteristic pattern arising from using a fixed set of scale sizes to model complicated spatial structure. The ASP method shows much more noise-like residuals owing to the fact that at each iteration it finds best-fit components. Most more recent algorithms derived using compressed-sensing theory are reported (in the literature) to produce results similar to the ASP algorithm, as they all also perform fits to parameterized basis functions.
A comparison between point-source CLEAN, MS-CLEAN, MEM and the ASP algorithms.
Adding Other Deconvolution algorithms
External deconvolution algorithms can be connected to our imaging framework in order to access our data I/O and gridding routines (with parallelization) and avail of the option of operating within major/minor cycle loops instead of as stand-alone methods that don’t often connect to the data. The only pre-requisite is that the algorithm is able to operate in the image domain on a residual image and a PSF, and produce a model image as output.
It should be noted that although many recently developed compressed-sensing algorithms do not explicitly make this uv-domain and image-domain distinction, their practical implementations do, and in some cases it is possible to frame the algorithm within a major/minor cycle structure (with residual visibilities being computed as ‘data - model’). Another way of saying this is that our software can be used to implement the data->image and image->data transforms, while implementing an external reconstruction algorithm. The only exceptions are algorithms that require the gridding of something other than ‘data - model’ and which cannot be implemented as linear combinations in the image domain.
Attempts by external algorithm developers to connect to our framework are welcome, as are suggestions for improving this interface to be more usable.
Task Interface
tclean can be used in ‘only major cycle’ mode by setting niter=0. If calcres=False, calcpsf=False are set, then tclean can be also used to start directly with minor cycle algorithms that pick up .residual and .psf images from disk.
Tool interface
Python scripts can use our PySynthesisImager library to access each operational step of the tclean task, and to add or delete steps as necessary. Examples are given in the tclean task documentation (at the end of the examples page).
Within C++
For C++ programmers, it is possible to connect a new deconvolution algorithm by deriving from SDAlgorithmBase and implementing three main routines (initialization, cleanup, and a ‘takeOneStep’ method that does the series of minor cycle iterations).
Iteration Control¶
The CASA Imager implements its iterative optimization in two nested loops, Major and Minor cycles, as described in the Overview.
Controls¶
loop gain
For each component selected in the CLEAN minor cycle, the response of only a fraction of the flux is subtracted out at each step. This is controlled by a loop gain \(\gamma\) which is multiplied with the amplitude of the latest flux component before the residual image is updated. This fraction represents a step size used in steepest descent algorithms to counter the effect of imperfect update directions. For a point source, the residual left on the dirty image is \((1-\gamma)^{N_{CL}}\).
Loop gain is typically set at a default of 0.1. As a general rule of thumb, if the sky model being fitted to the data is a good match for the structure being imaged, a higher loop gain is tolerated. MS-Clean with appropriate scale sizes is one example. On the other hand, point source CLEAN applied to extended emission will require a very small loop gain to make adequate progress.
Stopping Criteria
True convergence is not very easy to define or identify in practice. This is largely because of the logarithmic convergence typical of chi-square minimization or the presence of artifacts that may prevent true convergence but which would still allow imaging at a quality adequate for subsequent analysis. Imaging algorithms therefore have stopping criteria to decide when to pause a set of minor cycle iterations and to perform a major cycle as well as when to stop iterations altogether.
Reasons for stopping
Threshold
The standard stopping criterion is a threshold on the peak brightness in the residual image. When artifacts do not dominate the residuals, such a threshold is a robust way of terminating a run. A global stopping threshold is usually related to the theoretically expected rms (typically \(5\sigma\)). A stopping threshold to trigger a major cycle is usually related to the height of the strongest sidelobe of the strongest source. The rationale behind this choice is to expect errors in the subtraction of the contributions of flux components during the minor cycle (due to approximations such as the beam patch size) and to prevent spurious new components from being added to the model.
Niter
A simple stopping criterion is the total number of iterations (individual minor cycle steps). In the presence of artifacts, it is used if one wants to explicitly stop imaging early to prevent divergence or to truncate iterations once they reach the point of diminishing returns. It is usually used as an over-ride for the more natural stopping criteria of thresholding.
Non-Convergence
When the data do not exactly conform to what the algorithm is solving for, true convergence and theoretical noise estimates will not be reached. Some symptoms of non convergence include the residual image saturating in rms value or peak residual with no significant changes across minor and major cycle iterations. Of course, increases in absolute model flux that persist could signal divergence.
Nmajor
A simple stopping criterion, like niter, meant primarily as an over-ride for the more natural stopping criteria like threshold. Once the desired number of minor+major cleaning cycles have been performed tclean will exit. The returned stopcode will be any other stopping criteria that was reached first, if any, or else 9 if nmajor was reached first. For iteration control, the initial residual image calculation (chosen by toggling the calcres parameter) does not count towards this number, but the returned dictionary’s ‘nmajordone’ count will include this optional first residual image calculation step.
Implementation of stopping criteria in tclean
User Parameters : niter, cycleniter, threshold, nsigma, nmajor
Internally calculated controls : cyclethreshold
Minor Cycle Stopping Criteria
After a major cycle is complete, and before the minor cycle iterations begin, a cycleniter and cyclethreshold are computed and used as controls to terminate the minor cycle iterations and trigger the next major cycle. A major cycle is triggered (or minor cycle iterations are skipped) when any one of the following criteria are satisfied.
The mask for the current plane is all False.
Iteration limit : cycleniter = min ( niter - iterdone , cycleniter )
cyclethreshold is internally computed and used as a major cycle trigger. It is related what fraction of the PSF can be reliably used during minor cycle updates of the residual image. By default the minor cycle iterations terminate once the peak residual reaches the first sidelobe level of the brightest source.
Threshold : cyclethreshold is computed as follows using the settings in parameters cyclefactor, minpsffraction, maxpsffraction, threshold, nsigma :
psf_fraction = max_psf_sidelobe_level * cyclefactor
psf_fraction = max(psf_fraction, minpsffraction)
psf_fraction = min(psf_fraction, maxpsffraction)
cyclethreshold = peak_residual * psf_fraction # The peak residual is computed within the current mask.
cyclethreshold = max( cyclethreshold, threshold )
Further, if nsigma (a multiplicative factor of rms noise) is specified (>0.0), the n-sigma based threshold is calculated for each image plane. The threshold per image plane is calculated using the median absolute deviation (MAD) as follows:
nsigma_threshold = nsigma * robustRMS (where robustRMS = 1.4826 * MAD)
and then, the cyclethreshold calculated above is further evaulated as
cyclethreshold = max(cyclethreshold, nsigma_threshold)
Zero iterations performed in the minor cycle.
Divergence : If the peak residual increases from the previous peak residual by more than 10%.
(currentPeak - prevPeak)/(prevPeak) > 0.1
In all situations, the reason for stopping is printed in the logger, per image plane (e.g. per channel).
Global Stopping Criteria
After each major cycle, peak residuals (with and without masks) are evaluated and compared with the following criteria to decide if any more minor cycle iterations are needed or not. Any one of the following conditions will trigger termination of the imaging run.
(1)Total number of iterations >= niter
Currently iterations are counted across all image planes, including channels. In the future it will be modified to apply to one plane at a time.
peak residual within the mask < threshold ( or the peak reasidual value differs within one part in 100 to the threshold value)**
The mask is blank for all planes (either due to user edits or automasking)
No change in the peak residual from the previous major cycle. This would imply that the minor cycle in between did nothing.
peak residual within the mask < max(nsigma thresholds across image planes) (or the peak reasidual value differs within one part in 100 to the maximum nsigma threshold value)**
Divergence 1 : A large relative increase of peak residual across a single major cycle. This catches sudden strong divergence.
( PeakRes - PrevPeakRes ) / PrevPeakRes > 3.0 (where peak residual is computed over the entire image, ignoring the clean mask)
Divergence 2 : A relative increase of 3 times in the peak residual from the minimum recorded so far. This catches slow divergence.
( PeakRes - MinPeakRes ) / MinPeakRes > 3.0 (where peak residual is computed over the entire image, ignoring the clean mask) (8) nmajor : If the user-set maximum number of major cycles has been reached (counting from the first major cycle after the first set of minor cycle iterations), and none of the above criteria are satisfied, iterations will stop.
In all situations, the reason for stopping is printed in the logger and recorded in the return dictionary (obtained by setting interactive=1/0).
When nsigma threshold is activated (nsgima>0.0), since nsigma threshold values varies across image planes, the global exit condition that satifies in that case, can be combination of (5) and any other valid exit criteria.
(In addition to the above, a warning message is printed in the logger if the peak residual within the clean mask increases by a factor of 2, but no actions are taken.)
Runtime editing of Iteration Controls¶
When tclean is run with interactive=True, a viewer GUI opens to allow the drawing and display of masks on residual images, and also displays and allows the modification of the following iteration control parameters : iterations left, cycleniter, cyclethreshold, threshold.
Of these parameters, iterations left, and cyclethreshold are internally updated after each major cycle and then displayed in case the user wishes to edit them.
The field iterations left is auto-calculated as niter-iterdone. If this field is hand-edited, it is taken as ‘niter’ and the next updated value is this new niter-iterdone.
The cyclethreshold field is auto-updated based on the peak residual at the end of the latest major cycle. If cyclethreshold is hand-edited, the user-set value applies to only the current set of minor cycle iterations and the auto-calculation resumes from the next major cycle.
Note: Interactive tclean only works when a region or mask is selected in the CASA Viewer. If the entire image should be cleaned, please draw a box around the entire image. There is a known bug that when a region is first selected, and then de-selected to produce an empty mask (filled with zeros), the CASA Viewer that runs interactive tclean will still allow you to proceed, and tclean will detect an empty mask and stop. Please always mark a region/mask to continue interactive tclean, and do not forget to double-click inside the green contours to select the region.
Note : In casa5, the auto-calculated cyclethreshold is always displayed as 0, but hand-edited values are still honored. In the end, the logger contains all the information about what got used, and it has been tested that iteration control and imaging proceeds as expected.
Note: In casa6, the auto-calculated cyclethreshold is correctly displayed in the GUI. However, hand-edited cyclethresholds do not change in the GUI until two major cycles later. However, here too, the logger contains the most accurate information about what was used, and the expected behaviour (of hand-edited cyclethresholds applying to only the current minor cycles) is seen and has been tested. Therefore, iteration control and imaging will proceed as expected.
Note : Threshold information via the GUI must contain units. ‘0.5Jy’ will work but ‘0.5’ on its own will not.
Returned Dictionary¶
When the tclean task is run as a python command, it can produce a return value (by setting interactive=1/0 instead of True/False). This dictionary contains a summary of the run with information such as the number of iterations done, the number of major cycles, the peak residual at each major cycle boundary and at which iteration count this occured, metadata to index this information for multiple image fields, channels, and stokes planes, and stopcodes to indicate the reasons for termination of the run (i.e. the global exit criterion as well as minor-cycle exit criteria per channel/stokes plane). This dictionary can be used for scripting.
Some useful keys in the return dictionary are:
Summary Key |
Description |
---|---|
stopcode |
An integer to indicate the global stopping criterion that terminated the run. |
stopDescription |
A string that describes the global stopping criterion. |
iterdone |
The total number of minor cycle iterations. |
nmajordone |
The total number of major cycles (including the initial residual calculation). |
summarymajor |
A list of total iteration counts at which major cycles were triggered. |
summaryminor |
A dictionary containing a detailed summary of minor cycle iterations, per image field, channel and stokes plane. See details below. |
Minor Cycle Summary Dictionary¶
The summary of the minor cycle is a nested python dictionary/list, with information available per field/channel/stokes plane, summary type, and minor cycle. The structure for this dictionary is:
{
field id: {
channel id: {
stokes id: {
summary key: [
cycle: value
]
}
}
}
}
The inner most list [‘cycle: value’] has a length equal to the number of sets of minor cycle iterations. The ‘cycle’ key is just the 0-based index of the minor cycle iteration.
The outer most key ‘field id’ is the multifield imaging id. For more information on multifield imaging, see “Multiple Fields” under Types of images.
What channels are available is dependent on the field. Therefore, indexing into the returned dictionary requires finding that available channels and stokes indicies. To find the available channels, use summ_min[field].keys(), and to find the available stokes use summ_min[field][chan0].keys(). For example, the following code could be used to retrieve the number of iterations done:
# Get the iterations done on the main field, fifth channel, first stokes plane, during the middle minor cycle
rec = tclean(...)
summ_min = rec['summaryminor']
field0 = summ_min[0] # field 0 is the main field
channels = list(field0.keys())
stokes = list(field0[channels[0]].keys())
ncycles = len(field0[channels[0]][stokes[0]]['iterDone'])
itersDone = field0[channels[5]][stokes[0]]['iterDone'][int(ncycles/2)]
The full list of available summary keys can be retrieved with summ_min[field][chan0][stoke0]. The full list of possible summary keys is:
Summary Key |
Description |
---|---|
iterDone ᵃ |
Number of iterations done. For MPI and not USE_SMALL_SUMMARYMINOR=FALSE (see note c), this will be cummulative across channels, polarities, and cycles. |
peakRes |
Peak residual after deconvolution. |
modelFlux |
Model flux after deconvolution. |
cycleThresh ᵇ |
Stopping threshold for this cycle. |
cycleStartIters ᵇ ᶜ |
Cycle start iterations done (ie earliest startIterDone for the entire minor cycle). |
startIterDone ᶜ |
Starting iterations done value before deconvolution (ie should be 0 at chan 0/stokes 0/cycle 0). Iteration counts in tclean run continuously across channels before moving to the next set of minor cycle iterations, and so startIterDone records the absolute iteration count (summed across all channels/planes/cycles) at the start of the current minor cycle iterations for the current channel. |
startPeakRes ᶜ |
Starting peak residual before deconvolution. |
startModelFlux ᶜ |
Starting model flux before deconvolution. |
startPeakResNM ᶜ |
Starting peak residual before deconvolution, not limited to the user’s mask. |
peakResNM ᶜ |
Peak residual after deconvolution, not limited to the user’s mask. |
masksum ᶜ |
Sum of masked pixels at the beggining of the deconvolution. |
mpiServer ᶜ |
MPI server identifier this chunk was run on. |
stopCode ᶜ |
Stop code as used by the minor cycle (this is different than the tclean stopcode). |
‘iterDone’ is not cummulative across major cycles (this is in contrast to the pre CASA 6.4.4 ‘summaryminor’ matrix, where this value was cummulative across channels/stokes planes/cycles).
These values are independent of channels and stokes planes. They will be the same across all channel/stokes planes, for the same cycle.
These summary keys will not be available when running with MPI, by default. To keep them, set the environment variable USE_SMALL_SUMMARYMINOR=FALSE before starting casa. However, the combination of keeping the extra keys + MPI + a large numbers of channels has a chance of causing CASA to crash.
The description for the tclean task’s global stopcode can be retrieved from rec[‘stopDescription’].
The minor cycle exit criteria are recorded (per plane) in a second set of stopcode values, within the summaryminor dictionary. The minor cycle exit codes are not associated with the tclean task’s stopcode. Descriptions for the minor cycle stopcodes are:
stopcode |
Description |
---|---|
0 |
Skipped this channel/polarity. Zero mask. |
1 |
Reached cycleniter. |
2 |
Reached cyclethreshold. |
3 |
Zero iterations performed. |
4 |
Possible divergence. Peak residual increased by 10% from minimum. |
5 |
Exited deconvolver minor cycle without reaching any stopping criterion. |
6 |
Reached n-sigma threshold. |
Generating Convergence Plots
The minor cycle summary values can be used to make convergence plots of the clean process over major/minor cycles. For example, these plots of peak residual and flux can be generated (for a larger niter) with the last cell of these jupypter notebooks:
Example Notebook: static notebook, downloadable html
Masks for Deconvolution¶
For the most careful imaging, you will want to restrict the region over which you allow CLEAN components to be found by using a mask. This mask is generally referred to as a clean mask.
Creating a clean mask:¶
There are several different ways to specify a clean mask, including:
A text-based region. The CASA region text format can be used to define clean regions either by specifying the region directly in the tclean command or by using an ASCII text file containing the specifications. You can use the viewer to save a region formatted according to the CRTF specification. To do this, an image must already exist to serve as a reference or template to create the mask image or the region.
An image consisting of only 1 (valid) and 0 (invalid) pixel values. Such images can be generated or modified using tasks such as makemask. The mask has to have the same shape (number of pixels, velocity, and Stokes planes) as the output image. An exception are single velocity and/or single Stokes plane masks. They will be expanded to cover all velocity and/or Stokes planes of the output cube.
An automatically generated mask. There are several experimental algorithms available in tclean for automatically masking emission during the deconvolution cycle. See the automasking section below for more details.
A mask created by tclean while interactively cleaning using the viewer. You can combine this method with the options above to create an initial clean mask and modify it interactively. Please be aware that when running tclean interactively, the viewer is accessible during a major cycle, and the mask can be inadvertently by changed during the active clean cycle, although the new mask is not registered until the next major cycle. Also note that interactive tclean only works when a region or mask is selected in the CASA Viewer. If the entire image should be cleaned, please draw a box around the entire image. There is a known bug that when a region is first selected, and then de-selected to produce an empty mask (filled with zeros), the CASA Viewer that runs interactive tclean will still allow you to proceed, and tclean will detect an empty mask and stop. Please always mark a region/mask to continue interactive tclean, and do not forget to double-click inside the green contours to select the region.
However they are created, the masks are all converted (as necessary) and stored as CASA images consisting of the pixel values of 1 and 0. When mask files are read in and have undefined values, these values are treated as 1s by CASA. Mixed types of masks can be specified in the tclean task.
In CASA, the term, ‘mask’ for an image is used in two different contexts. One is used for CASA images/image analysis is a T/F mask (pixel mask), which can be embedded in the parent CASA image. The ‘mask’ used in imaging normally refers to a 1/0 image, which is directly used to define deconvolution region(s) (or set a ‘clean mask’) in the tclean task.
Automasking¶
The tclean task has an option to generate clean masks automatically during the deconvolution process by applying flux density thresholds to the residual image. Currently “auto-multithresh” is the automasking algorithm available in tclean. The “auto-multithresh” algorithm can be selected via the usemask parameter. For this algorithm, the mask will be updated at the beginning of a minor cycle based on the current residual image. The algorithm uses multiple thresholds based on the noise and sidelobe levels in the residual image to determine what emission to mask. It also have functionality to remove (“prune”) small mask regions that are unlikely to be real astronomical emission. A more detailed description of the algorithm are given below and in [23] .
“auto-multithresh”
This algorithm is intended to mimic what an experienced user would do when manually masking images while interactively cleaning. The parameters sidelobethreshold and noisethreshold control the initial masking of the image. The sidelobethreshold indicates the minimum sidelobe level that should be masked, while the noisethreshold indicates the minimum signal-to-noise value that should be masked. The threshold used for masking is the greater of the two values calculated for each minor cycle based on the rms noise and sidelobe levels in the current residual image.
Regions smaller than a user-specified fraction of the beam can be removed, or “pruned”, from the mask. The size of the region is defined as the number of contiguous pixels in the region. The figure below shows an example of the pruning process.
Figure 1 - An example of the pruning process. The image on the left shows the original threshold mask, while the image on the right shows the resulting mask after all regions smaller than a user-specified fraction of the beam area have been removed.
The resulting masks are all convolved with a Gaussian that is a multiple of the synthesized beam size, which is controlled by the parameter smoothfactor. Only values above some fraction of the smoothed Gaussian peak are retained, which is defined via the cutthreshold parameter. Note that cutthreshold is defined as a fraction of the smoothed Gaussian peak, not as an absolute value. This procedure ensures that sources are not masked too tightly, i.e., there is a margin between the emission and the mask. Note that smoothfactor and cutthreshold are related. A large smoothfactor and high cutthreshold can give a similar region to a lower smoothfactor but lower cutthreshold. Note that setting the cuttreshold too high (>~0.2) will tend to remove faint regions.
Figure 2 - An example of the process used to ensure that sources are not masked too tightly. The left hand image shows the initial threshold mask. The middle image shows the threshold mask convolved with a Gaussian. The right image shows the final threshold mask where only emission above some fraction of the peak in the smoothed mask is retained. The final mask is larger than the original threshold mask and better encapsulates the emission.
The initial threshold mask can be expanded down to lower signal-to-noise via binary dilation. This feature is particularly useful when there is significant faint extended emission. The lownoisethreshold parameter is used to create a mask of the low signal-to-noise emission, which we refer to as the constraint mask. Th previous total positive mask is expanded (or grown) via an operation known as binary dilation, which expands each mask region using a structuring element (also known as a kernel). Currently the structuring element is fixed with a 3x3 matrix with the diagonal element being 0. We use a constraint mask based on a low signal-to-noise threshold to limit the expansion of the mask to regions within the lownoisethreshold. Only the regions in the constraint mask that touch the previous mask are retained in the final constraint mask. Then the final constraint mask is pruned, smoothed, and cut using the same method as the initial threshold mask.
The sub-parameter growiterations gives a maximum number of iterations used to “grow” the previous masking into the low signal-to-noise mask, which can speed up masking of large cubes at the expense of possibly undermasking extended emission. The sub-parameter dogrowprune can be used to turn off pruning for the constraint mask, which also may also speed up this process.
Figure 3 - An example of how the masks are expanded into low signal-to-noise regions. The top row shows the binary dilation process. Left: The low signal-to-noise threshold mask used as a constraint mask. Middle: The final mask from the previous clean cycle. Right: The result of binary dilating the mask from the previous clean major cycle into the constraint mask. The bottom left image shows the binary dilated mask multiplied by the constraint mask to pick out only those regions in the constraint mask associated with the previous clean mask. The bottom middle image shows the final pruned, smoothed, and cut mask.
There is also an experimental absorption masking feature controlled by the sub-parameter negativethreshold, which has an analogous definition to noisethreshold. This feature assumes that the data has been continuum subtracted. Absorption masking can be turned off by setting the negativethreshold vaue to 0 (the default). Note that the negative and positive threshold masks are tracked separately and that the negative mask is not pruned or expanded into lower signal-to-noise regions.
Finally, all the masks (initial threshold mask, negative mask, low noise threshold mask) are added together with the mask from the previous major cycle to form the final mask.
All the operations described above, including obtaining image statistics, are done per spectral plane for spectral line imaging. If a channel is masked using the noise threshold and the resulting final mask is zero, then future auto-masking iterations will skip that channel. The minpercentchange parameter is an experimental parameter that controls whether future masks are calculated for a particular channel if the mask changes by less than n% after major cycle where the cyclethreshold is equal to the threshold for the clean. In general, we recommend minpercentchange to be set to -1.0 (turned off).
The verbose parameter records information to the log on whether a channel is included in the masking, the image noise and peak, the threshold used and it’s value, the number of regions found in the initial mask and how many were pruned, the number of region found in the low noise threshold mask and how many of those are pruned, and the number of pixels in the negative mask. This information is helpful for optimizing parameters for different imaging cases as well as general debugging.
Algorithm In Detail
Calculate threshold values based on the input parameters.
sidelobeThresholdValue = sidelobeThreshold * sidelobeLevel * peak in residual image
noiseThresholdValue = noiseThreshold * rms in residual image
lowNoiseThresholdValue = lowNoiseThreshold * rms in residual image
negativeThresholdValue = negativeThreshold * rms in residual image
Create the threshold mask.
maskThreshold = max(sidelobeThresholdValue,noiseThresholdValue)
Create threshold mask by masking all emission greater than maskThreshold.
Prune regions smaller than minBeamFrac times the beam area from threshold mask.
Smooth the mask image by smoothFactor * (beam size).
Mask everything above cutThreshold * the peak in the smoothed mask image.
Expand the mask to low signal-to-noise.
lowMaskThreshold = max(sidelobeThresholdValue,lowNoiseThresholdValue)
Create constraintMask by masking all emission greater than lowMaskThreshold.
Use binary dilation expand the previous clean cycle mask into the constraintMask.
Create the low S/N mask by retaining only the regions in the constraintMask that are connected to the previous clean cycle mask.
Prune [can turn this off with dogrowprune=False], cut, and smooth the low S/N mask the same way as was done for the threshold mask.
Mask the absorption (experimental)
If negativethreshold >0.0:
negativeMaskThreshold = - max(negativeThresholdValue, sidelobeThresholdValue)
mask negative pixels with values <= negativeThresholdValue
Cut and smooth the absorption mask the same way as was done for the threshold mask.
Add the threshold mask, the low S/N mask, the absorption mask, and the mask from previous clean cycle together.
Noise Estimation
As of CASA 5.5, “auto-multithresh” estimated the noise in the following procedure: If there is no mask, remove pixels from the noise distribution via Chauvenet’s criterion [24] [25] and then estimate the noise using the remaining pixels via the median absolute deviation. If there is a mask, then calculate the noise from the pixels outside the clean mask and inside the primary beam mask, which we refer to as the masked MAD. All MAD values are scaled to match a Gaussian distribution.
The parameter fastnoise is set to True by default.
Prior to CASA 5.5, “auto-multithresh” estimated the noise per channel using the median absolute deviation (MAD), scaled to match a Gaussian distribution. This noise estimate is computationally fast, but may be less accurate for cases where the emission covers a large fraction (nominally 50%) of the field of view. The new method in CASA 5.5+, although more complex and computationally expensive, may yield more accurate estimates of the noise in the case where emission covers a significant fraction of the field of view.
Polarization Data
As of CASA 5.6, auto-multithresh now functions with polarization data. It applies the same algorithms to the Stokes QUV images as used for the Stokes I image. This means that the full masking process is applied to the positive emission (including the prune and grow steps), but that the masking of the negative emission only includes the initial threshold mask (no prune or grow).
A Note on Input Parameters
The default “auto-multithresh” parameters have been optimized for the ALMA 12m array in its more compact configurations. The parameters may need to be modified for other input cases, e.g., ALMA 12m long baseline data, ALMA 7m array data, and VLA data. The main parameters to explore are noisethreshold, sidelobethreshold, lownoisethreshold, minbeamfrac, and negativethreshold (if you have absorption). We do not recommend changing the cutthreshold and smoothfactor parameters from their default values. The dogrowprune and growiterations parameters are primarily used to improve the speed of the algorithm for large cubes.
Spectral Line Imaging¶
Specific topics for spectral line imaging including spectral coordinates and spectral reference frames
Spectral coordinates and frame¶
In spectral line imaging, the spectral coordinates are defined by the user inputs. The relevant tclean parameters are the data selection parameters and image cube defining parameters: start, width, nchan, and outframe. The task attempts to adjust the inputs so that the data selection matches the defined cube.
The tclean tasks provide start and width parameters can be specified in channel index, frequency, or velocity. In tclean task, spectral mode is turned on by specmode=’cube’ and its specific mode (channel, frequency, or velocity) is automatically determined from the units of the sub-parameters (start, width). Mixed specifications are currently not allowed (e.g. start=’5’ and width = ‘10km/s’ ) in tclean.
The underlying imaging code (FTMachine) uses to a fixed spectral reference frame (LSRK) internally. However, the user can specify an outframe so that the output cube image is relabeled to the user specified frame. If the outframe is unspecified, it defaults to LSRK. Continuum images are always labeled in LSRK. Note that tools like the CASA Viewer can also apply on-the-fly conversions to relabel frequencies in frames other than what is in the image header. The masks (imregrid, imreframe, and exportfits) can also explicitly change the reference frame, and in some cases, regrid the channels.
Spectral Reference Frame
CASA (CASACORE Measures) uses the frequency frames defined in the Reference material section “Spectral Frames”. Data is typically observed in the topocentric observatory frame (TOPO) at a fixed sky frequency. Any observed line, however, will change its topocentric frequency as a function of time. Note that for TOPO (as well as GEO) reported frequencies, the grid of output images uses the first time stamp of the observations.
When applying Doppler corrections, the data is typically regridded to the Local Standard of Rest Kinematic (LSRK, CASA default) or the sun-earth barycentric (BARY) frame, which can be specified in the outframe parameter. Both of these parameters require the rest frequency of a spectral line (restfreq parameter).
In addition, a velocity definition (veltype parameter, sometimes referred to as Doppler type) is required. This parameter is typically either RADIO (CASA default) or OPTICAL. Note that those definitions are identical at \(v=0\) but increasingly differ at larger velocity values. A full list of supported velocitiy definitions is given in the Reference material section “Spectral Frames”.
Mapping between Data Channels and Image Channels¶
During the gridding process, the tclean task makes a choice about how to map data channels to image channels, based on the channelization encoded in the MS and the user-specified image channelization. The mapping between data channels and imager channels may vary depending on the following:
If the user-specified ‘start’ frequency is not at a data channel boundary, the visibility data and weights are interpolated and then evaluated at the centers of the shifted frequency grid.
When image channels are wider than data channels, visibilities and weights are binned and gridded via multi-frequency synthesis (MFS) within each image channel.
On-the-fly software Doppler tracking can also be done as a time-dependent frequency relabeling of the data channels, before interpolation and binning choices are made.
Usually, a combination of these three operations are performed during gridding. There are also special cases where only some or none of them apply.
The interp parameter chooses the interpolation scheme to be used during the gridding process. The choices are currently ‘nearest’, ‘linear’, and ‘cubic’.
‘nearest’ just picks the value of the nearest data channel.
‘linear’ will interpolate the data channel to a channel width that is an integral number of channels that fits in an image channel. For example, if the image channel is 3.14x the width of the original data channel, then interpolation will interpolate the data and weights to 3 channels with a width of 3.14/3.0 times the original width of the data channel.The linear gridder will use the 2 adjacent original data channel to interpolate.
‘cubic’ works the same as ‘linear’, but with the nearest 4 instead of 2 data channels.
Warning: in CASA version earlier than 5.6, the interpolated channels were ensure to be aligned with the edge of the image channel. This could cause channels to be dropped at the edges of data chuncks, causing different sensitivities at the edge of the chunkcs (which can be particularly problematic when chanchunk >1 or in parallel processing). In CASA 5.6, this has been resolved, and the interpolated channels data now align with the center of the image channel.
Software Doppler Tracking Options¶
For the purpose of this document, a time independent frame is a frame in which the source observed has not changed its velocity over the period of time of the observation. A time dependent frame is one in which the observed source has changed its velocity during the observation period. If datasets from different epochs are combined during imaging, the relevant period of time to consider is the entire range spanned by all datasets.
The following descriptions are specific to the task based on new imager, tclean.
There are three software Doppler tracking options, which will be controlled at the task level. Individual parameters are described in the parameter tab for tclean task page.
specmode=’cube’
Converts data frequencies to the time-independent spectral frame (default: LSRK).
Output image base frame : specified frame in outframe
In this mode, data frequencies are read from the input MS(es), and compared with image channel bin frequencies (also in LSRK) to decide which data channels go into which image channel bins. This process is a time-dependent relabeling of data frequencies. The result aligns the frequencies of spectral lines from astrophysical sources taken at different times and thus with different observed frequencies. The relevant user parameters are: start, width, nchan, outframe, veltype, restfreq.
Internally, this mode converts the data to LSRK frequencies before gridding and after gridding converts them to the outframe to construct an appropriate output CASA image. Note that for TOPO and GEO, which are actually time-dependent frames, the conversion layer encodes a specific time at which the conversion is valid. The convention in ‘clean’ is the start epoch of the dataset, but this time can be changed via the imreframe task with outframe=’topo’.
specmode=’cubedata’
Produces a one-to-one mapping between data channels and image channels.
Output image base frame : REST, UNDEFINED
In this mode, no time-dependent frequency frame transformations are done during gridding/binning. In this case, data frequencies are read from the input MS(es) and compared directly with image frequencies. If the data has been observed in a time-dependent frame (e.g., TOPO), this mode will not align the frequencies of spectral lines from astrophysical sources taken at different times and thus with different observed frequencies. Only local signals at a fixed frequency such as RFI will remain aligned, for example terrestrial RFI in case of TOPO data.
The relevant user parameters are start, width, nchan, veltype, restfreq.
For this mode, outframe is not an option as start, veltype, restfreq will be interpreted literally to construct the frequency grid, with no further conversions.
**(To be implemented) *specmode=’cubesrc’**
Convert data frequencies to the SOURCE frame.
Output image base frame : SOURCE
If the FIELD table of the source being imaged contains ephemeris information, a time-dependent relabeling of the data frequencies (software Doppler tracking) is done to make spectral lines stationary in the moving source frame. If the FIELD table of the source being imaged does not contain ephemeris information (i.e. the source is not a solar system object), the software Doppler tracking will follow a conversion to LSRK. In addition, a systemic velocity has to be specified with respect to a spectral frame, which will be recorded in the image.
The relevant user parameters are: start, width, nchan, frame, veltype, restfreq, sysvel, sysvelframe. The base frame of the output image will always be SOURCE. The sysvel and sysvelframe parameters represent the systemic velocity with respect to a specific frame that will be embedded in the coordinate system. These two parameters are ignored if the ephemeris information is available. This is the only mode that allows the start and width parameters to be specified in outframe=’SOURCE’ in addition to other standard frames.
mode=’mfs’
Multi-frequency synthesis, where there is only one large image channel. This will always be in LSRK, with the image frequency coordinates decided by the spw data selection parameter.
Imaging a pre-Doppler-tracked Data Set¶
An MS output by cvel or mstransform will be treated the same way as any other MS observed directly in the frame that the MS is labeled with.
A dataset that has been relabeled in a time-independent frame ( LSRK, LSRK, BARY, etc…. ) using mstransform can use mode=’cube’. The base frame of the output image will be based the input parameters. If the MS is already in a time-independent frame, the code will detect that no additional time-dependent frequency shifts are required. A similar situation holds for datasets labeled in the SOURCE frame when mode=’cubesrc’ is used.
A dataset that needs channel binning/gridding with no extra time-dependent frequency transformations should use mode=’cubedata’ and the output frame will be ‘UNDEFINED’. For example, when an MS has already been transformed into a time-dependent frame and the user wants to image the data as is.
If the MS has been relabeled ‘REST’ using mstransform, the base frame of the output image will be ‘REST’. This method is another way to generate images with no extra time-dependent frequency transformations.
Parameters for Spectral-Axis Image Defination¶
nchan
Number of channels in output cube image.
start
The first channel of the cube image. The units of start will decide whether the user means ‘channel index’, ‘frequency’ or ‘velocity’:
start=3 : channel 3 of the first spw in the selected list (irrespective of channels selected using the ‘spw’ parameter)
start=’1.2GHz’ : start frequency. The output channels are equidistant in frequency.
start=’15 km/s’ : start velocity. if veltype=’RADIO’, channels are equidistant in both frequency and velocity. If veltype=’OPTICAL’ or ‘Z’, the channels are equidistant in velocity but not in frequency. Also see veltype section below.
width
The channel width of the resulting image cube. If width has units, it has to be the same units as start. If specified as an integer, it is taken as N times the width of a single data channel. For irregular channels in either frequency or velocity, a reasonable default for the width of a single data channel will be calculated.
If start is specified in velocity and the width is not specified (default), then the output image will be ascending velocities (or descending frequencies) with the velocity specified in start as the first image channel. Also note that since the channel frequencies in the MS can be descending or ascending order, the appropriate sign (“+” or “-” although “+” can be omitted) should be used for the width when the frequency or velocity specification is used to avoid any confusion.
outframe
Spectral reference frame in which to interpret start. This is also the frame to which the base frame of the cube image will be set for mode=’cube’. For mode=’cubesrc’, the option of specifying start in the SOURCE frame will also be allowed.
veltype
Velocity option in which to interpret start if units are ‘km/s’ :
RADIO: velocity in ‘radio definition’: \(\frac{v_{rad}}{c} = 1 - \frac{f}{f_{0}} = \frac{z}{1+z}\)
OPTICAL: velocity in ‘optical definition’: \(\frac{v_{opt}}{c} = \frac{f_{0}}{f} - 1 = z\)
Z: the same as OPTICAL
RATIO: \(\frac{v}{c}=\frac{f}{f_{0}}\) * This is accepted but there will be no real interpretation of the velocity of this type.
BETA: relativistic definition: \(\frac{v}{c}=\frac{1-\left(\frac{f}{f_{0}}\right)^2}{1+\left(\frac{f}{f_{0}}\right)^2}\)
GAMMA: \(\frac{v}{c}=\frac{1}{\sqrt{1-BETA^2}} = 1 - \frac{1+\left(\frac{f}{f_{0}}\right)^2}{2\frac{f}{f_{0}}}\) * This is accepted but there will be no real interpretation of the velocity of this type.
restfreq
A vector of rest frequencies, which will be encoded in the output image. If this parameter is left empty, the list of rest frequencies encoded in the SOURCE table corresponding to the field being imaged will be used. The first rest frequency in the list will be used to interpret start when its units indicate a velocity specification.
sysvel
Systemic velocity of a source (only for mode=’cubesrc’)
sysvelframe
Frequency frame with respect to which the systemic velocity is specified (only for mode=’cubesrc’)
interpolation
Relevant when image channel widths > data channel widths and/or start is offset from the data start. This parameter is used irregardless of whether time-dependent frame conversions happen or not. It is not used only when start and width are aligned between the data channels and image channels and no time-dependent frequency relabeling is needed.
perchanweightdensity
When calculating weight density for Briggs style weighting in a cube, this parameter determines whether to calculate the weight density for each channel independently or a common weight density for all of the selected data (the default). This parameter has no meaning for continuum (specmode=’mfs’) imaging but for cube imaging perchanweightdensity=True is a recommended alternative option that provides more uniform sensitivity per channel for cubes, but with generally larger psfs than the perchanweightdensity=False. See the tclean task pages for more information.
NOTE on data selection via ‘spw’
The user should select a range larger than what the image will need, and not try to fine-tune the list of channels. The channel mapping and binning process will pick and grid only those data channels that actually map into image channels. This process is already optimized for performance.
Note on image channel order of the output cube
The start parameter defines the spectral coordinate of the first image channel while the sign of width parameter controls direction of the increment along the spectral axis. If width is unspecified, and if start is defined as a velocity or frequency, the image channels will be ordered such that it always increases in value in the unit specified in start with increasing channel number. This is regardless of whether spectral axis order of the input visibility data is increasing or decreaseing in frequency. For example, start=’-15km/s’ with result in the image with channel 0 being -15km/s and becomes more positive as the image channel number increases. For start specified in channel (e.g. start=5) with an unspecified width, image channel frequency axis order will depend on the frequency order of the input visibility data. For a full control of the spectral axis order in the output image, the user is encouraged to set width.
Using Output Images from tclean¶
Images from tclean will have LSRK or another frame specified in outframe or SOURCE or UNDEFINED or REST as the base frame. The spectral axis of the base frame is always encoded in frequency in the output images. A regularly spaced set of frequencies is represented by the start/width/nchan parameters are listed in the image header. An irregularly spaced set of frequencies is encoded as a tabular axis.
Conversion Layer of the Spectral Reference Frame
One can attach a conversion layer for a different spectral referance frame to the image using the imreframe task or a tool script that uses cs.setconversiontype() on top of the base frame.
The CASA Viewer will, by default, will display in the base frame of the image if no conversion layer is attached. However, if the conversion layer is attached, it will honor the frame in the conversion layer and relabel image frequencies on-the-fly while displaying the spectral coordinate. The Viewer also has options to temporarily change the frame to any frequency frame or velocity convention with or without the conversion layer.
Note that conversion layers from LSRK to TOPO/GEO (time-independent frame to time-dependent frame) will be tied to one particular time during the observation. Our convention is the start time of the dataset being imaged. Tool level scripts using the imageanalysis (ia) and coordinatesystem (cs) modules can be used to extract lists of frequencies or velocities in any spectral frame and velocity convention. Within a conversion layer, the commands csys = ia.coordsys(); csys.toworld( [0,0,0,5] ) will give the frequency of channel 5 in the frame of the conversion layer. With no conversion layer, it will list channel 5 in the base frame of the image ( i.e. LSRK ). Velocities can be read out using csys helper functions, e.g., csys.(set)restfrequency(XXX); csys.frequencytovelocity( 1.5, ‘GHz’, ‘RADIO’, ‘km/s ) . Several other spectral axis relabeling options are possible in combination with the measured (me) module.
CASA Images can finally be exported to the FITS format, during which frame conversions are hard-coded.
Image channels can be regridded using the imregrid task, if the user needs an explicit regrid instead of only frequency-axis relabeling.
Notes on the Frequency Frame Conversions¶
Conversion between the different types is done with the standard MeasConvert class (MFrequency::Convert, MRadialVelocity::Convert, MDoppler::Convert). This is what is encoded in the conversion layer of CASA Images.
Some conversions are only possible if the following frame information is available:
Conversion to/from REST needs Radial Velocity information. The sysvel parameter in mode=’cubesrc’ will be used for this. For an MS already at REST, no conversions are needed.
Conversion to/from TOPO and GEO needs Epoch information. This is set in the conversion layer for mode=’cube’ as the start time of the MS (after the data selections are applied) and can be modified via the imreframe task with outframe=’TOPO’ or ‘GEO’ and subparameter epoch.
Conversion to/from TOPO needs Position information. This is read from the input MS, or Image header.
All conversions need Direction information. This is the image center from the Image header.
Alert: Conversion between the different frequencies can, due to relativistic effects, only be done approximately for very high (order c) radial velocities. Rather than convert between frequencies, a better approach would be to start from radial velocities and a rest frequency.
Alert: For large radial velocities (of order c), the conversions are not precise, and not completely reversable, due to unknown transverse velocities, and the additive way in which corrections are done. They are correct to first order with respect to relativistic effects.
Wide Band Imaging¶
Single Pointing :
Single pointing wideband imaging is available via the ‘standard’, ‘mosaic’ and ‘awproject’ gridders in the tclean task.
Joint Mosaics :
Warning : Joint-mosaic imaging with multi-term wideband imaging has been verified and validated only for imaging cases where the instrumental parameters do not change across the face of the mosaic (i.e. position-independent PSFs). A series of algorithm details related to position dependent primary beam effects and point spread functions are being worked on. Only specific modes of wideband mosaicing are currently being commissioned for the VLASS imaging pipelines.
Imaging at wideband sensitivity¶
The continuum imaging sensitivity offered by a broad band receiver is given by
where \(T_{sys}\) is the instrumental system temperature, \(\Delta\nu\) is the bandwidth of each channel, \(\Delta\tau\) is the integration time, \(N_{chan}\) is the number of frequency channels, and \(\sigma_{continuum}\) and \(\sigma_{chan}\) are theoretical wideband and narrowband image noise levels. Note that this calculation is for an ideal system whose gain is flat across the band with equally weighted channels (i.e. at the center of the primary beam).
To take full advantage of this broadband imaging sensitivity, image re-construction algorithms need to be sensitive to the effects of combining measurements from a large range of frequencies. These include frequency-dependent angular resolution and uv-coverage, frequency-dependent array element response functions, and the spectral structure of the sky brightness distribution.
UV-Coverage
Projected baseline lengths are measured in units of the observed wavelength. Therefore the \(uv\) coverage and the imaging properties of an interferometer change with frequency. As the observing frequency increases, the angular resolution for a given distribution of antennas increases (or, conversely the width of the point spread function given by \(\theta_{\nu} = 1/{u}_{max}\) radians, decreases). In addition, at higher observing frequencies, the sensitivity to large spatial scales for a given distribution of antennas decreases.
Bandwidth Smearing (limits of channel averaging)
The choice of frequency resolution (or channel width) at which visibilities must be measured (or can be averaged up to) for synthesis imaging depends on the \(uv\) grid cell size to be used during imaging, which in turn depends on the observed frequency and the desired field of view in the image. The following condition ensures that within a field of view chosen as the half power beam width of the antenna primary beam, the image-domain bandwidth smearing is smaller than the angular resolution of the instrument:
For broad-band receivers, this limit will change across the band, and the channel width should be chosen as the bandwidth smearing limit computed for \(\nu_{min}\).
Sky Brightness
Stokes I continuum emission usually has smoothly varying, continuous spectral structure often following a power law functional form with curvature, steepening and turnovers at various locations in the spectrum. Power laws and polynomials are typically used to model such sky spectra. With the MT-MFS wideband imaging algorithm, a Taylor polynomial in \(I\) vs \(\nu\) space is fitted to the data per flux component, and the resulting coefficients used to calculate the spectral index that a power law model would provide.
Primary Beam
At the center of the primary beam, bandpass calibration makes the gain flat across the band. Away from the pointing-direction, however, the frequency-dependence of the primary-beam introduces artificial spectral structure in the wideband flux model reconstructed from the combined measurements. This frequency dependence must be modeled and removed before or during multi-frequency synthesis imaging to recover both spatial and spectral structure of the sky brightness across a large field of view. In general, the frequency dependence of the primary beam can be approximated by a power law.
If \(\theta\) is the angular distance from the pointing center and \(\theta_0\) is the primary beam FWHM at the reference frequency, then the frequency dependence of the primary beam is equivalent to a spectral index of
This corresponds to an effective spectral index of -1.4 at the half power point and reference frequency.
In order to limit computational costs for cubes with a large number of channels, CASA mosaic gridder does not calculate a beam for each data channel frequency. It calculates beams at frequency steps of 0.5% of the maximum frequency in a spectral window. This in turns mean that the frequency of the channel and the frequency that corresponds to the primary beam that is used for processing that channel may be off by at most 0.25% of the frequency. This choice of stepping in beam frequencies results in images that differ fractionally from using exact frequencies by less than 10\(^{-4}\). Therefore such errors would matter if we were to make mosaics that require 1:10\(^{4}\) or more dynamic range.
Options in CASA for wideband imaging¶
WARNING: Wideband mosaicing is still in its commissioning phase and not officially endorsed in CASA 5.5. With deconvolver=’mtmfs’ for multi-term imaging including wideband primary beam correction, gridder=’awproject’ has a known bug and should not be used. For gridder=’mosaic’ the uncertainties in the derived spectral index may be larger than the xxx.alpha.error images would imply, with or without the use of conjbeams, because of systematic issues that are currently being evaluated. Development/commissioning of wideband mosaicing is ongoing and will be available in a next CASA release.
(1) MFS (nterms=1)
Traditionally, multi-frequency synthesis (MFS) imaging refers to gridding visibilities from multiple frequency channels onto a single spatial-frequency grid. It assumes that the sky brightness and the primary beam are constant across the total measured bandwidth and all frequencies measure the same visibility function just at different spatial frequencies. In this case, standard imaging and deconvolution algorithms can be used to construct an accurate continuum image.
For sources with spectral structure across the observed band, this approach converts any spectral variations of the visibility function into spurious spatial structure that does not follow the standard convolution equation in the image domain and therefore will not self-correct during deconvolution. For the VLA at L-Band, for example, a 1.0 Jy source with spectral index of -1.0 across the 1-2 GHz band will produce spectral artifacts at the \(5\times10^{-3}\) level. Therefore, sources requiring dynamic ranges (peak brightness / thermal noise) less than a few hundred will not see any of these artifacts and basic MFS imaging will suffice. Detection experiments in otherwise empty fields are a good example of when this method is most appropriate.
(2) MT-MFS (nterms>1)
To alleviate the spectral artifacts discussed above and to reconstruct the broad-band sky brightness distribution correctly, a spectral model must be folded into the reconstruction process. The advantages of such an image reconstruction are that the combined \(uv\) coverage (from all channels) is used, flux components are ‘tied’ across frequency by the use of an explicit spectral model or physically motivated constraints, and the angular resolution of the resulting intensity and spectral index images is not limited to that of the lowest frequency in the band. Under high signal-to-noise conditions, the angular resolution follows that of the highest frequency in the band. Disadvantages are that the reconstruction is often tied to a specific spectral model and will work optimally only for sources whose spectral structure can be described by that model (i.e.a low order Taylor polynomial). In low signal-to-noise situations, the unnecessary fitting of higher order terms can increase the noise and error in the results.
The MTMFS algorithm models the spectrum of each flux component by a Taylor series expansion about \(\nu_0\) .
where \(I^{sky}_t\) represents a multi-scale Taylor coefficient image,and \(N_t\) is the order of the Taylor series expansion.
A Taylor expansion of a power law yields the following expressions for the first three coefficients from which the spectral index \(I^{sky}_{\alpha}\) and curvature \(I^{sky}_{\beta}\) images can be computed algebraically.
Note that with this choice of parameterization, we are using a polynomial to model a power-law given as follows
where \(log\) represents the natural logarithm.
User controls
Reference Frequency
This is the frequency about which the Taylor expansion is done. The default is the center of the frequency range being imaged, but this is not required. The relative weights/flags of data on either side of this frequency should be inspected to ensure that the reconstruction is not ill-conditioned. The output intensity image represents the flux at this reference frequency. Please note that the value at a specific reference frequency is different from the integrated flux across a frequency range.
nterms
The number of Taylor coefficients to solve for is a user parameter. The optimal number of Taylor terms depends on the available signal-to-noise ratio, bandwidth ratio and spectral shape of the source as seen by the telescope (sky spectrum x PB spectrum). In general, nterms=2 is a good starting point for wideband EVLA imaging and the lower frequency bands of ALMA (when fractional bandwidth is greater than 10%) if there is at least one bright source for which a dynamic range of greater than few 100 is desired. Spectral artifacts for the VLA often look like spokes radiating out from a bright source (i.e. in the image made with standard mfs imaging). If increasing the number of terms does not eliminate these artifacts, check the data for inadequate bandpass calibration. If the source is away from the pointing center, consider including wide-field corrections too.
The signal-to-noise ratio of the source must also be considered when choosing nterms. Note that the Taylor polynomial is in I vs \(\nu\) space. This means that even for a pure power law, one may need nterms=3 or 4 in order to properly fit the data if there is adequate signal to see more spectral variation than a straight line. One should avoid trying to fit a high-order polynomial to low signal-to-noise data.
Data Products
Taylor Coefficient Images
The basic products of the MT-MFS algorithm are a set of \(N+1\) (multi-scale) Taylor coefficient images that describe the spectrum of the sky brightness at each pixel (coefficients of an \(N^{th}\)-order polynomial). The \(0^{th}\)-order coefficient image is the Stokes I intensity image at the reference frequency.
Multi-Term Restoration
The restoration step of the MT-MFS algorithm performs two actions in addition to the standard convolution of the model with a Gaussian beam and adding back of the residuals. First, it converts the residuals into the Taylor coefficient space before adding them to the smoothed model components (which are already Taylor coefficients). The residuals (or error) will typically be higher for higher order terms. Since the terms are not strictly independent, errors from including higher order terms may slightly increase the noise floor even on the zeroth order intensity image. This arises because the concept of a ‘residual image’ is different for a multi-term algorithm. For standard narrow-band imaging, the residual or dirty image already has sky-domain fluxes. For multi-term imaging, the residual or dirty image must be further processed to calculate Taylor coefficients which represent sky-domain fluxes. It is this step that will provide accurate spectral indices (for example) from undeconvolved dirty images (i.e. tclean runs with niter=0 and deconvolver=’mtmfs’).
Calculating Spectral Index
Spectral index is computed as \(I^{sky}_{\alpha} = I^m_1 / I^m_0\), for all pixels above a threshold applied to the \(I^m_0\). Other pixels are zeroed out and a mask is applied. Currently this threshold is automatically calculated to be 5 x max( peak residual, user threshold ). Right now, the spectral index calculation can be modified in two ways (a) perform the above division oneself in a python script or (b) use the widebandpbcor task with action=’calcalpha’. The ability to control this within tclean itself will be added in the future.
Spectral curvature (when possible) is also computed from the Taylor coefficients.
Calculating Error in Spectral Index
An estimate of spectral index error is also provided as an output image. This is an empirical error estimate derived as the result of error propagation through the division of two noisy numbers: alpha = tt1/tt0 where the ‘error’ on tt1 and tt0 are just the values from the residual coefficient images at each pixel. In the limit of perfect deconvolution and noise-like residuals, this number can be accurate. However, in practice, deconvolution artifacts usually remain in the residual image (especially underneath extended emission) and they dominate the errors. In general, the spectral index error map should only be used as a guide of which regions of the image to trust relative to others, and not to use the absolute value of error for scientific analysis. A more useful error estimate can be derived by repeating the imaging run (especially if it involves multi-scale components) with slightly different settings of scale sizes and iteration controls, to see what is true signal and what can be attributed to reconstruction uncertainty. For high signal-to-noise compact sources, error limits of \(\pm 0.05\) can be achieved. For complicated extended emission at about SNR=100 or less, typical errors are about \(\pm 0.2\). These errors are highly correlated with how appropriately the scale sizes are chosen, with errors ranging from \(\pm 0.1\) or less up to \(\pm 0.5\) in the limit of using delta functions to try to model extended emission.
Errors on spectral curvature are much higher than for spectral index. In one example where the M87 galaxy was imaged at L-Band, only the central bright inner lobes (at dynamic range of a few thousand) showed average spectral curvature that could be trusted.
(3) Cube + imcollapse
The simplest form of wideband imaging is to treat each frequency channel independently and make an image cube. A continuum image can then be formed by first smoothing all planes to a common (lowest) angular resolution and computing the mean across frequency. Spectral structure can be modeled per pixel from this smoothed cube. The main advantage of this method is its simplicity and the fact that it does not depend on any particular spectral model. The main disadvantage is that the angular resolution of all higher frequency channels must be degraded to that of the lowest frequency before any combined analysis can be done. Also, in case of complicated spatial structure, each frequency’s \(uv\) coverage may be insufficient to guarantee reconstructions that are consistent with each other across the band.
Comparison of different wideband imaging methods¶
Type |
Cube |
MFS |
MFS with a wideband model |
---|---|---|---|
Angular Resolution |
Same angular resolution as lowest frequency data |
Same angular resolution as highest frequency data |
Same angular resolution as highest frequency data |
Continuum Sensitivity |
Narrow-band (for deconvolution) Full (after stacking) |
Full |
Full |
Weak Sources |
Low SNR sources may not be deconvolved accurately in all channels, diluting the combined result |
Accurate low SNR imaging, but ignores spectral variation of bright sources. Errors show up at dynamic ranges of a few 100. |
Accurate bright source modeling to allow detection of weak sources. |
Strong Sources |
Can handle arbitrary spectra down to the single channel sensitivity. |
Ignores Spectra |
Models spectra. Most useful for strong sources. |
Extended Emission |
Fewer constraints per channel so reconstruction may not match across channels. This leads to errors when computing spectral index |
Uses full spatial frequency coverage but ignores spectral. This can cause artifacts. |
Reconstructs structure and spectra accurately but depends on the spectral model for accuracy. |
Spectral Reconstruction |
Accurate for simple bright sources and does not depend on any predefined spectral model. |
Ignores spectra |
Models spectra using a wideband flux model during reconstruction. |
Primary Beam correction (and mosaics) |
Per channel, can be done either during gridding or after imaging |
Since an MFS image is a weighted channel average, accurate PB correction must be done per channel before combination. Post deconvolution division by a wideband primary beam is also a reasonable approximation. |
Wideband PB correction must be done either during gridding or after imaging by dividing out the primary beam and its frequency dependence from the obtained model. |
Other uses of wideband models¶
Wideband Self Calibration
The broad-band flux model generated by the MS-MFS algorithm can be used within a self-calibration loop in exactly the same manner as standard self-calibration. The purpose of such a self-calibration would be to improve the accuracy of the bandpass calibration and maintain smoothness across spectral windows or subbands that may have been treated independently.
Continuum Subtraction
In the case of accurate deconvolution, the wideband model may be subtracted out to study line emission on top of the continuum. The wideband model would be made by excluding channels that contain known line emission, predicting the wideband model over the entire frequency range, and then performing a ‘uvsub’ to subtract it out.
Example¶
The following images of 3C286 illustrate what wideband imaging artifacts look like and how they change with different values of nterms. These images were made from about 15 minutes of VLA L-Band calibrator data (1-2 GHz). Note that such clear improvements in the imaging will be visible only if there aren’t any other sources of error (e.g. calibration errors or weak residual RFI).
Wide-Band and Wide-Field Imaging¶
Wide-Band + W-term
W-Projection or faceted imaging can be combined with multi-term imaging (specmode=’mfs’, deconvolver=’mtmfs’, gridder=’widefield’ or ‘wproject’). The two algorithms are distinct enough there there are no special considerations to keep in mind when combining them.
Wide-Band + Full Beam
The frequency dependence of the primary beam introduces artificial spectral structure on the sky brightness distribution away from the pointing center. Below is an example of what this spectral structure looks like, in terms of a power law spectral index. If nothing is done to eliminate the artificial PB spectrum, it will be visible to the minor cycle during deconvolution and will be interpreted as extra sky spectral structure. Another aspect of using a wide-band primary beam is the large shelf of continuum sensitivity outside the main lobe of the average beam. This is also a region where the PB spectrum will be varying by up to 100% in positive and negative directions, also in a time-variable way. Therefore, there is increased sensitivity to sources outside the main lobe of the average PB, but very little hope of accurately imaging them without methods that carefully incorporate time- and frequency-dependent primary beam models.
Three methods to handle wide band primary beams are discussed below.
Cube Imaging
The option of cube imaging is always present, where the primary beam is corrected per channel at the end of imaging, using appropriate frequency-dependent primary beam models.
Post-deconvolution Wide-band Primary Beam Correction
If primary beams are ignored during imaging (gridders other than ‘awproject’ or ‘mosaic’), the artificial spectral structure will be absorbed into the sky model (to the extent that it is possible, given that the primary beams are squinted and rotating, creating a time-varying primary beam spectrum). The output Taylor coefficient images now represent the spectral structure of (primary beam) x sky.
Wide-band primary beam correction can be done by constructing Taylor coefficients that represent the primary beam spectrum at each pixel, and applying a polynomial division to take them out of the output images (per pixel).
Steps: (1) Compute a set of primary beams at the specified frequencies(2) Calculate Taylor-coefficient images that represent the primary beam spectrum(3) Perform a polynomial division to primary beam correct the output Taylor-coefficient images from the MT-MFS algorithm(4) Recompute spectral index (and curvature) using the corrected Taylor-coefficient images.
Currently, the widebandpbcor task performs this function, but it is scheduled to move into tclean where it will be implemented within C++, and use internally generated information about relative spectral weights.
Wideband AW-Projection
The use of wbawp=True with gridder=’awproject’ and conjbeams=True enables conjugate beams to be used during gridding. The goal is to remove the frequency dependence of the primary beam during the gridding step so that the minor cycle sees the spectral structure of only the sky. This reduces the number of Taylor terms required to model the spectrum and removes the need for any primary beam correction on the output spectral index maps.
Setting wbawp=True enables use of PB evaluated at the center frequency of each spectral window. Setting conjbeams=True enables use of the PB at the “conjugate” frequency which effectively projects-out the scaling of the PB with frequency (see Bhatnagar et al, ApJ, 2013,Vol.770, No. 2, 91) . The following plot shows the frequency dependence of a PB as a function of distance from the center of the PB. The red curves trance the total-power response of the antenna and the blue curves show the frequency dependence of the antenna response. The second figure below shows the effective frequency dependence when using conjugate beams duing imaging. The blue curve is significantly flat compared to the plot in the first figure. When imaged with conjugate beams, the effects of frequency dependent PBs is effectively removed in the images fed to the minor cycle algorithms. Image-plane based wide-band algorithms (like the MT-MFS algorithm) designed to model only sky frequency dependence can therefore be used without modification.
Wideband + Mosaics
There are several ways of constructing wideband mosaics. The three main choices are spectral (cube vs. MT-MFS), spatial (linear vs. joint mosaics), and primary beam correction (post-deconvolution corrections vs A-Projection based approaches that account for primary beams during gridding with or without correction of the frequency dependence at that stage). This results to a large number of options for the user. It is important to note that all methods have trade-offs and are not likely to give identical results (especially since in our software, different algorithms currently use different PB models).
It is recommended that when possible, to use specmode=’mfs’, deconvolver=’mtmfs’ with gridder=’awproject’ and wbawp=True in order to make wideband mosaics. For cube-based wideband mosaic imaging, it is recommended that one uses gridder=’awproject’ or ‘mosaic’ per channel with a post-deconvolution primary beam-correction per channel.
Wideband Mosaic Primary Beam
In a joint mosaic, one must keep in mind the spectral structure of the primary beam. In a single pointing, the spurious spectral structure is significant only away from the pointing center. Therefore, wideband options may not be required if the source of interest covers a small region at the center of the beam and if its own spectral structure isn’t strong enough to warrant multi-term imaging. However, in a mosaic, this primary beam spectral structure is present across the entire field of view of the mosaic, making even the imaging of flat-spectrum compact sources an exercise in wide-field and wide-band imaging.
Wide-Field Imaging¶
Wide-field imaging typically refers to fields of view over which the basic 2D Fourier transform assumption of interferometric imaging does not apply and where standard on-axis calibration will not suffice.
WARNING : Imaging modes with A-Projection and mosaics (gridder=’mosaic’ and ‘awproject’) have been validated only for a few usage modes and use cases as required by the ALMA and VLASS pipelines. Please use these modes at your own discretion, and carefully read the Known Issues for CASA 5.6 when using AWproject. Other wide-field imaging modes ( wproject, widefield, facets) work as expected.
The non-coplanar baseline effect: W-term¶
For wide-field imaging, sky curvature and non-coplanar baselines result in a non-zero w-term. Standard 2D imaging applied to such data will produce artifacts around sources away from the phase center. CASA has two methods to correct the w-term effect.
Faceting
In this method, visibilities are gridded multiple times onto the same uv-grid, each time with a different phase-reference center. One single dirty/residual image is constructed from the resulting grid and deconvolved using a single PSF (picked from the first facet). This deconvolution is not affected by emission that crosses facet boundaries, unlike in image-domain faceting, which is an older approach where small facet images are deconvolved separately before being stitched together. [27]
In tclean, faceting is available via gridder=’widefield’ where you can specify the number of desired facets on a side. It can be used along with W-Projection as well, for very large fields of view.
W-projection
In this method, visibilities with non-zero w-values are gridded using using a gridding convolution function (GCF) given by the Fourier transform of the Fresnel EM-wave propagator across a distance of w wavelengths. In practice, GCFs are computed for a finite set of w-values (wprojplanes) and applied during gridding. W-projection is roughly an order of magnitude faster than faceted imaging because it grids each visibility only once [28].
In tclean, w-projection is available via gridder=’widefield’ or ‘wproject’ or ‘awproject’. In all cases, the ‘wprojplanes’ parameter must be set. It represents the number of discrete w-values to be used to quantize the range of w-values present in the dataset being imaged. An appropriate value of wprojplanes depends on whether there is a bright source far from the phase center, the desired dynamic range of an image in the presence of a bright far out source, the maximum w-value in the measurements, and the desired trade off between accuracy and computing cost. As a (rough) guide, VLA L-Band D-config may require a value of 128 for a source 30arcmin away from the phase center. A-config may require 1024 or more. To converge to an appropriate value, try starting with 128 and then increasing it if artifacts persist. W-term artifacts (for the VLA) typically look like arc-shaped smears in a synthesis image or a shift in source position between images made at different times. These artifacts are more pronounced the further the source is from the phase center. There is no harm in simply always choosing a large value (say, 1024) but there will be a significant performance cost to doing so, especially for gridder=’awproject’ where it is combined with A-Projection. wprojplanes=-1 may be used with gridder=’widefield’ or ‘wproject’ to automatically compute the number of planes. The formula that CASA uses to calculate the number of plans when wprojplanes=-1 is:
\(N_\mathrm{wprojplanes} = 0.5\times \frac{W_\mathrm{max}}{\lambda} \times \frac{\mathrm{imsize}}{\mathrm{(radians)}}\)
where \(W_\mathrm{max}\) is the maximum \(w\) in your \(uvw\) data and imsize is the largest linear size of your image. This formula is somewhat conservative and it is possible to achieve good results by using a smaller number of planes, which can also save on speed and memory.
WARNING: The algorithm that automatically calculates the number of w-bins (wprojplanes = -1) errs on the side of numerical accuracy and tends to over-estimate the number of w-bins needed. For very wide fields of view, this could result in a significant increase in required memory. It is therefore useful to point out that it is safe to manually choose a value to avoid problems associated with limited memory resources. One can do a few tests with different wprojplane values in order to find out at which values any shifts in source positions are no longer noticeable.
Antenna Voltage/Power Patterns: Primary-Beam¶
The aperture-illumination-function (AIF) of each antenna results in a direction-dependent complex gain that can vary with time and is usually different for each antenna. The resulting antenna power pattern is called the primary beam. There are two methods to correct for the effect of the primary beam.
Image-domain PB-correction
A simple method of correcting the effect of the primary beam is a post-deconvolution image-domain division of the model image by an estimate of the average primary beam or some other model. This method ignores primary-beam variations across baselines and time, and is therefore approximate, limiting the imaging dynamic-range even within the main lobe of the beam. This approach also cannot handle heterogenous arrays.
In tclean, this option is available by setting pbcor=True. When used with gridder=’standard’ or ‘widefield’ or ‘wproject’ which do not internally use any primary beam models, it will compute a model PB at the reference frequency per image channel, and divide it out of the output restored image. If used with gridder=’mosaic’ or ‘awproject’, it will use a weighted average of the primary beam models used by the gridders per baseline and timestep.
Primary Beam correction for wide bandwidth observations is discussed in the Wideband Imaging section.
A-Projection
Time and baseline-dependent corrections are applied during gridding, by computing GCFs for each baseline as the convolution of the complex conjugates of two antenna aperture illumination functions. An additional image-domain normalization step is required, and can result in the image being “flat-sky” ( the image represents only the sky intensity) or “flat-noise” (the image represents the sky multiplied by the primary beam). The advantage of this method is that known time and baseline variability can be accounted for, both during gridding as well as de-gridding [29].
Different primary beam effects cause artifacts at different levels in the image [30]. Depending on the available sensitivity of an observation or desired dynamic range, one can choose to leave out some corrections and save on computing time. In general, the varying dish size in a heterogenous array is the dominant source of errors causing a dynamic range limit of a few 100. Next come large pointing offsets (such as beam squint or illumination offsets) and at a higher dynamic ranges (\(10^4\) and beyond) are other factors such as the details about feed leg structures. On its own, parallactic angle rotation causes artifacts only at a dynamic range of around \(10^5\) but if any of the other large effects (pointing offset or illumination pattern errors) are not azimuthally symmetric, then parallactic angle rotation will have an effect at much lower dynamic ranges.
gridder = ‘awproject’
In tclean, gridder=’awproject’ applies the full A-Projection algorithm and uses baseline, frequency and time dependent primary beams. They are azimuthally asymmetric to account for feed leg structures. They also include beam squint, which is corrected during gridding by applying an appropriate phase gradient across the GCFs to cancel out the polarization dependent pointing offset. The frequency dependence of the primary beam within the data being imaged is included in the calculations and can optionally also be corrected for during gridding (see Wideband Imaging section for details).
The operations of the ‘awproject’ gridder are controlled by three parameters: aterm, psterm and wprojplanes. aterm and psterm control the inclusion/exclusion of the A-term (the antenna aperture function) and the Prolate Spheroidal function (the anti-aliasing function) in the convolution functions used for gridding. wprojplanes controls the inclusion/exclusion of the w-term. The following table enumerates the operations for the different possible settings of these parameters. PS and PB in the table below refers to the Prolate Spheroidal and Primary Beam respectivelly and FT() referes to the Fourier transform operation. The last column also shows the mathematical content of the .pb images, which is one of the image-products on the disk in a tclean run. For generating a .pb image for image-plane PB correction, the gridder needs to be used with psterm=False and the cfcache parameter set to a fresh (non-existent) directory so that a fresh cfcache is generated without the PS term in it. When aterm=False, the psterm parameter needs to be set to True. It can be set to False when aterm=True. However with this setting the effects of aliasing may be there in the image, particularly near the edges.
Operation |
aterm |
psterm |
wprojplanes |
GCF |
Contents of the .pb image |
---|---|---|---|---|---|
AW-Projection |
True |
True |
>1 |
PS*A*W |
FT(PS) x PB |
‘’ |
False |
A*W |
PB |
||
A-Projection |
True |
True |
1 |
PS*A |
FT(PS) x PB |
‘’ |
False |
A |
PB |
||
W-Projection |
False |
True |
>1 |
PS*W |
FT(PS) |
Standard |
False |
True |
1 |
PS |
FT(PS) |
Full/Hybrid Mueller matrix support is being added into the system for full-polarization widefield imaging. Currently, heterogenous arrays like ALMA are not supported, but it will be suitable for VLA widefield imaging.
Parallel execution
The computing cost of A-Projection is larger than standard imaging, and cost of AW-Projection is higher than A-Projection. However, since the run time scales very well with parallelization, these costs can be effectively offset with the use of parallelization (using parallel=True; see the Imager Parallelization and Parallel Processing section for details about running casa in parallel mode). The runtime scales close to linear with the number of nodes used. We have measured this scaling for up to 200 cores, but the scaling may continue further dependening on the data size, data storage (e.g., Luster vs. standard file system), image size, algorithms used, etc. The plot below shows the measured scaling for a large EVLA L-band mosaic imaging experiment. The dark and light blue curves (legends “Make PSF + avgPB” and “Make Residual” respectively) show the measurement of the steady-state runtime as a function of the number of cores used. The lines in black associated with both these curves show the theoratical (ideal) linear scaling curves. A memo with the details of the characterization of the runtime in parallel mode can be found here. Note that parallelization is not restricted to A-Projection and can be used with any combination of gridder and deconvolver setting.
There are a number of parameters to apply approximations that can reduce the computing load.
Note that current code does not work correctly for non-square mosaic images and cube imaging. Fixes for these will be included in subsequent releases. VLA and ALMA data sets often carry the POINTING table with antenna pointing information which may not be correct. Since by default the imaging module now uses the POINTING table, the POINTING table may need to be disabled (delete all rows of the POINTING sub-table in the MS).
gridder=’mosaic’
In tclean, gridder=’mosaic’ applies an approximation of the A-Projection algorithm where it uses azimuthally symmetric beam models that can be different per baseline. It includes the diagonal of the Mueller matrix for multi-Stokes images, but ignores off-diagonals. The frequency dependence of the primary beam is accounted for but is not eliminated during gridding. Since time dependence is not supported by default, the computational cost is lower than A-Projection. Since ALMA imaging typically involves small fractional bandwidths, includes data with multiple dish sizes, and needs to operate on very large cubes with many channels, this option is suitable for ALMA. It is also possible to supply external beam models to this gridder, by setting up the vpmanager tool, and one can in principle assign beams separately for each antenna as a function of time, if needed. Note that gridder=’mosaic’ can be used even on a single pointing, especially to account for effects due to a heterogenous array.
Mosaics
Data from multiple pointings can be combined during gridding to form one single large image. In a Linear Mosaic, data from multiple pointings are imaged (and optionally deconvolved too) before being stitched together. A Joint Mosaic is a simple extension of A-Projection in which phase gradients are applied to the gridding convolution functions to map data from each pointing to a different position on the sky. In tclean, gridder=’mosaic’ and ‘awproject’ will both create joint mosaics if data from multiple pointings are selected as the input.
Pointing Offset Corrections
When the image phase center is chosen to be different from the observation phase center, a phase gradient is applied during gridding convolution to ensure that the image-domain primary beam is centered at the phase-reference direction. This situation is encountered for all joint mosaic imaging. By default, it is assumed that the antennas point in the same direction as the observation phase center specified in the FIELD subtable of the MS. However, entries may be supplied in the POINTING subtable of the MS and used instead of the FIELD table via the ‘usepointing’ parameter available to gridders=’mosaic’ and ‘awproject’. The VLASS project, for example, has time-dependent and antenna-dependent pointing offsets that are not captured in the FIELD table and which require an additional POINTING table. Note that ‘usepointing=True’ has no meaning if there are no entries in the POINTING subtable (the default with any MS). Therefore, the default is ‘usepointing=False’.
gridder=’mosaic’ reads and uses the pointing offset per timestep and baseline, but assumes that both antennas in a baseline pair are pointed in the same direction as the ANTENNA1 listed in the MS for each baseline and timestep. This has not been officially validated for CASA 5.6.
gridder=’awproject’ reads and uses the pointing offsets for both antennas in the first baseline pair listed in the MS (per timestep) and assumes this is constant across all baselines. It applies phase gradients per timestep with the assumption that all antennas are pointed in the same direction. This has been validated on VLASS 1.2 data.
WARNING: For CASA 5.6, with ‘usepointing=True’, the gridder=’mosaic’ and ‘awproject’ implement slightly different solutions. For CASA 5.6, only gridder=’awproject’ has been validated for usepointing=True. For CASA 5.7/6.1, also the ‘pointingoffsetsigdev’ parameter has been implemented for applying accurate heterogeneous pointing corrections. A few other features are expected to be implemented post 5.7/6.1, as described in the Known Issues.
Heterogeneous Array Imaging
Mosaic images may be made using arrays with dishes of different sizes, using the option gridder=’mosaic’ in tclean. For baselines formed from antennas of different types (i.e. cross baselines), primary beams and gridding convolution functions are constructed from voltage patterns of both types of antennas. For imaging purposes, tclean has correctly handled ALMA heterogeneous array data for many releases, with primary beam model parameters derived from measured beams. For heterogeneous arrays other than ALMA, special procedures are required (as described in the tutorial, and in Appendix A of NGVLA memo 67); it is also necessary to use CASA version 5.6 or later. In short, in order to ensure correct use of heterogeneous primary beam models for the NGVLA, the observatory name must be set to an entry not already internally listed (e.g. NGVLA1 instead of NGVLA), within the MS as well as in the supplied VP-Table containing primary beam images.
This heterogeneous array simulation and imaging tutorial demonstrates the use of the CASA imager for the ALMA and ngVLA, both of which involve antennas of different diameters. It demonstrates the creation and use of antenna-dependent Airy disk primary beams derived from the antenna diameter recorded within the Measurement Set, as well as the setting of antenna dependent primary beam images using the vpmanager tool. The demo also includes examples and metrics to test the accuracy of the simulation step and the imaging of each baseline type separately as well as together.
Primary Beam Models¶
gridder=’standard’, ‘wproject’, ‘widefield’, ‘mosaic’
Default PB models :
VLA: PB polynomial fit model (Napier and Rots, 1982)[31]
EVLA: New EVLA beam models (Perley 2016) [32]
ALMA : Airy disks for a 10.7m dish (for 12m dishes) and 6.25m dish (for 7m dishes) each with 0.75m blockages (Hunter/Brogan 2011). Joint mosaic imaging supports heterogeneous arrays for ALMA (Hunter/Brogan 2011)
These are all azimuthally symmetric beams. For EVLA, these models limit the dynamic range to 10^5 due to beam squint with rotation and the presence of feed leg structures. For ALMA, these models accounting only for differences in dish size, but not in any feed-leg structural differences between the different types of antennas.
Adding other PB models
Use the vpmanager tool, save its state, and supply as input to tclean’s vptable parameter
Example : For ALMA and gridder=’mosaic’, ray-traced (TICRA) beams are also available via the vpmanager tool. To use them, call the following before the tclean run:
vp.setpbimage(telescope="ALMA", compleximage='/home/casa/data/trunk/alma/responses/ALMA_0_DV__0_0_360_0_45_90_348.5_373_373_GHz_ticra2007_VP.im', antnames=['DV'+'%02d'%k for k in range(25)])
vp.saveastable('mypb.tab')
Then, supply vptable=’mypb.tab’ to tclean.
gridder = ‘awproject’
VLA / EVLA : Uses ray traced models (VLA and EVLA) including feed leg and subreflector shadows, off-axis feed location (for beam squint and other polarization effects), and a Gaussian fit for the feed beams [33].
The following figure shows an example of the ray-traced PB models. Image on the left shows the instantaneous narrow-band PB at the lowest frequency in the band while the image on the right shows the wide-band continuum beam. Sidelobes are at a few percent level and highly azimuthally asymmetric. This asymmetry shows up as time-varying gains across the image as the PB rotates on the sky with Parallactic Angle.
External Beam models for gridder= ‘awproject’
The beam models used internally in ‘awproject’ are derived from ray-traced aperture illumination functions. However since the ‘awproject’ algorithm uses the disk CF cache mechanism, a simple way to use a different beam model is to construct the disk CF cache and supply that to ‘awproject’ during imaging. The detailed documention for construcing the disk CF cache is being developed and will be released in subsequent CASA Docs release. In the meantime, if you need to access this route sooner, please contact the CASA Helpdesk who will direct you to the related (not yet released) documentation or appropriate Algorithms R&D Group (ARDG) staff.
ALMA : Similar ray-traced model as above, but since the correctness of its polarization properties remains un-verified, support for ALMA is not yet released for general users.
The current implementation of AW-Projection does not yet support heterogenous arrays (although the version of CASA’s AWProjection used by LOFAR’s LWImager does have fully heterogenous support). This, along with Full-polarization support is currently being worked on in ARDG branches.
Heterogeneous Pointing Corrections¶
Due to the high sensitivity of EVLA and ALMA telescopes, imaging performance can be limited by the antenna pointing errors. These pointing errors in general also vary significantly across the array and with time. Corrections to the true antenna pointing directions are contained in the POINTING sub-table, and if these corrections are present and accurate, they can be used to improve imaging of both single-pointing and mosaic fields. These heterogeneous pointing corrections are controlled by two parameters in tclean:
usepointing: When set to True, the antenna pointing vectors are fetched from the POINTING sub-table. When set to False (the default), the vectors are determined from the FIELD sub-table, effectively disabling correction of antenna pointing errors.
pointingoffsetsigdev: When correcting for pointing errors, the first value given in the pointingoffsetsigdev task is the size in arcsec of the bin used to discover antenna grouping for which phase gradients are computed. A compute for a new phase gradient is triggered for a bin if the length of the mean pointing vector of the antennas in the bin changes by more than the second value. The default value of this parameter is [], due a programmatic constraint. If run with this value, it will internally pick [600,600] and exercise the option of using large tolerances (10arcmin) on both axes. Please choose a setting explicitly for runs that need to use this parameter.
WARNING: Heterogeneous pointing corrections have been implemented in support of the VLA Sky Survey. This option is available only for gridder=’awproject’ and has been validated primarily with VLASS on-the-fly mosaic data where POINTING subtables have been modified after the data are recorded. The use of pointing corrections is currently unverified for general VLA and ALMA data, so users should use these parameters at their discretion.
A description of the algorithm that handles the antenna pointing corrections for the AW-Projection algorithm in CASA can be found in CASA memo 11.The implementation of heterogeneous antenna pointing corrections was driven by requirements for the VLA Sky Survey (VLASS). Additional testing of Wideband Mosaic Imaging and Pointing Corrections can be found in the CASA Memo Series Knowledgebase article.
Imager Parallelization¶
The CASA Imager employs two types of parallelism, multi-threading and multi-processing. A user may choose parallelization strategies by starting CASA using ‘casa’ versus ‘mpicasa’.
“casa” : CASA will run as a single process. Parts of the imaging code (such as FFTs and some gridding and deconvolution options) will employ multi-threading via OpenMP.
“mpicasa” : CASA will use multiple processes to partition the work. For cube imaging, both the major and minor cycles are parallelized. For continuum imaging only the major cycle is parallelized. In this mode of operation, multi-threading is disabled by default by forcing the maximum number of allowed threads to 1. In its simplest form, parallel casa may be invoked as “fullpath/mpicasa -n N fullpath/casa” where N is the number of processors to use and fullpath is the fully specified path to the directory containing the casa and mpicasa executable files.
The next few sections provide more details on design and usage.
Design : How Imager modules are structured and connected together to implement multi-process parallelization for Cube and Continuum imaging.
Usage : How to control memory use and the number of threads and processes. What to expect when comparing serial vs parallel runs in terms of numerical results as well as performance.
Design (Multi-Processing)¶
The CASA Imager consists of functionally separate modules that implement different steps of the iterative image reconstruction process. Multi-process parallelization partitions the data and images along appropriate axes and runs each of these modules separately on each piece, gathering and synchronizing results as needed. The usage modes of continuum and cube imaging are parallelized differently. For cube imaging, both the major and minor cycles are parallelized. For continuum imaging, only the major cycles are parallelized. Multi-Processing (on one node or multiple nodes) will assign/use memory independently per process.
Note : This section on multi-processing applies to when CASA is started via “mpicasa -N X” where X is the number of processes to use.
Imager modules¶
This section briefly describes the functional design of the Imager module and how it is used to implement parallelization across data rows and image planes.
The SynthesisImager implements the major cycle and does gridding, degridding and the residual visibility calculation. It consists of FTMachine [FT] constructs that operate on ImageStore [IS] objects.
The Normalizer module implements normalizations by sums of weights, including the gather operation required for parallel continuum runs. It operates on an ImageStore [IS].
The SynthesisDeconvolver implements the minor cycle deconvolution algorithms. A DeconvolverAlgorithm [DA] instance operates upon an ImageStore [IS].
The IterationController [IC] manages iteration setup and convergence checks. It is a set of python dictionaries that are transmitted to/from the deconvolvers.
The ImageStore [IS] is a construct that holds the collection of images needed by a single run of tclean. Each module constructs an ImageStore instance from a collection of images on disk. This allows a functional design with each module reading and writing data products from/to disk.
Each of these modules have separate CASA tool interfaces and can (in principle) be instantiated and used independent of each other. A PySynthesisImager python class provides a higher level interface to these tools, and top level task python code implements the sequence of major and minor cycle loops. Interactions between the blocks are shown in Figure 1.
Figure 1 : Block diagram showing how the different modules within the CASA Imager interact for serial runs.
Parallel Continuum¶
For continuum imaging, the major cycle is parallelized by partitioning the data, gridding and imaging each subset separately and then performing a gather operation to combine the images and weights prior to normalization (Figure 2). Parallelization for continuum imaging may be triggered by starting mpicasa and specifying how many processes (nproc or cpu cores) are to be used. Out of these nproc cores, nproc-1 cpus are assigned to the major cycles and 1 processor acts as coordinator and runs the minor cycle iterations. The visibility data (say it contains nrows rows) is divided along the row axis and each worker processor gets a partition to grid and FFT. Afterwards the coordinator processor collects the nproc-1 images, combines them and sends the residual/psf pair to the minor cycle (single processor).
Parallel continuum imaging runs have been shown to produce useful speedups. Test results (from 2016) are documented in CASA Memo 4.
Figure 2 : Major cycles operate on separate data partitions, and a gather step is performed prior to normalization and the minor cycle. The corresponding reverse operations are done for the model predict step.
NOTE: In addition to starting mpicasa, for continuum imaging one has to also set the parameter parallel=True in tclean. If it is set to parallel=False it will run on only one cpu core (which may be slower than running in serial casa if OMP_NUM_THREADS is unspecified and gets set to 1 by default.).
Parallel Cube - Current¶
Parallelism is achieved by partitioning on image channels and processing each partition separately for both major and minor cycles (Figure 3). By design, all image cubes are partitioned by reference only, preserving the structure of the image cube as produced by a serial run. The processing is in parallel (nproc -1 processors processing a partition each and writing back to a common product) or in serial (with the single processor processing one partition at a time). With this design the parallel and serial code use the same code and pathways, including for iteration control.
Figure 3 : Cube parallelization with synchronized iteration control.
The driving criterion here is to minimize the number of times we have to read the visibility data when making a residual image or psf. So ideally one would maximize the number of channels that can fit into memory for the gridding/degridding stage for each available process. The amount of memory is a function of the size of the image and number of copies needed for different gridders. As of writing this document it has been estimated that standard gridder may need the equivalent of 9 copies of single precision float images in memory to run without swapping, with wprojection needing 15 and mosaic needing 20. The partitioning for gridding uses these numbers to decide the number of channels per partition such that the all the required grid cubes fit in the memory available per processor. The number of available processors is also considered in this calculation, with the goal of using as many as possible.
The number of partitions can therefore range from “min(nchan,nproc)” for the situation of very few channels or very small images cubes that fit in memory, to “nchan” for the extreme case where only 1 channel can be processed at a time due to memory limits per processor.
Minor cycle partitions are chosen as the number of channels that fill an 8M pixel limit, with the finest granularity being 1 channel (i.e. when nchan=1 is itself > 8M pixels).
The reasoning for this choice is as follows : When the minor cycle is presented with a chunk of channels, deconvolution is always performed one channel at a time and the memory use is limited to that required by each deconvolver for one channel. But, when automasking is turned on, it requires the entire visible partition of accessed cubes to be loaded in memory at once. Therefore, always forcing a partitioning of one channel at a time would be ideal. However, if the image is too small the runtime of a single-plane minor cycle may be comparable to the time required for locking and writing that plane back into a common image and parallel speed-up may be insignificant or even negative as compared to serial. Experimentation on NRAO, Socorro lustre filesystem showed that if a partition contains at least 8 million pixels the locking time is not as significant as the processing time.
Note : Minor cycle algorithms do not respect any limit set by .casarc (and they have never done so). Internally, even for a serial run, only one channel/plane is send for deconvolution at a time and so memory use is always governed by that required by one channel.
This diagram (Figure 4) illustrates the code flow in the current cube imager in more detail.
Figure 4 : Code flow diagram for the CASA Cube imager.
In CASA 6.2 onwards, cube imaging uses the same code for parallel and serial. The partitioning used is always as defined above. In a serial run, partitions are iterated over in a loop, and in a parallel run they are run in parallel.
casa : Start casa in serial mode for cube imaging
Cube imaging is run in serial and the memory estimate (set by .casarc or cgroups) is interpreted as the amount available to a single processor. All the available threads in the system will be used by default (by the parts of the imager code that can use it) and this does not use extra memory. The user may limit the number of threads used by setting the environment variable OMP_NUM_THREADS.
mpicasa : Start casa in parallel mode for cube imaging
Cube imaging will use all the processes available by that invocation and multithreading is chosen as described in the section on usage below. Here too, the .casarc or cgroups setting is used.
Note : The tclean parameter “parallel” is ignored for cube imaging. It is relevant only for continuum imaging.
Parallel Cube - Old¶
In CASA 6.1 and earlier, parallel cube imaging was implemented as follows for all imaging modes. In CASA 6.2, only gridder=’awproject’ uses this mechanism for parallel cube imaging.
Figure 5 illustrates the old mechanism for cube parallelization. The image cube is partitioned along frequency, and the entire sequence of iterative image reconstruction is performed independently on each frequency chunk. At the end of the run, a reference-concatenated image is produced from the collection of image cubes per partition.
Figure 5 : Old implementation of Cube parallelization where iteration control runs independently per frequency partition.
In this mode of operation, the iteration control is independent per chunk. Calculations of the cyclethreshold to trigger each major cycle rely on the peak residual across all channels seen by the deconvolver, which in this case is per partition. Numerical results could therefore differ between serial runs and parallel runs with different partitions of the cube, although all results would still be scientifically equivalent. Interactive clean masking is not supported in this mode and different frequency chunking mechanisms (manual settings of channel chunks for serial runs and frequency partitioning for parallel) do not interact well with each other. The explicit partitioning of the image cubes prevents the ability to restart tclean for cubes with different numbers of processors, and writing the model column to disk was not easily possible.
In the CASA 6.2 development cycle, parallel cube imaging was refactored to address the above problems. The current implementation is depicted in Figures 3 and 4.
Current limitations¶
For CASA 6.2, support for the AWProject gridder has not been included in the recent refactored cube imaging effort. It uses the older version of cube imaging code, for both serial and parallel execution. Based on verification tests, this mode does run, but it has failure modes that have not yet been fully evaluated or commissioned to be fully usable. The following are warning messages meant to illustrate the situation.
All cube+awproject runs : The gridder=’awproject’ has not been fully tested for ‘cube’ imaging (parallel=True or False). Formal commissioning of this mode is expected in a subsequent release, where ‘awproject’ will be aligned with recent framework changes. Until then, please report errors/crashes if seen.
When cube+awproject, started with mpicasa : Cube imaging with awproject does not use the same MPI mechanism as the other gridders (Figs 3 and 4 above). Instead it uses the previous approach for cube imaging (Fig 5 above). Additionally, when started with mpicasa, this imaging mode will produce an error at the end of the task that says ‘parallel transport layer not initialized’. Please ignore this for now as it occurs after all computations are complete and outputs are on disk.
Note : Parallel continuum imaging is supported through the mechanism illustrated in Figure 2.
Full support for AWProject with Cube imaging is planned for an upcoming release. Note that this gridder has been classified as experimental for some time and although it has been commissioned for VLASS (and VLA) continuum in a recent release, it has never been commissioned for cube imaging along with cube partitioning using the channel-chunking mechanism.
Setup and Usage¶
Choosing the number of threads¶
Multi-threading runs automatically when invoking casa and calling any imaging tasks (tclean, deconvolver). CASA uses the FFTW library which runs any FFT calls on multiple cores using threads. Some gridders in the major cycle of tclean also run on multiple threads (standard, wproject, mosaic) using OpenMP. The multi-scale clean minor cycle also employs multi-threading across scales. Multi-threading (on one node) uses shared memory and does not consume more memory than when run on a single thread.
The maximum number of threads (or cores) used can be controlled by the user by using the environment variable OMP_NUM_THREADS.
The default number of threads depends on how one starts casa.
“casa” : The default is to use all the cores available on a node when possible. Without explicitly setting OMP_NUM_THREADS, a user may get results that differ at the 1e-8 level (i.e., well below the level at which typical scientific validation testing occurs) between different runs of serial CASA on different hardware as a function of total cores. This difference is due to third party libraries, like FFTW, which are threaded independently of how CASA is called. OpenMP multi-threading may be turned off (if needed) by launching casa as follows in a bash shell ” OMP_NUM_THREADS=1 casa ” or by setting the OMP_NUM_THREADS environment variable to 1. This will force all fft calls and tclean etc to use only 1 core.
“mpicasa” : The default setting for OMP_NUM_THREADS is 1. While it is possible to set this to a larger number, the combined use of multi-threading and multi-processing has not been thoroughly tested.
Choosing the number of processes¶
Parallelization mechanisms include python level management of jobs using mpi4py (for some situations) and C++ level MPI calls (for other situations).
For CASA 5.7/6.1 and previous versions, both continuum and cube parallelization use mpi4py, implemented as shown in Figures 2 and 5 respectively.
From CASA 5.8/6.2, cube parallelization uses MPI from C++, implemented as shown in Figures 3 and 4, for all imaging modes except the awproject gridder.
Future releases : We expect to enable awproject with the current cube imager, and also provide an option for continuum parallelization that uses cube major cycles.
To use multi-processing one has to invoke casa via the mpicasa call. Please refer to this documentation on how to use multiple nodes.
In its simplest form, multi-processing may be invoked as
path/mpicasa -n nproc path/casa
If the OMP_NUM_THREADS environment variable is not already defined, mpicasa will automatically disable multithreading by setting OMP_NUM_THREADS=1.
If the user has set OMP_NUM_THREADS to some value prior to starting mpicasa, this setting will be preserved and passed on to each process. It is thus possible to run both multi-threading and multi-processing if the user sets OMP_NUM_THREADS > 1 along with mpicasa. Please note though that this mode of operation exists but has not been thoroughly tested or exercised yet.
Choosing Memory Limits¶
The minimum amount of memory required for cube imaging is limited to what is currently used for 1 image channel. But, beyond that amount a user can specify a memory limit which the imager code will use to choose the number of image channels to use per partition.
There are two ways to set limits.
Define a “system.resources.memory” variable in the $HOME/.casarc (or .casa/rc) file. This setting will over-ride any other system-level setting for the purpose of determining how many cube channels to consider per partition. For example, setting ” system.resources.memory: 10000 ” will limit the CASA Imager to 10GB of RAM per partition/process, during cube major and minor cycles.
If the “system.resources.memory” variable is not defined in the .casarc file, casa uses the casacore HostInfo::memoryFree() method to find out how much memory is available. It uses a hard-coded sequence of system-level checks that include /proc and cgroup/torque settings (among others). Once this estimate of total available memory has been acquired, it is multiplied by another optional .casarc variable called “system.resources.memfrac” to determine the limit to be used for the calculation of the number of image channels to include per partition. The default value of the optional “system.resources.memfrac” variable is 1.
Information in the logs : When running Cube imaging via tasks such as tclean (and sdintimaging), the “Available mem” listed as a log message may be used as a record of what memory limit has been seen and used by the system. In this example, the .casarc (or .casa/rc) setting of system.resources.memory:48000 has been used along the information about having 8 processes in all, to decide to split the cube into 7 partitions. Please refer to the sections on major and minor cycle partitioning for cubes, for more information on the criteria used to choose the optimal partition size w.r.to the number of available processes.
SynthesisImagerVi2::nSubCubeFitInMemory
Required memory: 196.1 GB.
Available mem.: 48 GB (rc, mem. fraction: 80%, memory: 48)
=> Subcubes: 7. Processes on node: 8.
Note : Please note that the ‘system.resources.memory’ and ‘system.resources.memfrac’ controls are currenly available only via the .casarc file for both casa5 and casa6 and not via config.py or setup.py as listed for other configuration and startup options for casa6.
Note : On a multi-node cluster the user or system can control resources via cgroups settings. For more info at NRAO, please refer to documentation on running cluster jobs and instructions on setting resource limits.
Note: If the memory limit seen by the CASA imager (set in the .casarc or via cgroups or other system settings) is smaller than what is needed to process 1 full image channel then the usage might exceed the requested limit if the memory is physically available on the node. If the physical memory is not available, it will swap as needed (when it encounters a hard limit imposed by the operating system) or fail if allocated swap space is insufficient. The reason is that the the imager code uses these memory limits only to choose the number of image channels to use per partition (a different calculation for each imaging/gridding algorithm chosen), and not as an absolute memory limit on all operations.
Note : Continuum imaging and deconvolution have not been fitted with or tested against these settings and are generally not expected to follow these constraints yet.
When can a parallel cube run seem slower than expected ?¶
Parallelization is usually expected to offer a beneficial* speed-up.
However, there are some conditions when running in parallel are not expected to have much speedup. Mainly, when the I/O time is significant compared with compute time, the gains from compute parallelization are not useful and the overheads of setting up the parallelization will dominate.
The visibility data set is small : CPU time to grid/degrid the data is just a few multiples of the time needed to read the visibilities from disk
The output cube has only a few channels per processor available : The amount of time a processor has to wait to write the results in an image is comparable to the time it takes to write the data. It is therefore more efficient to choose a number of processors such that the number of channels per partition is large enough (approximately 8e+6 pixels in all)
Not enough memory per processor : This forces the processing to have only a few channels per partition, and it suffers from the issue mentioned in the previous point. This may be triggered by restrictions applied via .casarc and cgroups.
Slow disk : This in general brings the IO time up
Model visibilities to be written in the Measurement sets (savemodel=’modelcolumn’) : Model data writes are done by locking the data sets such that only one processor can write at a time. Based on one verification test, our measurements for CASA 6.2 show that the saving of the model column produced a 20% increase in runtime for a run with 2 major cycles, in which the model is written in the second cycle. (This was also measured to be a factor of 2 improvement over the situation for CASA 6.1).
*A future version of this documentation will include information about measured/demonstrated performance gains under some commonly-encountered situations.
Comparing data products with different choices of parallelization¶
Numerically Identical runs
In CASA 6.2, measurements have shown identical output between runs with mpicasa that use different numbers of processors. That is, runs with mpicasa -n X where X>1 produce identical results as mpicasa -n 1 (equivalent serial run).
Numerically Different runs (single precision level)
In CASA 6.2, there is a measurable difference at the single precision level between all serial runs started with ‘casa’ versus as ‘mpicasa -n 1’.
In CASA 5.8, there is no difference between serial and parallel runs (if automasking is turned off), but a difference at the single precision level appears if automasking is turned on.
CASA 6.2 and CASA 5.8 differ at the single precision level, for runs set up in the same way. This is likely due to differences in third party libraries, and should be kept in mind for all casa5 vs casa6 comparisons since CASA versions 5.6 and 6.0.
Note : In CASA 6.1 and earlier, numerical differences for cube imaging between serial and parallel were larger than any of the above differences, owing to the fact that iteration control was implemented differently for serial and parallel. This has been synchronized (logically) with serial cube imaging, in CASA 6.2, by switching from the modes shown in Fig 3 to Fig 4.
Bibliography¶
Hogbom 1974 (http://adsabs.harvard.edu/full/1974A%26AS…15..417H)
Clark 1980 (http://adsabs.harvard.edu/abs/1980A%26A….89..377C)
Schwab, 1984 (http://adsabs.harvard.edu/abs/1984AJ…..89.1076S)
Cornwell, 2008 (http://ieeexplore.ieee.org/document/4703304/)
U.Rau, 2011 (https://www.aanda.org/articles/aa/abs/2011/08/aa17104-11/aa17104-11.html)
Cornwell, 2008 (http://ieeexplore.ieee.org/document/4703511/)
Bhatnagar, 2008 (http://adsabs.harvard.edu/abs/2008A&A…487..419B)
S.Bhatnagar, 2013 (http://adsabs.harvard.edu/abs/2013ApJ…770…91B)
Cornwell, 1988 (http://adsabs.harvard.edu/abs/1988A%26A…202..316C)
Briggs D.S. 1999 (Astron. Soc. Pac. Conf. Ser.)
Cornwell, T.J 1999 (Astron. Soc. Pac. Conf. Ser.)
Cornwell, T.J., “The Generic Interferometer: II Image Solvers”, Aips++ note 184. Aug 1995
U.Rau, 2009 (http://ieeexplore.ieee.org/document/5109712/)
Briggs D., 1995, PhD Thesis, New Mexico Institute of Mining and Technology
Schwab and Cotton 1983
Hogbom 1974
Clark 1980
Cornwell 2008
Rau & Cornwell 2011
Cornwell and Evans 1985
Narayan and Nityananda 1986
Bhatnagar and Cornwell 2004
Kepley et al. 2020, Publications of the Astronomical Society of the Pacific, 132, 024505
Peirce, B. 1852, The Astronomical Journal, 2, 161.
Chauvenet, W. A Manual of Spherical and Practical Astronomy, Volume II (London, UK: Dover; reprinted in 1960 based on fifth revised and corrected edition 1891), 558–566
Rau & Cornwell (2011), A&A 532, A71 (http://adsabs.harvard.edu/abs/2011A%26A…532A..71R)
Sault et al, 1999 (https://ui.adsabs.harvard.edu/#abs/1999A&AS..139..387S/abstract)
Cornwell et al, 2008 (https://ui.adsabs.harvard.edu/#abs/2008ISTSP…2..647C/abstract)
Bhatnagar et al, 2008 (https://ui.adsabs.harvard.edu/#abs/2008A&A…487..419B/abstract)
Kundert et al 2016 (http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7762834&tag=1)
Napier and Rots, 1982 (https://library.nrao.edu/public/memos/vla/test/VLAT_134.pdf)
Perley 2016 (https://library.nrao.edu/public/memos/evla/EVLAM_195.pdf)
Brisken 2009 (https://ui.adsabs.harvard.edu/#abs/2009nsem.confE..21B/abstract)
[ ]:
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/single_dish_imaging.ipynb
Single-Dish Imaging¶
The data should already be T\(_{sys}\) and T\(_{sky}\) calibrated (at least into antenna temperature units, \(T_A^*\) [K]) , according to the process described in the single-dish calibration pages.
The CASA task used for imaging single-dish data is sdimaging. This task will return a four-dimensional array with two position axes, one frequency or velocity axis, and one polarization axis. The output of sdimaging is a CASA image that can be explored, analyzed, and manipulated in CASA, or exported into a versatile FITS image format via exportfits.
The sections below first describe the general process of gridding single-dish data followed by the actual procedures invoked with CASA.
Theoretical Description¶
A theoretical description of single-dish image generation and data gridding
For single-dish observations, ALMA uses on-the-fly mapping. The technique is described in Mangum et al. (2007) [1].
Converting single-dish observations into an image or cube is done almost entirely in the image domain. After taking and calibrating the data, the process follows three steps:
Forming the image grid
Populating the image grid
Smoothing the image data
The fundamental parameter relevant to image quality is the sampling interval. There are a number of sampling functions that need to be considered: the sky sampling function, the image grid sampling function, and the response function of the single-dish beam. These functions all convolve against each other to yield an effective image resolution somewhat poorer than the actual theoretical FWHM of the telescope primary beam.
The dimensions and extent of the image grid are determined by the mapped area on the sky. The gridding pixel size must be at least one half the size of the theoretical beam when convolved with the sky-sampling function. Since the sky sampling function is typically 1/3 to 1/5 of the primary beam, and the effective FWHM of the telescope and sky sampling function is close to that of the telescope anyway, it’s safe to use a pixel dimension that is 1/3th the width of the primary beam.
For example, a 30” telescope beam with a 6” sky sampling function has an effective FWHM of \(\sim \sqrt{(30^2+6^2)}\simeq\) 30.6”. Therefore, computing an image pixel size that is 30”/3 = 10”, is appropriately oversampling the effective beam FWHM and sampling interval.
After the coordinates of the data are transformed into sky coordinates, the image grid is formed with dimensions either consistent with the user specifications, or so that the image fully encompasses the observed sky positions.
For each pixel in the grid (e.g. in RA-Dec space), the gridding process searches through the data for measurements taken within some cutoff radius (specified by convsupport). Depending on their distance from the grid coordinate, the observation is weighted according to the kernel type and added together in the spatial domain (i.e. entire spectra are added together). If the clipminmax function is invoked, the maximum AND minimium data in the ensemble (prior to weighting) are rejected before summing. This process is repeated iteratively for each element in the grid.
Procedure for SD Imaging¶
CASA uses the sdimaging task to grid a single-dish image or cube. While the steps are described in detail here, an example of the full single-dish calibration and imaging processes can be found in the M100 Band 3 Single Dish CASAguide.
Default operation
The sdimaging task can determine and populate almost all the gridding parameters by default. Simply invoking sdimaging with the single-dish MeasurementSet and output file name will work. This will produce a single, potentially very large cube having as many channels as necessary to span the entire spectral range, with a spectral resolution equal to that of the observation.
sdimaging(infiles=sd_ms+'.bl',outfile=imagename)
The default parameter choices for imaging are selected as follows: the image pixel size is 1/3 the primary beam, the primary beam itself is computed based on the standard \(\frac{\lambda}{D}\), with some empirically-validated tapering applied. The image dimensions are determined by the spatial extent of the mapped area in the observation, and by default, all channels and all spectral windows are imaged, along with all antennas and all polarizations.
Customized operation
Of course, users can tune their data products by specifying the image size and dimensions, the frequency/velocity characteristics, the gridding and data filtering and smoothing parameters, and so on. The defaults for sdimaging for image resolution (i.e. cellsize in arcsec) is determined from the rest frequency of the 0th spectral window so that there are three pixel elements across the beam, the beam being calculated with \(b\times\frac{\lambda}{D}\). See information here about tapering: PrimaryBeamArcsec. The image extent is computed by default so that the sampled area is completely encompassed in a single rectangle, and the pixel dimension follows naturally from maxsize/cellsize. The default image center (the somewhat inappropriately-named phasecenter parameter) is computed simply as the center of that region.
These parameters can be left to be determined by sdimaging, or they can be determined using CASA tools.
Image dimensions and pixel interval
The image extent can be explicitly determined using aU.getTPSampling:
xSampling, ySampling, maxsize = aU.getTPSampling(refvis, showplot=True)
which returns an image output showing the scans, their sky angles, and positions in RA-Dec, as shown here:
Note that getTPSampling MUST operate on the original MeasurementSet (i.e. one that is not split or subselected). getTPSampling also yields the sampling rates in the x and y (i.e. azimuth and elevation) axes, as well as the maximum size of the image, in arcseconds.
The beam size used by sdimaging is determined using the aU.primaryBeamArcsec task, though this can also be invoked by the user and used to compute, for example, a cellsize and image size. The default for aU.primaryBeamArcsec corresponds to a 12m antenna with normal tapering. Setting the fwhmfactor modifies the beam taper (see discussion in PrimaryBeamArcsec).
freq=115.27e+9
fwhmfactor=1.13
diameter=12
theorybeam = aU.primaryBeamArcsec(frequency=freq*1e-9, fwhmfactor=fwhmfactor, diameter=diameter)
cell = theorybeam/9.0
imsize = int(round(maxsize/cell)*2)
The center of the image can be modified using the phasecenter parameter. Single-dish images actually have many phase centers, so the name is somewhat misleading. However it is preserved here for consistency with the interferometer terminology. In the context of single-dish data, phasecenter refers only to coordinates that will align with the center of the image, and this can be in J2000 or Az/El, e.g.
phasecenter='J2000 12h22m54.9 +15d49m15'
Frequency and/or velocity axis
The default rest frequency is the mean frequency of the first spectral window (i.e. that having the lowest spectral window ID). Of course it can instead be set by the user, or a different spectral window frequency can be used, extracted from the data using msmd tools:
msmd.open(vislist[0])
freq = msmd.meanfreq(spw)
msmd.close()
print "SPW %d: %.3f GHz" % (spw, freq*1e-9)
The third axis of the image cube can be specified using the veltype and outframe parameters. Many spectral-line observers will prefer to change these so the output has a velocity axis in the radio convention as follows:
veltype='radio',
outframe='lsrk',
and the rest frequency can be specified with:
restfreq='115.271204GHz'
The velocity extent of the image cube can be specified by selecting a spectral window (via the spw parameter), the channel range (via the nchan and start parameters), and the frequency/velocity resolution (via the width parameter). For example:
nchan=70,
mode='velocity',
start='1400km/s',
width='5km/s',
Gridding parameters
The gridding kernel defaults to a box shape, but it can be specified as a spherical (‘SF’), Primary beam (‘PB’), Gaussian (‘GAUSS’) or Gaussian*Jinc (GJINC). The recommended setting for ALMA data is a spherical (‘SF’) kernel. The convsupport parameter defines the cut-off radius for ‘SF’ in units of pixels, defaulting to 3 pixels. However, the recommended value for ALMA data is convsupport=6 (see sdimaging and Mangum et al. 2007 [1] for more details on these parameters).
The parameter stokes specifies the stokes product. At present, the weighting for stokes I is computed consistently with historical usage: I=XX/2+YY/2. While this is mathematically consistent with the computation of stokes I, it is an incorrect treatment since the computation necessarily must incorporate the contributions from Q and U. Ordinarily, these terms cancel out from the computation of stokes I, but their error parameters must be incorporated, and historically, this is not respected.
CASA development is seeking to make the computation of the weights consistent with a proper computation of stokes I, and this is done in sdfit, but it is not yet completed for sdimaging. ** However, to emphasize, while the current implementation of computation for stokes I by **sdimaging is consistent with convention, the convention is formally incorrect.
Example script
Fully specified, a call to sdimaging might look like the following:
sdimaging(infiles=sd_ms+'.bl',
field='M42',
spw='%s'%(spw),
nchan=70,
mode='velocity',
start='1400km/s',
width='5km/s',
veltype='radio',
outframe='lsrk',
restfreq='%sGHz'%(freq/1e+9),
gridfunction='SF',
convsupport=6,
stokes='I',
phasecenter='J2000 12h22m54.9 +15d49m15',
ephemsrcname='',
imsize=imsize,
cell='%sarcsec'%(cell),
overwrite=True,
outfile=imagename)
The products here are the image data, returned in the variable ‘imagename’, and also a map of weights: <imagename>.weight. The weights indicate the robustness of the gridded data on a per-pixel basis, and are important when performing further computations and analysis with the image products.
Bibliography¶
Mangum, et al. 2007, A&A, 474, 679-687 (http://www.aanda.org/articles/aa/pdf/2007/41/aa7811-07.pdf)
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/image_combination.ipynb
Image Combination¶
Joint Single Dish and Interferometer Image Reconstruction¶
The SDINT imaging algorithm allows joint reconstruction of wideband single dish and interferometer data. This algorithm is available in the task sdintimaging and described in Rau, Naik & Braun (2019).
Joint reconstruction of wideband single dish and interferometer data in CASA is experimental. Please use at own discretion.
The usage modes that have been tested are documented below.
SDINT Algorithm¶
Interferometer data are gridded into an image cube (and corresponding PSF). The single dish image and PSF cubes are combined with the interferometer cubes in a feathering step. The joint image and PSF cubes then form inputs to any deconvolution algorithm (in either cube or mfs/mtmfs modes). Model images from the deconvolution algorithm are translated back to model image cubes prior to subtraction from both the single dish image cube as well as the interferometer data to form a new pair of residual image cubes to be feathered in the next iteration. In the case of mosaic imaging, primary beam corrections are performed per channel of the image cube, followed by a multiplication by a common primary beam, prior to deconvolution. Therefore, for mosaic imaging, this task always implements conjbeams=True and normtype=’flatnoise’.
The input single dish data are the single dish image and psf cubes. The input interferometer data is a MeasurementSet. In addition to imaging and deconvolution parameters from interferometric imaging (task tclean), there are controls for a feathering step to combine interferometer and single dish cubes within the imaging iterations. Note that the above diagram shows only the ‘mtmfs’ variant. Cube deconvolution proceeds directly with the cubes in the green box above, without the extra conversion back and forth to the multi-term basis. Primary beam handling is also not shown in this diagram, but full details (via pseudocode) are available in the reference publication.
The parameters used for controlling the joint deconvolution are described on the sdintimaging task pages.
Usage Modes¶
The task sdintimaging contains the algorithm for joint reconstruction of wideband single dish and interferometer data. The sdintimaging task shares a significant number of parameters with the tclean task, but also contains unique parameters. A detailed overview of these parameters, and how to use them, can be found in the CASA Docs task pages of sdintimaging.
As seen from the diagram above and described on the sdintimaging task pages, there is considerable flexibility in usage modes. One can choose between interferometer-only, singledish-only and joint interferometer-singledish imaging. Outputs are restored images and associated data products (similar to task tclean).
The following usage modes are available in the (experimental) sdintimaging task. Tested modes include all 12 combinations of:
Cube Imaging : All combinations of the following options.
specmode = ‘cube’
deconvolver = ‘multiscale’, ‘hogbom’
usedata = ‘sdint’, ‘sd’ , ‘int’
gridder = ‘standard’, ‘mosaic’
parallel = False, True
Wideband Multi-Term Imaging : All combinations of the following options.
specmode = ‘mfs’
deconvolver = ‘mtmfs’ ( nterms=1 for a single-term MFS image, and nterms>1 for multi-term MFS image. Tests use nterms=2 )
usedata = ‘sdint’, ‘sd’ , ‘int’
gridder = ‘standard’, ‘mosaic’
parallel = False, True
NOTE: When the INT and/or SD cubes have flagged (and therefore empty) channels, only those channels that have non-zero images in both the INT and SD cubes are used for the joint reconstruction.
NOTE: Single-plane joint imaging may be run with deconvolver=’mtmfs’ and nterms=1.
NOTE: All other modes allowed by the new sdintimaging task are currently untested. Tests will be added in subsequent releases.
Examples/Demos¶
Basic test results¶
The sdintimaging task was run on a pair of simulated test datasets. Both contain a flat spectrum extended emission feature plus three point sources, two of which have spectral index=-1.0 and one which is flat-spectrum (rightmost point). The scale of the top half of the extended structure was chosen to lie within the central hole in the spatial-frequency plane at the middle frequency of the band so as to generate a situation where the interferometer-only imaging is difficult.
Please refer to the publication for a more detailed analysis of the imaging quality and comparisons of images without and with SD data.
Images from a run on the ALMA M100 12m+7m+TP Science Verification Data suite are also shown below.
Single Pointing Simulation :
Wideband Multi-Term Imaging ( deconvolver=’mtmfs’, specmode=’mfs’ )
SD + INT
A joint reconstruction accurately reconstructs both intensity and spectral index for the extended emission as well as the compact sources.
INT-only
The intensity has negative bowls and the spectral index is overly steep, especially for the top half of the extended component.
SD-only
The spectral index of the extended emission is accurate (at 0.0) and the point sources are barely visible at this SD angular resolution.
Cube Imaging ( deconvolver=’multiscale’, specmode=’cube’ )
SD + INT
A joint reconstruction has lower artifacts and more accurate intensities in all three channels, compared to the int-only reconstructions below
INT-only
The intensity has negative bowls in the lower frequency channels and the extended emission is largely absent at the higher frequencies.
SD-only
A demonstration of single-dish cube imaging with deconvolution of the SD-PSF.
In this example, iterations have not been run until full convergence, which is why the sources still contain signatures of the PSF.
Mosaic Simulation
An observation of the same sky brightness was simulated with 25 pointings.
Wideband Multi-Term Mosaic Imaging ( deconvolver=’mtmfs’, specmode=’mfs’ , gridder=’mosaic’ )
SD + INT
A joint reconstruction accurately reconstructs both intensity and spectral index for the extended emission as well as the compact sources.
This is a demonstration of joint mosaicing along with wideband single-dish and interferometer combination.
INT-only
The intensity has negative bowls and the spectral index is strongly inaccurate. Note that the errors are slightly less than the situation with the single-pointing example (where there was only one pointing’s worth of uv-coverage).
Cube Mosaic Imaging ( deconvolver=’multiscale’, specmode=’cube’, gridder=’mosaic’ )
SD + INT
A joint reconstruction produces better per-channel reconstructions compared to the INT-only situation shown below.
This is a demonstration of cube mosaic imaging along with SD+INT joint reconstruction.
INT-only
Cube mosaic imaging with only interferometer data. This clearly shows negative bowls and artifacts arising from the missing flux.
ALMA M100 Spectral Cube Imaging : 12m + 7m + TP¶
The sdintimaging task was run on the ALMA M100 Science Verification Datasets.
The single dish (TP) cube was pre-processed by adding per-plane restoringbeam information.
Cube specification parameters were obtained from the SD Image as follows
from sdint_helper import *
sdintlib = SDINT_helper()
sdintlib.setup_cube_params(sdcube='M100_TmP')
Output : Shape of SD cube : [90 90 1 70\]
Coordinate ordering : ['Direction', 'Direction', 'Stokes', 'Spectral']
nchan = 70
start = 114732899312.0Hz
width = -1922516.74324Hz
Found 70 per-plane restoring beams\#
(For specmode='mfs' in sdintimaging, please remember to set 'reffreq' to a value within the freq range of the cube.
Returned Dict : {'nchan': 70, 'start': '114732899312.0Hz', 'width': '-1922516.74324Hz'}
Task sdintimaging was run with automatic SD-PSF generation, n-sigma stopping thresholds, a pb-based mask at the 0.3 gain level, and no other deconvolution masks (interactive=False).
sdintimaging(usedata="sdint", sdimage="../M100_TP", sdpsf="",sdgain=3.0,
dishdia=12.0, vis="../M100_12m_7m", imagename="try_sdint_niter5k",
imsize=1000, cell="0.5arcsec", phasecenter="J2000 12h22m54.936s +15d48m51.848s",
stokes="I", specmode="cube", reffreq="", nchan=70, start="114732899312.0Hz",
width="-1922516.74324Hz", outframe="LSRK", veltype="radio",
restfreq="115.271201800GHz", interpolation="linear",
perchanweightdensity=True, gridder="mosaic", mosweight=True,
pblimit=0.2, deconvolver="multiscale", scales=[0, 5, 10, 15, 20],
smallscalebias=0.0, pbcor=False, weighting="briggs", robust=0.5,
niter=5000, gain=0.1, threshold=0.0, nsigma=3.0, interactive=False,
usemask="user", mask="", pbmask=0.3)
Results from two channels are show below.
LEFT : INT only (12m+7m) and RIGHT : SD+INT (12m + 7m + TP)
Channel 23
Channel 43
Moment 0 Maps : LEFT : INT only. MIDDLE : SD + INT with sdgain=1.0 RIGHT : SD + INT with sdgain=3.0
Moment 1 Maps : LEFT : INT only. MIDDLE : SD + INT with sdgain=1.0 RIGHT : SD + INT with sdgain=3.0
A comparison (shown for one channel) with and without masking is shown below.
Notes :
In the reconstructed cubes, negative bowls have clearly been eliminated by using sdintimaging to combine interferometry + SD data. Residual images are close to noise-like too (not pictured above) suggesting a well-constrained and steadily converging imaging run.
The source structure is visibly different from the INT-only case, with high and low resolution structure appearing more well defined. However, the high-resolution peak flux in the SDINT image cube is almost a factor of 3 lower than the INT-only. While this may simply be because of deconvolution uncertainty in the ill-constrained INT-only reconstruction, it requires more investigation to evaluate absolute flux correctness. For example, it will be useful to evaluate if the INT-only reconstructed flux changes significantly with careful hand-masking.
Compare with a Feathered image : http://www.astroexplorer.org/details/apjaa60c2f1 : The reconstructed structure is consistent.
The middle and right panels compare reconstructions with different values of sdgain (1.0 and 3.0). The sdgain=3.0 run has a noticeable emphasis on the SD flux in the reconstructed moment maps, while the high resolution structures have the same are the same between sdgain=1 and 3. This is consistent with expectations from the algorithm, but requires further investigation to evaluate robustness in general.
Except for the last panel, no deconvolution masks were used (apart from a pbmask at the 0.3 gain level). The deconvolution quality even without masking is consistent with the expectation that when supplied with better data constraints in a joint reconstruction, the native algorithms are capable of converging on their own. In this example (same niter and sdgain), iterative cleaning with interactive and auto-masks (based mostly on interferometric peaks in the images) resulted in more artifacts compared to a run that allowed multi-scale clean to proceed on its own.
The results using sdintimaging on these ALMA data can be compared with performance results when using feather, and when using tp2vis (ALMA study by J. Koda and P. Teuben).
Fitting a new restoring beam to the Feathered PSF¶
Since the deconvolution uses a joint SD+INT point spread function, the restoring beam is re-fitted after the feather step within the sdintimaging task. As a convenience feature, the corresponding tool method is also available to the user and may be used to invoke PSF refitting standalone, without needing an MS or any gridding of weights to make the PSF. This method will look for the imagename.psf (or imagename.psf.tt0), fit and set the new restoring beam. It is tied to the naming convention of tclean.
synu = casac.synthesisutils();
synu.fitPsfBeam(imagename='qq', psfcutoff=0.3) # Cubes
synu.fitPsfBeam(imagename='qq', nterms=2, psfcutoff=0.3) # Multi-term
Tested Use Cases¶
The following is a list of use cases that have simulation-based functional verification tests within CASA.
Wideband mulit-term imaging (SD+Int)
Wideband data single field imaging by joint-reconstruction from single dish and interferometric data to obtain the high resolution of the interferometer while account for the zero spacing information. Use multi-term multi-frequency synthesis (MTMFS) algorithm to properly account for spectral information of the source.
Wideband multi-term imaging: Int only
The same as #1 except for using interferometric data only, which is useful to make a comparison with #1 (i.e. effect of missing flux). This is equivalent to running ‘mtmfs’ with specmode=’mfs’ and gridder=’standard’ in tclean
Wideband multi-term imaging: SD only
The same as #1 expect for using single dish data only which is useful to make a comparison with #1 (i.e. to see how much high resolution information is missing). Also, sometimes, the SD PSF has significant sidelobes (Airy disk) and even single dish images can benefit from deconvolution. This is a use case where wideband multi-term imaging is applied to SD data alone to make images at the highest possible resolution as well as to derive spectral index information.
Single field cube imaging: SD+Int
Spectral cube single field imaging by joint reconstruction of single dish and interferometric data to obtain single field spectral cube image.
Use multi-scale clean for deconvolution
Single field cube imaging: Int only
The same as #4 except for using the interferometric data only, which is useful to make a comparison with #4 (i.e. effect of missing flux). This is equivalent to running ‘multiscale’ with specmode=’cube’ and gridder=’standard’ in tclean.
Single field cube imaging: SD only
The same as #4 except for using the single dish data only, which is useful to make a comparison with #4
(i.e. to see how much high resolution information is missing)
Also, it addresses the use case where SD PSF sidelobes are significant and where the SD images could benefit from multiscale (or point source) deconvolution per channel.
Wideband multi-term mosaic Imaging: SD+Int
Wideband data mosaic imaging by joint-reconstruction from single dish and interferometric data to obtain the high resolution of the interferometer while account for the zero spacing information.
Use multi-term multi-frequency synthesis (MTMFS) algorithm to properly account for spectral information of the source. Implement the concept of conjbeams (i.e. frequency dependent primary beam correction) for wideband mosaicing.
Wideband multi-term mosaic imaging: Int only
The same as #7 except for using interferometric data only, which is useful to make a comparison with #7 (i.e. effect of missing flux). Also, this is an alternate implementation of the concept of conjbeams ( frequency dependent primary beam correction) available via tclean, and which is likely to be more robust to uv-coverage variations (and sumwt) across frequency.
Wideband multi-term mosaic imaging: SD only
The same as #7 expect for using single dish data only which is useful to make a comparison with #7 (i.e. to see how much high resolution information is missing). This is the same situation as (3), but made on an image coordinate system that matches an interferometer mosaic mtmfs image.
Cube mosaic imaging: SD+Int
Spectral cube mosaic imaging by joint reconstruction of single dish and interferometric data. Use multi-scale clean for deconvolution.
Cube mosaic imaging: Int only
The same as #10 except for using the intererometric data only, which is useful to make a comparison with #10 (i.e. effect of missing flux). This is the same use case as gridder=’mosaic’ and deconvolver=’multiscale’ in tclean for specmode=’cube’.
Cube mosaic imaging: SD only
The same as #10 except for using the single dish data only, which is useful to make a comparison with #10 (i.e. to see how much high resolution information is missing). This is the same situation as (6), but made on an image coordinate system that matches an interferometer mosaic cube image.
Wideband MTMFS SD+INT with channel 2 flagged in INT
The same as #1, but with partially flagged data in the cubes. This is a practical reality with real data where the INT and SD data are likely to have gaps in the data due to radio frequency interferenece or other weight variations.
Cube SD+INT with channel 2 flagged
The same as #4, but with partially flagged data in the cubes. This is a practical reality with real data where the INT and SD data are likely to have gaps in the data due to radio frequency interferenece or other weight variations.
Wideband MTMFS SD+INT with sdpsf=””
The same as #1, but with an unspecified sdpsf. This triggers the auto-calculation of the SD PSF cube using restoring beam information from the regridded input sdimage.
INT-only cube comparison between tclean and sdintimaging
Compare cube imaging results for a functionally equivalent run.
INT-only mtmfs comparison between tclean and sdintimaging
Compare mtmfs imaging results for a functionally equivalent run. Note that the sdintimaging task implements wideband primary beam correction in the image domain on the cube residual image, whereas tclean uses the ‘conjbeams’ parameter to apply an approximation of this correction during the gridding step.
Note : Serial and Parallel Runs for an ALMA test dataset have been shown to be consistent to a 1e+6 dynamic range, consistent with differences measured for our current implementation of cube parallelization.
Feather & CASAfeather¶
Feathering is a technique used to combine a Single Dish (SD) image with an interferometric image of the same field.The goal of this process is to reconstruct the source emission on all spatial scales, ranging from the small spatial scales measured by the interferometer to the large-scale structure measured by the single dish. To do this, feather combines the images in Fourier space, weighting them by the spatial frequency response of each image. This technique assumes that the spatial frequencies of the single dish and interferometric data partially overlap. The subject of interferometric and single dish data combination has a long history. See the introduction of Koda et al 2011 (and references therein) [1] for a concise review, and Vogel et al 1984 [2], Stanimirovic et al 1999 [3], Stanimirovic 2002 [4], Helfer et al 2003 [5], and Weiss et al 2001 [6], among other referenced papers, for other methods and discussions concerning the combination of single dish and interferometric data.
The feathering algorithm implemented in CASA is as follows:
Regrid the single dish image to match the coordinate system, image shape, and pixel size of the high resolution image.
Transform each image onto uniformly gridded spatial-frequency axes.
Scale the Fourier-transformed low-resolution image by the ratio of the volumes of the two ‘clean beams’ (high-res/low-res) to convert the single dish intensity (in Jy/beam) to that corresponding to the high resolution intensity (in Jy/beam). The volume of the beam is calculated as the volume under a two dimensional Gaussian with peak 1 and major and minor axes of the beam corresponding to the major and minor axes of the Gaussian.
Add the Fourier-transformed data from the high-resolution image, scaled by \((1-wt)\) where \(wt\) is the Fourier transform of the ‘clean beam’ defined in the low-resolution image, to the scaled low resolution image from step 3.5. Transform back to the image plane.
The input images for feather must have the following characteristics:
Both input images must have a well-defined beam shape for this task to work, which will be a ‘clean beam’ for interferometric images and a ‘primary-beam’ for a single-dish image. The beam for each image should be specified in the image header. If a beam is not defined in the header or feather cannot guess the beam based on the telescope parameter in the header, then you will need to add the beam size to the header using imhead.
Both input images must have the same flux density normalization scale. If necessary, the SD image should be converted from temperature units to Jy/beam. Since measuring absolute flux levels is difficult with single dishes, the single dish data is likely to be the one with the most uncertain flux calibration. The SD image flux can be scaled using the parameter sdfactor to place it on the same scale as the interferometer data. The casafeather task (see below) can be used to investigate the relative flux scales of the images.
Feather attemps to regrid the single dish image to the interferometric image. Given that the single dish image frequently originates from other data reduction packages, CASA may have trouble performing the necessary regridding steps. If that happens, one may try to regrid the single dish image manually to the interferometric image. CASA has a few tasks to perform individual steps, including imregrid for coordinate transformations, imtrans to swap and reverse coordinate axes, the tool ia.adddegaxes() for adding degenerate axes (e.g. a single Stokes axis). See the “Image Analysis” chapter for additional options. If you have trouble changing image projections, you can try the montage package, which also has an associated python wrapper.
If you are feathering large images together, set the numbers of pixels along the X and Y axes to composite (non-prime) numbers in order to improve the algorithm speed. In general, FFTs work much faster on even and composite numbers. Then use the subimage task or tool to trim the number of pixels to something desirable.
Inputs for task feather¶
The inputs for feather are:
#feather :: Combine two images using their Fourier transforms
imagename = '' #Name of output feathered image
highres = '' #Name of high resolution (interferometer) image
lowres = '' #Name of low resolution (single dish) image
sdfactor = 1.0 #Scale factor to apply to Single Dish image
effdishdiam = -1.0 #New effective SingleDish diameter to use in m
lowpassfiltersd = False #Filter out the high spatial frequencies of the SD image
The SD data cube is specified by the lowres parameter and the interferometric data cube by the highres parameter. The combined, feathered output cube name is given by the imagename parameter. The parameter sdfactor can be used to scale the flux calibration of the SD cube. The parameter effdishdiam can be used to change the weighting of the single dish image.
The weighting functions for the data are usually the Fourier transform of the Single Dish beam FFT(PBSD) for the Single dish data, and the inverse, 1-FFT(PBSD), for the interferometric data. It is possible, however, to change the weighting functions by pretending that the SD is smaller in size via the effdishdiam parameter. This tapers the high spatial frequencies of the SD data and adds more weight to the interferometric data. The lowpassfiltersd can take out non-physical artifacts at very high spatial frequencies that are often present in SD data.
Note that the only inputs are for images; feather will attempt to regrid the images to a common shape, i.e. pixel size, pixel numbers, and spectral channels. If you are having issues with the regridding inside feather, you may consider regridding using the imregrid and specsmooth tasks.
The feather task does not perform any deconvolution but combines the single dish image with a presumably deconvolved interferometric image. The short spacings of the interferometric image that are extrapolated by the deconvolution process will be those that are down-weighted the most when combined with the single dish data. The single dish image must have a well-defined beam shape and the correct flux units for a model image (Jy/beam instead of Jy/pixel). Use the tasks imhead and immath first to convert if needed.
Starting with a cleaned synthesis image and a low resolution image from a single dish telescope, the following example shows how they can be feathered:
feather(imagename ='feather.im', #Create an image called feather.im
highres ='synth.im', #The synthesis image is called synth.im
lowres ='single_dish.im') #The SD image is called single_dish.im
Visual Interface for feather (casafeather)¶
CASA also provides a visual interface to the feather task. The interface is run from a command line outside CASA by typing casafeather in a shell. An example of the interface is shown below. To start, one needs to specify a high and a low resolution image, typically an interferometric and a single dish map. Note that the single dish map needs to be in units of Jy/beam. The output image name can be specified. The non-deconvolved (dirty) interferometric image can also be specified to use as diagnostic of the relative flux scaling of the single dish and interferometer images. See below for more details. At the top of the display, the parameters effdshdiameter and sdfactor can be provided in the “Effective Dish Diameter” and “Low Resolution Scale Factor” input boxes. One you have specified the images and parameters, press the “Feather” button in the center of the GUI window to start the feathering process. The feathering process here includes regridding the low resolution image to the high resolution image.
Figure 1: The panel shows the “Original Data Slice”, which are cuts through the u and v directions of the Fourier-transformed input images. Green is the single dish data (low resolution) and purple the interferometric data (high resolution). To bring them on the same flux scale, the low data were convolved to the high resolution beam and vice versa (selectable in color preferences). In addition, a single dish scaling of 1.2 was applied to adjust calibration differences. The weight functions are shown in yellow (for the low resolution data) and orange (for the high resolution data). The weighting functions were also applied to the green and purple slices. Image slices of the combined, feathered output image are shown in blue. The displays also show the location of the effective dish diameter by the vertical line. This value is kept at the original single dish diameter that is taken from the respective image header.
The initial casafeather display shows two rows of plots. The panel shows the “Original Data Slice”, which are either cuts through the u and v directions of the Fourier-transformed input images or a radial average. A vertical line shows the location of the effective dish diameter(s). The blue lines are the combined, feathered slices.
Figure 2: The casafeather “customize” window.
The ‘Customize’ button (gear icon on the top menu page) allows one to set the display parameters. Options are to show the slice plot, the scatter plot, or the legend. One can also select between logarithmic and linear axes; a good option is usually to make both axes logarithmic. You can also select whether the x-axis for the slices are in the u, or v, or both directions, or, alternatively a radial average in the uv-plane. For data cubes, one can also select a particular velocity plane, or to average the data across all velocity channels. The scatter plot can display any two data sets on the two axes, selected from the ‘Color Preferences’ menu. The data can be the unmodified, original data, or data that have been convolved with the high or low resolution beams. One can also select to display data that were weighted and scaled by the functions discussed above.
Figure 3: The scatter plot in casafeather. The low data, convolved with high beam, weighted and scaled is still somewhat below the equality line (plotted against high data, convolved with low beam, weighted). In this case one can try to adjust the “low resolution scale factor” to bring the values closer to the line of equality, ie. to adjust the calibration scales.
Plotting the data as a scatter plot is a useful diagnostic tool for checking for differences in flux scaling between the high and low resolution data sets.The dirty interferometer image contains the actual flux measurements made by the telescope. Therefore, if the single dish scaling is correct, the flux in the dirty image convolved with the low resolution beam and with the appropriate weighting applied should be the same as the flux of the low-resolution data convolved with the high resolution beam once weighted and scaled. If not, the sdfactor parameter can be adjusted until they are the same. One may also use the cleaned high resolution image instead of the dirty image, if the latter is not available. However, note that the cleaned high resolution image already contains extrapolations to larger spatial scales that may bias the comparison.
Bibliography¶
Koda et al 2011 (http://adsabs.harvard.edu/abs/2011ApJS..193…19K)
Vogel et al 1984 (http://adsabs.harvard.edu/abs/1984ApJ…283..655V)
Stanimirovic et al 1999 (http://adsabs.harvard.edu/abs/1999MNRAS.302..417S)
Stanimirovic et al 2002 (http://adsabs.harvard.edu/abs/2002ASPC..278..375S)
Helfer et al 2003 (http://adsabs.harvard.edu/abs/2003ApJS..145..259H)
Weiss et al 2001 (http://adsabs.harvard.edu/abs/2001A%26A…365..571W)
[ ]:
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/image_analysis.ipynb
Image Analysis¶
The task viewer is deprecated in lieu of imview and msview, which contain the same functionality. Please invoke the imview (msview) task for running the CASA Viewer to visualize images or image cubes (visibility data).
CASA Images¶
CASA images are stored as tables and can be accessed with CASA tasks and tools. Image metadata can be listed and edited with the imhead task. Further processing includes the computation of statistics including spectral indices and polarization properties, transformation onto different spatial coordinates, spatial resolutions, and spectral frames, and many other processes (see the following section on Dealing with Images for a description of tasks that operate on CASA images).
Image Headers
Image Headers contain metadata on the observation – e.g. the observing date, pointing position, object observed, etc., and the resulting image – e.g. the restoring beam size, image intensity units, spatial coordinate system, spectral parameters, stokes parameters, etc. Header metadata tells the user what is in the image, and is used by imview and other tasks to set the data array on the correct spatial and spectral coordinates, assign the intensity values correctly, and otherwise properly handle the data cube.
Image Headers can be accessed and edited via the imhead task and the msmd tool. Header data can also be inspected with the casabrowser. See the page on Image Headers for further details.
Image Axes / Velocity Systems
CASA images typically have the following axis order (python indices are zero-based): Axis 0 = RA, 1 = DEC, 2 = Stokes, 3 = Frequency. The spatial axes can alternately contain GLON/GLAT or other coordinate systems. The spectral axis of images created in CASA is always in frequency units. In addition, one or more velocity systems can be added to relabel the spectral axis. When images are imported into CASA from FITS files rather than generated within CASA itself, the above conventions may not apply. See the page on Image Import and Export for further details on importing and exporting FITS files.
The spatial and spectral axes in CASA images can be modified using CASA tasks and tools described in the Reformat Images page.
Image Masks
Internal Image Masks are stored as Boolean True/False cubes within the images themselves. There can be multiple masks stored in each data cube and one of them is defined to be the ‘default’ mask. The default mask is the one visible when the image is displayed, e.g. in the CASA Viewer, and that is applied for operations on images. All masks have labels, such as mask0 etc. and they can be selected by specifying the image name followed by the mask name, separated by a colon. For example, ‘mask1’ in ‘image.im’ is used when specifying the image as ‘image.im:mask1’. Available masks can be listed with the task makemask which can also assign any mask as the default. The same task can also be used to export masks into separate CASA zero/non-zero cubes and to import such cubes as Boolean masks inside images. In addition, makemask enables the creation of masks from image regions. More information on masks is provided on the Image Masks and LEL Masks sections.
CASA Regions
CASA Regions can be specified through simple lists in LEL (e.g. region = 'box\[[108, 108,], [148, 148]]')
or through CASA Region Text Format (CRTF) files, which are text files that contain one or more regions with specific shapes (e.g. ellipses and rectangles), sizes, and other properties. These files can be used to specify the region of an image in which to operate, and they can easily be modified by the user or converted to CASA image masks (Boolean data cubes) using the makemask task.
More information on CRTF files is available on the Region Files section.
Dealing with Images¶
Image cubes in CASA can be manipulated and analyzed in various ways mainly using tasks with an ‘im’ prefix and with the image CASA tool. Frequently, the tasks and tools handle CASA, FITS, and MIRIAD images, but we recommend using images in the CASA format.
In the following pages, useful image analysis tasks are introduced that span import/export tasks, image information, reformatting, mathematical operations, and spatial and spectral fitting. Available image analysis tasks include:
imhead — summarize and manipulate the “header” information in a CASA image
imsubimage — Create a (sub)image from a region of the image
imcontsub — perform continuum subtraction on a spectral-line image cube
imfit — image plane Gaussian component fitting
immath — perform mathematical operations on or between images
immoments — compute the moments of an image cube
impv — generate a position-velocity diagram along a slit
imstat — calculate statistics on an image or part of an image
imval — extract the data and mask values from a pixel or region of an image
imtrans — reorder the axes of an image or cube
imcollapse — collapse image along one or more axes by aggregating pixel values along that axis
imregrid — regrid an image onto the coordinate system of another image
imreframe — change the frame in which the image reports its spectral values
imrebin — rebin an image
specsmooth — 1-dimensional smooth images in the spectral and angular directions
imsmooth — 2-dimensional smooth images in the spectral and angular directions
specfit — fit 1-dimensional Gaussians, polynomial, and/or Lorentzians models to an image or image region
specflux — Report details of an image spectrum.
plotprofilemap — Plot spectra at their position
rmfit — Calculation of rotation measures
spxfit — Calculation of Spectral Indices and higher order polynomials
makemask — image mask handling
slsearch — query a subset of the Splatalogue spectral line catalog
splattotable — convert a file exported from Splatalogue to a CASA table
importfits — import a FITS image into a CASA image format table
exportfits — write out an image in FITS format
There are other tasks which are useful during image analysis. These include:
imview — there are useful region statistics and image cube slice and profile capabilities in the viewer
Common Task Parameters¶
Certain parameters are present in many image analysis tasks. These include:
imagename
The imagename parameter is used to specify the image(s) on which a task should operate. In most tasks, this will be a string containing the image name, but in some tasks, this can be a list of strings, as for example, in immath. Most image analysis tasks accept both CASA images and FITS images, although we recommend working with CASA images.
outfile
The outfile parameter specifies the name (in string format) of the file that the task should output. This parameter is only present in tasks that produce processed files (typically images) as output. It will therefore not be present for tasks that return python dictionaries, arrays, or other data types.
axes
The axes parameter is used to specify the image axes that the task should operate on, and the user should input a list of integers for this (e.g. “axes = [0,1]”). CASA images typically have the following axis order (python indices are zero-based): Axis 0 = RA, 1 = DEC, 2 = Stokes parameter, and 3 = Frequency. The imhead task can be used to confirm the axis specifications in the data cube of interest, and the axes may differ from the above sequence, particularly when using FITS data cubes or CASA images that were converted from FITS files. In the examples, we assume the above axis order.
To obtain statistics across RA and DEC for each velocity channel, the user would run the imstat task (imstat stands for “image statistics”) with “axes = [0,1]”. To obtain statistics over the spectral axis, one would run imstat with axes = [3].
box, chans, stokes
The box, chans, and stokes parameters are used to select parts of an image cube for the task to operate on. If a box is applied, the task will operate only on a specific spatial region (e.g. box = ‘100,100,200,200’ will only operate on pixels in the range (100,100) <= (x,y) <= (200,200) ). If specific channels are specified through chans, the task will select that segment of the spectral axis (e.g. chans = ‘30~45’ will operate on channels 30 through 45). In the same way, stokes selects specific Stokes parameter axes, as e.g. stokes = ‘I’. Further detail is provided in the Image Selection Parameters section.
mask
The mask parameter tells the task to operate on specific segments of the image cube, as set by a mask. The input for the mask parameter may be a conditional statement in LEL string format (e.g. mask = ‘ “ngc5921.im > 0.5’, which selects all pixels in that image that have values larger than 0.5 and zeros out all other pixels), or may be a Boolean True/False cube or an Integer zero/non-zero cube. The task will not operate on pixels that are “masked”, or zeroed out. See the Image Masks page for more detail and examples of usage.
stretch
This parameter can be True or False, with a default value of False. Set stretch = True when applying a single-plane mask to a full image cube. As an example, if you have a mask in a single spectral channel image that you wish to apply to all spectral channels in a cube, you would “stretch” the mask over all of the channels. The mask can also be stretched over all Stokes parameter planes for polarization images.
Returned Python Dictionaries
Many image analysis tasks return python dictionaries with information that is also printed to the logger. The dictionaries can be assigned to a variable and then used later for other scripting purposes. In the following the output of imstat is assigned to the python dictionary ‘test_stats’:
CASA <20>: test_stats=imstat(imagename='test.image')
CASA <21>: test
Out[21]:
{'blc': array([0, 0, 0, 0], dtype=int32),
'blcf': '17:45:40.899, -29.00.18.780, I, 1.62457e+10Hz',
'max': array([ 0.49454519]),
'maxpos': array([32, 32, 0, 0], dtype=int32),
'maxposf': '17:45:40.655, -29.00.15.580, I, 1.62457e+10Hz',
'mean': array([ 0.00033688]),
'medabsdevmed': array([ 0.]),
'median': array([ 0.]),
'min': array([-0.0174111]),
'minpos': array([15, 42, 0, 0], dtype=int32),
'minposf': '17:45:40.785, -29.00.14.580, I, 1.62457e+10Hz',
'npts': array([ 4096.]),
'q1': array([ 0.]),
'q3': array([ 0.]),
'quartile': array([ 0.]),
'rms': array([ 0.00906393]),
'sigma': array([ 0.00905878]),
'sum': array([ 1.37985568]),
'sumsq': array([ 0.3365063]),
'trc': array([63, 63, 0, 0], dtype=int32),
'trcf': '17:45:40.419, -29.00.12.480, I, 1.62457e+10Hz'}
Image Import/Export¶
The exportfits and importfits tasks enable conversion between CASA images and FITS data. The exportfits task allows you to write your CASA image to a FITS file that other packages can read, and the importfits task converts existing FITS files into CASA images. While many image analysis tasks can operate on FITS files, we recommend converting to CASA images for processing and analysis purposes.
Export CASA Image to FITS (exportfits)
The exportfits task is used to export a CASA image to FITS format. The inputs are:
#exportfits :: Convert a CASA image to a FITS file
imagename = '' #Name of input CASA image
fitsimage = '' #Name of output image FITS
#file
velocity = False #Use velocity (rather than
#frequency) as spectral axis
optical = False #Use the optical (rather than
#radio) velocity convention
bitpix = -32 #Bits per pixel
minpix = 0 #Minimum pixel value (if
#minpix > maxpix, value is
#automatically determined)
maxpix = -1 #Maximum pixel value (if
#minpix > maxpix, value is
#automatically determined)
overwrite = False #Overwrite pre-existing
#imagename
dropstokes = False #Drop the Stokes axis?
stokeslast = True #Put Stokes axis last in
#header?
history = True #Write history to the FITS
#image?
dropdeg = False #Drop all degenerate axes (e.g.
#Stokes and/or Frequency)?
Alert: The spectral axis of CASA images is nearly always in frequency rather than velocity. Velocities are computed only as a secondary mapping of the spectral channels with respect to a rest frequency. If velocity units are desired and the user sets velocity = True, exportfits will write the spectral axis in velocity units instead of in frequency units. The exportfits task will not output a FITS file with multiple spectral coordinate systems.
As a simple example of an exportfits command, the following will write the CASA image (‘ngc5921.clean.image’) as a FITS file (‘ngc5921.clean.fits’). In this case, the default parameter values will be adopted, so that the resulting FITS file will have the same axis order, number of pixels, etc. as the original CASA image.
exportfits(imagename='ngc5921.clean.image',outfile='ngc5921.clean.fits')
In some cases, the user may wish to use the dropstokes, stokeslast, and/or dropdeg parameters in order for the FITS image to be compatible with certain external applications. The dropdeg parameter will remove the frequency axis if it has a length of one channel, and/or it will drop the Stokes axis if that has a length of one (i.e. only one Stokes parameter is present). This would be useful, for example, for continuum data so that other programs will interpret it as a 2-D image rather than a cube.
See exportfits in the Global Task List for examples in which these and other parameters are specified.
FITS Image Import (importfits)
The importfits task enables the user to import a FITS image into CASA image table format. It is not essential to generate a CASA image file if you intend to simply view the image, as the CASA viewer can read FITS images, however we recommend importing to CASA image format for analyzing images with CASA. The inputs for importfits are:
#importfits :: Convert an image FITS file into a CASA image
fitsimage = '' #Name of input image FITS file
imagename = '' #Name of output CASA image
whichrep = 0 #If fits image has multiple
#coordinate reps, choose one.
whichhdu = 0 #If its file contains
#multiple images, choose one.
zeroblanks = True #Set blanked pixels to zero (not NaN)
overwrite = False #Overwrite pre-existing imagename
defaultaxes = False #Add the default 4D
#coordinate axes where they are missing
defaultaxesvalues = [] #List of values to assign to
#added degenerate axes when
#defaultaxes=True (ra,dec,freq,stokes)
As a simple example, the following command would create a CASA image named ‘ngc5921.clean.image’ from the FITS file ‘ngc5921.clean.fits’:
importfits(fitsimage='ngc5921.clean.fits',imagename='ngc5921.clean.image')
See importfits in the Global Task List for more complex examples.
Extracting data from an image (imval)
The imval task will extract the values of the data and mask from a specified region of an image and place in the task return value as a Python dictionary. The inputs are:
#imval :: Get the data value(s) and/or mask value in an image.
imagename = '' #Name of the input image
region = '' #Image Region. Use viewer
box = '' #Select one or more box regions
chans = '' #Select the channel(spectral) range
stokes = '' #Stokes params to image (I,IV,IQU,IQUV)
Area selection using box region is detailed in the Image Selection Parameters section. By default, box=’ ‘ will extract the image information at the reference pixel on the direction axes. Plane selection is controlled by chans and stokes. By default, chans=’ ‘ and stokes=’ ‘ will extract the image information in all channels and Stokes planes.For instance,
xval = imval('myimage', box='144,144', stokes='I' )
will extract the Stokes I value or spectrum at pixel 144,144, while
xval = imval('myimage', box='134,134.154,154', stokes='I' )
will extract a 21 by 21 pixel region. Extractions are returned in NumPy arrays in the return value dictionary, plus some extra elements describing the axes and selection:
CASA <2>: xval = imval('ngc5921.demo.moments.integrated')
CASA <3>: xval
Out[3]:
{'axes': [[0, 'Right Ascension'],
[1, 'Declination'],
[3, 'Frequency'],
[2, 'Stokes']],
'blc': [128, 128, 0, 0],
'data': array([ 0.89667124]),
'mask': array([ True], dtype=bool),
'trc': [128, 128, 0, 0],
'unit': 'Jy/beam.km/s'}
extracts the reference pixel value in this 1-plane image. Note that the ‘data’ and ‘mask’ elements are NumPy arrays, not Python lists. To extract a spectrum from a cube:
CASA <8>: xval = imval('ngc5921.demo.clean.image',box='125,125')
CASA <9>: xval
Out[9]:
{'axes': [[0, 'Right Ascension'],
[1, 'Declination'],
[3, 'Frequency'],
[2, 'Stokes']],
'blc': [125, 125, 0, 0],
'data': array([ 8.45717848e-04, 1.93370355e-03, 1.53750915e-03,
2.88399984e-03, 2.38683447e-03, 2.89159478e-04,
3.16268904e-03, 9.93389636e-03, 1.88773088e-02,
3.01138610e-02, 3.14478502e-02, 4.03211266e-02,
3.82498614e-02, 3.06552909e-02, 2.80734301e-02,
1.72479432e-02, 1.20884273e-02, 6.13593217e-03,
9.04005766e-03, 1.71429547e-03, 5.22095338e-03,
2.49114982e-03, 5.30831399e-04, 4.80734324e-03,
1.19265869e-05, 1.29435991e-03, 3.75700940e-04,
2.34788167e-03, 2.72604497e-03, 1.78467855e-03,
9.74952069e-04, 2.24676146e-03, 1.82263291e-04,
1.98463408e-06, 2.02975096e-03, 9.65532148e-04,
1.68218743e-03, 2.92119570e-03, 1.29359076e-03,
-5.11484570e-04, 1.54162932e-03, 4.68662125e-04,
-8.50282842e-04, -7.91683051e-05, 2.95954203e-04,
-1.30133145e-03]),
'mask': array([ True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True], dtype=bool),
'trc': [125, 125, 0, 45],
'unit': 'Jy/beam'}
To extract a region from the plane of a cube:
CASA <13>: xval = imval('ngc5921.demo.clean.image',box='126,128,130,129',chans='23')
CASA <14>: xval
Out[14]:
{'axes': [[0, 'Right Ascension'],
[1, 'Declination'],
[3, 'Frequency'],
[2, 'Stokes']],
'blc': [126, 128, 0, 23],
'data': array([[ 0.00938627, 0.01487772],
[ 0.00955847, 0.01688832],
[ 0.00696965, 0.01501907],
[ 0.00460964, 0.01220793],
[ 0.00358087, 0.00990202]]),
'mask': array([[ True, True],
[ True, True],
[ True, True],
[ True, True],
[ True, True]], dtype=bool),
'trc': [130, 129, 0, 23],
'unit': 'Jy/beam'}
CASA <15>: print xval['data'][0][1]
0.0148777160794
In this example, a rectangular box was extracted, and you can see the order in the array and how to address specific elements.
Image Headers¶
As summarized in the CASA Images page, an image header contains information on the observation – e.g. the observing date, pointing position, object observed, etc., and the resulting image – e.g. the restoring beam size, image intensity units, spatial coordinate system, spectral parameters, stokes parameters, etc.. Header metadata can also store notes on the observation and/or calibration and image processing. The header tells the user what is in the image and is used by the CASA viewer and other tasks to set the data array on the correct spatial and spectral coordinates, assign the intensity values correctly, and otherwise properly handle the data cube.
FITS image headers can be read in CASA using the listfits task, whereas CASA image headers can be read and edited using the imhead task. Additionally, the imhistory task can be used to view the history of the image, i.e. what operations or processes have been applied to it. These three tasks are described and demonstrated below.
List the Header of a FITS image
CASA can frequently read and write image FITS files directly. Nevertheless, it is advisable to convert the images to the CASA format first with importfits for some tasks and applications.
The task listfits can be used to display the Header Data Unit (HDU) of a FITS image. The input includes only the name of the of the FITS file, as follows:
#listfits :: List the HDU and typical data rows of a fits file:
fitsfile = '' #Name of input fits file
The logger will output the full FITS HDU. The example below shows the logger output for a Digital Sky Survey Image, which we have truncated somewhat due to the length of the output:
##########################################
#####Begin Task: listfits #####
listfits(fitsfile="dss.test.fits")
read fitsfile=dss.test.fits
d 29: DATE-OBS= '1998-11-24T11:83:00' /Observation: Date/Time time.
Primary Array HDU ------>>>
d 156: DATAMIN = 2701 /GetImage: Minimum returned pixel value
value has wrong data type. erted to type double.
d 157: DATAMAX = 22189 /GetImage: Maximum returned pixel value
value has wrong data type. erted to type double.
SIMPLE = T /FITS: Compliance
BITPIX = 16 /FITS: I*2 Data
NAXIS = 2 /FITS: 2-D Image Data
NAXIS1 = 891 /FITS: X Dimension
NAXIS2 = 893 /FITS: Y Dimension
EXTEND = T /FITS: File can contain extensions
DATE = '2016-11-17' /FITS: Creation Date
ORIGIN = 'STScI/MAST' /GSSS: STScI Digitized Sky Survey
SURVEY = 'POSSII-F' /GSSS: Sky Survey
REGION = 'XP061 ' /GSSS: Region Name
PLATEID = 'A2U4 ' /GSSS: Plate ID
SCANNUM = '01 ' /GSSS: Scan Number
DSCNDNUM= '00 ' /GSSS: Descendant Number
TELESCID= 3 /GSSS: Telescope ID
BANDPASS= 35 /GSSS: Bandpass Code
COPYRGHT= 'Caltech/Palomar' /GSSS: Copyright Holder
SITELAT = 33.356 /Observatory: Latitude
SITELONG= 116.863 /Observatory: Longitude
TELESCOP= 'Oschin Schmidt - D' /Observatory: Telescope
INSTRUME= 'Photographic Plate' /Detector: Photographic Plate
EMULSION= 'IIIaF ' /Detector: Emulsion
FILTER = 'RG610 ' /Detector: Filter
PLTSCALE= 67.2 /Detector: Plate Scale arcsec per mm
PLTSIZEX= 355 /Detector: Plate X Dimension mm
PLTSIZEY= 355 /Detector: Plate Y Dimension mm
PLATERA = 144.055 /Observation: Field centre RA degrees
PLATEDEC= 69.812 /Observation: Field centre Dec degrees
PLTLABEL= 'SF07740 ' /Observation: Plate Label
DATE-OBS= '1998-11-24T11:83:00' /Observation: Date/Time
EXPOSURE= 50 /Observation: Exposure Minutes
PLTGRADE= 'A ' /Observation: Plate Grade
OBSHA = 1.28333 /Observation: Hour Angle
OBSZD = 37.9539 /Observation: Zenith Distance
AIRMASS = 1.26743 /Observation: Airmass
REFBETA = 61.7761 /Observation: Refraction Coeff
REFBETAP= -0.082 /Observation: Refraction Coeff
REFK1 = -48616.4 /Observation: Refraction Coeff
REFK2 = -148442 /Observation: Refraction Coeff
CNPIX1 = 4993 /Scan: X Corner
CNPIX2 = 10823 /Scan: Y Corner
XPIXELS = 23040 /Scan: X Dimension
YPIXELS = 23040 /Scan: Y Dimension
XPIXELSZ= 15.0295 /Scan: Pixel Size microns
YPIXELSZ= 15 /Scan: Pixel Size microns
ASTRMASK= 'xp.mask ' /Astrometry: GSC2 Mask
WCSAXES = 2 /GetImage: Number WCS axes
WCSNAME = 'DSS ' /GetImage: Local WCS approximation from full plat
RADESYS = 'ICRS ' /GetImage: GSC-II calibration using ICRS system
CTYPE1 = 'RA---TAN' /GetImage: RA-Gnomic projection
CRPIX1 = 446 /GetImage: X reference pixel
CRVAL1 = 148.97 /GetImage: RA of reference pixel
CUNIT1 = 'deg ' /GetImage: degrees
CTYPE2 = 'DEC--TAN' /GetImage: Dec-Gnomic projection
CRPIX2 = 447 /GetImage: Y reference pixel
CRVAL2 = 69.6795 /GetImage: Dec of reference pixel
CUNIT2 = 'deg ' /Getimage: degrees
CD1_1 = -0.000279458 /GetImage: rotation matrix coefficient
CD1_2 = 2.15165e-05 /GetImage: rotation matrix coefficient
CD2_1 = 2.14552e-05 /GetImage: rotation matrix coefficient
CD2_2 = 0.00027889 /GetImage: rotation matrix coefficient
OBJECT = 'data ' /GetImage: Requested Object Name
DATAMIN = 2701 /GetImage: Minimum returned pixel value
DATAMAX = 22189 /GetImage: Maximum returned pixel value
OBJCTRA = '09 55 52.730' /GetImage: Requested Right Ascension (J2000)
OBJCTDEC= '+69 40 45.80' /GetImage: Requested Declination (J2000)
OBJCTX = 5438.47 /GetImage: Requested X on plate (pixels)
OBJCTY = 11269.3 /GetImage: Requested Y on plate (pixels)
END (0,0) = 4058 (0,1) = 4058
Reading and Manipulating CASA Image Headers
CASA image headers can be accessed and edited with the imhead task. The imagename and mode are the two primary parameters in the imhead task. The imhead task can be run with mode=’summary’, ‘list’, ‘get’, ‘put’, ‘add’, ‘del’, or ‘history’, and setting the mode opens up mode-specific sub-parameters. Many of these modes are described below and further documented in the imhead page of the Global Task List.
The default mode is mode=’summary’, which prints a summary of the image properties to the logger and terminal, and returns a dictionary containing header information. With mode=’summary’, imhead has the following inputs:
#imhead :: List, get and put image header parameters
imagename = '' #Name of the input image
mode = 'summary' #imhead options: add, del,
#get, history, list, put, summary
verbose = False #Give a full listing of
#beams or just a short summary?
#Only used when the image has multiple beams
#and mode='summary'.
Note that to capture the dictionary, it must be assigned as a Python variable, e.g. by running:
header_summary = imhead('ngc5921.demo.cleanimg.image',mode='summary')
Setting mode=’list’ prints all header keywords and values to the logger and terminal, and returns a dictionary containing the keywords and values. This mode does not have any sub-parameters.
The mode=’get’ setting allows the user to retrieve the value for a specified keyword hdkey:
#imhead :: List, get and put image header parameters
imagename = '' #Name of the input image
mode = 'get' #imhead options: list, summary, get, put
hdkey = '' #The FITS keyword
The mode=’put’ setting allows the user to replace the current value for a given keyword hdkey with that specified in hdvalue. There are two sub-parameters that are opened by this option:
#imhead :: List, get and put image header parameters
imagename = '' #Name of the input image
mode = 'put' #imhead options: list, summary, get, put
hdkey = '' #The FITS keyword
hdvalue = '' #Value of hdkey
Alert: Be careful when using mode=’put’. This task does not check whether the values you specify (e.g. for the axes types) are valid, and you can render your image invalid. Make sure you know what you are doing when using this option!
Examples for imhead
In the command below, we print the header summary to the logger:
CASA <51>: imhead('ngc5921.demo.cleanimg.image',mode='summary')
The logger output is the following:
#####Begin Task: imhead #####
Image name : ngc5921.demo.cleanimg.image
Object name : N5921_2
Image type : PagedImage
Image quantity : Intensity
Pixel mask(s) : None
Region(s) : None
Image units : Jy/beam
Restoring Beam : 52.3782 arcsec, 45.7319 arcsec, -165.572 deg
Direction reference : J2000
Spectral reference : LSRK
Velocity type : RADIO
Rest frequency : 1.42041e+09 Hz
Pointing center : 15:22:00.000000 +05.04.00.000000
Telescope : VLA
Observer : TEST
Date observation : 1995/04/13/00:00:00
Telescope position: [-1.60119e+06m, -5.04198e+06m, 3.55488e+06m] (ITRF)
Axis Coord Type Name Proj Shape Tile Coord value at pixel Coord incr Units
------------------------------------------------------------------------------------------------
0 0 Direction Right Ascension SIN 256 64 15:22:00.000 128.00 -1.500000e+01 arcsec
1 0 Direction Declination SIN 256 64 +05.04.00.000 128.00 1.500000e+01 arcsec
2 1 Stokes Stokes 1 1 I
3 2 Spectral Frequency 46 8 1.41279e+09 0.00 2.4414062e+04 Hz
Velocity 1607.99 0.00 -5.152860e+00 km/s
#####End Task: imhead
If the beam size per plane differs (for example, in a spectral data cube), the beam information will be displayed for the channel with the largest beam (i.e. the lowest frequency channel), the chennel with the smallest beam (i.e. the highest frequency channel), and the channel closest to the median beam size. If you set verbose=True, the beam information would be provided for each spectral channel (or each plane of the image). Running imhead with mode=’summary’ and verbose=False for a spectral data cube would print information on the restoring beams as follows:
Restoring Beams
Pol Type Chan Freq Vel
I Max 0 9.680e+08 0 39.59 arcsec x 22.77 arcsec pa=-70.57 deg
I Min 511 1.990e+09 -316516 20.36 arcsec x 12.05 arcsec pa=-65.67 deg
I Median 255 1.478e+09 -157949 27.11 arcsec x 15.54 arcsec pa=-70.36 deg
Setting mode=’list’ prints all header keywords and values to the logger and terminal, and returns a dictionary containing the keywords and values. In the following, we capture the resulting dictionary in the variable hlist, and print the variable.
CASA <52>: hlist = imhead('ngc5921.demo.cleanimg.image',mode='list')
CASA <53>: hlist
Out[53]:
{'beammajor': 52.378242492675781,
'beamminor': 45.731891632080078,
'beampa': -165.5721435546875,
'bunit': 'Jy/beam',
'cdelt1': '-7.27220521664e-05',
'cdelt2': '7.27220521664e-05',
'cdelt3': '1.0',
'cdelt4': '24414.0625',
'crpix1': 128.0,
'crpix2': 128.0,
'crpix3': 0.0,
'crpix4': 0.0,
'crval1': '4.02298392585',
'crval2': '0.0884300154344',
'crval3': 'I',
'crval4': '1412787144.08',
'ctype1': 'Right Ascension',
'ctype2': 'Declination',
'ctype3': 'Stokes',
'ctype4': 'Frequency',
'cunit1': 'rad',
'cunit2': 'rad',
'cunit3': '',
'cunit4': 'Hz',
'datamax': ' Not Known ',
'datamin': -0.010392956435680389,
'date-obs': '1995/04/13/00:00:00',
'equinox': 'J2000',
'imtype': 'Intensity',
'masks': ' Not Known ',
'maxpixpos': array([134, 134, 0, 38], dtype=int32),
'maxpos': '15:21:53.976, +05.05.29.998, I, 1.41371e+09Hz',
'minpixpos': array([117, 0, 0, 21], dtype=int32),
'minpos': '15:22:11.035, +04.31.59.966, I, 1.4133e+09Hz',
'object': 'N5921_2',
'observer': 'TEST',
'projection': 'SIN',
'reffreqtype': 'LSRK',
'restfreq': [1420405752.0],
'telescope': 'VLA'}
The values for these keywords can be queried using mode=’get’. In the following examples, we capture the return value:
CASA <53>: mybmaj = imhead('ngc5921.demo.cleanimg.image',mode='get',hdkey='beammajor')
CASA <54>: mybmaj
Out[54]: {'unit': 'arcsec', 'value': 52.378242492699997}
CASA <55>: myobserver = imhead('ngc5921.demo.cleanimg.image',mode='get',hdkey='observer')
CASA <56>: print myobserver
{'value': 'TEST', 'unit': ''}
You can set the values for keywords using mode=’put’. For example:
CASA <57>: imhead('ngc5921.demo.cleanimg.image',mode='put',hdkey='observer',hdvalue='CASA')
Out[57]: 'CASA'
CASA <58>: imhead('ngc5921.demo.cleanimg.image',mode='get',hdkey='observer')
Out[58]: {'unit': '', 'value': 'CASA'}
Image History (imhistory)
Image headers contain records of the operations applied to them, as CASA tasks append the image header with a record of what they did. This information can be retrieved via the imhistory task, and new messages can be appended using the imhistory task as well. The primary inputs are imagename and mode, with sub-parameters arising from the selected mode. To view the history of the image, the inputs are:
#imhistory :: Retrieve and modify image history
imagename = '' #Name of the input image
mode = 'list' #Mode to run in, 'list' to
#retrieve history,'append'
#to append a record to history.
verbose = True #Write history to logger if
#mode='list'?
With verbose=True (default) the image history is also reported in the CASA logger. The imhistory task returns the messages in a Python list that can be captured by a variable, e.g.
myhistory=imhistory('image.im')
It is also possible to add messages to the image headers via mode=’append’. See the `imhistory <../api/casatasks.rst>`__ page in the Global Task List for an example of appending messages to the image history.
Reformat Images¶
This section contains a description of various tasks that reformat images. These include:
imsubimage: enables the user to extract a sub-image from a larger cube,
imtrans: changes the axis order in an image,
imregrid: sets the image onto a different spatial coordinate system or spectral grid,
imreframe: changes the velocity system of an image
imrebin: rebins an image in a spatial or spectral dimension
imcollapse: collapses an image along an axis.
Extracting sub-images
The task imsubimage provides a way to extract a smaller data cube from a bigger one. The inputs are:
#imsubimage :: Create a (sub)image from a region of the image
imagename = '' #Input image name. Default is unset.
outfile = '' #Output image name. Default is unset.
box = '' #Rectangular region to select in
#direction plane. Default is to use the
#entire direction plane.
region = '' #Region selection. Default is to use the
#full image.
chans = '' #Channels to use. Default is to use all
#channels.
stokes = '' #Stokes planes to use. Default is to use
#all Stokes planes.
mask = '' #Mask to use. Default is none.
dropdeg = True #Drop degenerate axes
[ keepaxes = [] #If dropdeg=True, these are the]
#degenerate axes to keep. Nondegenerate
#axes are implicitly always kept.
verbose = True #Post additional informative messages to
#the logger
The region keyword defines the size of the smaller cube and is specified via the CASA region CRTF syntax. E.g.
region='box [ [ 100pix , 130pix] , [120pix, 150pix ] ]'
will extract the portion of the image that is between pixel coordinates (100,130) and (12,150). dropdeg=T with selection via keepaxes is useful to remove axes in the data cube that are degenerate, i.e. axes with a single plane only. A single Stokes I axis is a common example.
Reordering the Axes of an Image Cube
Sometimes data cubes can be in axis orders that are not adequate for processing. The CASA task imtrans can change the ordering of the axis:
#imtrans :: Reorder image axes
imagename = '' #Name of the input image
outfile = '' #Name of output CASA image.
order = '' #New zero-based axes order.
wantreturn = True #Return an image tool referencing the
#transposed image
The order parameter is the most important input here. It is a string of numbers that shows how axes 0, 1, 2, 3, … are mapped onto the new cube (note that the first axis has the label 0, as typical in python). E.g. order=’1032’ will reorder the input axis 0 to be axis 1 in the output, input axis 1 to be output axis 0, input axis 2 to output axis 3 (the last axis) and input axis 3 to output axis 2. Alternatively, axes can be specified by their names. E.g., to reorder an image with right ascension, declination, and frequency and reverse the first two, order=[‘declination’, ‘right ascension’, ‘frequency’] will work. The axes names can be found typing ia.coordsys.names. Minimum match is supported, so that order=[‘d’, ‘f’, ‘r’] will produce the same results.Axes can simultaneously be transposed and reversed. To reverse an axis, precede it by a ‘-‘. For example, order=’-10-32’ will reverse the direction of the first and third axis of the input image (the zeroth and second axes in the output image).Example (swap the stokes and spectral axes in an RA-Dec-Stokes-Frequency image):
imagename = 'myim.im'
outfile = 'outim.im'
order = '0132'
imtrans()
or
outfile = 'myim_2.im'
order = 132
imtrans()
or
outfile = 'myim_3.im'
order = ['r', 'd', 'f', 's']
imtrans()
or
outfile = 'myim_4.im'
order = ['rig', 'declin', 'frequ', 'stok']
imtrans()
If the outfile parameter is empty, only a temporary image is created; no output image is written to disk. The temporary image can be captured in the returned value (assuming wantreturn=True).
Regridding an Image (imregrid)
Inside the Toolkit: More complex coordinate system and image regridding operation can be carried out in the toolkit. The coordsys (cs) tool and the ia.regrid method are the relevant components.
It is occasionally necessary to regrid an image onto a new coordinate system. The imregrid task will regrid one image onto the coordinate system of another, creating an output image. In this task, the user need only specify the names of the input, template, and output images. The default inputs are:
#imregrid :: regrid an image onto a template image
imagename = '' #Name of the source image
template = 'get' #A dictionary, refcode, or name of an
#image that provides the output shape
#and coordinate system
output = '' #Name for the regridded image
asvelocity = True #Regrid spectral axis in velocity space
#rather than frequency space?
axes = [-1] #The pixel axes to regrid. -1 => all.
interpolation = 'linear' #The interpolation method. One of
#'nearest', 'linear', 'cubic'.
decimate = 10 #Decimation factor for coordinate grid
#computation
replicate = False #Replicate image rather than regrid?
overwrite = False #Overwrite (unprompted) pre-existing
#output file?
The output image will have the data in imagename regridded onto the coordinate system provided by the template parameter. template is used universally for a range of ways to define the grid of the output image:
a template image: specify an image name here and the input will be regridded to the same 3-dimensional coordinate system as the one in template. Values are filled in as blanks if they do not exist in the input. Note that the input and template images must have the same coordinate structure to begin with (like 3 or 4 axes, with the same ordering)
a coordinate system (reference code): to convert from one coordinate frame to another one, e.g. from B1950 to J2000, the template parameter can be used to specify the output coordinate system. These following values are supported: ‘J2000’, ‘B1950’, ‘B1950_VLA’, ‘GALACTIC’, ‘HADEC’, ‘AZEL’, ‘AZELSW’, ‘AZELNE’, ‘ECLIPTIC’, ‘MECLIPTIC’, ‘TECLIPTIC’, ‘SUPERGAL’
‘get’: This option returns a python dictionary in the {‘csys’: csys_record, ‘shap’: shape} format
a python dictionary: In turn, such a dictionary can be used as a template to define the final grid
Redefining the Velocity System of an Image
imreframe can be used to change the velocity system of an image. It is not applying a regridding as a change from radio to optical conventions would require, but it will change the labels of the velocity axes.
#imreframe :: Change the frame in which the image reports its spectral values
imagename = '' #Name of the input image
output = '' #Name of the output image; '' => modify input image
outframe = 'lsrk' #Spectral frame in which the frequency or velocity
#values will be reported by default
restfreq = '' #restfrequency to use for velocity values (e.g.
#'1.420GHz' for the HI line)
outframe defines the velocity frame (LSRK, BARY, etc.,) of the output image and a rest frequency should be specified to relabel the spectral axis in new velocity units.
Rebin an Image
The task imrebin allows one to rebin an image in any spatial or spectral direction:
imrebin :: Rebin an image by the specified integer factors
imagename = '' #Name of the input image
outfile = '' #Output image name.
factor = [] #Binning factors for each axis. Use
#imhead or ia.summary to determine axis
#ordering.
region = '' #Region selection. Default is to use the full
#image.
box = '' #Rectangular region to select in
#direction plane. Default is to use the entire
#direction plane.
chans = '' #Channels to use. Default is to use all
#channels.
stokes = '' #Stokes planes to use. Default is to
#use all Stokes planes. Stokes planes
#cannot be rebinned.
mask = '' #Mask to use. Default is none.
dropdeg = False #Drop degenerate axes?
crop = True #Remove pixels from the end of an axis to
#be rebinned if there are not enough to
#form an integral bin?
where factor is a list of integers that provides the numbers of pixels to be binned for each axis. The crop parameters controls how pixels at the boundaries are treated if the bin values are not multiple integers of the image dimensions.Example:
imrebin(imagename='my.im', outfile='rebinned.im', factor=[1,2,1,4], crop=T)
This leaves RA untouched, bins DEC by a factor of 2, leaves Stokes as is, and bins the spectral axis by a factor of 4. If the input image has a spectral axis with a length that is not a multiple of 4, the crop=T setting will discard the remaining 1-3 edge pixels.
Collapsing an Image Along an Axis
imcollapse allows to apply an aggregation function along one or more axes of an image. Functions supported are ‘max’, ‘mean’, ‘median’, ‘min’, ‘rms’, ‘stdev’, ‘sum’, ‘variance’ (minimum match supported). The relevant axes will then collapse to a single value or plane (i.e. they will result in a degenerate axis). The functions are specified in the function parameter of the imcollapse inputs:
#imcollapse :: Collapse image along one axis, aggregating pixel values along that axis.
imagename = '' #Name of the input image
function = '' #Function used to compute aggregation
#of pixel values.
axes = [0] #Zero-based axis number(s) or minimal
#match strings to collapse.
outfile = '' #Name of output CASA image.
box = '' #Optional direction plane box ('blcx,
#blcy, trcx trcy').
region = '' #Name of optional region file to use.
chans = '' #Optional zero-based contiguous
#frequency channel specification.
stokes = '' #Optional contiguous stokes planes
#specification.
mask = '' #Optional mask to use.
wantreturn = True #Should an image analysis tool
#referencing the collapsed image be
#returned?
wantreturn=True returns an image analysis tool containing the newly created collapsed image.Example (myimage.im is a 512x512x128x4 (ra,dec,freq,stokes; i.e. in the 0-based system, frequency is labeled as axis 2) image and we want to collapse a subimage of it along its spectral axis avoiding the 8 edge channels at each end of the band, computing the mean value of the pixels (resulting image is 256x256x1x4 in size)):
imcollapse(imagename='myimage.im', outfile='collapse_spec_mean.im',
function='mean', axis=2, box='127,127,383,383', chans='8~119')
Note that imcollapse will not smooth to a common beam for all planes if they differ. If this is desired, run imsmooth before imcollapse.
Spectral Analysis¶
Continuum Subtraction on an Image Cube (imcontsub)
One method to separate line and continuum emission in an image cube is to specify a number of line-free channels in that cube, make a linear fit to the visibilities in those channels, and subtract the fit from the whole cube. Note that the task uvcontsub serves a similar purpose but the subtraction is performed in visibility space (see UV Continuum Subtraction. The imcontsub task will perform a polynomial baseline fit to the specified channels from an image cube and subtract it from all channels. The default inputs are:
#imcontsub :: Continuum subtraction on images
imagename = '' #Name of the input image
linefile = '' #Output line image file name
contfile = '' #Output continuum image file name
fitorder = 0 #Polynomial order for the continuum estimation
region = '' #Image region or name to process see viewer
box = '' #Select one or more box regions
chans = '' #Select the channel(spectral) range
stokes = '' #Stokes params to image (I,IV,IQU,IQUV)
ALERT: imcontsub has issues when the image does not contain a spectral or stokes axis. Errors are generated when run on an image missing one or both of these axes. You will need to use the toolkit (e.g. the ia.adddegaxes method) to add degenerate missing axes to the image.
Examples for imcontsub
For example, we first make a clean image from data in which no uv-plane continuum subtraction has been performed:
#Now clean, keeping all the channels except first and last
default('clean')
vis = 'ngc5921.demo.src.split.ms'
imagename = 'ngc5921.demo.nouvcontsub'
mode = 'channel'
nchan = 61
start = 1
width = 1
imsize = [256,256]
psfmode = 'clark'
imagermode = ''
cell = [15.,15.]
niter = 6000
threshold='8.0mJy'
weighting = 'briggs'
robust = 0.5
mask = [108,108,148,148]
interactive=False
clean()
#It will have made the image:
#-----------------------------
#ngc5921.demo.nouvcontsub.image
#You can view this image
imview('ngc5921.demo.nouvcontsub.image')
Channels 0 through 4 and 50 through 60 are line-free. Continuum subtraction is then performed with:
default('imcontsub')
imagename = 'ngc5921.demo.nouvcontsub.image'
linefile = 'ngc5921.demo.nouvcontsub.lineimage'
contfile = 'ngc5921.demo.nouvcontsub.contimage'
fitorder = 1
chans = '0~4,50~60'
stokes = 'I'
imcontsub()
Computing the Moments of an Image Cube
For spectral line datasets, the output of the imaging process is an image cube, with a frequency or velocity channel axis in addition to the two sky coordinate axes. This can be most easily thought of as a series of image planes stacked along the spectral dimension. A useful product to compute is to collapse the cube into a moment image by taking a linear combination of the individual planes:
\(M_m(x_i,y_i) = \sum_k^N w_m(x_i,y_i,v_k)\,I(x_i,y_i,v_k)\)
for pixel i and channel k in the cube I. There are a number of choices to form the moment-m, usually approximating some polynomial expansion of the intensity distribution over velocity mean or sum, gradient, dispersion, skew, kurtosis, etc. There are other possibilities (other than a weighted sum) for calculating the image, such as median filtering, finding minima or maxima along the spectral axis, or absolute mean deviations. And the axis along which to do these calculations need not be the spectral axis (i.e. do moments along Dec for a RA-Velocity image). We will treat all of these as generalized instances of a “moment” map.The immoments task will compute basic moment images from a cube. The default inputs are:
#immoments :: Compute moments of an image cube:
imagename = '' #Input image name
moments = [0] #List of moments you would like to compute
axis = 'spectral' #The moment axis: ra, dec, lat, long, spectral, or stokes
region = '' #Image Region. Use viewer
box = '' #Select one or more box regions
chans = '' #Select the channel(spectral) range
stokes = '' #Stokes params to image (I,IV,IQU,IQUV)
mask = '' #mask used for selecting the area of the
#image to calculate the moments on
includepix = -1 #Range of pixel values to include
excludepix = -1 #Range of pixel values to exclude
outfile = '' #Output image file name (or root for multiple moments)
This task will operate on the input file given by imagename and produce a new image or set of images based on the name given in outfile.The moments parameter chooses which moments are calculated. The choices for the operation mode are:
moments=-1 - mean value of the spectrum
moments=0 - integrated value of the spectrum
moments=1 - intensity weighted coordinate; traditionally used to get ‘velocity fields’
moments=2 - intensity weighted dispersion of the coordinate; traditionally used to get ‘velocity dispersion’
moments=3 - median of I
moments=4 - median coordinate
moments=5 - standard deviation about the mean of the spectrum
moments=6 - root mean square of the spectrum
moments=7 - absolute mean deviation of the spectrum
moments=8 - maximum value of the spectrum
moments=9 - coordinate of the maximum value of the spectrum
moments=10 - minimum value of the spectrum
moments=11 - coordinate of the minimum value of the spectrum
The meaning of these is described in the CASA Toolkit Manual, that describes the associated image.moments tool.The axis parameter sets the axis along which the moment is “collapsed” or calculated. Choices are: ‘ra’, ‘dec’, ‘lat’, ‘long’, ‘spectral’, or ‘stokes’. A standard moment-0 or moment-1 image of a spectral cube would use the default choice ‘spectral’. One could make a position-velocity map by setting ‘ra’ or ‘dec’.The includepix and excludepix parameters are used to set ranges for the inclusion and exclusion of pixels based on values. For example, includepix=[0.05,100.0] will include pixels with values from 50 mJy to 1000 Jy, and excludepix=[100.0,1000.0] will exclude pixels with values from 100 to 1000 Jy.If a single moment is chosen, the outfile specifies the exact name of the output image. If multiple moments are chosen, then outfile will be used as the root of the output filenames, which will get different suffixes for each moment. For image cubes that contain different beam sizes for each plane, immoments will smooth all planes to the largest beam size first, then collapse to the desired moment.
Hints for using immoments
In order to make an unbiased moment-0 image, do not put in any thresholding using includepix or excludepix. This is so that the (presumably) zero-mean noise fluctuations in off-line parts of the image cube will cancel out. If you image has large biases, like a pronounced clean bowl due to missing large-scale flux, then your moment-0 image will be biased also. It will be difficult to alleviate this with a threshold, but you can try.
To make a usable moment-1 (or higher) image, on the other hand, it is critical to set a reasonable threshold to exclude noise from being added to the moment maps. Something like a few times the rms noise level in the usable planes seems to work (put into includepix or excludepix as needed). Also use chans to ignore channels with bad data.
Examples using immoments
default('immoments')
imagename = 'ngc5921.demo.cleanimg'
#Do first and second spectral moments
axis = 'spectral'
chans = ''
moments = [0,1]
#Need to mask out noisy pixels, currently done
#using hard global limits
excludepix = [-100,0.009]
outfile = 'ngc5921.demo.moments'
immoments()
#It will have made the images:
#--------------------------------------
#ngc5921.demo.moments.integrated
#ngc5921.demo.moments.weighted_coord
Other examples of NGC2403 (a moment-0 image of a VLA line dataset) and NGC4826 (a moment-1 image of a BIMA CO line dataset) are shown in the Figure below
NGC2403 VLA moment zero (left) and NGC4826 BIMA moment one (right) images as shown in the viewer.
Generating Position-Velocity Diagrams (impv)
CASA can generate position-velocity (PV) diagrams via the task impv:
#impv :: Construct a position-velocity image by choosing two points in the direction plane.
imagename = '' #Name of the input image
outfile = '' #Output image name. If empty, no image is written.
mode = 'coords' #If 'coords', use start and end values. If 'length', use
#center, length, and pa values.
width = 1 #Width of slice for averaging pixels perpendicular to the
#slice. Must be an odd positive integer or valid
#quantity. See help for details.
unit = 'arcsec' #Unit for the offset axis in the resulting image. Must be
#a unit of angular measure.
chans = '' #Channels to use.
#Channels must be contiguous. Default is to use all
#channels.
region = '' #Region selection. Default is entire image. No selection
#is permitted in the direction plane.
stokes = 'I' #Stokes planes to use. Planes must be contiguous. Default
#is to use all stokes.
mask = [] #Mask to use. Default is none.
stretch = False #Stretch the mask if necessary and possible? Default False
PV diagrams are generated by “slicing” a datacube through the RA/DEC planes. The “slit” can be defined either by start/end coordinates or by a length, center coordinate, and position angle. Averaged over the width of the ‘slit’ the image cube values are then stored in a new image with position and velocity as the two axes. The slit position is specified by a start and end pixel in the RA/DEC plane of the data cube. An angular unit can be set to define what is stored in the resulting PV image.
1-dimensional Smoothing (specsmooth)
To gain higher signal-to-noise of data cubes, one can smooth the data along one dimension. Typically this is the spectral axis. Hanning and Boxcar smoothing kernels are available in the task specsmooth:
#specsmooth :: Smooth an image region in one dimension
imagename = '' #Name of the input image
outfile = '' #Output image name.
region = '' #Region selection. Default is to use the full
#image.
box = '' #Rectangular region to select in
#direction plane. Default is to use the entire
#direction plane.
mask = '' #Mask to use. Default is none..
axis = -1 #The profile axis. Default: use the
#spectral axis if one exists, axis 0
#otherwise (<0).
function = 'hanning' #Convolution function. hanning and boxcar
#are supported functions. Minimum match
#is supported.
dmethod = 'copy' #Decimation method. '' means no
#decimation, 'copy' and 'mean' are also
#supported (minimum match).
The parameter dmethod=’copy’ allows one to only keep every nth channel, if the smoothing kernel has a width of n. Leaving this parameter empty will return the same size cube as the input and setting it to ‘mean’ will average planes using the kernel width.
Spectral Line fitting
specfit is a powerful task to perform spectral line fits in data cubes. Three types of fitting functions are currently supported, polynomials, Gaussians, and Lorentzians. specfit can fit these functions in two ways: over data that were averaged across a region (multifit=False) or on a pixel by pixel basis (multifit=True).
#specfit :: Fit 1-dimensional Gaussians and/or polynomial models to an image or image region
imagename = '' #Name of the input image
box = '' #Rectangular box in direction coordinate
#blc, trc. Default: entire image ('').
region = '' #Region of interest. Default: Do
#not use a region.
chans = '' #Channels to use. Channels must be
#contiguous. Default: all channels ('').
stokes = '' #Stokes planes to use. Planes must be
#contiguous. Default: all stokes ('').
axis = -1 #The profile axis. Default: use the
#spectral axis if one exists, axis 0
#otherwise (<0).
mask = '' #Mask to use. Default is
#none..
poly = -1 #Order of polynomial element. Default: do
#not fit a polynomial (<0).
estimates = '' #Name of file containing initial estimates.
#Default: No initial estimates ('').
ngauss = 1 #Number of Gaussian elements. Default: 1.
pampest = '' #Initial estimate of PCF profile (gaussian
#or lorentzian) amplitudes.
pcenterest = '' #Initial estimate PCF profile centers, in
#pixels.
pfwhmest = '' #Initial estimate PCF profile FWHMs, in
#pixels.
pfix = '' #PCF profile parameters to fix during fit.
pfunc = '' #PCF singlet functions to fit. 'gaussian'
#or 'lorentzian' (minimal match
#supported). Unspecified means all
#gaussians.
minpts = 0 #Minimum number of unmasked points
#necessary to attempt fit.
multifit = True #If true, fit a profile along the desired
#axis at each pixel in the specified
#region. If false, average the non-fit
#axis pixels and do a single fit to that
#average profile. Default False.
amp = '' #Name of amplitude solution image. Default:
#do not write the image ('').
amperr = '' #Name of amplitude solution error image.
#Default: do not write the image ('').
center = '' #Name of center solution image. Default: do
#not write the image ('').
centererr = '' #Name of center solution error image.
#Default: do not write the image ('').
fwhm = '' #Name of fwhm solution image. Default: do
#not write the image ('').
fwhmerr = '' #Name of fwhm solution error image.
#Default: do not write the image ('').
integral = '' #Prefix of name of integral solution image.
#Name of image will have gaussian
#component number appended. Default: do
#not write the image ('').
integralerr = '' #Prefix of name of integral error solution
#image. Name of image will have gaussian
#component number appended. Default: do
#not write the image ('').
model = '' #Name of model image. Default: do not write
#the model image ('').
residual = '' #Name of residual image. Default: do not
#write the residual image ('').
wantreturn = True #Should a record summarizing the results be
#returned?
logresults = True #Output results to logger?
gmncomps = 0 #Number of components in each gaussian
#multiplet to fit
gmampcon = '' #The amplitude ratio constraints for non-
#reference components to reference
#component in gaussian multiplets.
gmcentercon = '' #The center offset constraints (in pixels)
#for non-reference components to reference
#component in gaussian multiplets.
gmfwhmcon = '' #The FWHM ratio constraints for non-
#reference components to reference
#component in gaussian multiplets.
gmampest = [0.0] #Initial estimate of individual gaussian
#amplitudes in gaussian multiplets.
gmcenterest = [0.0] #Initial estimate of individual gaussian
#centers in gaussian multiplets, in
#pixels.
gmfwhmest = [0.0] #Initial estimate of individual gaussian
#FWHMss in gaussian multiplets, in pixels.
gmfix = '' #Parameters of individual gaussians in
#gaussian multiplets to fix during fit.
logfile = '' #File in which to log results. Default is
#not to write a logfile.
goodamprange = [0.0] #Acceptable amplitude solution range. [0.0]
#=> all amplitude solutions are
#acceptable.
goodcenterrange = [0.0] #Acceptable center solution range in pixels
#relative to region start. [0.0] => all
#center solutions are acceptable.
goodfwhmrange = [0.0] #Acceptable FWHM solution range in pixels.
#[0.0] => all FWHM solutions are
#acceptable.
sigma = '' #Standard deviation array or image name.
Polynomial Fits
Polynomials can be fit by specifying the polynomial order in poly. Negative orders will not fit any polynomials.
Lorentzian and Gaussian Fits
Gaussian and Lorentzian fits are very similar, they both require amplitude, center, and FWHM to be fully specified. All of the following discussion is thus valid for both functions. The parameter pfunc controls whether Gaussian or Lorentzian functions are to be used. Default is all Gaussians. pfunc=[‘L’, ‘G’, ‘G’, ‘L’] would use Lorentzian, Gaussian, Gaussian, and Lorentzian components in the order they appear in the estimates file (see below).
One or more single Gaussian/Lorentzian
For Gaussian and Lorentzian fits, the task will allow multiple components and specfit will try to find the best solution. The parameter space, however, is usually not uniform and to avoid local minima in the goodness-of-fit space, one can provide initial start values for the fits. This can be done either through the parameters pampest, pcenterest, and pfwhmest for the amplitudes, center, and FWHM estimates in image coordinates. pfix can take parameters that specify fixed fit values. Any combination of the characters ‘p’ (peak), ‘c’ (center), and ‘f’ (fwhm) are permitted, e.g. ‘fc’ will hold the fwhm and the center constant during the fit. Fixed parameters will have no errors associated with them in the solution. Alternatively, a file with initial values can be supplied by the estimates parameter (one Gaussian/Lorentzian parameter set per line). The file has the following format:
[peak intensity], [center], [fwhm], [optional fixed parameter string]
The first three values are required and must be numerical values. The peak intensity must be expressed in map units, while the center and fwhm must be specified in pixels. The fourth value is optional and if present, represents the parameter(s) that should be held constant during the fit (see above). An example estimates file is:
# estimates file indicating that two Gaussians should be fit
# first guassian estimate, peak=40, center at pixel number 10.5,
# fwhm = 5.8 pixels, all parameters allowed to vary during
# fit
40, 10.5, 5.8
# second Gaussian, peak = 4, center at pixel number 90.2,
# fwhm = 7.2 pixels, hold fwhm constant
4, 90.2, 7.2, f
# end file
and the output of a typical execution, e.g.
specfit(imagename='IRC10216_HC3N.cube_r0.5.image', region='specfit.crtf',
multifit=F, estimates='', ngauss=2)
(‘specfit.crtf’ is a CASA regions file, see Section D) will be
Fit :
RA : 09:47:57.49
Dec : 13.16.46.46
Stokes : I
Pixel : [146.002, 164.499, 0.000, *]
Attempted : YES
Converged : YES
Iterations : 28
Results for component 0:
Type : GAUSSIAN
Peak : 5.76 +/- 0.45 mJy/beam
Center : -15.96 +/- 0.32 km/s
40.78 +/- 0.31 pixel
FWHM : 7.70 +/- 0.77 km/s
7.48 +/- 0.74 pixel
Integral : 47.2 +/- 6.0 mJy/beam.km/s
Results for component 1:
Type : GAUSSIAN
Peak : 4.37 +/- 0.33 mJy/beam
Center : -33.51 +/- 0.58 km/s
23.73 +/- 0.57 pixel
FWHM : 15.1 +/- 1.5 km/s
14.7 +/- 1.5 pixel
Integral : 70.2 +/- 8.8 mJy/beam.km/s
If wantreturn=True (the default value), the task returns a python dictionary (here captured in a variable with the inventive name of ‘fitresults’) :
fitresults=specfit(imagename='IRC10216_HC3N.cube_r0.5.image', region='specfit.rgn', multifit=F, estimates='', ngauss=2)
The values can then be used by other python code for further processing.
Gaussian Multiplets
It is possible to fit a number of Gaussians together, as multiplets with restrictions. All restrictions are relative to a reference Gaussian (the zero’th component of each multiplet). gncomps specifies the number of Gaussians for each multiplets, and, in fact, a number of these multiplets can be fit simultaneously. gncomps=[2,4,3], e.g. fits a 2-component Gaussian, a 4-component Gaussian, and a 3-component Gaussian all at once. The initial parameter estimates can be specified with the gmampest, gmcenterest, and gmfwhmest parameters and the estimates are simply listed in the sequence of gncomps. E.g. if gncomps=[2,4,3] is specified with multiplet G0 consisting of 2 Gaussians a, b, multiplet G1 of 4 Gaussians c, d, e, f, and multiplet G2 of three Gaussians g, h, i, the parameter list in gm*est would be like gm*est=[a,b,c,d,e,f,g,h,i].Restrictions can be specified via the gmampcon parameter for the amplitude ratio (non-reference to reference), gmcentercon for the offset in pixels (to a reference), and gmfwhmcon for the FWHM ratio (non-reference to reference). A value of 0 will not constrain anything. The reference is always the zero’th component in each multiplet, in our example, Gaussians a, c, and g. They cannot be constrained. So gmncomps=[2, 4, 3], gmampcon= [ 0 , 0.2, 0 , 0.1, 4.5, 0 ], gcentercon=[24.2, 45.6, 92.7, 0 , -22.8, -33.5], and gfwhmcon=’ ‘ would constrain Gaussians b relative to a with a 24.2 pixel offset, Gaussian d to c with a amplitude ratio of 0.2 and a 45.6 pixel offset, Gaussian e to c with a offset of 92.7 pixels, etc. Restrictions will overrule any estimates.The parameters goodamprange, goodcenterrange, and goodfwhmrange can be used to limit the range of amplitude, center, and fwhm solutions for all Gaussians.
Pixel-by-pixel fits
As mentioned above, specfit can also fit spectral cubes on a pixel by pixel basis. In this case, one can choose to write none, any or all of the solution and error images for Gaussian/Lorentzian fits via the parameters amp, amperr, center, centererr, fwhm, and fwhmerr. Each Gaussian component will produce a set of images _0, _1, etc. suffixes. Writing analogous images for polynomial coefficients is not yet supported although polynomial fits when multifit=True is supported. Best fit coefficients are written to the logger. Pixels for which fits were not attempted or did not converge will be masked as bad.
Spatial Spectral Line Properties
specflux calculates the flux as a function of frequency and velocity over a selected spatial region. Flux densities of Jy/beam are being converted to Jy by properly integrating over the selected region.The input parameters of specflux are:
#specflux :: Report details of an image spectrum.
imagename = '' #Name of the input image
box = '' #Rectangular region to select in
#direction plane. Default is to use the entire
#direction plane.
region = '' #Region selection. Default is to use the full
#image.
chans = '' #Channels to use. Default is to use all
#channels.
stokes = '' #Stokes planes to use. Default is to
#use all Stokes planes.
mask = '' #Mask to use. Default
#is none.
unit = 'km/s' #Unit to use for the abscissa. Must be
#conformant with a typical spectral axis
#unit.
major = '' #Major axis of overriding restoring beam.
#If specified, must be a valid quantity.
minor = '' #Minor axis of overriding restoring beam.
#If specified, must be a valid quantity
logfile = '' #File which to write details. Default is
#to not write to a file.
The results can be written into a logfile to be plotted in other packages.
Plot Spectra on a Map (plotprofilemap)
The profilemap task enables plotting spectra according to their pointing directions (a.k.a. a profile map) in plots. The input should be CASA image,or FITS format cube. Spectra within the cube are averaged into a bin number specified with the numpanels keyword. Absent or masked data are treated according to plotmasked keyword setting.
plotprofilemap(imagename='interesting_spectralcube_casaimge.im',
figfile = 'grid_map.png',
separatepanel=F,
spectralaxis = 'velocity',
title = 'myprofilemap',
transparent = F,
showaxislabel = True,
showtick = True,
showticklabel = True,
pol=0,
numpanels='8')
Calculation of Rotation Measures
rmfit is an image domain task that accepts an input cube image containing Stokes Q and U axes and will generate the rotation measure by performing a least square fit in the image domain to obtain the best fit to the equation \(\chi = \chi_0 + RM\times \lambda^2\).
The inputs to rmfit are:
#rmfit :: Calculate rotation measure.
imagename = '' #Name(s) of the input image(s). Must be specified.
rm = '' #Output rotation measure image name. If not specified, no
#image is written.
rmerr = '' #Output rotation measure error image name. If not
#specified, no image is written.
pa0 = '' #Output position angle (degrees) at zero wavelength image
#name. If not specified, no image is written.
pa0err = '' #Output position angle (degrees) at zero wavelength error
#image name. If not specified, no image is written.
nturns = '' #Output number of turns image name. If not specified, no
#image is written.
chisq = '' #Output reduced chi squared image name. If not specified,
#no image is written.
sigma = '' #Estimate of the thermal noise. A value less than 0 means
#auto estimate.
rmfg = 0.0 #Foreground rotation measure in rad/m/m to subtract.
rmmax = 0.0 #Maximum rotation measure in rad/m/m for which to solve.
#IMPORTANT TO SPECIFY.
maxpaerr = 1e+30 #Maximum input position angle error in degrees to allow in
#solution determination.
This task generates the rotation measure image from stokes Q and U measurements at several different frequencies. You are required to specify the name of at least one image with a polarization axis containing stokes Q and U planes and with a frequency axis containing more than two pixels. The frequencies do not have to be equally spaced (i.e. the frequency coordinate can be a tabular coordinate). It will work out the position angle images for you. You may also specify multiple image names, in which case these images will first be concatenated along the spectral axis using ia.imageconcat. The requirements are that for all images, the axis order must be the same and the number of pixels along each axis must be identical, except for the spectral axis which may differ in length between images. The spectral axis need not be contiguous from one image to another. See also the imagepol.fourierrotationmeasure function for a new Fourier-based approach.Rotation measure algorithms that work robustly are few. The main problem is in trying to account for the \(n- \pi\) ambiguity (see Leahy et al.1986 - Appendix A.1) [1] and the MIRIAD manual.
But as in all these algorithms, the basic process is that for each spatial pixel, the position angle vs frequency data is fit to determine the rotation measure and the position angle at zero wavelength (and associated errors). An image containing the number of \(n- \pi\) turns that were added to the data at each spatial pixel and for which the best fit was found can be written. The reduced \(\chi^2\) image for the fits can also be written. Any combination of output images can be written.
NOTE: No assessment of curvature (i.e. deviation from the simple linear position angle - \(lambda^2\) functional form) is made.
The parameter sigma gives the thermal noise in Stokes Q and U. By default it is determined automatically using the image data. But if it proves to be inaccurate (maybe not many signal-free pixels), it may be specified. This is used for calculating the error in the position angles (via propagation of Gaussian errors).The argument maxpaerr specifies the maximum allowable error in the position angle that is acceptable. The default is an infinite value. From the standard propagation of errors, the error in the linearly polarized position angle is determined from the Stokes Q and U images (at each directional pixel for each frequency). If the position angle error for any pixel exceeds the specified value, the position angle at that pixel is omitted from the fit. The process generates an error for the fit and this is used to compute the errors in the output images.
NOTE: maxpaerr is not used to mask pixels in the output images.
The argument rmfg is used to specify a foreground RM value. For example, you may know the mean RM in some direction out of the Galaxy, then including this can improve the algorithm by reducing ambiguity. The parameter rmmax specifies the maximum absolute RM value that should be solved for. This quite an important parameter. If you leave it at the default, zero, no ambiguity handling will be used. So some a priori information should be supplied; this is the basic problem with rotation measure algorithms.
Calculation of Spectral Indices and Higher Order Polynomials
This application fits a power logarithmic polynomial or a logarithmic transformed polynomial to pixel values along a specified axis of an image or images. These functions are most often used for fitting the spectral index and higher order terms of a spectrum. A power logarithmic polynomial has the form
\(y = \frac{c_0 x}{D^{(c_1 + c_2 \ln(x/D) + c_3 \ln(x/D)^2 + c_n \ln(x/D)^{(n - 1)})}}\)
and a logarithmic transformed polynomial is simply the result of this equation after taking the natural log of both sides so that it has the form
\(\ln(y) = c_0 + c_1 \ln(x) + c_2 \ln(x/D)^2 + ... + c_n \ln(x/D)^n\)
Because the logarithm of the ordinate values must be taken before fitting a logarithmic transformed polynomial, all non-positive pixel values are effectively masked for the purposes of fitting. The coefficients of the two forms are equal to each other except that c0 in the second equation is equal to \(\ln(c_0)\) of the first. In the case of fitting a spectral index, which is traditionally represented as \(\alpha\), is equal to \(c_1\). In both cases, \(D\) is a normalization constant used so that abscissa values are closer to unity when they are sent to the fitter. This generally improves the probability that the fit will converge. This parameter may be specified via the div parameter. A value of 0 (the default) indicates that the application should determine a reasonable value for \(D\), which is determined via\(D = 10^{\int(\log10(\sqrt(\min(x)*\max(x)))}\)where min(x) and max(x) are the minimum and maximum abscissa values, respectively.The inputs are:
#spxfit :: Fit a 1-dimensional model to an image or image region
for determination of spectral index.
imagename = #Name of the input image(s)
box = '' #Rectangular box in
#direction coordinate blc, trc.
#Default: entire image ('').
region = '' #Region of interest. Default:
#Do not use a region.
chans = '' #Channels to use. Channels
#must be contiguous. Default: all channels ('').
stokes = '' #Stokes planes to
#use. Planes must be contiguous. Default:
#all stokes ('').
axis = -1 #The profile axis. Default:
#use the spectral axis if one
#exists, axis 0 otherwise (<0).
mask = '' #Mask to use. Default is none.
minpts = 1 #Minimum number of unmasked
#points necessary to attempt
#fit.
multifit = True #If true, fit a profile
#along the desired axis at each
#pixel in the specified
#region. If false, average the
#non-fit axis pixels and do
#a single fit to that average
#profile. Default False.
spxsol = '' #Name of the spectral index
#function coefficient solution
#image to write.
spxerr = '' #Name of the spectral index
#function coefficient error
#image to write.
model = '' #Name of model
#image. Default: do not write the model
#image ('').
residual = '' #Name of residual
#image. Default: do not write the
#residual image ('').
spxtype = 'plp' #Type of function to
#fit. 'plp' => power logarithmic
#polynomial, 'ltp' =>
#logarithmic transformed polynomial.
spxest = [] #Initial estimates for the
#spectral index function
#coefficients.
spxfix = [] #Fix the corresponding spectral index function
#coefficients during the fit. True=>hold fixed.
div = 0 #Divisor (numerical value or
#quantity) to use in the
#logarithmic terms of the
#plp or ltp function. 0 =>
#calculate a useful value on the fly.
wantreturn = True #Should a record summarizing
#the results be returned?
logresults = True #Output results to logger?
logfile = '' #File in which to log
#results. Default is not to write a
#logfile.
sigma = -1 #Standard deviation array or image name(s).
outsigma = '' #Name of output image used
#for standard deviation. Ignored
#if sigma is empty.
For more than a single input image or cube, all images must have the same dimensions along all axes other than the fit axis. multifit will perform a per-pixel fit, otherwise there will be a single value over the entire region.
Search for Spectral Line Rest Frequencies
The slsearch task allows the spectral line enthusiast to find their favorite spectral lines in subset of the Splatalogue spectral line catalog which is distributed with CASA. In addition, one can export custom catalogs from Splatalogue and import them to CASA using the task splattotable (next section) or tool method sl.splattotable. One can even import catalogs with lines not in Splatalogue using the same file format.The inputs to slsearch are as follows:
#slsearch :: Search a spectral line table.
tablename = '' #Input spectral line table name to
#search. If not specified, use the
#default table in the system.
outfile = '' #Results table name. Blank means do not
#write the table to disk.
freqrange = [84, 90] #Frequency range in GHz.
species = [''] #Species to search for.
reconly = False #List only NRAO recommended
#frequencies.
chemnames = [''] #Chemical names to search for.
qns = [''] #Resolved quantum numbers to search
#for.
rrlinclude = True #Include RRLs in the result set?
rrlonly = False #Include only RRLs in the result set?
intensity = -1 #CDMS/JPL intensity range. -1 -> do not
#use an intensity range.
smu2 = -1 #S*mu*mu range in Debye**2. -1 -> do
#not use an S*mu*mu range.
loga = -1 #log(A) (Einstein coefficient) range.
#-1 -> do not use a loga range.
eu = -1 #Upper energy state range in Kelvin. -1
#-> do not use an eu range.
el = -1 #Lower energy state range in Kelvin. -1
#-> do not use an el range.
verbose = True #List result set to logger (and
#optionally logfile)?
logfile = '' #List result set to this logfile (only
#used if verbose=True).
append = True #If true, append to logfile if it
#already exists, if false overwrite
#logfile if it exists. Only used if
#verbose=True and logfile not blank.
wantreturn = True #If true, return the spectralline tool
#associated with the result set.
The table is provided in the tablename parameter but if it is blank (the default), the catalog which is included with CASA will be used. Searches can be made in a parameter space with large dimensionality:
freqrange Frequency range in GHz.
species Species to search for.
reconly List only NRAO recommended frequencies.
chemnames Chemical names to search for.
qns Resolved quantum numbers to search for.
intensit CDMS/JPL intensity range.
smu2 $S\mu^{2}$ range in Debye$^2$.
loga log(A) (Einstein coefficient) range.
el Lower energy state range in Kelvin.
eu Upper energy state range in Kelvin.
rrlinclude Include RRLs in the result set?
rrlonly Include only RRLs in the result set?
Notation is as found in the Splatalogue catalog.
Example: Search for all lines of the species HOCN and HOCO\(^+\) in the 200-300GHz range:
slsearch(outfile='myresults.tbl', freqrange = [200,300],
species=['HOCN', 'HOCO+'])
The task can also return a python dictionary if assigned a variable like:
myLines = slsearch(outfile='myresults.tbl', freqrange = [200,300],
species=['HOCN', 'HOCO+'])
Convert Exported Splatalogue Catalogs to CASA Tables
In some cases, the internal spectral line catalog may not contain the lines in which one is interested. In that case, one can export a catalog from Splatalogue or even create their own ‘by hand’ (be careful to get the format exactly right though!). CASA’s task splattotable can then be used to create a CASA table that contains these lines and can be searched:
#splattotable :: Convert a downloaded Splatalogue spectral line list to a casa table.
filenames = [''] #Files containing Splatalogue lists.
table = '' #Output table name.
wantreturn = True #Do you want the task to return a spectralline tool attached to the results table?
A search in Splatalogue will return a catalog that can be saved in a file (look for the ‘Export’ section after the results on the search results page). The exported filename(s) should be entered in the filenames parameter of splattotable. The downloaded files must be in a specific format for this task to succeed. If you use the Splatalogue Export CASA fields feature, you should have no difficulties.
Bibliography¶
Leahy, J.P., Pooley, G.G., & Jagers, W.J. 1986, A&A, 156, 234 (http://adsabs.harvard.edu/abs/1986A%26A…156..234L)
Image Plane Analysis¶
Image-plane Component Fitting
Task imfit inputs are:
#imfit :: Fit one or more elliptical Gaussian components on an image region(s)
imagename = '' #Name of the input image
box = '' #Specify one or more box regions for the fit.
region = '' #Region.
chans = '' #Spectral channels on which to perform fit.
stokes = '' #Stokes parameter to fit. If blank, first stokes plane is
#used.
mask = '' #Mask to use. Default is none.
includepix = [] #Range of pixel values to include for fitting.
excludepix = [] #Range of pixel values to exclude for fitting.
residual = '' #Name of output residual image.
model = '' #Name of output model image.
estimates = '' #Name of file containing initial estimates of component
#parameters.
logfile = '' #Name of file to write fit results.
newestimates = '' #File to write fit results which can be used as initial
#estimates for next run.
complist = '' #Name of output component list table.
dooff = False #Also fit a zero level offset? Default is False
rms = -1 #RMS to use in calculation of uncertainties. Numeric or
#valid quantity (record or string). If numeric, it is
#given units of the input image. If quantity, units must
#conform to image units. If not positive, the rms of the
#residual image, in the region of the fit, is used.
noisefwhm = '' #Noise correlation beam FWHM. If numeric value,
#interpreted as pixel widths. If quantity (dictionary,
#string), it must have angular units.
imfit will return (as a Python dictionary) the results of the fit, but the results can also be written into a component list table or a logfile.
NOTE: To fit more than a single component, you must provide starting estimates for each component via the estimates file. See ‘’help imfit’’ for more details on this. A noise estimate will be calculated automatically or can be provided through the rms and noisefwhm keywords.
Examples for imfit
#First fit only a single component at a time
#This is OK since the components are well-separated and not blended
#Box around component A
xfit_A_res = imfit('b1608.demo.clean2.image',box='121,121,136,136',
newestimates='b1608.demo.clean2.newestimate')
#Now extract the fit part of the return value
xfit_A = xfit_A_res['results']['component0']
#xfit_A
#Out[7]:
#{'flux': {'error': array([ 6.73398035e-05, 0.00000000e+00, 0.00000000e+00,
#0.00000000e+00]),
#'polarisation': 'Stokes',
#'unit': 'Jy',
#'value': array([ 0.01753742, 0. , 0. , 0. ])},
#'label': '',
#'shape': {'direction': {'error': {'latitude': {'unit': 'arcsec',
#'value': 0.00041154866279462775},
#'longitude': {'unit': 'arcsec',
#'value': 0.00046695916589535109}},
#'m0': {'unit': 'rad', 'value': -2.0541102061078207}, NOTE: 'm0' and 'm1' are the coordinates of peak/controid
#'m1': {'unit': 'rad', 'value': 1.1439131060384089}, NOTE: 'm0' and 'm1' are the coordinates of peak/controid
#'refer': 'J2000',
#'type': 'direction'},
#'majoraxis': {'unit': 'arcsec', 'value': 0.29100166137741568},
#'majoraxiserror': {'unit': 'arcsec',
#'value': 0.0011186420613222663},
#'minoraxis': {'unit': 'arcsec', 'value': 0.24738110059830495},
#'minoraxiserror': {'unit': 'arcsec',
#'value': 0.0013431999725066338},
#'positionangle': {'unit': 'deg', 'value': 19.369249322401796},
#'positionangleerror': {'unit': 'rad',
#'value': 0.016663189295782171},
#'type': 'Gaussian'},
#'spectrum': {'frequency': {'m0': {'unit': 'GHz', 'value': 1.0},
#'refer': 'LSRK',
#'type': 'frequency'},
#'type': 'Constant'}}
#Now the other components
xfit_B_res = imfit('b1608.demo.clean2.image',box='108,114,120,126',
newestimates='b1608.demo.clean2.newestimate',append=True)
xfit_B = xfit_B_res['results']['component0']
xfit_C_res= imfit('b1608.demo.clean2.image',box='108,84,120,96')
xfit_C = xfit_C_res['results']['component0']
xfit_D_res = imfit('b1608.demo.clean2.image',box='144,98,157,110')
xfit_D = xfit_D_res['results']['component0']
print ""
print "Imfit Results:"
print "--------------"
print "A Flux = %6.4f Bmaj = %6.4f" % (xfit_A['flux']['value'][0],xfit_A['shape']['majoraxis']['value'])
print "B Flux = %6.4f Bmaj = %6.4f" % (xfit_B['flux']['value'][0],xfit_B['shape']['majoraxis']['value'])
print "C Flux = %6.4f Bmaj = %6.4f" % (xfit_C['flux']['value'][0],xfit_C['shape']['majoraxis']['value'])
print "D Flux = %6.4f Bmaj = %6.4f" % (xfit_D['flux']['value'][0],xfit_D['shape']['majoraxis']['value'])
print ""
Now try fitting four components together. For this we will have to provide an estimate file. We will use the clean beam for the estimate of the component sizes:
estfile=open('b1608.demo.clean2.estimate','w')
print >>estfile,'#peak, x, y, bmaj, bmin, bpa'
print >>estfile,'0.017, 128, 129, 0.293arcsec, 0.238arcsec, 21.7deg'
print >>estfile,'0.008, 113, 120, 0.293arcsec, 0.238arcsec, 21.7deg'
print >>estfile,'0.008, 113, 90, 0.293arcsec, 0.238arcsec, 21.7deg'
print >>estfile,'0.002, 151, 104, 0.293arcsec, 0.238arcsec, 21.7deg'
estfile.close()
Then, this can be used in imfit:
fit_all_res = imfit('b1608.demo.clean2.image',
estimates='b1608.demo.clean2.estimate',
logfile='b1608.demo.clean2.imfitall.log',
newestimates='b1608.demo.clean2.newestimate',
box='121,121,136,136,108,114,120,126,108,84,120,96,144,98,157,110')
#Now extract the fit part of the return values
xfit_allA = xfit_all_res['results']['component0']
xfit_allB = xfit_all_res['results']['component1']
xfit_allC = xfit_all_res['results']['component2']
xfit_allD = xfit_all_res['results']['component3']
These results are almost identical to those from the individual fits. You can see a nicer printout of the fit results in the logfile.
2-dimensional Smoothing; Image Convolution
A data cube can be smoothed across spatial dimensions with imsmooth. The inputs are:
#imsmooth :: Smooth an image or portion of an image
imagename = '' #Name of the input image. Must be
#specified.
kernel = 'gauss' #Type of kernel to use. Acceptable values
#are 'b', 'box', or 'boxcar' for a
#boxcar kernel, 'g', 'gauss', or
#'gaussian' for a gaussian kernel, 'c',
#'common', or 'commonbeam' to use the
#common beam of an image with multiple
#beams as the gaussian to which to
#convolve all the planes, 'i' or 'image'
#to use an image as the kernel.
beam = '' #Alternate way of describing a Gaussian.
#If specified, must be a dictionary with
#keys 'major', 'minor', and 'pa' (or
#'positionangle'). Do not specify beam
#if specifying major, minor, and pa.
#Example: Example: {'major': '5arcsec',
#'minor': '2arcsec', 'pa': '20deg'}.
targetres = False #If gaussian kernel, specified parameters
#are to be resolution of output image
#(True) or parameters of gaussian to
#convolve with input image (False).
major = '' #Major axis for the kernels. Standard
#quantity representation. Must be
#specified for kernel='boxcar'. Example:
#'4arcsec'.
minor = '' #Minor axis. Standard quantity
#representation. Must be specified for
#kernel='boxcar'. Example: '2arcsec'.
pa = '' #Position angle used only for gaussian
#kernel. Standard quantity
#representation. Example: '40deg'.
region = '' #Region selection. See Default is to use the full
#image.
box = '' #Rectangular region to select in
#direction plane. Default is to use the entire
#direction plane.
chans = '' #Channels to use. Default is to use all
#channels.
stokes = '' #Stokes planes to use. Default is to
#use all Stokes planes.
mask = '' #Mask to use. Default
#is none.
outfile = '' #Output image name. Must be specified.
overwrite = False #Overwrite (unprompted) pre-existing
#output file?
where the cube/image imagename will be convolved with a kernel defined in the kernel keyword. Kernels ‘gauss’ and ‘boxcar’ need the major and minor axes sizes as input, the Gaussian kernel smoothing also requires a position angle. By default, the kernel size defines the kernel itself, i.e. the data will be smoothed with this kernel. If the targetres parameter for Gaussian kernels is set to ‘True’, major and minor axes will be those from the output resolution, and the kernel will be adjusted for each plane to arrive at the final resolution. The ‘commonbeam’ kernel is to be used when the beam shape is different as a function of frequency. This option will then smooth all planes to a single beam, defined by the largest beam in the cube. With the ‘image’ kernel, one can specify an image that will serve as the convolution kernel. A scale factor can be applied, which defaults to flux conservation where units are Jy/beam or Jy/beam.km/s. For all other units, like K, the output will be scaled by the inverse of the convolution kernel. e.g., in the extreme case of a flat distribution the values before and after smoothing will be the same.
Examples:
Smoothing with a Gaussian kernel 20” by 10”
imsmooth( imagename='my.image', kernel='gauss', major='20arcsec', minor='10arcsec',targetres=T)
Smoothing using pixel coordinates and a boxcar kernel.
imsmooth( imagename='new.image', major='20pix', minor='10pix', kernel='boxcar')
Image Import/Export¶
These tasks will allow you to write your CASA image to a FITS file that other packages can read, and to import existing FITS files into CASA as an image. They can also extract data to Python data structures for further analysis.
FITS Image Export
To export your images to fits format use the exportfits task. The inputs are:
#exportfits :: Convert a CASA image to a FITS file
imagename = '' #Name of input CASA image
fitsimage = '' #Name of output image FITS file
velocity = False #Use velocity (rather than frequency) as spectral axis
optical = False #Use the optical (rather than radio) velocity convention
bitpix = -32 #Bits per pixel
minpix = 0 #Minimum pixel value
maxpix = 0 #Maximum pixel value
overwrite = False #Overwrite pre-existing imagename
dropstokes = False #Drop the Stokes axis?
stokeslast = True #Put Stokes axis last in header?
The dropstokes or stokeslast parameter may be needed to make the FITS image compatible with an external application.For example,
exportfits('ngc5921.demo.cleanimg.image','ngc5921.demo.cleanimg.image.fits')
FITS Image Import
You can also use the importfits task to import a FITS image into CASA image table format. Note, the CASA Viewer can read fits images so you don’t need to do this if you just want to look at the image. The inputs for importfits are:
#importfits :: Convert an image FITS file into a CASA image
fitsimage = '' #Name of input image FITS file
imagename = '' #Name of output CASA image
whichrep = 0 #If fits image has multiple
#coordinate reps, choose one.
whichhdu = 0 #If its file contains
#multiple images, choose one.
zeroblanks = True #Set blanked pixels to zero (not NaN)
overwrite = False #Overwrite pre-existing imagename
defaultaxes = False #Add the default 4D
#coordinate axes where they are missing
defaultaxesvalues = [] #List of values to assign to
#added degenerate axes when
#defaultaxes=True (ra,dec,freq,stokes)
For example, we can read the above image back in
importfits('ngc5921.demo.cleanimg.image.fits','ngc5921.demo.cleanimage')
Extracting data from an image
The imval task will extract the values of the data and mask from a specified region of an image and place in the task return value as a Python dictionary. The inputs are:
#imval :: Get the data value(s) and/or mask value in an image.
imagename = '' #Name of the input image
region = '' #Image Region. Use viewer
box = '' #Select one or more box regions
chans = '' #Select the channel(spectral) range
stokes = '' #Stokes params to image (I,IV,IQU,IQUV)
By default, box=’ ‘ will extract the image information at the reference pixel on the direction axes. Plane selection is controlled by chans and stokes. By default, chans=’ ‘ and stokes=’ ‘ will extract the image information in all channels and Stokes planes. For instance,
xval = imval('myimage', box='144,144', stokes='I' )
will extract the Stokes I value or spectrum at pixel 144,144, while
xval = imval('myimage', box='134,134.154,154', stokes='I' )
will extract a 21 by 21 pixel region. Extractions are returned in NumPy arrays in the return value dictionary, plus some extra elements describing the axes and selection:
CASA <2>: xval = imval('ngc5921.demo.moments.integrated')
CASA <3>: xval
Out[3]:
{'axes': [[0, 'Right Ascension'],
[1, 'Declination'],
[3, 'Frequency'],
[2, 'Stokes']],
'blc': [128, 128, 0, 0],
'data': array([ 0.89667124]),
'mask': array([ True], dtype=bool),
'trc': [128, 128, 0, 0],
'unit': 'Jy/beam.km/s'}
extracts the reference pixel value in this 1-plane image. Note that the ‘data’ and ‘mask’ elements are NumPy arrays, not Python lists. To extract a spectrum from a cube:
CASA <8>: xval = imval('ngc5921.demo.clean.image',box='125,125')
CASA <9>: xval
Out[9]:
{'axes': [[0, 'Right Ascension'],
[1, 'Declination'],
[3, 'Frequency'],
[2, 'Stokes']],
'blc': [125, 125, 0, 0],
'data': array([ 8.45717848e-04, 1.93370355e-03, 1.53750915e-03,
2.88399984e-03, 2.38683447e-03, 2.89159478e-04,
3.16268904e-03, 9.93389636e-03, 1.88773088e-02,
3.01138610e-02, 3.14478502e-02, 4.03211266e-02,
3.82498614e-02, 3.06552909e-02, 2.80734301e-02,
1.72479432e-02, 1.20884273e-02, 6.13593217e-03,
9.04005766e-03, 1.71429547e-03, 5.22095338e-03,
2.49114982e-03, 5.30831399e-04, 4.80734324e-03,
1.19265869e-05, 1.29435991e-03, 3.75700940e-04,
2.34788167e-03, 2.72604497e-03, 1.78467855e-03,
9.74952069e-04, 2.24676146e-03, 1.82263291e-04,
1.98463408e-06, 2.02975096e-03, 9.65532148e-04,
1.68218743e-03, 2.92119570e-03, 1.29359076e-03,
-5.11484570e-04, 1.54162932e-03, 4.68662125e-04,
-8.50282842e-04, -7.91683051e-05, 2.95954203e-04,
-1.30133145e-03]),
'mask': array([ True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True], dtype=bool),
'trc': [125, 125, 0, 45],
'unit': 'Jy/beam'}
To extract a region from the plane of a cube:
CASA <13>: xval = imval('ngc5921.demo.clean.image',box='126,128,130,129',chans='23')
CASA <14>: xval
Out[14]:
{'axes': [[0, 'Right Ascension'],
[1, 'Declination'],
[3, 'Frequency'],
[2, 'Stokes']],
'blc': [126, 128, 0, 23],
'data': array([[ 0.00938627, 0.01487772],
[ 0.00955847, 0.01688832],
[ 0.00696965, 0.01501907],
[ 0.00460964, 0.01220793],
[ 0.00358087, 0.00990202]]),
'mask': array([[ True, True],
[ True, True],
[ True, True],
[ True, True],
[ True, True]], dtype=bool),
'trc': [130, 129, 0, 23],
'unit': 'Jy/beam'}
CASA <15>: print xval['data'][0][1]
0.0148777160794
In this example, a rectangular box was extracted, and you can see the order in the array and how to address specific elements.
Math Operations / Statistics¶
Mathematical operations on images are typically completed using the CASA task immath, and image statistics may be derived using the CASA tasks imstat and imdev. Here, we give an overview of how these tasks are used.
Mathematical Operations on Images
The CASA task immath is useful for performing mathematical operations on images and on specific channels within images, including e.g. addition or subtraction of two cubes, squaring an image, computing a spectral index, and determining polarization angles and intensities. The inputs are:
#immath :: Perform math operations on images
imagename = '' #a list of input images
mode = 'evalexpr' #mode for math operation (evalexpr, spix, pola, poli)
expr = '' #Mathematical expression using images
varnames = '' #a list of variable names to use with the image files
outfile = 'immath_results.im' #File where the output is saved
mask = '' #Mask to use. Default is none.
region = '' #Region selection.
#Default is to use the full image.
box = '' #Rectangular region to
#select in direction plane.
#Default is to use the
#entire direction plane.
chans = '' #Channels to use.
#Default is to use all channels.
stokes = '' #Stokes planes to use.
#Default is to use all Stokes planes.
imagemd = '' #An image name from which metadata should be copied. The input
#can be either an image listed under imagename or any other
#image on disk. Leaving this parameter unset may copy header
#metadata from any of the input images, which
#one is not guaranteed.
Alert: immath does not convert any brightness units, e.g. from Jy/beam to K or vice versa. The user is responsible for making sure the images are consistent with the values in the header and the image. It is not advisable to mix input images that are in different units or have different beam sizes.
The imagename parameter must be given the name of a single image as a string (e.g. imagename=’image1.im’) or the names of multiple images in a list of strings (e.g. imagename=[‘image1.im’, ‘image2.im’] ). The immath task outputs an image file, and the name of the output file is specified using the outfile parameter.
The mode parameter selects what immath is to do. The default, mode=’evalexpr’, allows the user to specify a mathematical operation to execute on the input images through the expr sub-parameter. The mathematical expression is specified in expr as a Lattice Expression Language (LEL) string (see the page on LEL strings). The standard usage for mode=’evalexpr’ is to input a list of images into the imagename parameter, and then refer to them in the expr subparameter in LEL by the names IM0, IM1, …. For example,
immath(imagename=['image1.im','image2.im'],expr='IM0-IM1',outfile='ImageDiff.im')
would subtract the second image given from the first.
For the special modes ‘spix’, ‘pola’, and ‘poli’, the images required for the operation may need to be listed in imagename in a particular order. See examples of usage for polarization data below, paying particular attention to posted alerts.
The mathematical expression can be computed on the entire image cube, or on selected regions and image planes, which can be specified through the mask, region, box, chans, and stokes parameters. Mask specification is done using the mask parameter which can optionally contain an on-the-fly mask expression (in LEL) or point to an image with a pixel mask. In some cases, one may like to use a flat image (e.g. a moment image) mask applied to an entire cube. The stretch=True subparameter in mask allows one to expand the mask to all planes (i.e. channels or Stokes planes) of the cube. Region selection can also be carried out through the region parameter (see the pages on Region Files and Region File Format) and box parameter, while image plane selection is controlled by chans and stokes parameters.
The image metadata in the output file is adopted from another image, which can be specified through the imagemd parameter. In imagemd, input the name of the image from which the metadata should be copied and used for the output image. If left blank, the task may pick any of the input image headers, so it is better to define this parameter. In fact, the image specified in imagemd can be any image, even an image that is not part of the calculations in immath.
Detailed examples of immath usage are given below.
Examples for immath
In the following, we show a examples of immath usage. Note that the image names in the expr are assumed to refer to existing image files in the current working directory.
Simple math - Select a single plane (channel 22) of the 3-D cube:
immath(imagename='ngc5921.demo.cleanimg.image',
expr='IM0',chans='22',
outfile='ngc5921.demo.chan22.image')
Double all values in our image:
immath(imagename=['ngc5921.demo.chan22.image'],
expr='IM0*2.0',
outfile='ngc5921.demo.chan22double.image' )
Square all values in our image:
immath(imagename=['ngc5921.demo.chan22.image'],
expr='IM0^2',
outfile='ngc5921.demo.chan22squared.image' )
NOTE: The units in the output image are still claimed to be “Jy/beam”, i.e. immath will not correctly scale the units in the image for non-linear cases like this. Beware!
Subtract our image containing channel 22 from the original 3-D cube. Note that in this example, the 2-D plane (channel 22) is extended into the third dimension, so that che channel 22 image is subtracted from each plane in the 3-D cube:
immath(imagename=['ngc5921.demo.cleanimg.image','ngc5921.demo.chan22.image'],
expr='IM0-IM1',
outfile='ngc5921.demo.sub22.image')
Divide an image by another, with a threshold on one of the images:
immath(imagename=['ngc5921.demo.cleanimg.image','ngc5921.demo.chan22.image'],
expr='IM0/IM1[IM1>0.008]',
outfile='ngc5921.demo.div22.image')
You can do other mathematical operations on an image (e.g. trigonometric functions), as well as use scalar results from an image (e.g. max, min, median, mean, variance) in immath. You also have access to constants such as e() and pi(). As an example, the following expression uses the sine function, square root (sqrt), a scalar function (max), and the constant pi :
immath(imagename=['ngc5921.demo.chan22.image','ngc5921.demo.chan22squared.image'],
expr='sin(float(pi())*IM0/sqrt(max(IM1)))',
outfile='ngc5921.demo.chan22sine.image')
NOTE: Once again, the units in the output image are still claimed to be “Jy/beam”, i.e. immath will not correctly scale the units in the image for non-linear cases like this. Beware!
Region and Channel Selection - Select and save a region including the inner 1/4 of an image for channels 0 through 9 (chans=’<10’) and channels 40, 42, and 44:
default('immath')
imagename=['ngc5921.demo.cleanimg.image']
expr='IM0'
region='box[[64pix,64pix],[192pix,192pix]]'
chans='<10;40,42,44'
outfile='ngc5921.demo.inner.image'
immath()
If more than one channel is specified in the chans parameter, then the output image will contain multiple channels spanning the range from the lowest channel specified to the highest. In the example above, the output image will span channels 0 through 44, for a total of 45 channels. The channels that were not selected (in this case, channels 10 through 39 and channels 41 and 43) will be masked in the output cube. If we had set chans=’40,42,44’ then there would be 5 output channels corresponding to channels 40, 41, 42, 43, and 44 of the MS with 41 and 43 masked.
Note that the chans syntax allows the operators ‘<’, ‘<=’, ‘>’, and ‘>’. For example, the following two inputs select the same channels.
chans = '<17,>79'
chans = '<=16,>=80'
Polarization manipulation - Extract each Stokes plane from a cube into an individual image:
default('immath')
imagename = '3C129BC.clean.image'
outfile='3C129BC.I'; expr='IM0'; stokes='I'; immath();
outfile='3C129BC.Q'; expr='IM0'; stokes='Q'; immath();
outfile='3C129BC.U'; expr='IM0'; stokes='U'; immath();
outfile='3C129BC.V'; expr='IM0'; stokes='V'; immath();
Extract linearly polarized intensity and polarization position angle images:
immath(stokes='', outfile='3C129BC.P', mode='poli',
imagename=['3C129BC.Q','3C129BC.U'], sigma='0.0mJy/beam');
immath(stokes='', outfile='3C129BC.X', mode='pola',
imagename=['3C129BC.Q','3C129BC.U'], sigma='0.0mJy/beam');
ALERT: For mode=’pola’ you MUST call as a function as in this example (giving the parameters as arguments) or immath will fail.
Create a fractional linear polarization image:
default( 'immath')
imagename = ['3C129BC.I','3C129BC.Q','3C129BC.U']
outfile='3C129BC.fractional_linpol'
expr='sqrt((IM1^2 + IM2^2)/IM0^2)'
stokes=''
immath()
Create a polarized intensity image:
default( 'immath')
imagename = ['3C129BC.Q','3C129BC.U','3C129BC.V']
outfile='3C129BC.pol_intensity'
expr='sqrt(IM0^2 + IM1^2 + IM2^2)'
stokes=''
immath()
Toolkit Tricks: The following uses the toolkit. You can make a complex linear polarization (Q+iU) image using the imagepol tool:
#Make an imagepol tool and open the clean image
potool = casac.homefinder.find_home_by_name('imagepolHome')
po = potool.create()
po.open('3C129BC.clean.image')
#Use complexlinpol to make a Q+iU image
po.complexlinpol('3C129BC.cmplxlinpol')
po.close()
You can now display this in the Viewer, in particular overlay this over the intensity raster with the intensity contours. When you load the image, use the LEL:
'3C129BC.cmplxlinpol'['3C129BC.P'>0.0001]
which is entered into the LEL box at the bottom of the Load Data menu.
Using Masks in immath
The mask parameter is used inside immath to apply a mask to all the images used in expr before calculations are done (if you are curious, it uses the ia.subimage tool method to make virtual images that are then input in the LEL to the ia.imagecalc method).For example, let’s assume that we have made a single channel image using clean:
default('clean')
vis = 'ngc5921.demo.src.split.ms.contsub'
imagename = 'ngc5921.demo.chan22.cleanimg'
mode = 'channel'
nchan = 1
start = 22
step = 1
field = ''
spw = ''
imsize = [256,256]
cell = [15.,15.]
psfalg = 'clark'
gain = 0.1
niter = 6000
threshold='8.0mJy'
weighting = 'briggs'
rmode = 'norm'
robust = 0.5
mask = [108,108,148,148]
clean()
There is now a file ngc5921.demo.chan22.cleanimg.mask that is an image with values 1.0 inside the cleanbox region and 0.0 outside. We can use this to mask the clean image:
default('immath')
imagename = 'ngc5921.demo.chan22.cleanimg.image'
expr='IM0'
mask='"ngc5921.demo.chan22.cleanimg.mask">0.5'
outfile='ngc5921.demo.chan22.cleanimg.imasked'
immath()
Toolkit Tricks: Note that there are also pixel masks that can be contained in each image. These are Boolean masks, and are implicitly used in the calculation for each image in expr. If you want to use the mask in a different image not in expr, try it in mask:
#First make a pixel mask inside ngc5921.demo.chan22.cleanimg.mask
ia.open('ngc5921.demo.chan22.cleanimg.mask')
ia.calcmask('"ngc5921.demo.chan22.cleanimg.mask">0.5')
ia.summary()
ia.close()
#There is now a 'mask0' mask in this image as reported by the summary
#Now apply this pixel mask in immath
default('immath')
imagename='ngc5921.demo.chan22.cleanimg.image'
expr='IM0'
mask='mask(ngc5921.demo.chan22.cleanimg.mask)'
outfile='ngc5921.demo.chan22.cleanimg.imasked1'
immath()
Note that nominally the axes of the mask must be congruent to the axes of the images in expr. However, one exception is that the image in mask can have fewer axes (but not axes that exist but are of the wrong lengths). In this case, immath will extend the missing axes to cover the range in the images in expr. Thus, you can apply a mask made from a single channel to a whole cube.
#drop degenerate stokes and freq axes from mask image
ia.open('ngc5921.demo.chan22.cleanimg.mask')
im2 = ia.subimage(outfile='ngc5921.demo.chan22.cleanimg.mymask',dropdeg=True)
im2.summary()
im2.close()
ia.close()
#mymask has only RA and Dec axes
#Now apply this mask to the whole cube
default('immath')
imagename='ngc5921.demo.cleanimg.image'
expr='IM0'
mask='"ngc5921.demo.chan22.cleanimg.mymask">0.5'
outfile='ngc5921.demo.cleanimg.imasked'
immath()
Computing Image Statistics
The imstat task will calculate statistics on a region of an image and return the results as a value in a Python dictionary. The inputs are:
#imstat :: Displays statistical information from an image or image region
imagename = '' #Name of the input image.
axes = -1 #List of axes to evaluate statistics over. Default is
#all axes.
region = '' #Image Region or name. Use Viewer.
box = '' #Select one or more box regions.
chans = '' #Select the channel(spectral) range.
stokes = '' #Stokes params to image (I,IV,IQU,IQUV). Default '' =>
#include all
listit = True #Print stats and bounding box to logger?
verbose = False #Print additional messages to logger?
mask = '' #Mask to use. Default is none.
logfile = '' #Name of file to write fit results.
algorithm = 'classic' #Algorithm to use. Supported values are 'chauvenet',
#'classic', 'fit-half', and 'hinges-fences'. Minimum
#match is supported.
clmethod = 'auto' #Method to use for calculating classical statistics.
#Supported methods are 'auto', 'tiled', and
#'framework'. Ignored if algorithm is not 'classic'.
Area selection can be done using region and mask parameters. Plane selection is controlled by chans and stokes. The parameter axes will select the dimensions that the statistics are calculated over. Typical data cubes have axes like: RA axis 0, DEC axis 1, Velocity axis 2. So, e.g. axes=[0,1] would be the most common setting to calculate statistics per spectral channel.A typical output of imstat on a cube with axes=[0,1] and algorithm=’classic’ (default) looks like:
No region specified. Using full positional plane.
Using all spectral channels.
Using polarizations ALL
Determining stats for image IRC10216_HC3N.cube_r0.5.image
Set region from supplied region record
Statistics calculated using Classic algorithm
Regions ---
-- bottom-left corner (pixel) [blc]: [0, 0, 0, 0]
-- top-right corner (pixel) [trc]: [299, 299, 0, 63]
-- bottom-left corner (world) [blcf]: 09:48:01.492, +13.15.40.658, I, 3.63994e+10Hz
-- top-right corner (world) [trcf]: 09:47:53.299, +13.17.40.258, I, 3.63915e+10Hz
No region specified. Using full positional plane.
Using all spectral channels.
Using polarizations ALL
Selected bounding box :
[0, 0, 0, 0] to [299, 299, 0, 63] (09:48:01.492, +13.15.40.658, I, 3.63994e+10Hz to 09:47:53.299, +13.17.40.258, I, 3.63915e+10Hz)
#Frequency Frequency(Plane) Npts Sum Mean Rms Std dev Minimum Maximum
3.63993552e+10 0 9.000000e+04 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
3.63992302e+10 1 9.000000e+04 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
3.63991052e+10 2 9.000000e+04 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
3.63989802e+10 3 9.000000e+04 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
3.63988551e+10 4 9.000000e+04 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
3.63987301e+10 5 9.000000e+04 6.069948e-01 6.744386e-06 1.534640e-03 1.534634e-03 -6.355108e-03 6.166496e-03
3.63986051e+10 6 9.000000e+04 2.711720e-01 3.013023e-06 1.538071e-03 1.538077e-03 -6.165663e-03 5.862981e-03
3.63984801e+10 7 9.000000e+04 2.501259e-01 2.779177e-06 1.578049e-03 1.578056e-03 -6.771976e-03 6.272645e-03
3.63983551e+10 8 9.000000e+04 -3.706732e-01 -4.118591e-06 1.607191e-03 1.607194e-03 -8.871284e-03 6.591001e-03
where the header information provides the specifications of the data that were selected followed by the table with the frequency values of the planes, the plane numbers, Npts (the number of pixels per plane), and the Sum, Median, RMS, Standard deviations, Minimum, and Maximum of the pixel values for each plane. Similar output is provided when the data is averaged over different axes. The logger output can also be written into or appended to a log file for further processing elsewhere (logfile parameter).imstat has access to different statistics algorithms. Most of them represent different ways on how to treat distributions that are not Gaussian, in particular to eliminate outlier values from the statistics. Available algorithms are CLASSIC, where all unmasked pixels are used, FIT-HALF, where one (good) half of the distribution is being mirrored across a central value, HINGES-FENCES, where the inner quartiles plus a ‘fence’ data portion is being used, and CHAUVENET, which includes values based on the number of standard deviations from the mean. For more information, see the inline help of the imstat task.
Using the task return value
The contents of the return value of imstat are in a Python dictionary of key-value sets. For example,
xstat = imstat()
will assign this to the Python variable xstat. The keys for xstat are outlined on the imstat page.For example, an imstat call might be
default('imstat')
imagename = 'ngc5921.demo.cleanimg.image' #The NGC5921 image cube
box = '108,108,148,148' #20 pixels around the center
chans = '21' #channel 21
xstat = imstat()
In the terminal window, imstat reports:
Statistics on ngc5921.usecase.clean.image
Region ---
-- bottom-left corner (pixel) [blc]: [108, 108, 0, 21]
-- top-right corner (pixel) [trc]: [148, 148, 0, 21]
-- bottom-left corner (world) [blcf]: 15:22:20.076, +04.58.59.981, I, 1.41332e+09Hz
-- top-right corner( world) [trcf]: 15:21:39.919, +05.08.59.981, I, 1.41332e+09Hz
Values --
-- flux [flux]: 0.111799236126
-- number of points [npts]: 1681.0
-- maximum value [max]: 0.029451508075
-- minimum value [min]: -0.00612453464419
-- position of max value (pixel) [maxpos]: [124, 131, 0, 21]
-- position of min value (pixel) [minpos]: [142, 110, 0, 21]
-- position of max value (world) [maxposf]: 15:22:04.016, +05.04.44.999, I, 1.41332e+09Hz
-- position of min value (world) [minposf]: 15:21:45.947, +04.59.29.990, I, 1.41332e+09Hz
-- Sum of pixel values [sum]: 1.32267159822
-- Sum of squared pixel values [sumsq]: 0.0284534543692
Statistics ---
-- Mean of the pixel values [mean]: 0.000786836167885
-- Standard deviation of the Mean [sigma]: 0.00403944306904
-- Root mean square [rms]: 0.00411418313161
-- Median of the pixel values [median]: 0.000137259965413
-- Median of the deviations [medabsdevmed]: 0.00152346317191
-- Quartile [quartile]: 0.00305395200849
The return value in xstat is
CASA <152>: xstat
Out[152]:
{'blc': array([108, 108, 0, 21]),
'blcf': '15:22:20.076, +04.58.59.981, I, 1.41332e+09Hz',
'flux': array([ 0.11179924]),
'max': array([ 0.02945151]),
'maxpos': array([124, 131, 0, 21]),
'maxposf': '15:22:04.016, +05.04.44.999, I, 1.41332e+09Hz',
'mean': array([ 0.00078684]),
'medabsdevmed': array([ 0.00152346]),
'median': array([ 0.00013726]),
'min': array([-0.00612453]),
'minpos': array([142, 110, 0, 21]),
'minposf': '15:21:45.947, +04.59.29.990, I, 1.41332e+09Hz',
'npts': array([ 1681.]),
'quartile': array([ 0.00305395]),
'rms': array([ 0.00411418]),
'sigma': array([ 0.00403944]),
'sum': array([ 1.3226716]),
'sumsq': array([ 0.02845345]),
'trc': array([148, 148, 0, 21]),
'trcf': '15:21:39.919, +05.08.59.981, I, 1.41332e+09Hz'}
ALERT: The return dictionary currently includes NumPy array values, which have to be accessed by an array index to get the array value. To access these dictionary elements, use the standard Python dictionary syntax, e.g. xstat [][]
For example, to extract the standard deviation as a number
mystddev = xstat['sigma'][0]
print 'Sigma = '+str(xstat['sigma'][0])
Examples for imstat
To extract statistics for an image:
xstat = imstat('b1608.demo.clean2.image')
#Printing out some of these
print 'Max = '+str(xstat['max'][0])
print 'Sigma = '+str(xstat['sigma'][0])
#results:
#Max = 0.016796965152
#Sigma = 0.00033631979385
In a box around the brightest component:
xstat_A = imstat('b1608.demo.clean2.image',box='124,125,132,133')
#Printing out some of these
print 'Comp A Max Flux = '+str(xstat_A['max'][0])
print 'Comp A Max X,Y = ('+str(xstat_A['maxpos'][0])+','+str(xstat_A['maxpos'][1])+')'
#results:
#Comp A Max Flux = 0.016796965152
#Comp A Max X,Y = (128,129)
Computing a Deviation Image
The imdev task produces an output image whose value in each pixel represents the “error” or “deviation” in the input image at the corresponding pixel. The output image has the same dimensions and coordinate system as the input image, or as the selected region of the input image. The inputs are:
#imdev :: Create an image that can represent the statistical deviations of the input image.
imagename = '' #Input image name
outfile = '' #Output image file name. If left blank (the default), no
#image is written but a new image tool referencing
#the collapsed image is returned.
region = '' #Region selection. Default is to use the full image.
box = '' #Rectangular region(s) to select in direction plane.
#Default is to use the entire direction plane.
chans = '' #Channels to use. Default is to use all channels.
stokes = '' #Stokes planes to use. Default is to use all Stokes planes.
mask = '' #Mask to use. Default setting is none.
overwrite = False #Overwrite (unprompted) pre-existing output file? Ignored
#if "outfile" is left blank.
grid = [1, 1] #x,y grid spacing. Array of exactly two positive integers.
anchor = 'ref' #x,y anchor pixel location. Either "ref" to use the image
#exactly two integers.
xlength = '1pix' #Either x coordinate length of box, or diameter of circle.
#Circle is used if ylength is a reference pixel or an
#empty string.
ylength = '1pix' #y coordinate length of box. Use a circle if ylength is
#an empty string.
interp = 'cubic' #Interpolation algorithm to use. Can be "nearest", "linear",
#or "cubic". Minimum match supported.
stattype = 'sigma' #Statistic to compute. See below for the list of supported
#statistics.
statalg = 'classic' #Statistics computation algorithm to use. Supported values
#are "chauvenet" and "classic". Minimum match is supported.
Area selection can be done using the region and mask parameters. Plane selection is controlled by the chans and stokes parameters. Statistics are computed spatially: a deviation image is computed independently for each channel/Stokes plane. If the outfile parameter is left blank, the task returns an image tool referencing the resulting image; otherwise the resulting image is written to disk.
The statistic to be computed is selected using the stattype parameter. Allowed statistics are:
iqr inner quartile range (q3 - q1)
max maximum
mean mean
medabsdevmed, madm median absolute deviation from the median
median median
min minimum
npts number of points
q1 first quartile
q3 third quartile
rms rms
sigma, std standard deviation
sumsq sum of squares
sum sum
var variance
xmadm median absolute deviation from the median multipied by x, where x is the
reciprocal of Phi^-1(3/4),where Phi^-1 is the reciprocal of the quantile
function. Numerically, x = 1.482602218505602. See, eg,
https://en.wikipedia.org/wiki/Median_absolute_deviation#Relation_to_standard_deviation
The chosen statistic is calculated around a set of grid points (pixels) across the input image with grid spacing specified by the grid parameter. The size and shape of the region used to compute the statistic at each grid point is specified by the xlength and ylength parameters. If ylength is an empty string, then the region used is a circle centered on each grid point with diameter provided by xlength. Otherwise, a rectangular region with dimensions of xlength by ylength is used. These two parameters may be specified as valid quantities with recognized units (e.g., “4arcsec” or “4pix”). They may also be specified as numerical values, in which case the unit is assumed to be pixels.
The chosen statistic is calculated at every grid point in the input image, and the result is reflected at the corresponding pixel of the output image. Values at all other pixels in the output image are determined by interpolating across the grid points using the interpolation scheme given by the input parameter interp. The statalg parameter specifies the algorithm for the statistics computation. Available algorithms are CLASSIC, where all unmasked pixels are used, and CHAUVENET, which includes values based on the number of standard deviations from the mean.
Examples for imdev
Compute a “standard deviation” image using grid-spacing of 4 arcsec in the X direction and 5 arcsec in the Y direction, with linear interpolation to compute values at non-grid-pixels. Compute the standard deviation in a box of 20 x 25 arcsec.
imdev("my.image", "std.image", grid=[4,5], xlength="20arcsec", ylength="25arcsec",
stattype="sigma", interp="linear", statalg="classic")
Compute an image showing median absolute deviation (MAD) across the image, with MAD converted to an equivalent RMS value. Anchor the grid at a specific pixel [1000,1000] with grid-spacing of 10 pixels, and use circles of diameter 30 pixels for the statistical computation. Calculate the statistic using the z-score/Chauvenet algorithm by fixing the maximum z-score to determine outliers to 5. Use cubic interpolation to determine the value at non-grid-pixels. Have the task return a pointer to the output image.
myim = imdev("my.image", anchor=[1000,1000], grid=[10,10], xlength=30, ylength='',
stattype="xmadm", interp="cubic", statalg="chauvenet", zscore=5)
Image Selection Parameters¶
Many of the image analysis tasks contain a set of parameters that can be used to define a sub-section of an image that the task works on. This includes selection in the spatial coordinates, typically RA/DEC or GLON/GLAT (or image axes 0,1) which can be defined by the box parameter. Spectral selection can be achieved by the chans parameter (typically axis 3) and Stokes selection with the stokes parameter (typically axis 2). These parameters are described below and are a quick way to select sub-images. For more complicated selections, we recommend the usage of the CASA region file (CRTF file).
Region Selection
Direction (e.g. RA, Dec) areal selection in the image analysis tasks is controlled by the box parameter or through the region parameter. Note that one should either specify a region (recommended) or any of box/chans/stokes parameters. Specifying both at the same time will give priority to the command line inputs in ‘chans’ and ‘stokes’ but will keep the region file specification for the spatial region selection.
The box parameter selects spatial rectangular areas:
box = '' #Select one or more box regions
E.g. for right ascension and declination. Boxes are specified by their bottom-left corner (blc) and top-right corner (trc) as follows: blcx, blcy, trcx, trcy. At the moment, CASA only accepts pixel values. The default (empty string) is to select all pixels of an image.
Example:
box='0,0,50,50'
selects a square with 50 pixels on the side starting at 0.
box='10,20,30,40,100,100,150,150'
selects two regions at a time. The first and the last four values mark the two boxes, following (blcx1, blcy1, trcx1, trcy1, blcx2, blcy2, trcx2, trcy2), with b/t = bottom/top and l/r = left/right.
ALERT: Note that one should either specify a region (recommended) or any of box/chans/stokes. If both are specified, the following rules apply: - If the region parameter is specified as a python dictionary (e.g. such as various rg tool methods return), a binary region file, or a region-in-image, it is not permissible to specify any of the box, chans, or stokes parameters. - If the region parameter is specified to be a CRTF file name, or a CRTF region string, then the resulting region and box selection is the union of the box specification with any regions in the CRTF file/string. This is the equivalent of translating the box specification into the equivalent “box” CRTF specification and prepending that specification to the specified CRTF file/string in the region parameter.
Plane Selection (chans, stokes)
The channel, frequency, or velocity plane(s) of the image is chosen using the chans parameter:
chans = '' #Select the channel(spectral) range
Using channel numbers, it is possible to also set ranges, e.g.**
chans='0,3,4,8' #select channels 0, 3, 4, 8
chans='3~20;50,51' #channels 3 to 20 and 50 and 51
chans='<10;>=55' #channels 0 to 9 and 55 and greater (inclusively)*
*
Sometimes, as in the immoments task, the channel/plane selection is generalized to work on more than one axis type. In this case, the planes parameter is used. This behaves like chans in syntax.
chans can also be set in the CASA region format to allow settings in frequency and velocity, e.g.
chans=('range=[-50km/s,50km/s], restfreq=100GHz, frame=LSRK')
This example would even define a new velocity system independent of the one in the image itself. If the rest frequency and velocity frame within the image are being used, the latter two entries are not needed. The parentheses are needed when the call is in a single command.
A frequency selection looks like:
chans=('range=[100GHz,100.125GHz]')
The polarization plane(s) of the image is chosen with the stokes parameter:
stokes = '' #Stokes params to image (I,IV,IQU,IQUV)
stokes parameters to image. Best practice is to separate the stokes parameters by commas, but this is not essential if no ambiguity exists. Options are: ‘I’,’Q’,’U’,’V’, ‘I’,’IV’,’QU’,’IQUV’, ‘RR’,’RL’,’LR’,’LL’, ‘XX’,’YX’,’XY’,’YY’,…
Examples:
stokes='IQUV';
stokes='I,Q'
stokes='RR,RL'
ALERT: For image analysis tasks and tool methods which also accept the region parameter, the following rules apply if both the chans and region parameters are simultaneously specified: - If the region parameter is specified as a python dictionary (e.g. such as various rg tool methods return), a binary region file, or a region-in-image, it is not permissable to specify any of the box, chans, or stokes parameters. - If the region parameter is specified to be a CRTF file name, or a CRTF region string, it is only permissable to specify chans if that specification can be represented as a single contiguous channel range. In that case, the chans specification overrides any global or per-region range specification in the CRTF file/string, and is used as the global spectral range selection for all regions in the CRTF file/string.
Region tool and multi-dimensional selection
The region parameter only supports 2D representations of shapes (box, circle, etc) in general astronomical coordinate systems, such as RA and dec or pixel coordinates. Additional axes need to be specified with different parameters, such as chans or stokes.
The region tool is less constrained, as it allows multi-dimensional selection in pixel coordinates, independent of the coordinate axes. For example:
reg = rg.box([0,0,0,0],[5,6,7,8])
The return value is a big dictionary specifying everything the software needs to know about the region. The reg variable can then be specified as the value of the region parameter in ia tool methods.
Region Files¶
The region parameter points to a CASA region which can be directly specified or listed in a ImageRegion file. An ImageRegion file can be created with the CASA viewer’s region manager, or directly using the CASA region file (crtf) syntax. Typically ImageRegion files will have the suffix “.crtf” for CASA Region Text Format.
For example:
region='circle[[18h12m24s, -23d11m00s], 2.3arcsec]'
or
region='myimage.im.crtf'
to specify a region file. For the most part, the region parameter in tasks only accepts strings (e.g. file names, region shape descriptions) while the region parameter in ia tool methods only accepts python region dictionaries (e.g. produced using the rg tool).
Alert: When both the region parameter and any of box/chans/stokes are specified simultaneously, the task may perform unwanted selections. While it is safest to only specify one of these (sets of) parameters, the following rules apply:
If region is specified as a python dictionary (eg such as various rg tool methods return), a binary region file, or a region-in-image, then it is not permissable to specify any of the box, chans, or stokes parameters.
If the region parameter is specified to be a CRTF file name, or a CRTF region string, the following rules apply: - If box is specified, the resulting selection is the union of the box specification with any regions in the CRTF file/string. This is the equivalent of translating the box specification into the equivalent “box” CRTF specification and prepending that specification to the specified CRTF file/string in the region parameter.
If chans is specified, it must be able to be represented as a single contiguous range of channels. In this case, the chans specification overrides any global or per-region range specification in the CRTF file/string, and is used as the global spectral range selection for all regions in the CRTF file/string.
If stokes is specified, this specification overrides any global or per-region corr specification in the CRTF file/string, and is used as the global correlation selection for all regions in the CRTF file/string.
NOTE: The CASA image analysis tasks will determine how a region is projected on a pixel image. The current CASA definition is that when the center of a pixel is inside the region, the full pixel is considered to be included in the region. If the center of the pixel is outside the region, the full pixel will be excluded. Note that the CASA viewer behavior is not entirely consistent and for rectangles it assumes that any fractional pixel coverage will include the entire pixel. For other supported shapes (ellipses and polygons), however, the Viewer adheres to the ‘center of pixel’ definition, consistent with the image analysis tools and tasks.
For purely single-pixel work regions may not necessarily be the best choice and alternate methods may be preferable to using regions, eg. ia.topixel, ia.toworld, ia.pixelvalue.
ALERT: Some region file specifications are not recognized by the viewer, the viewer only supports rectangles (box), ellipses, and polygons.
Region File Format¶
The CASA region file format provides a flexible, easily edited set of region definitions which are accepted across CASA tasks. Region files may be written by hand or using the CASA Viewer.
ALERT: Whereas the region format is supported by all the data processing tasks, the viewer implementation is still limited to rectangles, ellipses, and some markers.
For a file to be recognized as a valid CASA region text file, the first line must contain the string:
#CRTF
“CRTF” stands for “CASA Region Text Format”. One may also include an optional version number at the end of the string, so it reads #CRTFv0; this indicates the version of the format definition.
Region files have two different kinds of definitions, “regions” and “annotations”, each of which is one line long. To indicate an annotation, a line must begin with “ann”. Lines that begin with the comment character (#) are not considered for processing or display.
The second line of a file may define global parameters that are to be used for all regions and annotations in that file, in which case the line starts with the word “global”. The parameters set here may also be overridden by keywords in a specific line, in which case the keywords pertain only to that one line.
NOTE: All regions are considered by tasks. They will be displayed by visualization tasks as well as used to create masks, etc., as appropriate. Annotations are used by display tasks, and are for visual reference only.
Some tasks, like clean, require that a region cannot be entirely outside the image.
Region Definitions
All regions lines will follow this general arrangement:
{shape} {additional parameter=value pairs}
The possible parameter/value pairs are described in more detail below. Note that most parameters beyond the shape and its coordinates can be defined globally.
Possible units for coordinates are:
sexagesimal, e.g. 18h12m24s for right ascension or -03.47.27.1 for declination
decimal degrees, e.g. 140.0342deg for both RA and Dec
radians, e.g. 2.37666rad for both RA and Dec
pixels, e.g. 204pix
Possible units of length are:
degrees, e.g. 23deg
arcminutes, e.g. 23arcmin
arcseconds, e.g. 23arcsec
radians, e.g. 0.00035rad
pixels, e.g. 23pix
Units must always be included when defining a region.
NOTE: The CASA image analysis tasks will determine how a region is projected on a pixel image. The current CASA definition is that when the center of a pixel is inside the region, the full pixel is considered to be included in the region. If the center of the pixel is outside the region, the full pixel will be excluded. Note that the CASA viewer behavior is not entirely consistent and for rectangles it assumes that any fractional pixel coverage will include the entire pixel. For other supported shapes (ellipses and polygons), however, ithe viewer adheres to the ‘center of pixel’ definition, consistent with the image analysis tools and tasks.
For purely single-pixel work regions may not necessarily be the best choice and alternate methods may be preferable to using regions, eg. ia.topixel, ia.toworld, ia.pixelvalue.
Allowed Shapes
Rectangular box; the two coordinates are two opposite corners:
box[[x1, y1], [x2, y2]]
Center box; [x, y] define the center point of the box and [x_width, y_width] the width of the sides:
centerbox[[x, y], [x_width, y_width]]
Rotated box; [x, y] define the center point of the box; [x_width, y_width] the width of the sides; rotang the rotation angle:
rotbox[[x, y], [x_width, y_width], rotang]
Polygon; there could be many [x, y] corners. If parts of the polygon overlap, then the pixels in that overlapping region are taken into account. Note that the last point will connect with the first point to close the polygon:
poly[[x1, y1], [x2, y2], [x3, y3], ...]
Circle; center of the circle [x,y], r is the radius:
circle[[x, y], r]
Annulus; center of the circle is [x, y], [r1, r2] are inner and outer radii:
annulus[[x, y], [r1, r2]]
Ellipse; center of the ellipse is [x, y]; semi-major and semi-minor axes are [bmaj, bmin]; position angle of the major axis is pa:
ellipse[[x, y], [bmaj, bmin], pa]
Annotation Definitions
In addition to the definitions for regions, above, the following are always treated as annotations:
Line; coordinates define the end points of the line:
line[[x1, y1], [x2, y2]]
Vector; coordinates define end points; second coordinate pair is location of tip of arrow:
vector[[x1, y1], [x2, y2]]
Text; coordinates define leftmost point of text string:
text[[x, y], ’my text’]
Symbol; coordinates define location of symbol:
symbol[[x, y], {symbol}]
Global Definitions
Definitions to be used throughout the region file are placed on a line beginning with ‘global’, usually at the top of the file. These definitions may also be used on any individual region or annotation line; in this case, the value defined on that line will override the predefined global (but only for that line). If a ‘global’ line occurs later in the file, subsequent lines will obey those definitions.
Coordinate reference frame:
Supported values: J2000, B1950, B1950_VLA, BMEAN, GALACTIC, ECLIPTIC, SUPERGAL, ICRS
Default: image value
coord = J2000
Frequency/velocity axis:
Possible values: REST, LSRK, LSRD, BARY, GEO, TOPO, GALACTO, LGROUP, CMB
Default: image value
frame=TOPO
Frequency/velocity range:
Possible units: GHz, MHz, kHz, km/s, Hz, channel, chan (=channel)
Default: image range
range=[min, max]
Correlation axis:
Possible values: I, Q, U, V, RR, RL, LR, LL, XX, XY, YX, YY, RX, RY, LX, LY, XR, XL, YR, YL, PP, PQ, QP, QQ, RCircular, LCircular, Linear, Ptotal, Plinear, PFtotal, PFlinear, Pangle
Default: all planes present in image
corr=[X, Y]
Velocity calculation:
Possible values: RADIO, OPTICAL, Z, BETA, GAMMA
Default: image value
veltype=RADIO
Rest frequency:
Default: image value
restfreq=1.42GHz
Line characteristics:
Possible values: any line style recognized by matplotlib: ‘-‘=solid, ‘–’=dashed, ‘:’=dotted
Default:
linewidth=1
linestyle=’-’
Symbol characteristics:
Symbol size and thickness:
symsize = 1
symthick = 1
Region, symbol, and text color:
Possible values: any color recognized by matplotlib, including hex values
Default:
color=green
color=red
Text font characteristics:
Possible values: see below
‘usetex’ is a boolean parameter that determines whether or not the text line should be interpreted as LaTeX, and would require working LaTeX, dvipng, and Ghostscript installations (equivalent to the text.usetex parameter in matplotlib).
font=Helvetica
fontsize=10pt
fontstyle=bold
usetex=True/False
Label position:
Possible values: ‘left’, ‘right’, ‘top’, ‘bottom’
Default: ‘top’
labelpos=’right’
Label color:
Default: color of associated region.
Allowed values: same as values for region colors.
labelcolor=’green’
Label offset:
Default: [0,0].
Allowed values: any positive or negative number, in units of pixels.
labeloff=[1, 1]
Allowed Additional Parameters
These must be defined per region line:
Labels: text label for a region; should be placed so text does not overlap with region boundary
label=’string’
“OR/NOT” operators: A “+” at the beginning of a line will flag it with a boolean “OR” (default), and a “-” will flag it with a boolean “NOT”. Overlapping regions will be treated according to their sequence in the file; i.e., ((((entireImage OR line1) OR line2) NOT line3) OR line4). This allows some flexibility in building “non-standard” regions. Note that a task (e.g., clean) will still consider all lines: if one wishes to remove a region from consideration, it should be commented out (“#”).
Default: OR (+)
Examples
A file with both global definitions and per-line definitions:
#CRTFv0
global coord=B1950_VLA, frame=BARY, corr=[I, Q], color=blue
# A simple circle region:
circle[[18h12m24s, -23d11m00s], 2.3arcsec]
# A box region, this one only for annotation:
ann box[[140.0342deg, -12.34243deg], [140.0360deg, -12.34320deg]]
# A rotated box region, for a particular range of velocities:
rotbox[[12h01m34.1s, 12d23m33s], [3arcmin, 1arcmin], 12deg], range=[-1240km/s, 1240km/s]
# An annular region, overriding some of the global defaults:
annulus[[17h51m03.2s, -45d17m50s], [4.12deg, 0.10deg]], corr=[I,Q,U,V], color=red, label=’My label here’
# Cuts an ellipse out of the previous regions, but only for Q and a particular frequency range:
-ellipse[[17:51:03.2, -45.17.50], [1.34deg, 0.25deg], 45rad], range=[1.420GHz, 1.421GHz], corr=[Q], color=green, label=’Removed this’
# A diamond marker, in J2000 coordinates:
symbol[[32.1423deg, 12.1412deg], D], linewidth=2, coord=J2000, symsize=2
Fonts and Symbols
Allowed Symbols
symbol |
description |
---|---|
. |
point marker |
, |
pixel marker |
o |
circle marker |
v |
triangle_down marker |
^ |
triangle_up marker |
< |
triangle_left marker |
> |
triangle_right marker |
1 |
tri_down marker |
2 |
tri_up marker |
3 |
tri_left marker |
4 |
tri_right marker |
s |
square marker |
p |
pentagon marker |
* |
star marker |
h |
hexagon1 marker |
H |
hexagon2 marker |
plus marker |
|
x |
x marker |
D |
diamond marker |
d |
thin_diamond marker |
| |
vline marker |
_ |
hline marker |
Allowed Fonts for Linux
“Century Schoolbook L”, “Console”, “Courier”, “Courier 10 Pitch”, “Cursor”, “David CLM”, “DejaVu LGC Sans”, “DejaVu LGC Sans Condensed”, “DejaVu LGC Sans Light”, “DejaVu LGC Sans Mono”, “DejaVu LGC Serif”, “DejaVu LGC Serif Condensed”, “Dingbats”, “Drugulin CLM”, “East Syriac Adiabene”, “Ellinia CLM”, “Estrangelo Antioch”, “Estrangelo Edessa”, “Estrangelo Nisibin”, “Estrangelo Nisibin Outline”, “Estrangelo Talada”, “Fangsong ti”, “Fixed [Sony]”, “Fixed [Eten]”, “Fixed [Misc]”, “Fixed [MNKANAME]”, “Frank Ruehl CLM”, “fxd”, “Goha-Tibeb Zemen”, “goth_p”, “Gothic [Shinonome]”, “Gothic [mplus]”, “hlv”, “hlvw”, “KacstArt”, “KacstBook”, “KacstDecorative”, “KacstDigital”, “KacstFarsi”, “KacstLetter”, “KacstPoster”, “KacstQura”, “KacstQuraFixed”, “KacstQuran”, “KacstTitle”, “KacstTitleL”, “Liberation Mono”, “Liberation Sans”, “Liberation Serif”, “LKLUG”, “Lohit Bengali”, “Lohit Gujarati”, “Lohit Hindi”, “Lohit Kannada”, “Lohit Malayalam”, “Lohit Oriya”, “Lohit Punjabi”, “Lohit Tamil”, “Lohit Telugu”, “LucidaTypewriter”, “Luxi Mono”, “Luxi Sans”, “Luxi Serif”, “Marumoji”, “Miriam CLM”, “Miriam Mono CLM”, “MiscFixed”, “Monospace”, “Nachlieli CLM”, “Nimbus Mono L”, “Nimbus Roman No9 L”, “Nimbus Sans L”, “Nimbus Sans L Condensed”, “PakTypeNaqsh”, “PakTypeTehreer”, “qub”, “Sans Serif”, “Sazanami Gothic”, “Sazanami Mincho”, “Serif”, “Serto Batnan”, “Serto Jerusalem”, “Serto Jerusalem Outline”, “Serto Mardin”, “Standard Symbols L”, “sys”, “URW Bookman L”, “URW Chancery L”, “URW Gothic L”, “URW Palladio L”, “Utopia”, “Yehuda CLM”
Allowed Fonts for MacOS X
“Abadi MT Condensed Light”, “Adobe Caslon Pro”, “Adobe Garamond Pro”, “Al Bayan”, “American Typewriter”, “Andale Mono”, “Apple Braille”, “Apple Chancery”, “Apple LiGothic”, “Apple LiSung”, “Apple Symbols”, “AppleGothic”, “AppleMyungjo”, “Arial”, “Arial Black”, “Arial Hebrew”, “Arial Narrow”, “Arial Rounded MT Bold”, “Arial Unicode MS”, “Arno Pro”, “Ayuthaya”, “Baghdad”, “Baskerville”, “Baskerville Old Face”, “Batang”, “Bauhaus 93”, “Bell Gothic Std”, “Bell MT”, “Bernard MT Condensed”, “BiauKai”, “Bickham Script Pro”, “Big Caslon”, “Birch Std”, “Blackoak Std”, “Book Antiqua”, “Bookman Old Style”, “Bookshelf Symbol 7”, “Braggadocio”, “Britannic Bold”, “Brush Script MT”, “Brush Script Std”, “Calibri”, “Calisto MT”, “Cambria”, “Candara”, “Century”, “Century Gothic”, “Century Schoolbook”, “Chalkboard”, “Chalkduster”, “Chaparral Pro”, “Charcoal CY”, “Charlemagne Std”, “Cochin”, “Colonna MT”, “Comic Sans MS”, “Consolas”, “Constantia”, “Cooper Black”, “Cooper Std”, “Copperplate”, “Copperplate Gothic Bold”, “Copperplate Gothic Light”, “Corbel”, “Corsiva Hebrew”, “Courier”, “Courier New”, “Curlz MT”, “DecoType Naskh”, “Desdemona”, “Devanagari MT”, “Didot”, “Eccentric Std”, “Edwardian Script ITC”, “Engravers MT”, “Euphemia UCAS”, “Eurostile”, “Footlight MT Light”, “Franklin Gothic Book”, “Franklin Gothic Medium”, “Futura”, “Garamond”, “Garamond Premier Pro”, “GB18030 Bitmap”, “Geeza Pro”, “Geneva”, “Geneva CY”, “Georgia”, “Giddyup Std”, “Gill Sans”, “Gill Sans MT”, “Gill Sans Ultra Bold”, “Gloucester MT Extra Condensed”, “Goudy Old Style”, “Gujarati MT”, “Gulim”, “GungSeo”, “Gurmukhi MT”, “Haettenschweiler”, “Harrington”, “HeadLineA”, “Hei”, “Heiti SC”, “Heiti TC”, “Helvetica”, “Helvetica CY”, “Helvetica Neue”, “Herculanum” “Hiragino Kaku Gothic Pro”, “Hiragino Kaku Gothic ProN”, “Hiragino Kaku Gothic Std”, “Hiragino Kaku Gothic StdN”, “Hiragino Maru Gothic Pro”, “Hiragino Maru Gothic ProN”, “Hiragino Mincho Pro”, “Hiragino Mincho ProN”, “Hiragino Sans GB”, “Hobo Std”, “Hoefler Text”, “Impact”, “Imprint MT Shadow”, “InaiMathi”, “Kai”, “Kailasa”, “Kino MT”, “Kokonor”, “Kozuka Gothic Pro”, “Kozuka Mincho Pro”, “Krungthep”, “KufiStandardGK”, “Letter Gothic Std”, “LiHei Pro”, “LiSong Pro”, “Lithos Pro”, “Lucida Blackletter”, “Lucida Bright”, “Lucida Calligraphy”, “Lucida Console”, “Lucida Fax”, “Lucida Grande”, “Lucida Handwriting”, “Lucida Sans”, “Lucida Sans Typewriter”, “Lucida Sans Unicode”, “Marker Felt”, “Marlett”, “Matura MT Script Capitals”, “Meiryo”, “Menlo”, “Mesquite Std”, “Microsoft Sans Serif”, “Minion Pro”, “Mistral”, “Modern No. 20”, “Monaco”, “Monotype Corsiva”, “Monotype Sorts”, “MS Gothic”, “MS Mincho”, “MS PGothic”, “MS PMincho”, “MS Reference Sans Serif”, “MS Reference Specialty”, “Mshtakan”, “MT Extra”, “Myriad Pro”, “Nadeem”, “New Peninim MT”, “News Gothic MT”, “Nueva Std”, “OCR A Std”, “Onyx”, “Optima”, “Orator Std”, “Osaka”, “Papyrus”, “PCMyungjo”, “Perpetua”, “Perpetua Titling MT”, “PilGi”, “Plantagenet Cherokee”, “Playbill”, “PMingLiU”, “Poplar Std”, “Prestige Elite Std”, “Raanana”, “Rockwell”, “Rockwell Extra Bold”, “Rosewood Std”, “Sathu”, “Silom”, “SimSun”, “Skia”, “Stencil”, “Stencil Std”, “STFangsong”, “STHeiti”, “STKaiti”, “STSong”, “Symbol”, “Tahoma”, “Tekton Pro”, “Thonburi”, “Times”, “Times New Roman”, “Trajan Pro”, “Trebuchet MS”, “Tw Cen MT”, “Verdana”, “Webdings”, “Wide Latin”, “Wingdings”, “Wingdings 2”, “Wingdings 3”, “Zapf Dingbats”, “Zapfino”
Image Masks¶
A mask can be used to define whether part of an image is used or not. There are different options for masks:
an image cube with Boolean True (not masked) or False (masked) values: They usually live inside image cubes and are automatically applied to the data. More than one mask may exist in a cube. The task makemask can be used to access and manipulate internal Boolean masks via the image.im:mask syntax.
an image cube with zero (masked) and non-zero (not masked) values: They are their own image cubes and are applied to other image cubes when needed.
an LEL string for a condition.
Masks (mask parameter)
Using image cubes is useful to mask on a pixel by pixel basis, where False and zeros mark masked (excluded) pixels. Both versions can be converted into each other with the task makemask. Some analysis tasks show an optional stretch parameter which is useful, e.g., to expand a single plane mask to an entire cube along the spectral axis.
To use a different zero/non-zero mask (in this case ‘myimage.mask’), the parameter can be set like the following:
mask='mask(myimage.mask)'
The default boolean masks inside images will also be respected with the above syntax.
But remember that an image can have multiple Boolean masks, so to use the mask2 in an image, set the parameter as:
mask='mask(myimage.mask:mask2)'
using the syntax where the mask is specified after the colon. To see what masks are present in your image, use the makemask task.
An LEL string can be an on-the-fly (OTF) mask expression or refer to an image pixel mask.
Note that the mask file supplied in the mask parameter must have the same shape, same number of axes, and same axes length as the images supplied in the expr parameter, with one exception. The mask may be missing some of the axes, if this is the case then the mask will be expanded along these axes to become the same shape.
The following example uses the mask from file ngc5921.clean.cleanbox.mask :
mask='mask(ngc5921.clean.cleanbox.mask)'
Here, the mask is calculated to be all pixels with values larger than 0.5:
mask='"ngc5921.clean.cleanbox.mask">0.5'
Because it is an LEL expression, care must be taken to properly escape characters which LEL views as special. For details, see the LEL document. As an example, specifying
mask = "3clean_mask.im" (WILL FAIL)
will cause the image analysis application to fail, because the image name begins with a number. The solution is to escape the name properly, e.g.:
mask = "'3clean_mask.im'"
Image Mask Handling
makemask facilitates the handling of image masks in CASA. As mentioned above, there are two basic mask formats: 1) one or more Boolean masks stored internally in the image file, and 2) images with zero and non-zero image values. makemask looks like:
#makemask :: Makes and manipulates image masks
mode = 'list' #Mask method (list, copy,expand,delete,setdefaultmask)
inpimage = '' #Name of input image.
To distinguish between Boolean internal masks and zero/non-zero images, makemask uses the syntax galaxy.image:mask0 for Boolean masks within an image: in this example, the Boolean mask mask0 within the image galaxy.image. Without the colon separator, the image itself will be treated as a zero/non-zero mask.mode=’list’ lists all the internal Boolean masks that are present in an image. The default masks can be set with mode=’setdefaultmask’ and they can be deleted with the mode=’delete’. The default mask is used when an image is displayed in the viewer and is used in all analysis tasks.mode=’copy’ lets a user copy a Boolean mask from one image to another image, or to write out the Boolean mask as a zero/non-zero image. The latter format is very useful when different masks are combined or manipulated. All the image analysis tools, in particular immath are applicable for such zero/non-zero masks as they act like normal images. makemask will always attempt to regrid the input mask to the output image.In addition mode=’copy’ can be used to merge masks and also accepts regions. E.g. to create a mask from a CASA region file, the input would look like:
#makemask :: Makes and manipulates image masks
mode = 'copy' #Mask method (list, copy,expand,delete,setdefaultmask)
inpimage = 'inputimage.im' #Name of input image.
inpmask = 'region.crtf' #mask(s) to be processed: image masks,T/F internal masks
#(Need to include parent image names),regions(for copy mode)
output = 'imagemask.im' #Name of output mask (imagename or imagename:internal_maskname)
overwrite = False #overwrite output if exists?
mode=’expand’ furthermore expands a mask in the spectral domain. It regrids first then stretches the edge channels. E.g. a one plane continuum image would be stretched to all planes of a data cube.
Image Analysis Tools¶
The CASA image analysis module contains an image analysis tool with numerous methods, as well as several higher level tasks. The tasks free users from the burden of resource management, and offer what many consider to be a more user-friendly interface available via the input <taskname> CASA command. In many cases, image analysis tasks are really just simple wrappers around analogous tool methods (e.g., the imcollapse task is just a relatively simple wrapper around the ia.collapse() tool method call), although in some cases, such as with the imregrid task, the mapping is not as simple, and much more goes on “under the hood” of a task.
Overview of Image Analysis Tool Functionality
At the heart of the image analysis module is the image analysis tool. An image analysis tool provides access to CASA images. Currently only single-precision, floating-point CASA images are supported by all methods in the image analysis tool and complex-valued images are supported by many, but not all, methods.
The default, global image analysis tool is named ia. New, initially-unattached image analysis tools can be created via
my_new_ia = iatool()
Image analysis tools also provide direct (native) access to FITS and Miriad images, although such access is read-only. These foreign formats to CASA format. For optimum processing speed, it is highly recommended to convert foreign formats to CASA images.
It is important to note that many methods return new image analysis tools that are attached to an image that method has created. Even if one does not intend on using this returned tool, it is important to capture it and run ia.done() on it or it will continue to use resources unnecessarily, e.g.
new_image_tool = ia.collapse("my_collapsed.im")
#do things with new_image_tool and then run done() on it
new_image_tool.done()
Tool Manipulation
ia.close(): Detach tool from image and perform required clean up.
ia.done(): Detach tool from image and perform required clean up and optionally removed attached image.
ia.isopen(): Determines if there is an image attached to the tool.
ia.newimage(): Create a new image analysis tool using an image.
ia.newimagefromarray(): Create a new image analysis tool from a numpy array.
ia.newimagefromfile(): Create a new image analysis tool using an image.
ia.newimagefromfits(): Create a new image analysis tool using a FITS image.
ia.newimagefromimage(): Create a new image analysis tool using an image.
ia.newimagefromshape(): Create a new image analysis tool using an image shape.
ia.open(): Attach the image analysis tool to the specified image.
ia.type(): Tool type. Always returns ‘image’.
FITS Conversion
There is functionality to interconvert between CASA images and FITS files. There is also native access to FITS files.
ia.fromfits(): Convert a FITS image file to a CASA image
ia.tofits(): Convert a CASA image to a FITS file.
ImageCreation
There are various ways to create CASA images from various data structures.
ia.fromarray(): Create a CASA image from a numpy array of pixel values.
ia.fromshape(): Create a CASA image of a specified shape.
ia.maketestimage(): Create a test image from a FITS file in the CASA data repository.
Image Destruction
ia.remove(): Delete the attached image from disk.
ia.removefile(): Delete the specified image from disk.
Image Interrogation
Various metadata and pixel data can be interrogated.
ia.beamarea(): Get the image synthesized beam area.
ia.boundingbox(): Get the bounding rectangular box which circumscribes the specified region.
ia.brightnessunit(): Get the image brightness unit.
ia.commonbeam(): For an image with multiple beams, compute the size of the smallest beam that circumscribes all of the image’s beams.
ia.getchunk(): Get pixel or mask values from (a specified rectangular region of) an image.
ia.getregion(): Get pixel or mask values from a specified region of an image.
ia.haslock(): Determines if the image has a lock associated with it.
ia.history(): Get the history information from an image.
ia.miscinfo(): Retrieve “miscellaneous” metadata associated with an image.
ia.name(): Get the image name.
ia.pixelvalue(): Get the pixel and mask values at a specified location of an image.
ia.restoringbeam(): Get information about the synthesized beam(s) of an image.
ia.shape(): Get image shape.
ia.summary(): Get various metadata of an image.
Manipulation of Image Metadata
ia.lock(): Acquire a lock on the attached image.
ia.rename(): Rename the image.
ia.rotatebeam(): Rotate the synthesized beam(s) of an image through a specified angle.
ia.setbrightnessunit(): Set image brightness unit.
ia.sethistory(): Add history records to an image.
ia.setmiscinfo(): Set image miscellaneous metadata.
ia.setrestoringbeam(): Set image synthesized beam(s).
ia.unlock(): Release the image lock.
Manipulation of Image Pixel and Pixel Mask Values
ia.calc(): Replace the pixel values in the attached image with the values determined from the specified LEL expression.
ia.calcmask(): Compute a pixel mask based on an LEL expression.
ia.insert(): Insert the pixel values of another image into an image.
ia.maskhandler(): Manipulate image pixel masks.
ia.modify(): Modify an image using a model specified by a component list.
ia.putchunk(): Set pixel values (in a specified rectrangular region) of an image.
ia.putregion(): Set pixel values in a specified region of an image.
ia.replacemaskedpixels(): Set masked pixel to a specified value.
ia.set(): Set pixel or mask values.
Operations on Images
Various operations can be performed on images which result in new images.
ia.addnoise(): Add noise to an image.
ia.boxcar(): Boxcar smooth an image along a specified axis.
ia.decimate(): Remove planes of an image.
ia.collapse(): Collapse image along specified axis, computing aggregate function of pixels along that axis.
ia.convolve(): Convolve an image with an array or with another image.
ia.continuumsub(): Subtract continuum emission in a spectral line image.
ia.convolve2d(): Convolve an image with a two-dimensional kernel.
ia.crop(): Crop pixels from the edge of an image.
ia.fft(): Fast Fourier Transform (FFT) the image.
ia.hanning(): Hanning smooth an image along a specified axis.
ia.imagecalc(): Create an image from an LEL expression.
ia.imageconcat(): Concatenate multiple images along a specified axis.
ia.makecomplex(): Create a complex-valued image from two float-valued images representing the real and imaginary values.
ia.pad(): Pad the edges of an image with pixels.
ia.pv(): Create a position-velocity image.
ia.pbcor(): Construct a primary beam corrected image.
ia.rebin(): Rebin pixel values by specified factors.
ia.regrid(): Regrid an image to a specified coordinate system.
ia.rotate(): Rotate the direction coordinate of an image.
ia.sepconvolve(): Convolve an image with a separable kernel.
ia.subimage(): Create an image by specifying a region of an image.
ia.transpose(): Transpose an image.
Image Analysis
ia.convertflux(): Interconvert between peak intensity and flux density for a specified Gaussian source.
ia.decompose(): Decompose complex source into individual two dimensional models.
ia.deconvolvecomponentlist(): Deconvolve a component list from the restoring beam.
ia.findsources(): Find strong point sources in an image.
ia.fitcomponents(): Fit two-dimensional models to the direction plane(s) of an image.
ia.fitprofile(): Fit one-dimensional models along an axis image.
ia.histograms(): Compute histograms from the pixel values of an image.
ia.maxfit(): Find maximum value in the direction coordinate and do a simple parabolic fit.
ia.moments(): Compute moments of an image.
ia.statistics(): Compute image statistics using various algorithms.
ia.twopointcorrelation(): compute two point autocorrelation functions from the image
Image Coordinates
The coordinate system of an image can be manipulated. Specific coordinate system values can be directly manipulated using the CASA coordinate system tool.
ia.adddegaxes(): Add degenerate axes to an image’s coordinate system.
ia.coordmeasures(): Convert from pixel to world coordinates, and return as a measure.
ia.coordsys(): Retrieve the image coordinate system as a CASA coordinate system tool.
ia.setcoordsys(): Replace the image’s coordinate system with another.
ia.topixel(): Convert from world to pixel coordinates.
ia.toworld(): Convert from pixel to world coordinates.
Miscellaneous
ia.makearray(): Create a numpy array of specified shape and value.
FITS Conversion
exportfits: Convert a CASA image to a FITS image.
importfits: Convert a FITS image to a CASA image.
Interrogation and Manipulation of Image Metadata
imhead: Summarize, interrogate, and modify image metadata
imhistory: List and append records to image history.
Operations on Images
Various operations can be performed on images which result in new images.
imcollapse: Collapse image along specified axis, computing aggregate function of pixels along that axis.
imcontsub: Subtract continuum emission in a spectral line image.
immath: Perform mathematical operations upon images.
immoments: Compute image moments.
impbcor: Construct a primary beam corrected image.
impv: Create a position-velocity image.
imrebin: Rebin pixel values by specified factors.
imregrid: Regrid an image to a specfied coordinate system.
imsmooth: Perform various two-dimensional convolutions.
imsubimage: Create an image by specifying a region of an image.
imtrans: Transpose an image.
specsmooth: Perform various one-dimensional convolutions.
Image Analysis
imfit: Fit two-dimensional models to the direction plane(s) of an image.
imstat: Compute image statistics using various algorithms.
imval: Interrogate pixel values.
rmfit: Compute rotation measure.
specfit: Fit one-dimensional models along a specified axis of an image.
specflux: Report spectral profile and calculate spectral flux over a user-specified region.
spxfit: Fit spectral index models along a specified axis of an image.
A persistent CASA image is stored on disk. Several files and subdirectories containing the image pixel data, mask data, and metadata are stored in a directory. The name of that directory is the name of the image.To access an existing persistent image, use the ia.open() method:
ia.open("my.im")
When you are finished with the image, it is important to close the tool so it no longer uses system resources:
ia.close()
It is also possible to create temporary images, which, if small enough, are stored completely in memory and destroyed when the user is finished with them. Creating such images is usually accomplished by running one of the image creation methods, and leaving the name of the output image blank (this is usually the default). So, for example, to create an image of a specified shape, one might run:
ia.fromshape(shape=[20,20,20])
As with persistent images, it is important to close the image analysis tool when finished with temporary images. In this case, the temporary image will be destroyed.
Persistent images can, in principle, be stored in a variety of ways. For example, the image could be stored row by row; this is the way that most older generation packages store images. It makes for very fast row by row access, but very slow in other directions (e.g. extract all the profiles along the third axis of an image). A CASA image is stored with what is called tiling. This means that small multi-dimensional chunks (a tile) are stored sequentially. It means that row by row access is a little slower, but access speed is essentially the same in all directions.
Here are some simple examples using image tools.
#access the CASA "test" FITS image and write it to a CASA image named "zz"
ia.maketestimage('zz',overwrite=true)
#print a summary to the logger and capture the summary metadata in variable "summary"
summary = ia.summary()
#evaluate image statistics and save the stats info to a variable called "stats"
stats = ia.statistics()
#create a rectangular region using the rg tool
box = rg.box([10,10], [50,50])
#create a subimage of that region, and name the resulting image "zz2"
#capture the new image tool attached to "zz2" in the variable "im2"
im2 = ia.subimage('zz2', box, overwrite=true)
#get statistics for zz2 and store the results in the variable "stats2"
stats2 = im2.statistics()
print "CLEANING UP OLD zz2.amp/zz2.phase IF THEY EXIST. IGNORE WARNINGS!"
ia.removefile('zz2.amp')
ia.removefile('zz2.phase')
#FFT subimage and store amp and phase
im2.fft(amp='zz2.amp',phase='zz2.phase')
#close image tools
im2.close()
ia.close()
Foreign Images
The image analysis tool also provides native, read-only access to some foreign image formats. Presently, these are FITS (Float, Double, Short and Long pixel values are supported) and Miriad. This means that you don’t have to convert the file to native CASA format in order to access the image. For example:
#Assumes environment variable is set
pathname = os.environ.get("CASAPATH")
pathname = pathname.split()[0]
datapath1 = pathname + "/data/demo/Images/imagetestimage.fits"
#Access FITS image
ia.open(datapath1)
ia.close()
#Access Miriad image
ia.open('im.mir')
ia.close()
#create a new image tool attached to the FITS image
ims = ia.newimagefromimage(infile=datapath1)
#create a region record representing the inner quarter of an image
innerquarter=rg.box([0.25,0.25],[0.75,0.75],frac=true)
#create a subimage of the inner quarter of the FITS image
subim = ims.subimage(region=innerquarter)
#done with the tools, release resources
ia.close()
ims.close()
In general, any parameter to a task or a tool method which accepts an image name will support CASA, FITS, or Miriad images.
There are some performance penalties of which you should be aware. First, because CASA images are tiled (see above), performance is the same regardless of how the images are accessed. In contrast, FITS and Miriad images are not tiled. This means that the performance when accessing these types of images will be poorer for certain operations. e.g., extracting a profile along the third axis of an image. Second, for FITS images, masked values are indicated via a “magic value”. This means that the mask is worked out on the fly every time the image is accessed.
If you find performance is poor or if you want a writable image, then use appropriate tool methods to convert the foreign format image to a CASA image.
Virtual Images
It is possible to have an image analysis tool that is not associated with a single persistent image; these are called “virtual’’ images. For example, with ia.imagecalc(), one can create an expression which may contain many images. You can write the result of the expression to a persistent image, but if you wish, you can also just maintain the expression, evaluating it each time it is needed - nothing is ever written out to disk in this case. There are other image methods like this (the documentation for each one explains what it does). The rules are:
If you specify the outfile or equivalent parameter, then the output image is always persistent with the name specified.
If you leave the outfile or equivalent parameter unset, then if possible, a virtual image will be created. Sometimes this virtual image will be an expression as in the example above (i.e. it references other images) or a temporary image in memory, or a temporary image on disk. The ia.summary() method will list the type of image. When you ia.close() that image tool, the virtual image will be destroyed.
If you leave the outfile or equivalent parameter unset, and the called method cannot create a virtual image, it will create a persistent image with a name of its choice (sometimes input plus function name).
A virtual image can always be written to disk as a persistent image with the ia.subimage() method.
Coordinate Systems
An image contains a coordinate system. A coordinate system tool is used to manipulate a coordinate system. An image tool allows you to recover the coordinate system into a coordinate system tool via the ia.coordsys() method. You can set a new image coordinate system with the ia.setcoordsys() method.
You can do some basic world to pixel and vice versa coordinate transformations via the image tool ia.topixel(), ia.toworld(), and ia.coordmeasures() methods.
Lattice Expression Language (LEL)
LEL allows you to create mathematical expressions involving images. For example, add the corresponding pixel values of two images, or multiply the miniumum value of one image by the square root of the pixel values of another image. The LEL syntax is quite rich and is described in detail on the Lattice Expression Language section.
IMPORTANT NOTE: Image names which contain “special” characters (eg, “+”, “-“, etc) must be properly escaped. See the Lattice names subsection of the Expressions section in the aforementioned document for details.
To produce an image that is the result of an LEL computation, use the ia.calc() or ia.imagecalc() image analysis tool methods. Here are some examples.
In this example the image analysis tool is attached to the persistent image named “zz”. This image’s name is used in an LEL expression which adds the pixel values of that image to the sine of the pixel values of that image (for trigonometric LEL functions, pixel values are taken to be in radians). Note that the ia.calc() method overwrites the pixel values of the attached image with the values computed by the LEL expression. To create a new image without overwriting the pixel values of the image associated with the image tool, use the ia.imagecalc() method.
ia.maketestimage('zz', overwrite=true)
#Make the minimum value zero
ia.calc('zz + min(zz)')
ia.close()
This example demonstrates ways of dealing with image names which have special characters.
ia.maketestimage("test-im", overwrite=true)
#escape special characters using a ""
im1 = ia.imagecalc(pixels='test-im + 5')
#or surround the entire image name with quotes
im2 = ia.imagecalc(pixels='"test-im" + 5')
#or
im3 = ia.imagecalc(pixels="'test-im' + 5")
im1.close()
im2.close()
im3.close()
ia.close()
Region Selection
A region designates a subset of pixels in the image in which one is interested. The region is selected based on coordinate information. Such a selection complements on-the-fly masks in which pixels are selected based on a mathematical expression which is tested against their values (see below). Regions may be specified in several ways. The region manager tool (default rg) has several methods for generating regions. These methods generally return a dictionary representation of a region which can be used as input for the region parameter in various image analysis tool methods and tasks. A region can also be specified by the box/chans/stokes selection parameters in tasks and tool methods which accept them. Regions can also be specified in a special format known as CASA region text format. This format allows for specifying of various region shapes and spectral and polarization extents. This specification can be placed in a file, and in this case, the region parameter can be set to the name of that file and the region information will be extracted. Alternatively, the region parameter can be set directly to the CRTF specification. The complete CRTF specification can be found in the “Region File Format” section.
Pixel Masks
A pixel mask is a set of boolean values which have a one-to-one correspondence with image pixels. A value of True indicates that pixel is “good” (i.e., should be used in computations), while a value of False indicates that pixel is “bad”. For example, blanked pixels in a FITS image are treated as “bad” by CASA. When such a file is imported into a CASA image, a pixel mask is created to reflect the badness of blanked pixels in the FITS image. For persistent CASA images, pixel masks are stored in the same directory in which other image information is stored.
If an image does not have a pixel mask associated with it, all of its pixels are treated as good by CASA.
A CASA image may contain any number of pixel masks and these masks can be managed via the ia.maskhandler() image analysis tool method. If an image contains multiple pixel masks, only a maximum of one mask will be used during a run of a task or tool method. This pixel mask is known as the “default” pixel mask. The default pixel mask can be set by running ia.maskhandler(set=”pixelmaskname”). You can also indicate that none of the image pixel masks should be applied by running ia.maskhandler (set=””). In this case, all pixels are considered to be good. Pixel masks can also be viewed in the output of the ia.summary() image analysis tool method and imhead task output.
The ia.putregion() image analysis tool method run with usemask=True can be used to change the values of the default pixel mask. The image analysis tool method ia.set() can also be used to set the values of the default pixel mask. The image analysis tool method ia.calcmask() can be used to create a new pixel mask based on a boolean LEL expression.
On The Fly Pixel Masks
Most image analysis tool methods and tasks accept a parameter named mask, which represents an OTF (on-the-fly) pixel mask that is computed for use by only that tool method or task (the exception being the ia.calcmask() image analysis tool method in which case a persistent pixel mask is attached to the image; see previous section). This parameter may be specified in one of two ways:
As an LEL boolean expression, or
as a single image name, in which case, pixel values >= 0.5 are treated as True (good) values, and all others are treated as False.
If the image has a default pixel mask, the mask used in the computation is the logical AND of the OTF pixel mask and default pixel mask. For example:
ia.maketestimage('zz', overwrite=true)
#create default pixel mask for which only positive valued pixels are good
ia.calcmask("zz>0")
#compute statistics by specifying an OTF mask, which gets ANDed with
#the default pixel mask, effectively making only pixels with values between 0 and 1 "good"
#for the statistics computation
stats = ia.statistics(mask="zz < 1")
ia.close()
The mask expression must in general conform in shape and coordinates with the input image.
A useful LEL function to use in conjunction with the mask parameter is indexin(). This enables the user to specify a mask based upon selected pixel coordinates or indices rather than image values. For example:
ia.fromshape(shape=[20])
#only pixels in the specified planes along the specified axis are considered good.
#prints [False False False False True True True True True True False False False False True False False False True True]
print ia.getregion(mask='indexin(0, [4:9, 14, 18:19])',getmask=true)
ia.close()
Regions As Pixel Masks
Regions, which have previously been discussed, are just another form of an OTF pixel mask, and in fact, if one specifies the region and mask parameters simultaneously, and the associated image also has a default pixel mask, all these three types of pixel masks are just ANDed together to form the pixel mask that is used in the resulting computation. One can even convert a region specification into a persistent pixel mask by specifying the region parameter in e.g., the ia.fromimage() image analysis tool method. The created image will have a default pixel mask that is a representation of the region specified (if the initial image had a default pixel mask, then that will be ANDed with the region specification to form the default pixel mask of the resulting image).
Lattice Expression Language¶
The Lattice Expression Language (LEL) makes it possible to do arithmetic on lattices (in particular on images [which are just lattices plus coordinates]). An expression can be seen as a lattice (or image) in itself. It can be used in any operation where a normal image is used.
To summarize, the following functionality is supported:
The common mathematical, comparison, and relational operators.
An extensive list of mathematical and logical functions.
Mixed data type arithmetic and automatic data type promotion.
Support of image masks.
Masking using boolean expressions.
Handling of masks in an expression.
Support of image regions.
Interface from both Python and C++.
The first section explains the syntax. The last sections show the interface to LEL using Python or C++. At the end some examples are given. If you like, you can go straight to the examples and hopefully immediately be able to do some basic things.
LEL operates on lattices, which are a generalization of arrays. As said above a particular type of lattice is an image; they will be used most often. Because lattices can be very large and usually reside on disk, an expression is only evaluated when a chunk of its data is requested. This is similar to reading only the requested chunk of data from a disk file.
LEL is quite efficient and can therefore be used well in C++ and Python code. Note however, that it can never be as efficient as carefully constructed C++ code.
LEL Expressions
A LEL expression can be as simple or as complex as one likes using the standard arithmetic, comparison, and logical operators. Parentheses can be used to group subexpressions. The operands in an expression can be lattices, constants, functions, and condition masks.
lat1 + 10
lat1 + 2 * max(lat2,1)
amp(lat1, lat2)
lat1 + mean(img[region1])
lat1 + mean(lat2[lat2>5 && lat2<10])
The last example shows how a boolean expression can be used to form a mask on a lattice. Only the pixels fulfilling the boolean condition will be used when calculating the mean. In general the result of a LEL expression is a lattice, but it can be a scalar too. If is is a scalar, it will be handled correctly by C++ and Python functions using it as the source in, say, an assignment to another lattice. LEL fully supports masks. In most cases the mask of a subexpression is formed by AND-ing the masks of its operands. It is fully explained in a later section.
LEL supports the following data types:
Bool
Float - single precision real (which includes integers)
Double - double precision real
Complex - single precision complex
DComplex - double precision complex
All these data types can be used for scalars and lattices.
LEL will do automatic data type promotion when needed. E.g. when a Double and a Complex are used in an operation, they will be promoted to DComplex. It is also possible to promote explicitly using the conversion functions (FLOAT, DOUBLE, COMPLEX and DCOMPLEX). These functions can also be used to demote a data type (e.g. convert from Double to Float), which can sometimes be useful for better performance.
Region is a specific data type. It indicates a region of any type (in pixel or world coordinates, relative, fractional). A region can only be applied to a lattice subexpression using operator []
.
Constants
Scalar constants of the various data types can be formed as follows (which is similar to Python):
A Bool constant can be given as True or False.
A Float constant can be any integer or floating-point number. For example:
3
3.14
3.14e-2
A Double constant is a floating-point number using a D for the exponent. One can also use the
DOUBLE
function. For example:
1d2
$3.14d-2
double(2)
The imaginary part of a Complex or DComplex constant is formed by a Float or Double constant immediately followed by a lowercase i. A full complex constant is formed by adding another constant as the real part. For example:
1.5 + 2i
2i+1.5$ is identical
Note that a full complex constant has to be enclosed in parentheses when, say, a multiplication is performed on it. For example:
2 * (1.5+2i)
The functions pi()
and e()
should be used to specify the constants pi and e. Note that they form a Double constant, so when using for example pi with a Float lattice, it could make a lot of sense to convert pi to a Float. Otherwise the lattice is converted to a Double, which is time-consuming. However, one may have very valid reasons to convert to Double, e.g. to ensure that the calculations are accurate enough.
Operators
The following operators can be used (with their normal meaning and precedence):
Unary + and -, can not be used with Bool operands.
Unary !
Logical NOT operator, can only be used with Bool operands. For a region it forms the complement.
Binary ^, *, /, %, +, and -
% is the modulo operator. E.g.
3%1.4
results in0.2
and-10%3
results in-1
.^ is the power operator.
All operators are left-associative, except ^ which is right-associative; thus
2
^1
^2
results in2
.Operator % can only be used for real operands, while the others can be used for real and complex operands.
Operator - can also be used for regions. It forms the difference of the left and right operand.
==, ! =, >, > =, <, and < =
For Bool operands only = = and ! = can be used. A Bool operand cannot be compared with a numeric operand. The comparison operators use the norm for complex values.
&& and | | && and ||
Logical AND and OR operator.
These operators can only be used with Bool operands. When used on a region && forms the intersection, while | | forms the union.
The precedence order is:
^
unary +, -, ! *, /, %
+, -
= = ,! = , > , > = , < , < =
&&
| |
Note that ^ has a higher precedence than the unary operators.For example, -3
^2
results in -9
.
The operands of these operators can be 2 scalars, 2 lattices, or a lattice and a scalar. When 2 lattices are used, they should in principle conform; i.e. they should have the same shape and coordinates. However, LEL will try if it can extend one lattice to make it conformant with the other. It can do that if both lattices have coordinates and if one lattice is a true subset of the other (thus if one lattice has all the coordinate axes of the other lattice and if those axes have the same length or have length 1). If so, LEL will add missing axes and/or stretch axes with length 1.
Functions
In the following tables the function names are shown in uppercase, while the result and argument types are shown in lowercase. Note, however, that function names are case-insensitive. All functions can have scalar and/or lattice arguments.When a function can have multiple arguments (e.g. atan2), the operands are automatically promoted where needed.
Mathematical functions
Several functions can operate on real or complex arguments. The data types of such arguments are given as ‘numeric’.
Double PI()
Returns the value of pi.
Double E()
Returns the value of e.
numeric SIN(numeric)
numeric SINH(numeric)
real ASIN(real)
numeric COS(numeric)
numeric COSH(numeric)
real ACOS(real)
real TAN(real)
real TANH(real)
real ATAN(real)
real ATAN2(real y, real x)
Returns ATAN(y/x)
in correct quadrant.
numeric EXP(numeric)
numeric LOG(numeric)
Natural logarithm.
numeric LOG10(numeric)
numeric POW(numeric, numeric)
The same as operator ^.
numeric SQRT(numeric)
complex COMPLEX(real, real)
Create a complex number from two reals.
complex CONJ(complex)
real REAL(numeric)
Real value itself or real part of a complex number.
real IMAG(complex)
Imaginary part of a complex number.
real NORM(numeric)
real ABS(numeric), real AMPLITUDE(numeric)
Both find the amplitude of a complex number. If the numeric argument is real, imaginary part zero is assumed.
real ARG(complex), real PHASE(complex)
Both find the phase of a complex number.
numeric MIN(numeric, numeric)
numeric MAX(numeric, numeric)
Float SIGN(real)
Returns -1 for a negative value, 0 for zero, 1 for a positive value.
real ROUND(real)
Rounds the absolute value of the number. E.g. ROUND(-1.6) = -2
.
real FLOOR(real)
Works towards negative infinity. E.g. FLOOR(-1.2) = -2
real CEIL(real)
Works towards positive infinity.
real FMOD(real, real)
The same as operator %.
Note that the trigonometric functions need their arguments in radians.
Scalar result functions
The result of these functions is a scalar.
double NELEMENTS(anytype)
Return number of elements in a lattice (1 for a scalar).
double NDIM(anytype)
Return dimensionality of a lattice (0 for a scalar).
double LENGTH(anytype, real axis)
Return length of a lattice axis (returns 1 for a scalar or if axis exceeds number of axes). Axis number is 1-relative.
Bool ANY(Bool)
Is any element true?
Bool ALL(Bool)
Are all elements true?
Double NTRUE(Bool)
Number of true elements.
Double NFALSE(Bool)
Number of false elements.
numeric SUM(numeric)
Return sum of all elements.
numeric MIN(numeric)
Return minimum of all elements.
numeric MAX(numeric)
Return maximum of all elements.
real MEDIAN(real)
Return median of a lattice. For smallish lattices (max. 512*512 elements) the median can be found in 1 pass. Other lattices usually require 2 passes.
real FRACTILE(real,float)
Return the fractile of a lattice at the fraction given by the second argument. A fraction of 0.5 is the same as the median. The fraction has to be between 0 and 1. For smallish lattices (max. 512*512 elements) the fractile can be found in 1 pass. Other lattices usually require 2 passes.
real FRACTILERANGE(real,float,float)
Return the range between the fractiles at the fraction given by the second and third argument. The fractions have to be between 0 and 1 and the second fraction has to be greater than the first one. The second fraction is optional and defaults to 1-fraction1
. Thus:
FRACTILERANGE(lat, 0.1)
FRACTILERANGE(lat, 0.1, 0.9)
FRACTILE(lat,0.9) - FRACTILE(lat,0.1)
are equal, be it that the last one is about twice as slow. For smallish lattices (max. 512*512 elements) the fractile range can be found in 1 pass. Other lattices usually require 2 passes.
numeric MEAN(numeric)
Return mean of all elements.
numeric VARIANCE(numeric)
Return variance.
(sum((a(i) - mean(a))**2) / (nelements(a) - 1)`)
All calculations are done in double precision.
numeric STDDEV(numeric)
Return standard deviation (the square root of the variance).
real AVDEV(numeric)
Return average deviation.
(`sum(abs(a(i) - mean(a))) / nelements(a)`). All calculations are done in double precision.
Miscellaneous functions
numeric REBIN(numeric,[f1,f2,...])
Rebins the image using the given (integer) factors. It averages the pixels in each bin with shape [f1,f2,…]. Masked-off pixels are not taken into account. If all pixels in a bin are masked off, the resulting pixel will be masked off. The length of the factor list [f1,f2,…] has to match the dimensionality of the image. The factors do not need to divide the axes lengths evenly. Each factor can be a literal value, but it can also be any expression resulting in a real scalar value. For instance, for a 3-dimensional image:
rebin(lat,[2,2,1])
will halve the size of axis 1 and 2.
real AMP(real,real)
It returns the square root of the quadrature sum of the two arguments. Thus:
amp(lat1,lat2)
gives \(\sqrt{{lat}_1^2 + {lat}_2^2}\)
This can be used to form, for example, (biased) polarized intensity images when lat1 and lat2 are Stokes Q and U images.
real PA(real,real)
It returns a `position angle’ (in degrees) from the two lattices. That is,
pa(lat1,lat2)
gives \(180/\pi*atan2(lat1, lat2)/2\)
This can be used to form, for example, linear polarization position angle images when lat1 and lat2 are Stokes Q and U images, respectively.
real SPECTRALINDEX(real,real)
It returns a the spectral index made from the two lattices. That is,
log(s1/s2) / log(f1/f2)
where s1 and s2 are the source fluxes in the lattices and f1 and f2 are the frequencies of the spectral axes of both lattices. Similar to e.g. operator + the lattices do not need to have the same shape. One can be extended/stretched as needed.
anytype VALUE(anytype)
It returns the argument without its possible mask, thus it removes the mask from the argument. The section about mask handling discusses it in more detail.
Bool MASK(anytype)
It returns the mask of the argument. The section about mask handling discusses it in more detail.
Bool ISNAN(anytype)
It tests lattice elements on a NaN value and sets the corresponding output element to T if so; otherwise to F.
anytype REPLACE(anytype, anytype)
The first argument has to be a lattice (expression). The optional second argument can be a scalar or a lattice (expression). It defaults to 0. The result of the function is a copy of the first argument, where each masked-off element in the first argument is replaced by the corresponding element in the second argument. The result’s mask is a copy of the mask of the first argument.
replace (lat1, 0)
replace (lat1, lat2)
The first example replaces each masked-off element in lat1
by 0. The second example replaces it by the corresponding element in lat2
. A possible mask of lat2
is not used.
anytype IIF(Bool, anytype, anytype)
The first argument is a boolean expression. If an element in it is true, the corresponding element from the second argument is taken, otherwise from the third argument. It is similar to the ternary ?:
construct in C++. E.g.
iif (lat1>0, lat1, 0) same as max(lat1,0)
iif (sum(lat1)>0, lat1, lat2)
The examples shows that scalars and lattices can be freely mixed. When all arguments are scalars, the result is a scalar. Otherwise the result is a lattice. Note that the mask of the result is formed by combining the mask of the arguments in an appropriate way as explained in the section about mask handling.
Bool INDEXIN(real axis, set indices)
The first argument is a 1-relative axis number. The second argument is a set of indices. It returns a Bool array telling for each lattice element if the index of the given axis is contained in the set of indices.
The 1-relative indices have to be given as elements with integer values enclosed in square brackets and separated by commas. Each element can be a single index, an index range as start:end
, or a strided index range as start:end:stride
. The elements do not need to be ordered, but in a range start must be < = end. For example:
image[indexin(2, [3,4:8,10:20:2])]
masks image
such that only the pixels with an index 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 18, or 20 on the second axis are set to True.
The following special syntax exists for this function.
INDEXi IN set
where i
is the axis number. So the example above can also be written as:
image[index2 in [3,4:8,10:20:2]]
Negated versions of this function exist as:
INDEXNOTIN(axis, set)
INDEXi NOT IN set
Conversion functions
Float FLOAT(real)
Convert to single precision.
Double DOUBLE(real)
Convert to double precision.
Complex COMPLEX(numeric)
Convert to single precision complex. If the argument is real, the imaginary part is set to 0.
DComplex DCOMPLEX(numeric)
Convert to double precision complex. If the argument is real, the imaginary part is set to 0.
Bool BOOLEAN(region)
Convert to boolean. This can be useful to convert a region to a boolean lattice. Only a region in pixel coordinates can be converted, so in practice only an image mask can be converted.
Note that, where necessary, up-conversions are done automatically. Usually it may only be needed to do a down-conversion (e.g. Double to Float).
Lattice names
When a lattice (e.g. an image) is used in an expression, its name has to be given. The name can be given directly if it consists of the characters -.$~
and alphanumeric characters.
If the name contains other characters or if it is a reserved word (currently only T and F are reserved), it has to be escaped. Escaping can be done by preceeding the special characters with a backslash or by enclosing the string in single or double quotes. E.g.
~/myimage.data
~/myimage.data\-old
~/myimage.data-old
LEL Masks¶
A boolean mask associated with an image indicates whether a pixel is good (mask value True) or bad (mask value False). If the mask value is False, then the image pixel is not used for computation (e.g. when finding the mean of the image).
An image can have zero (all pixels are good) or more masks. One mask can be designated as the default mask. By default it will be applied to the image (from Python, designation of the default mask is handled by the ia.maskhandler method of the Image tool).
When using LEL, the basic behaviour is that the default mask is used. However, by qualifying the image name with a suffix string, it is possible to specify that no mask or another mask should be used. The suffix is a colon followed by the word nomask
or the name of the alternative mask.
myimage.data
myimage.data:nomask
'myimage.data:othermask'
The first example uses the default mask (if the image has one). The second example uses no mask (thus all pixels are designated good) and the third example uses mask othermask
.
Note that if the image name is enclosed in quotes, the mask name should be enclosed too. It means that a colon cannot be part of an image name.
It is also possible to use a mask from another image like
myimage.data:nomask[myotherimage::othermask]
This syntax is explained in the section describing regions.
Lattice Condition Mask
We have seen in the previous section that lattices (in this case images) can have an associated mask. These masks are stored with the image - they are persistent.
It is also possible to create transient masks when a LEL expression is executed (dawn, usually). This is done with the operator []
and a boolean expression. For example,
sum( lat1[lat1<5 && lat1>10] )
creates a mask for lat1
indicating that only its elements fulfilling the boolean condition should be taken into account in the sum
function. Note that the mask is local to that part of the expression. So in the expression
sum( lat1[lat1<5 && lat1>10] ) + sum(lat1)
the second sum
function takes all elements into account. Masking can also be applied to more complex expressions and it is recursive.
(lat1+lat2)[lat3<lat4]
sum( lat1[lat1<5][lat1>10] )
(lat1 + lat2[lat3<lat4]) [lat1<5]
The first example applies the mask generated by the []
operator to the expression lat1+lat2
. The second example shows the recursion (which ANDs the masks). It is effectively a (slower) implementation of the first example in this subsection. In the last example, the expression inside the parentheses is only evaluated where the condition [lat1<5]
is true and the resulting expression has a mask associated with it.
Please note that it is possible to select pixels on an axis by means of the function INDEXIN
(or by the INDEXi IN
expression) as shown in the previous section about miscellaneous functions.
Mask Handling
As explained in the previous subsections, lattices can have a mask. Examples are a mask of good pixels in an image, a mask created using a boolean condition and the operator []
, or a mask defining a region within its bounding box.A pixel is bad when the image has a mask and when the mask value for that pixel is False. Functions like max
ignore the bad pixels.Note that in a
MeasurementSet a False mask value indicates a good visibility. Alas this is a historically grown distinction in radio-astronomy.Image masks are combined and propagated throughout an expression. E.g. when two lattices are added, the mask of the result is formed by and-ing the masks of the two lattices. That is, the resultant mask is True where the mask of lattice one is true AND the mask of lattice 2 is True. Otherwise, the resultant mask is False.In general the mask of a subexpression is formed
by and-ing the masks of the operands. This is true for e.g. +, *, atan2
, etc.. However, there are a few special cases:
The mask created by
operator[condition]
is formed by and-ing the condition result, the mask of the result, and the mask of the subexpression where the condition is applied to. For example, supposelat1
andlat2
each have a mask. Then in
sum( lat1[lat2<5] )
the sum
function will only sum those elements for which the mask of lat1
and lat2
is valid and for which the condition is true.
The logical AND operator forms the resultant mask by looking at the result and the masks of the operands.
lat1[lat1<0 && lat2>0]
Let us say both lat1
and lat2
have masks. The operand lat1<0
is true if the mask of lat1
is true and the operand evaluates to true, otherwise it is false. Apply the same rule to the operand lat2 > 0
. The AND operator gives true if the left and right operands are both true. If the left operand is false, the right operand is no longer relevant. It is, in fact, 3-valued logic with the values true, false, and undefined.
Thus, the full expression generates a lattice with a mask. The mask is true when the condition in the []
operator is true, and false otherwise. The values of the output lattice are only defined where its mask is true.
The logical OR operator works the same as the AND operator. If an operand has a true value the other operand can be ignored.
The mask of the result of the
replace
function is a copy of the mask of its first operand. The mask of the second operand is not used at all.The
iif
function has three operands. Depending on the condition, an element from the second or third operand has to be taken. The resultant mask is formed by the mask of the condition and-ed with the appropriate elements from the masks of the second or third operand.The
value
function returns the value without a mask, thus it removes the mask from a value. It has the same effect as theimage:nomask
construction discussed above. However, thevalue
function is more general, because it can also be applied to a subexpression.The
mask
function returns the mask of a value. The returned value is a boolean lattice and has no mask itself. When the value has no mask, it returns a mask consisting of all True values. When applied to an image, it returns its default mask.
Consider the following more or less equivalent examples:
value(image1)[mask(image2)]
image1:nomask[mask(image2)]
image1:nomask[image2::mask0]
The first two use the default mask of image2
as the mask for image1
.The latter uses mask0
of image2
as the mask for image1
. It is equivalent to the first two examples if mask0
is the default mask of image2
.It is possible that the entire mask of a subexpression is false. For example, if the mean of such a subexpression is taken, the result is undefined. This is fully supported by LEL, because a scalar value also has a mask associated with it. One can see a masked-off
scalar as a lattice with an all false mask. Hence an operation involving an undefined scalar results in an undefined scalar. The following functions act as described below on fully masked-off lattices:
MEDIAN, MEAN, VARIANCE, STDDEV, AVDEV, MIN, MAX result in an undefined scalar:
NELEMENTS, NTRUE, NFALSE, SUM result in a scalar with value 0.
ANY results in a scalar with value F.
ALL results in a scalar with value T.
LENGTH, NDIM ignore the mask because only the shape of the lattice matters.
You should also be aware that if you remove a mask from an image, the values of the image that were previously masked bad may have values that are meaningless.
Mask Storage
In many of the expressions we have looked at in the examples, a mask has been generated. What happens to this mask and indeed the values of the expression depends upon the implementation. If for example, the function you are invoking with LEL writes out the result, then both the mask and result will be stored. On the other hand, it is possible to just use LEL expressions but never write out the results to disk. In this case, no data or mask is written to disk. You can read more about this in the interface section.
LEL Regions¶
A region-of-interest generally specifies a portion of a lattice which you are interested in for some astronomical purpose (e.g. what is the flux density of this source). Quite a rich variety of regions are supported in CASA. There are simple regions like a box or a polygon, and compound regions like unions and intersections. Regions may contain their own ``region masks’’. For example, with a 2-d polygon, the region is defined by the vertices, the bounding box and a mask which says whether a pixel inside the bounding box is inside of the polygon or outside of the polygon.
In addition, although masks and regions are used somewhat differently by the user, a mask is really a special kind of region; they are implemented with the same underlying code.
Like masks, regions can be persistently stored in image. Within Python, regions can be created using the various methods of the rg tool. Regions can also be defined in plain text files (see Region File Format).
We saw in the previous section how the condition operator []
could be used to generate masks with logical expressions. This operator has a further talent. A region of any type can be applied to a lattice with the []
operator. You can think of the region as also effectively being a logical expression. The only difference with what we have seen before is that it results in a lattice with the shape of the region’s bounding box. If the lattice or the region (as in the polygon above) has a
mask, they are and-ed to form the result’s mask.
All types of regions supported in CASA can be used, thus:
regions in pixel or world coordinates
in absolute, relative and/or fractional units
basic regions box, ellipsoid, and polygon
compound regions union, intersection, difference, and complement.
extension of a region or group of regions to higher dimensions
masks
The documentation in the classes LCRegion, LCSlicer, and WCRegion gives you more information about the various regions.
At this moment a region can not be defined in LEL itself. It is only possible to use regions predefined in an image or another table.
A predefined region can be used by specifying its name. There are three ways to specify a region name:
tablename::regionname
The region is looked up in the given table (which will usually be an image) in which it is stored.::regionname
The region is looked up in the last table used in the expression.regionname
Is usually equivalent to above. However, there is no syntactical difference between the name of a region and a lattice/image. Therefore LEL will first try if the name represents a lattice or image. If not, the name is supposed to be a region name. The prefix::
in the previous way tells that the name should only be looked up as a region.
Examples are
myimage.data[reg1]
(myimage.data - otherimage)[::reg1]
(myimage.data - otherimage)[myimage.data::reg1]
myimage.data:nomask[myotherimage::othermask]
In the first example region reg1
is looked up in image myimage.data
. It is assumed that reg1
is not the name of an image or lattice. It results in a lattice whose shape is the shape of the bounding box of the region. The mask of the result is the and of the region mask and the lattice mask.
In the second example it is stated explicitly that reg1
is a region by using the :: syntax. The region is looked up in otherimage
, because that is the last table used in the expression. The result is a lattice with the shape of the bounding box of the region.
In the third example the region is looked up in myimage.data
. Note that the this and the previous example also show that a region can be applied to a subexpression.
In the fourth example we have been very cunning. We have taken advantage of the fact that masks are special sorts of regions. We have told the image myimage.data
not to apply any of its own masks. We have then used the []
operator to generate a mask from the mask stored in a different image, myotherimage
. This effectively applies the mask from one image to another. Apart from copying the mask, this is the only way to do this.
Unions, intersections, differences and complements of regions can be generated and stored (in C++ and Python). However, it is also possible to form a union, etc. in LEL itself. However, that can only be done if the regions have the same type (i.e. both in world or in pixel coordinates).The following operators can be used:
reg1 || reg2
to form the union.reg1 && reg2
to form the intersection.reg1 - reg2
to form the difference.!reg1
to form the complement.
The normal CASA rules are used when a region is applied:
A region in world or relative coordinates can only be applied to an image (or a subexpression resulting in an image). Otherwise there is no way to convert it to absolute pixel coordinates.
The axes of a region in world coordinates have to be axes in the image (subexpression). However, the region can have fewer axes.
If a region has fewer axes than the image or lattice the region is automatically extended to the full image by taking the full length of the missing axes.
LEL Optimization¶
When giving a LEL expression, it is important to keep an eye on performance issues.
LEL itself will do some optimization:
As said in the introduction a LEL expression is evaluated in chunks. However, a scalar subexpression is executed only once when getting the first chunk. E.g. in
lat1 + mean(lat2)
the subexpression mean(lat2)
is executed only once and not over and over again when the user gets chunks.
Often the exponent 2 is used in the
pow
function (or operator^
). This is optimized by using multiplication instead of using the system pow function.When LEL finds a masked-off scalar in a subexpression, it does not evaluate the other operand. Instead it sets the result immediately to a masked-off scalar. Exceptions are the operators AND and OR and function
iif
, because their masks depend on the operand values.
The user can optimize by specifying the expression carefully.
It is strongly recommended to combine scalars into a subexpression to avoid unnecessary scalar-lattice operations. E.g.
2 * lat1 * pi()
should be written as
lat1 * (2 * pi())
#or
2 * pi() * lat1
because in that way the scalars form a scalar subexpression which is calculated only once. Note that the subexpression parentheses are needed in the first case, because multiplications are done from left to right. In the future LEL will be optimized to shuffle the operands when possible and needed.
It is important to be careful with the automatic data type promotion of single precision lattices. Several scalar functions (e.g. pi) produce a double precision value, so using
pi
with a single precision lattice causes the lattice to be promoted to double precision. If accuracy allows it, it is much better to convertpi
to single precision. E.g. assumelat1
andlat2
are single precision lattices.
atan2(lat1,lat2) + pi()/2
The result of atan2
is single precision, because both operands are single precision. However, pi
is double precision, so the result of atan2
is promoted to double precision to make the addition possible. Specifying the expression as:
atan2(lat1,lat2) + float(pi())/2
avoids that (expensive) data type promotion.
POW(LAT,2)
orLAT``^``2
is faster thanLAT*LAT
, because it accesses latticeLAT
only once.SQRT(LAT)
is faster thanLAT``^``0.5
orPOW(LAT,0.5)
POW(U,2) + POW(V,2) < 1000``^``2
is considerably faster thanSQRT(SQUARE(U) + SQUARE(V)) < 1000
, because it avoids theSQRT
function.LEL can be used with disk-based lattices and/or memory-based lattices. When used with memory-based lattices it is better to make subexpressions the first operand in another subexpression or a function. E.g.
lat1*lat2 + lat3
is better thanlat3 + lat1*lat2
The reason is that in the first case no copy needs to be made of the lattice data which already reside in memory. All LEL operators and functions try to reference the data of their latter operands instead of making a copy. In general this optimization does not apply to LEL expression. However, when using the true C++ interface to classes likeLatticeExprNode
, one can easily use memory-based lattices. In that case it can be advantageous to pay attention to this optimization.
LEL Interface¶
There are two interfaces to LEL. One is from Python and the other from C++. It depends upon your needs which one you access. Most high level users of CASA will access LEL only via the Python interface.
Simple String Expressions
The ia.imagecalc method evaluates the expression and stores the result and mask in an output image. If you specify the output image name, it is written to a disk file of that name. If you don’t give it, the output image is never written out; it is evaluated every time an action (like ia.statistics) is requested.
im = ia.imagecalc(outfile='outimage', pixels='inimage1+inimage2');
im.statistics();
The first command creates an image file outimage
filling it with the sum of the input images. The second command does statistics on that new image. Writing it as
im = ia.imagecalc(pixels='inimage1+inimage2');
im.statistics();
would do the same with the exception of creating the output image. Instead the created image is transient; it only lives as an expression and each time it is used the expression is evaluated.
We can use the method ia.calc on an already existing image. Thus
ia.open('ngc1213');
ia.calc('ngc1213^2');
would replace the pixels by the square of their value in the opened image.
Sometimes you need to double quote the file names in your expression. For example, if the images reside in a different directory as in this example.
im = ia.imagecalc ('"dir1/im1" + "/nfs/data/im2"');
C++ interface
This consists of 2 parts.
The function
command
in Images/ImageExprParse.h can be used to execute a LEL command. The result is a LatticeExprNode object. This example does the same as the Python one shown above. E.g.
LatticeExprNode seltab1 = ImageExprParse::command
("imagein1 + imagein2");
The other interface is a true C++ interface having the advantage that C++ variables can be used. Class LatticeExprNode contains functions to form an expression. The same operators and functions as in the command interface are available. For example:
Float clipValue = 10;
PagedImage<Float> image("imagein");
LatticeExpr<Float> expr(min(image,clipValue));
forms an expression to clip the image. Note that the expression is written as a normal C++ expression. The overloaded operators and functions in class LatticeExprNode ensure that the expression is formed in the correct way. Note that a LatticeExprNode
object is usually automatically converted to a templated LatticeExpr
object, which makes it possible to use it as a normal Lattice
. So far the expression is only formed, but not evaluated. Evaluation is only done when the expression is
used in an operation, e.g. as the source of the copy operation shown below.
PagedImage<Float> imout("imageout");
imout.copyData (expr);
LEL Examples¶
The following examples show some LEL expressions (equally valid in C++ or Glish).
Note that LEL is readonly; i.e. it does not change any value in the images given. A function in the image
client has to be used to do something with the result (e.g. storing in another image).
lat1+lat2
– adds 2 latticesmean(myimage:nomask) --
results in a scalar value giving the mean of the image. No mask is used for the image, thus all pixels are used. The scalar value can be used as a lattice. E.g. it can be used as the source in theimage
functionreplacemaskedpixels
to set all masked-off elements of a lattice to the mean.complex(lat1,lat2) --
results in a complex lattice formed bylat1
as the real part andlat2
as the imaginary part.min(lat1, 2*mean(lat1)) --
results in a lattice wherelat1
is clipped at twice its mean value.min(myimage, 2*mean(mymage[myregion])) --
results in an image wheremyimage
is clipped at twice the mean value of regionmyregion
in the image..lat1[lat1>2*min(lat1)]
– results in a lattice with a mask. Only the pixels greater than twice the minimum are valid.replace(lat1)
– results in a lattice where each masked-off element inlat1
is replaced by 0.iif(lat1<mean(lat1),lat1*2,lat1/2) --
results in a lattice where the elements less than the mean are doubled and the elements greater or equal to the mean are divided by 2.
Here follows a sample Glish session showing some of the LEL capabilities and how Glish variables can be used in LEL.
duw01> glish -l image.g
- a := array(1:50,5,10) # make some data
- global im1 := imagefromarray('im1', a); # fill an image with it
- im1.shape()
[5 10]
- local pixels, mask
- im1.getregion(pixels, mask); # get pixels and mask
- mask[1,1] := F # set some mask elements to False
- mask[3,4] := F
- im1.putregion(mask=mask); # put new mask back
- global reg:=drm.box([1,1],[4,4]); # a box region
- im2 := imagecalc(pixels='$im1[$reg]') # read-only image applying region
- local pixels2, mask2
- im2.getregion(pixels2, mask2); # get the pixels and mask
- print pixels2
[[1:4,]
1 6 11 16
2 7 12 17
3 8 13 18
4 9 14 19]
- print mask2
[[1:4,]
F T T T
T T T T
T T T F
T T T T]
- im1.replacemaskedpixels ('mean(im2)'); # replace masked-off values
- im1.getregion (pixels2, mask2); # by mean of masked-on in im2
- print pixels2
[[1:5,]
10.0714283 6 11 16 21 26 31 36 41 46
2 7 12 17 22 27 32 37 42 47
3 8 13 10.0714283 23 28 33 38 43 48
4 9 14 19 24 29 34 39 44 49
5 10 15 20 25 30 35 40 45 50]
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/image_visualization.ipynb
Image Cube Visualization¶
The task viewer is deprecated in lieu of imview and msview, which contain the same functionality. Please invoke the imview (msview) task for running the CASA Viewer to visualize images or image cubes (visibility data).
This documentation describes how to use the CASA Viewer to display data. The Viewer can be started as a stand-alone executable or by the tasks imview and msview inside a CASA shell. The imview task displays images, while the msview task is for visualizing MeasurementSets. These tasks offer scriptability, giving command line access to many of the viewer features.
Note: This section describes the CASA Image Viewer (imview). The Viewer is being phased out in favor of CARTA and is no longer in active development. This information is provided for reference in instances where desired capability is not yet ready in CARTA.
For help on the image viewer, please see the subpages shown in the tabs to the left. For the Visibility visualization, please see 2-D Visualization and Flagging of Visibility Data.
Viewer Basics¶
Initial Viewer Panels: The Viewer Display Panel (left) and the Data Manager Panel (right) for a regular image or data cube.
Within the CASA environment, the imview task can be used to start the CASA Viewer for displaying images or image cubes. The inputs are:
#imview :: View an image
raster = {} # (Optional) Raster filename (string) or
# complete raster config dictionary. The
# allowed dictionary keys are file (string),
# scaling (numeric), range (2 element numeric
# vector), colormap (string), and colorwedge
# (bool).
contour = {} # (Optional) Contour filename (string) or
# complete contour config dictionary. The
# allowed dictionary keys are file (string),
# levels (numeric vector), unit (float), and
# base (float).
zoom = 1 # (Optional) zoom can specify intermental zoom
# (integer), zoom region read from a file
# (string) or dictionary specifying the zoom
# region. The dictionary can have two forms.
# It can be either a simple region specified
# with blc (2 element vector) and trc (2
# element vector) [along with an optional
# coord key ("pixel" or "world"; pixel is the
# default) or a complete region rectangle e.g.
# loaded with "rg.fromfiletorecord( )". The
# dictionary can also contain a channel
# (integer) field which indicates which
# channel should be displayed.
axes = {} # (Optional) this can either be a three
# element vector (string) where each element
# describes what should be found on each of
# the x, y, and z axes or a dictionary
# containing fields "x", "y" and "z" (string).
out = '' # (Optional) Output filename or complete
# output config dictionary. If a string is
# passed, the file extension is used to
# determine the output type (jpg, pdf, eps,
# ps, png, xbm, xpm, or ppm). If a dictionary
# is passed, it can contain the fields, file
# (string), scale (float), dpi (int), or
# orient (landscape or portrait). The scale
# field is used for the bitmap formats (i.e.
# not ps or pdf) and the dpi parameter is used
# for scalable formats (pdf or ps).
Examples of starting the CASA Viewer using imview:
CASA <1>: imview()
CASA <3>: viewer('ngc5921.demo.cleanimg.image')
CASA <3>: viewer(contour='ngc5921.demo.cleanimg.image')
The first command creates an empty Viewer Display Panel and a Load Data Window. The second command starts the viewer with a loaded MeasurementSet. The third command starts the viewer and opens an image data cube.
Note that the viewer can open FITS files, CASA image files (imview), MeasurementSets (msview), and saved viewer states. The viewer determines the type of file being opened automatically.
For additional scripting options when opening the viewer, see the discussion of the imview and msview tasks.
Running the Viewer Outside CASA
If you have CASA installed, then the CASA Viewer is available as a stand-alone application called casaviewer. From the operating system prompt, the following commands work the same as the casa task commands given in the previous section:
casaviewer &
casaviewer ms_filename &
casaviewer image_filename &
casaviewer image_filename contour &
casaviewer '"image_filename"^2' lel &
The Viewer Display Panel
The CASA Viewer consists of a number of graphical user interface (GUI) windows. The main Viewer Display Panel is used for both image and MeasurementSet viewing. It is shown in the left panels of Initial Viewer Panels and Data Display Options and appears the same whether an image or MeasurementSet is being displayed.
At the top of the Viewer Display Panel are drop down menus:
DATA
Open — open the Data Manager window.
Manage Images — open the Image Manager which contains functionality for managing the image stack.
Adjust Data Display — open the Data Display Options (‘Adjust’) window.
Save as… — save/export data to a file.
Print — print the displayed image.
Save Panel State — to a ‘restore’ file (xml format).
Restore Panel State — from a restore file.
Preferences — manually edit the viewer configuration.
Close panel— close this Viewer Display Panel. If this is the last display panel open, this will exit the viewer.
Quit Viewer — close all display panels and exit.
DISPLAY PANEL - New Panel— create a new, empty Viewer Display Panel.
Panel Options — open the Viewer Canvas Manager (see the Viewing Images and Cubes page) window to edit margins, the number of panels, and the background.
Save Panel State — save the current state of the viewer as a file that can later be reloaded.
Restore Panel State — restore the viewer to a state previously saved as a file.
Print — print displayed image.
Close Panel — close this Viewer Display Panel. If this is the last display panel open, this will exit the viewer.
TOOLS
Spectral Profile — open the Spectral Profile Browser window to look at intensity as a function of frequency for part of an image.
Collapse/Moments — open the Collapse/Moments window, which allows you to create new images from a data cube by integrating along the spectral axis.
Histogram — open the Histogram inspection window, which allows you to graphically examine the distribution of pixel values in a data cube.
Fit — open the Two-Dimensional Fitting window, which can be used to fit Gaussians to two dimensional intensity distributions.
Interactive Clean — open a window to look at ongoing interactive clean processes.
VIEW
Main Toolbar — show/hide the top row of icons.
Mouse Toolbar — show/hide the second row of mouse-button action selection icons.
Animators— show/hide tapedeck control panel attachment to the main Viewer Display Panel.
Cursors — show/hide the position tracking attachment to the main Viewer Display Panel.
Regions — show/hide the region manager attachment to the main Viewer Display Panel.
The Main Toolbar
Main Toolbar: The Display Panel’s Main Toolbar appears directly below the menus and contains ‘shortcut’ buttons for most of the frequently-used menu items.
Below the drop down menus is the Main Toolbar. This top row of icons offers fast access to these menu items:
FOLDER (Data\:Open shortcut) — open the Data Manager window.
WRENCH (Data\:Adjust shortcut) — open the Data Display Options (‘Adjust’) window.
MANAGE IMAGES (Data\: Manage Images shortcut) - open the Image Manager window.
DELETE (Data\:Close shortcut) — close (unload) the selected data file. The menu expands to the right showing all loaded data.
SAVE DATA (Data\:Save as) — save the current data to a file.
NEW PANEL (Display Panel:New Panel) — create a new, empty Viewer Display Panel.
PANEL WRENCH (Display Panel:Panel Options) — open the Viewer Canvas Manager (see the Viewing Images and Cubes page) window to edit margins, the number of panels, and the background.
SAVE PANEL (Display Panel: Save Panel State) — save panel state to a ‘restore’ file.
RESTORE PANEL (Display Panel: Restore Panel State) — restore panel state from a restore file.
SPECTRAL PROFILE (Tools: Spectral Profile) — Open the Spectral Profile window to look at intensity as a function of frequency for part of an image.
COLLAPSE/MOMENTS (Tools: Collapse/Moments) — Open the Collapse/Moments window, which allows you to create new images from a data cube by integrating along the spectral axis.
HISTOGRAM (Tools:Histogram) — Open the Histogram inspection window, which allows you to graphically examine the distribution of pixel values in a data cube.
TWO-DIMENSIONAL FITTING (Tools:Fit) – Open the Two-Dimensional Fitting window, which can be used to fit Gaussians to two dimensional intensity distributions.
PRINT (Display Panel:Print) — print the current display.
MAGNIFIER BOX — zoom out all the way.
MAGNIFIER PLUS — zoom in (by a factor of 2).
MAGNIFIER MINUS — zoom out (by a factor of 2).
The Mouse Toolbar
Mouse Toolbar: The ‘Mouse Tool’ Bar allows you to assign how mouse buttons behave in the image display area. Initially, zooming, color adjustment, and rectangular regions are assigned to the left, middle and right mouse buttons, respectively. Click on a tool with a mouse button to assign that tool to that mouse button.
Below the Main Toolbar are eleven Mouse Tool buttons (see Mouse Toolbar). These allow you to assign what behavior the three mouse buttons have when clicked in the display area. Clicking a mouse tool icon will [re-]assign the mouse button that was clicked to that tool. Black and white squares beneath the icons show which mouse button is currently assigned to which tool.The mouse tools available from the toolbar are:
ZOOMING (magnifying glass icon): To zoom into a selected area, press the zoom tool’s mouse button (the left button by default) on one corner of the desired rectangle and drag to the desired opposite corner. Once the button is released, the zoom rectangle can still be moved or resized by dragging. To complete the zoom, double-click inside the selected rectangle. If you instead double-click outside the rectangle, you will zoom out.
PANNING (hand icon): Press the panning tool’s mouse button on a point you wish to move, drag it to the position where you want it moved, and release. Note: The arrow keys, Page Up, Page Down, Home and End keys can also be used to pan through your data any time you are zoomed in. (Click on the main display area first, to be sure the keyboard is ‘focused’ there).
STRETCH-SHIFT COLORMAP FIDDLING (crossed arrows): This is usually the handiest color adjustment; it is assigned to the middle mouse button by default. Hold down the tool’s mouse button and drag across the display window to adjust the stretch and color. Note that you can also adjust the color table quantitatively inside the Data Display Options window.
BRIGHTNESS-CONTRAST COLORMAP FIDDLING (light/dark sun): Another tool to adjust the color stretch.
POSITIONING (plus): This tool can place a point marker on the display to select a position. It is used to flag MeasurementSet data or to select an image position for spectral profiles. Click on the desired position with the tool’s mouse button to place the point; once placed you can drag it to other locations. You can also place multiple points on the display (e.g. for different spectral profile positions) – remove them by hovering over and hitting ESC. Double-click is not needed for this tool. See Viewer Region Positioning for more detail.
RECTANGLE, ELLIPSE, and POLYGON REGION DRAWING: The rectangle region tool is assigned to the right mouse button by default. As with the zoom tool, a rectangle region is generated by dragging with the assigned mouse button; the selection is confirmed by double-clicking within the rectangle. An ellipse regions is created by dragging with the assigned mouse button. In the case of an elliptical region, both the elliptical region and its surrounding rectangle are shown on the display. The selection is confirmed by double-clicking within the ellipse. Polygon regions are created by clicking the assigned mouse button at the desired vertices and then clicking the final location twice to finish. Once created, a polygon can be moved by dragging from inside, or reshaped by dragging the handles at the vertices. See Viewer Region Positioning for the uses of this tool.
POLYLINE DRAWING: A polyline can be created by selecting this tool. It is manipulated similarly to the polygon region tool: create segments by clicking at the desired positions and then double-click to finish the line.
DISTANCE TOOL (ruler): After selecting the distance tool by assigning a mouse button to it, distances on the image can conveniently be measured by dragging the mouse with the assigned button pressed. The tool measures the distances along the world coordinate axes and along the hypotenuse. If the units in both axes are [deg], the distances are displayed in [arcsec].
POSITION-VELOCITY DIAGRAM: Use this mouse tool to drag out a position axis that can be used to generate a position velocity diagram in a new viewer panel from the region manager dock.
NOTE: The ‘escape’ key can be used to cancel any mouse tool operation that was begun but not completed, and to erase a region, point, or other tool showing in the display area.
The Main Display Area
The Main Display Area lies below the toolbars. This area shows the image or MeasurementSet currently loaded. Clicking the mouse inside the display area allows region or position selection according to the settings in the mouse toolbar.The Display Area may have up to three attached panels: the Animators panel, the Cursors panel, and the Regions panel. These may be displayed or hidden from the “View” dropdown menu in the main Viewer Display Panel. If one of these is missing from your viewer, check that it is checked “on” in that menu. The panels can also be turned off by clicking the “X” in the top right corner, in which case you will need to use the View menu to get them back.By default, the three panels appear attached to the main Viewer Display Panel on the right side of the image. They may be dragged to new positions. Each of the three panels can be attached to the left, top, right, or bottom of the main Viewer Display Panel or they can be entirely undocked and left as free-floating panels.
NOTE: Depending on your window manager, windows without focus, including detached panels and tools like the Spectral Profile Browser may sometimes display odd behavior. As a general rule, giving the window focus by clicking on it will correct the issue. If you seem to “lose” a detached panel (like the Animators Panel), then click in the main window to get it back.
NOTE: With all three panels turned on (and especially with several images loaded), the Main Display Panel can sometimes shrink to very small sizes as the panels grow. Try detaching the panels to get the main display panel back to a useful size.
A restart of the viewer will display all docks in the state of a previous viewer session, given that it was closed normally. In addition, the viewer docking can be changed under “Preferences” in the toolbar (for Mac OS: under the “CASA Viewer” tab on the toolbar, for Linux: under “Data”). An example is given in the Preference Dialog figure below. Each item can be changed and the input box will only allow accepted input formats. A complete restart is required to apply the changes.
Preference Dialog: The Preference Dialog can be used to manually change the docking and size of the viewer panel.
The Animators Panel
Animators Panel: The Animators Panel allows you to scroll through the z-axis of a data cube (using the Channels tape deck) or cycle among open Images. The panel can be undocked from the main display panel.
The Animators Panel allows you to scroll through the channels of a data cube and to flip through loaded images. The main features of the panel are the two “tape decks,” one labeled “Channels” and one labeled “Images”.
NOTE: You will only see the Images tape deck when multiple images are loaded.
The “Channels” tape deck scrolls between planes of an individual image. By default, the channel tape deck scrolls among frequency planes when Right Ascension and Declination are the displayed axes (in this case, frequency is the z-axis). From outside to inside, the buttons cause the display to jump all the way to the beginning/end of the z-axis, cause the viewer to step one plane forward or backward along the z-axis, or start a movie. The limits on the z-axis can be set manually using the windows at the end of the scroll bar. The scroll bar can also be dragged or the user can jump the display to a manually entered plane by entering the plane into the text block.When you have multiple images loaded, the Images tape deck cycles through which is image is being displayed. In the movie mode, it allows you continuously click between images. Functionally, the Images tape deck works similarly to the Channels tape deck, with the ability to step, jump, or continuously scroll through images.
NOTE: The check boxes next to the channel and images tabs enable or disable those panels. This doesn’t have much effect when the display has only a single panel, but with multiple panels (i.e., several maps at once in the main window) it changes the nature of the display. If the “Images” box is checked then interleaved maps from different cubes are displayed. Otherwise a series of maps from a single cube are shown.
The Cursors Panel
Cursors Panel: The Cursors Panel gives information about the open data cube at the current location of the cursor. Freeze the Cursors Panel using the SPACE bar.
The Cursors Panel (below the Images tape deck in Initial Viewer Panels shows the intensity, position (e.g., RA and Dec), Stokes, frequency (or velocity), and pixel location for the point currently under the cursor. A separate box appears for each registered image or MeasurementSet and you can see the tracking information for each. Tracking can be ‘frozen’ (and unfrozen again) by hitting the space bar when the viewer’s focus is on the main display area. (To be sure that the focus is indeed the main display area, first click on the main display area.)
The Region Manager Panel
The Region Manager Panel becomes active when regions are created. It has a large amount of functionality, from display region statistics and histograms to creating position-velocity cuts. Like the Animators and Cursors Panels, the Region Manager Panel can be moved relative to the main Viewer Display Panel or entirely undocked.
Saving and Restoring the Display Panel State
You can save the display panel’s current state — meaning the panel settings and the data on display — or load a saved panel state from disk. To save the display panel state, select Save Panel State from the Display Panel drop-down menu or click the “Save Display Panel State to File” icon on the main toolbar (an arrow pointing from a picture to a page. It is advisable but not required to retain the file’s ‘.rstr’ (“Restore”) extension.You can restore the display panel to the saved state by loading the saved state from the Data Manager Panel, by selecting Restore Panel State from the Display Panel drop down menu, or by clicking the “Restore Display Panel State” icon (just to the right of the “Save Display Panel State” icon).It is possible to restore panel states viewing MeasurementSets, images, or panel states that have multiple layers, such as contour plots over raster images. You can also save LEL displays. You can also the save or restore the panel state with no data loaded, which is a convenient way to restore preferred initial settings such as overall panel size.Data Locations: The viewer is fairly forgiving regarding data location when restoring a saved panel state. It will find files located:
in the original location recorded in the restore file
in the current working directory (where you started the viewer)
in the restore file’s directory
in the original location relative to the restore file
This means that you can generally restore a saved panel state if you move that file together with data files. The exception to this rule is that the process is less forgiving if you save the display of an LEL expression. In this case the files must be in the locations specified in the original LEL expression. If a data file is not found, restore will attempt to proceed but results may not be ideal.Manually Editing Saved Display Panel States: The saved “Restore” files are in ascii (xml) format, and manual edits are possible. However, these files are long and complex. Use caution, and back up restore files before editing. If you make a mistake, the viewer may not even recognize the file as a restore file. It is easier and safer to make changes on the display panel and then save the display panel state again.
The Data Manager Panel — Saving and Loading Data
Data Manager Panel: The load tab of the Data Manager panel. This appears if you open the viewer without any infile specified, if you use select Open from the Data drop-down menu, or click the Open (Folder) icon. You can access the save image or save region tabs from this view or by selecting Save as… from the Data drop down menu. The load tab shows all files in the current directory that can be loaded into the viewer — images (in task imview), MS (in task msview), CASA region files, and Display Panel State files.
The Data Manager Panel is used to interactively load and save images, MeasurementSets, Display Panel States, and regions. An example of the loading tab in this panel is shown in the Data Manager Panel figure. This panel appears automatically if you open the viewer without specifying an input file or it can be accessed through the Data\:Open menu or Open icon of the Viewer Display Panel.
Loading Data
The load tab of the Data Manager Panel allows you to interactively choose images or MeasurementSets to load into the viewer. The load tab automatically shows you the available images, MeasurementSets, and Display Panel States in the current directory that can be opened by the viewer. When you highlight an image in this view, the tab shows a brief summary of the image: pixel shape, extent of the image on the sky and in frequency/velocity, and restoring beam (if available).Selecting a file will bring up information about that file in the panel on the right of the Data Manager Panel provide options for how to display the data. Images can be displayed as:
raster image
contour map
vector map
marker map
These options area each discussed in Viewing Images.
Additionally, the following data types can be loaded via the Data Manager Panel:slice: a subselection of a data cube can be loaded, the start and end pixel in each spatial, polarization, and spectral dimension can be selected.LEL: Instead of only loading an image from disk, you may ask the viewer to evaluate a Lattice Expression Language (LEL) expression. This can be entered in the box provided after you click the “LEL” box. The images used in the LEL expression should have the same coordinates and extents.MeasurementSets: A MeasurementSet can only be displayed as a raster. For MeasurementSets, the load tab offers options for data selection. This will reduce loading and processing times for visibility flagging.Regridding Images on Load: Optionally, you may regrid the velocity axis of an image you wish to load to match the current coordinates grid in the Display Panel. In this case, the viewer will interpolate (using the selected interpolation scheme) the cube on disk to share the same velocity gridding as the loaded coordinates. This can be used, e.g. to overlay contour maps of different spectral lines or to make synchronized movies of multiple cubes. Note that the regridding depends on the rest frequency in the image, which is used to calculate the velocities used in regridding.
Registered vs. Open Datasets
When you load data as described above, it is first opened, and then registered on all existing Display Panels. An open dataset has been prepared in memory from disk. All open datasets will have a tab in the Data Display Options window, whether currently registered or not. When a data set is registered to a Display Panel its coordinates are aligned to the master coordinate image in the panel and it is ready for drawing. If multiple Display Panels are open then a data set may be registered on one Display Panel and not on another. Only those data sets registered on a particular Display Panel show up in its Cursors Panel.Why Register More Than One Image? It is useful to have more than one image registered on a panel if you are displaying a contour image over a raster image, to ‘blink’ between images, or to compare images using the Cursors Panel.Unregistering Images: A data set can be registered or unregistered using the Image Manager in the Data drop down menu or the Image Manager icon (third from left). This icon will open the Image Manager window and the checkboxes can be used to register or unregister an image.Closing vs. Unregistering: You can close a data set that is no longer needed using the Close option in the Data drop-down menu, the “Close” icon (fourth from left), or right mouse button “Close” selection in the Image Manager.If you close a dataset, you must reload it from disk (or recreate it if it’s an LEL expression, regridded image, moment or something similar) to see it again. If you unregister a dataset, it will draw immediately if you re-register it, with its options as you have previously set them. In general, close unneeded datasets but unregister those that you intend to use again.
Image Manager
The Image Manager is used to define the master coordinate image, the sequence of images (e.g. for blinking), to register and unregister images, close images, change between raster, contour, vector, and marker displays, and to modify the properties of images. The panel can be invoked from the “Manage Images” tool, the third icon from the left (two overlapping squares).An example is shown in Image Manager figure. In this case, four images are loaded into the viewer. The sequence of images can be changed by dragging and dropping the images to new positions in the stack. The letter to the left indicates whether the image is a Raster, Contour, Vector, or Marker image. MC marks the coordinate master, in this case the second image. The checkboxes are to change the registration statuses. The Coordinate Master image can be defined by a right mouse click, and selecting the corresponding option. The right mouse menu button also offers options for quick changes between contour and raster images and to close an image. The Options button will open a drop-down box (as shown in Image Manager for the first image), in which one can again change between image type, change to a different rest frequency (or “Reset” to the value in the image header), or open the “Display Options” panel for that specific image.
Image Manager
Saving Data or Regions
Save Data Panel: The Save Data panel that appears when selecting the ‘Save as…’
The viewer can create new images by carrying out velocity regridding, evaluating an LEL expression, or collapsing a data cube. You can save these images to disk using the Data Manager Panel. Select Save As under the Data drop-down menu or click the Save As (disk) icon to bring up the Data Manager Panel set to the save tabs. These tabs are shown in the figure above.From the Save Image tab of the Data Manager Panel, you can export images from the viewer to either a CASA image or FITS file on disk. Select the desired file name and click “Save.” The Data Manager also allows you to save your current regions to a file, either in the CASA or ds9 format. The left side of the Save Data Panel lists all images that can be exported to disk. To save an image to a file, you can either enter the new filename in the box labeled ‘output name:’ followed by the save-button (alternatively the ‘Enter’-key), or choose a file name from the right hand side.
Viewing Images and Cubes¶
Viewing Images
There are several options for viewing an image. These are seen at the right of the Load Data - Viewer panel after selecting an image. They are:
raster image — a greyscale or color image,
contour map — contours of intensity as a line plot,
vector map — vectors (as in polarization) as a line plot,
marker map — a line plot with symbols to mark positions.
The raster image is the default image display, and is what you get if you invoke the viewer with an image file and no other options. In this case, you will need to use the Open menu to bring up the Load Data panel to choose a different display.
This page discusses raster images and contour maps in detail; for an example of how to use a vector map, see the 3C286 Polarization CASAguide here.
Initial Viewer Panels
When the viewer is started, two dialogs appear. One is the Data Manager which presents two panels.
Data Manager
The left panel shows the files that the viewer can load while the right panel shows some statistics about the file that is selected.
The other panel is the Viewer Display Panel.
Viewer Display Panel
This panel is the main panel used to interact with the viewer.
The image is shown on the left and image information is shown on the right. The Cursors panel displays information about the pixel at the current cursor location (as the cursor is moved around the image).
Animators Panel
The Animators panel allows the planes on the image cube to be displayed. This can be done by either single-stepping plane by plane or playing the planes of the image cube like a movie. The buttons are:
- move to the first image plane
- move back one image plane
- play image cube in reverse
- stop playing the movie
- play image cube as a movie
- move forward one image plane
- move to the last image plane
In addition, to these controls for moving through the image cube, there are two other areas of animation control:
- the rate indicates how how fast the movie should be played in terms of frames, and the second entry box (here with a zero) is for going to a particular plane of the image cube (enter a number and hit return) [Jump doesn’t seem to do anything]
- the slider of this control allows for moving through the image cube as the slider is moved. The dialogs at the ends of the slider allows for setting the start and end points for the movie (which can be less than or equal to zero and the number of planes in the cube)
Button Tools
These tools are designed for use with a three-button mouse. The row of boxes below the icon indicates which mouse button to which the tool is currently bound. For example, the last three icons in this table indicate that these tools are bound to the first, second, and third buttons respectively:
- zoom: select this tool (by clicking on this icon and pressing one of the three buttons), then click and drag out a rectangle, then double click inside the rectangle to zoom in
- panning: select this tool, then if the image is zoomed in, click and drag within the image to move the image
- adjust color map: select this tool, then click and drag within the image to adjust the color map
- contrast: select this tool, then click and drag within the image
- point region: select this tool, then place a point on the image, the regions panel corresponding to the dot you placed will have statistics an information about the selected point
- rectangular region: select this tool, then click and drag out a rectangle in the image and the regions panel corresponding to this region will have information about the rectangular region; double clicking in the region will display the statistics in to terminal window
- eliptical region: select this tool, then click and drag out an ellipse, the regions panel corresponding to this region will have information about the eliptical region; double clicking in the region will display the statistics in the terminal window
- polygon region: select this tool, then click multiple times within the image to mark out a region (one click at a time), double clicking when you have marked all of the points that denote the polygon, the regions panel corresponding to this region will have information about this region; double clicking in the region will display statistics for this region in the terminal window
- polyline region: select this tool, then click multiple times within the image to mark out a multi-segment line, the region panel for this region will display statistics about the region
- ruler: select this tool, click and drag in the image to get a display of distance along two axes
- pv diagram: select this tool, click and drag within the image to create a to be used to create a position/velocity diagram (the diagram is created from the region panel corresponding to the P/V line that you’ve drawn)
These tools create regions that can be used to provide information about a portion of an image.
Regions
Regions are created with the region Button Tools. For a region to be created, the Region panel (displayed on the left side of the Viewer Display Panel) must be open. If you do not see the Regions panel, it can be included in the Display Panel by selecting the Regions check box in the View menu:
Once the Regions dialog is in view Regions can be created and information about the regions can be viewed. For example, here one region (black rectangle) has been created and region statistics is displayed:
Note that in the statistics window, the brightness units are specified and the beam area is defined as the volume of the elliptical Gaussian \(\frac{Π}{4ln(2)} * FWHM_{major} * FWHM_{minor}\), where ln() is the natural logarithm and \(FWHM_{major}\) and \(FWHM_{minor}\) are the major and minor FWHM axes of the beam, respectively. The flux density is the mean intensity multiplied by the number of beams included in the selected region.
Region Statistics
If more than one region has been created, the scroll bar can be used to move from the information about one region to the next or the cursor (in the image panel) can be used to move from one region to the next and the region information will be updated as the cursor moves from region to region. Here are three regions, showing histograms for the regions:
Region Histogram
The region which corresponds to the histogram that is shown has the corner adjustment cubes drawn.
Any regions that are created using these tools can be removed by moving the cursor over the region you would like to remove and once that region is highlighted press the escape key. Regions can also be deleted from the region panel that corresponds to the region you would like to remove.
Viewing a Raster Map
A raster map of an image shows pixel intensities in a two-dimensional cross-section of gridded data with colors selected from a colormap according to a scaling that can be specified by the user. Once loaded, the data display can be adjusted by the user through the Data Display Options panel, which appears when you choose the Data\: Adjust Data Display menu or use the wrench icon from the Main Toolbar. The Data Display Options window is shown in the right panel of Initial Viewer Panels. It consists of a tab for each image or MS loaded, under which are a cascading series of expandable categories. For an image, these are:
display axes
hidden axes
basic settings
position tracking
axis labels
axis label properties
beam ellipse
color wedge
The basic settings category is expanded by default. To expand a category to show its options, click on it with the left mouse button.
Data Display Options — Display and Hidden Axes
In this category the physical axes (i.e. Right Ascension, Declination, Velocity, Stokes) to be displayed can be selected and assigned to the x, y, and z axes of the display. The z axis will be the axis scrolled across by the channel bar in the Animators Panel. If your image has a fourth axis (typically Stokes), then one of the axes will need to be hidden and not used in viewing. Which axis is hidden can be controlled by a slider within the hidden axes drop-down.
Data Display Options — Basic Settings
This roll-up is open by default showing some commonly-used parameters that alter the way the image is displayed. The most frequently used of these changes how the intensity value of a pixel maps to a color on the screen.
The options available are:
basic settings: aspect ratio
This option controls the horizontal-vertical size ratio of data pixels on screen. “Fixed world” (the default) means that the aspect ratio of the pixels is set according to the coordinate system of the image (i.e., true to the projected sky). “Fixed lattice” means that data pixels will always be square on the screen. Selecting “flexible” allows the map to stretch independently in each direction to fill as much of the display area as possible.
basic settings: pixel treatment
This option controls the precise alignment of the edge of the current “zoom window” with the data lattice. “Edge” (the default) means that whole data pixels are always drawn, even on the edges of the display. For most purposes, “edge” is recommended. “center” means that data pixels on the edge of the display are drawn only from their centers inwards.
NOTE: A data pixel’s center is considered its “definitive” position, and corresponds to a whole number in “data pixel” or “lattice” coordinates.
basic settings: resampling mode
This setting controls how the data are resampled to the resolution of the screen. “Nearest” (the default) means that screen pixels are colored according to the intensity of the nearest data point, so that each data pixel is shown in a single color. “Bilinear” applies a bilinear interpolation between data pixels to produce smoother looking images when data pixels are large on the screen. “Bicubic” applies an even higher-order (and somewhat slower) interpolation.
basic settings: data range
You can use the entry box provided to set the minimum and maximum data values mapped to the available range of colors as a list [min, max]. For very high dynamic range images, you will probably want to enter a max less than the data maximum in order to see detail in lower brightness-level pixels.
NOTE: By default you edit the scaling of a single image at once and can click between the tabs at the top of the Data Display Options window to manipulate different windows. By checking the Global Color Settings box at the bottom of this window, you will manipulate the scaling for all images at once. This can be very useful, for example, when attempting a detailed comparison between multiple reductions of the same data.
basic settings: scaling power cycles
This option allows logarithmic scaling of data values to colormap cells, which can be very helpful in the case of very high dynamic range. The color for a data value is determined as follows:
The value is clipped to lie within the data range [min, max] specified above.
This clipped value is mapped to a new value depending on the selected scaling power cycles in the following way:
If the scaling power cycles is set to 0 (the default), the program considers a linear range from [min, max] and scales this directly onto the set of available colors.
For positive scaling values, the data value (after clipping on [min, max] is scaled linearly to lie between 0 and p (where p is the value chosen) and 10 is raised to this power, yielding a value in the range 1 to 10p. That value is scaled linearly to the set of available colors.
[For negative scaling values, the data value (after clipping on [min, max]) is scaled linearly to lie between 1 and 10|p|, where p is the power chosen. The base 10 logarithm is taken of this re-scaled data value, yielding a value in the range 0 to abs(p). That value is scaled linearly to the set of available colors. Thus the data is treated as if it had p decades of range, with an equal number of colors assigned to each decade.]
The color corresponding to a number in final range is determined by the selected colormap and its “fiddling” (shift/slope) and brightness/contrast settings (see Mouse Toolbar, above). Adding a color wedge to your image can help clarify the effect of the various color controls.
In practice, you will often manipulate the data range bringing the max down in high dynamic range images, raising the minimum to the near the noise level when using non-zero scaling cycles. It is also common to use negative power cycles when considering high dynamic range images — this lets you bring out the faint features around the bright peaks.
basic settings: colormap
You can select from a variety of colormaps here. Hot Metal, Rainbow and Greyscale colormaps are the ones most commonly used.
Graphical Specification of the Intensity Scale
A histogram icon next to the Data Range entry in the Data Display Options window opens the Image Color Mapping window, which allows visualization and graphical manipulation of the mapping of intensity to color. The window at the left shows a histogram of the data with a gray range showing the data range. You can use this window to select the data range graphically (with the mouse), manually (by typing into the Min and Max entry windows), or as a percentile of the data. On the right, you can select the scaling power cycles and see a visualization of the transfer function mapping intensity (x-axis) to color (y-axis).
The functionality here follows the other histogram tools, with the Display tab used to change the histogram plotting parameters. It will often be useful to use a logarithmic scaling of the y-axis and filled histograms when manipulating the color table.
Data Display Options — Other Settings
Many of the other settings on the Data Display Options panel for raster images are self-explanatory such as those which affect beam ellipse drawing (only available if your image provides beam data), or the form of the axis labeling and position tracking information. You can also give your image a color wedge, a key to the current mapping from data values to colors.
Viewer Canvas Manager — Panels, Margins, and Backgrounds
The display area can also be manipulated from the Viewer Canvas Manager window. Use the wrench icon with a “P” (or the “Display Panel” menu) to show this window, which allows you to manipulate the infrastructure of the main display panel. You can set:
Margins - specify the spacing for the left, right, top, and bottom margins
Number of panels - specify the number of panels in x and y and the spacing between those panels.
Background Color - white or black (more choices to come)
Multi-Panel Display illustrates a multi-panel display along with the Viewer Canvas Manager settings which created it.
Viewing a Contour Map
Viewing a contour image is similar to viewing a raster map. A contour map shows lines of equal data value for the selected plane of gridded data. Contour maps are particularly useful for overlaying on raster images so that two different measurements of the same part of the sky can be shown simultaneously.
Several basic settings options control the contour levels used:
The contours themselves are specified by a list in the box Relative Contour Levels. These are defined relative to the two other parameters:
The Base Contour Level sets the zero level for the relative contour list corresponds to in units of intensity in the image.
The Unit Contour Level sets what 1 in the relative contour list corresponds to in units of intensity in the image.
Additionally, you have the option to manipulate the thickness and color of the image and to have either positive or negative contours appear dashed.
For example, the following settings:
Relative Contour Levels = [0.2, 0.4, 0.6, 0.8]
Base Contour Level = 0.0
Unit Contour Level = <image max>
would map the maximum of the image to 1 in the relative contour levels and the base contour level to zero. So the contours will show 20%, 40%, 60%, and 80% of the peak.
Another approach is to set the unit contour to 1, so that the contours are given in intensity units (usually Jy/beam). So this setup:
Relative Contour Levels = [0.010, 0.0.020, 0.040, 0.080, 0.160, 0.320]
Base Contour Level = 0.0
Unit Contour Level = 1.0
would create contours starting at 10 mJy/beam and doubling every contour.
Another useful approach is to set contours in units of the rms noise level of the image, which can be worked out from a signal free region. Then a setup like this:
Relative Contour Levels = [-3,3,5,10,15,20]
Base Contour Level = 0.0
Unit Contour Level = <image rms>
Would indicate significance of the features in the image. The first two contours show emission at ± 3-sigma and so on. You can get the image rms using the imstat task or using the Region Statistics on a region of the image .
Not all images are of intensity, for example a moment-1 image (immoments task) has units of velocity. In this case, absolute contours (like the last two examples) will work fine, but by default the viewer will set fractional contours but refer to the min and max of the image:
Relative Contour Levels = [0.2, 0.4, 0.6, 0.8]
Base Contour Level = <image min>
Unit Contour Level = <image max>
Here we have contours spaced evenly from min to max, and this is what you get by default if you load a non-intensity image (like the moment-1 image).
Overlay Contours on a Raster Map
Contours of either a second data set or the same data set can be used for comparison or to enhance visualization of the data. The Data Display Options Panel will have multiple tabs (switch between them at the top of the window) that allow you to adjust each overlay individually.
NOTE: Axis labeling is controlled by the first-registered image overlay that has labeling turned on (whether raster or contour), so make label adjustments within that tab.
To add a Contour overlay, open the Load Data panel (Use the Data menu or click on the folder icon), select the data set and click on contour map.
Creating a Position/Velocity Diagram
With an image cube loaded, it is possible to create a position/velocity diagram using the P/V Button Tool. The first step in creating a P/V diagram is to select the tool with the mouse button to which you would like to bind the tool:
Here we have bound the P/V button tool to the first mouse button. At this point, we can drag out a P/V line, by clicking and dragging on the image:
After creating the P/V line, it will appear with two circles on each end. These circles can be used to adjust the line by clicking within the circle and dragging:
After the P/V line is properly in place, the final P/V diagram can be created from the P/V Regions panel. It is just a matter of generating the P/V diagram with the Generate P/V button:
The generation of the P/V diagram may take some time, but then the final diagram is displayed:
Regions in the Viewer¶
Regions and the Region Manager
CASA regions are following the CASA ‘crtf’ standard as described in § D. CASA regions can be used in all applications, including tclean and Image Analysis Tasks.
NOTE: The CASA image analysis tasks will determine how a region is projected on a pixel image. The current CASA definition is that when the center of a pixel is inside the region, the full pixel is considered to be included in the region. If the center of the pixel is outside the region, the full pixel will be excluded. Note that the CASA viewer behavior is not entirely consistent and for rectangles it assumes that any fractional pixel coverage will include the entire pixel. For other supported shapes (ellipses and polygons), however, ithe viewer adheres to the ‘center of pixel’ definition, consistent with the image analysis tools and tasks.
For purely single-pixel work regions may not necessarily be the best choice and alternate methods may be preferable to using regions, eg. ia.topixel, ia.toworld, ia.pixelvalue.
ALERT: Some region file specifications are not recognized by the viewer, the viewer only supports rectangles (box), ellipses, and polygons.
NOTE: A leading ‘ann’ (short for annotation) to a region definition indicates that it is for visual overlay purposes only.
NOTE: Whereas the region format is supported by all the data processing tasks, some aspects of the viewer implementation are still limited to rectangles, ellipses, and some markers. Full support for all region types is progressing with each CASA release.
Once one or more regions are created, the Region Manager Panel becomes active. Like the Position Tracking and Animator Panels, this can be docked or detached from the main viewer display. It contains several tabs that can be used to adjust, analyze, and save or load regions.
NOTE: Moving the mouse into a region will bring it into focus for the Spectral Profile or Histogram tools.
Region Creation, Selection, and Deletion
Within the display area, you can draw regions or select positions using the mouse. Regions can be created with the buttons marked as ‘R’ in the mouse tool bar, which can be found on the top-left (second row of buttons) in the Viewer Display Panel. The viewer currently supports creation of rectangles, ellipses, polygons, and the point. As usual, a mouse button can be assigned to each button as indicated by the small black square in each button (marking the left, middle, or right mouse button).
Regions can be selected by SHIFT+click, de-selected by pressing SHIFT+click again. The bottom of the Region Manager Panel features a slider to switch between regions in the image. Regions can be removed by hovering over and pressing ESC or by pressing the buttons to the right side of the slider where the first button (trash can icon) deletes all regions and the far right button (red circle with a white X) deletes the region that is currently displayed in the panel.
Once regions are selected, they will feature little, skeletal squares in the corners of their boundary boxes. Selected regions can be moved by dragging with the mouse button and manually resized by grabbing the corners as handles. If more than one region is selected, all selected regions move together.
The Rectangle Region drawing tool currently enables the full functionality of the various Region Manager tabs (see below) as well as:
Region statistics reporting for images via double clicking (also shown in the Statistics tab of the Region Manager),
Defining a region to be averaged for the spectral profile tool (accessed via the Tools:Spectral Profile drop down menu or “Open the Spectrum Profiler” icon),
Flagging of MeasurementSets. Note that the Rectangle Region tool’s mouse button must also be double-clicked to confirm an MS flagging edit.
Selecting Clean regions interactively (§ 5.3.5)
The Polygon Region and Ellipse Region drawing have the same uses, except that polygon region flagging of a MeasurementSet is not supported.
Region Positioning
With at least one region drawn, the Region Manager becomes active. Using the Properties tab, one can manually adjust the position, annotation, and display style of the region. The entries labeled “frames” set which planes of the image cube the region persists through (regions can have a depth associated with them and will only appear in the frames listed in this range). One can manually adjust the width and height and the center of the box in the chosen units. The ‘selected’ check box is an alternative way to the SHIFT+click to select a region. The ‘annotation’ checkbox will place the “ann” string in front of the region ASCII output – annotation regions are not be used for processing in, e.g. data analysis tasks. In the line and text tabs, one can set the style with which the region is displayed, the associated text, and the position and style of that text.
NOTE: Updating the position of a region will update the spectral profile shown if the Spectral Profile tool is open and the histogram if the Histogram tool is open. The views are linked. Dragging a region or adjusting it manually with the Properties tab is a good way to explore an image.
Region Statistics
One of the most useful features of defining a region is the ability to extract statistics characterizing the intensity distribution inside the region. You can see these in the Statistics tab of the of the Region Manager Panel. This displays statistics for the current region in the current plane of the current image. When more than a single region is drawn, you can select them one by one and the Region Panel will update the statistics to reflect the currently selected region. All values are updated on the fly when the region is dragged across the image.
A similar functionality can be achieved by double clicking inside of a region. This will send statistics information for this region in all registered images to the terminal, looking something like this:
(IRC10216.36GHzcont.image) image
Stokes Velocity Frame Doppler Frequency
I -2.99447e+11km/s LSRK RADIO 3.63499e+10
BrightnessUnit BeamArea Npts Sum Flux
Jy/beam 36.2521 27547 1.087686e-01 3.000336e-03
Mean Rms Std dev Minimum Maximum
3.948473e-06 3.723835e-04 3.723693e-04 -1.045624e-03 9.968892e-03
Listed parameters are Stokes, and the displayed channel Velocity with the associated Frame, Doppler and Frequency value. Sum, Mean, Rms, Std Deviation, Minimum, and Maximum value refer to those in the selected region and has the units as specified in BrightnessUnit. Npts is the number of pixels in the region, and BeamArea the beam size in pixels. FluxDensity is in Jy if the image is in Jy/beam. This is an easy way to copy and paste the statistical data to a program outside of CASA for further use.
Taking the RMS of the signal-free portion of an image or cube is a good way to estimate the noise. Contrasting this number with the maximum of the image gives an estimate of the dynamic range of the image. The FluxDensity measurement gives a way to use the viewer to do very basic photometry.
Saving and Loading Regions
The File tab in the Region Manager allows one to save or load selected regions, either individually or en masse. You can choose between CASA and DS9 region format. The default is a CASA region file (saved with a ‘.crtf’ suffix, see § D). The DS9 format does not offer the full flexibility and cannot capture Stokes and spectral axes. DS9 regions will only be usable as annotations in the viewer, they cannot be used for data processing in other CASA tasks. When saving regions, one can choose to save only the current region, all regions that were selected with SHIFT+click, or all regions that are visible on the screen.
NOTE: The load functionality for this tab will only become available once at least one region exists. To load a region when no regions exist, use the Region Manager window.
The Region Fit
The Viewer can attempt to fit a two-dimensional Gaussian to the emission distribution inside the currently selected region. To attempt the fit, go to the Fit tab of the Region Manager and click the ‘gaussfit’ button in the bottom left of the panel. You can choose whether or not to fit a sky level (e.g., to account for a finite background, either astronomical, sky, or instrumental). After fitting the distribution, the Fit panel shows the results of the fit, the center, major and minor axis, and position angle of the Gaussian fit in pixels (I) and in world coordinates (W, RA and Dec). The detailed results of the fit will also appear in the terminal window, including a flag showing whether the fit converged.
The Region Histogram
Histogram Tab: The histogram tab in the Region Manager. Right click to zoom. Hit SHIFT + Right Click to adjust the details of the histogram display.
The viewer will automatically derive a histogram of the pixel values inside the selected region. This can be viewed using the Histogram tab of the of the Region Manager Panel. This is a pared down version of the full Histogram Tool. You can manipulate the details of the histogram plot by:
Using the Right Click to zoom - either to the full range, a selected percentile, or a range that you have graphically selected by dragging the mouse.
Hitting SHIFT + Right Click to open the histogram options. This lets you toggle between a logarithmic and linear y-axis, choose between a line, outline, or filled histogram, and adjust the number of bins.
The histogram will update as you change the plane of the cube or shift between regions.
Spectral Profiler¶
NOTE: Make Sure That You Use the Radio Version! This section describes the ‘Radio’ version of the profiler. To be sure that you have the radio version of the tool selected (this may not be the default), click on the preferences icon (the ‘gear’ fourth from the left in the Spectral Profile tool) and make sure that the ‘Optical’ option is not checked. If you are using the Spectral Profile tool in the viewer for the very first time, you will also be prompted for a selection that will subsequently be kept for all future calls unless the preference is changed.
The Spectral Profile Tool consists of the Spectral Profile Toolbar, a main display area, and two associated tabs: Spectral-Line Fitting and Line Overlays.
Interaction With the Main Display Panel: For the Spectral Profile tool to work, a region or point must be specified in the main Viewer Display window. Use the mouse tools to specify a point, rectangle, ellipse, or polygon region. Alternatively, load a region file. The Spectral Profile tool will show a spectrum extracted from the region most recently highlight by the mouse in the main Viewer Display Panel. The method of extraction (i.e. mean, median, sum, or flux density) can be specified by the user using a drop down menu below the spectrum in the Spectral Profile window; the method of extraction is mean by default.
The Spectral Profile tool can also feed back to the Main Display Panel. By holding CTRL and right clicking in the spectrum, you will cause the Main Display Panel to jump to display the frequency channel corresponding to the spectral (x) coordinate of the region highlighted in the Spectral Profile tool. Holding CTRL and dragging out a spectral range while holding the right mouse button will queue a movie scrolling through images across that spectral range. You can achieve the same effect with the two-ended-arrow icon towards the right of the toolbar in the Spectral Profile window.
In both the Spectral-Line Fitting and Line Overlays tabs, it may be useful to select a range in frequency or velocity. You can do this with the parallel lines-and-arrow icon (see below) or by holding shift, left clicking, and dragging out the range of interest. A shaded gray region should appear indicating your selection.
Spectral Profile Toolbar
Spectral Profile Toolbar: The toolbar for the Spectral Profile tool allows the user to save the spectrum, print or save the tool as an image, edit preferences (general, tool, legend), apply spectral smoothing, pan or zoom around the spectrum, select a range of interest, jump to a channel, or add a label.
The Spectral Profile Toolbar is the toolbar along the top of the Spectral Profile window. From left to right, the icons allow the user to:
(disk) export the current profile to a FITS or ASCII file
(printer) print the main window to a hard copy
(writing desk) save the panel as an image (PNG, JPG, PDF, etc.)
(gear) set plot preferences
(color wheel) set color preferences for the plot
(signpost) set legend preferences
(triangle) set the spectral smoothing method and kernel width
(arrows) pan the spectrum in the indicated direction NOTE: The arrow keys also allow one to pan using the keyboard.
(magnifying glass) zoom to the default zoom, in, and out NOTE: the +/- keys allow one to zoom with the keyboard
(parallel lines+arrows) drag out a range of interest in the spectrum, for use with fitting or line overlays.
(double-ended arrow) jump to a channel in the main viewer (single click) or define a range over which to play a movie in the viewer (with a drag).
NOTE: You can also jump to a channel with CTRL+Right Click and queue a movie by holding CTRL and dragging out a range while holding the right mouse button.
(notepad and pencil) Add or edit a label on the plot. Click this icon to enter a mode where you can drag out a box to create a new annotation box or drag the corners of an existing one to resize it. You can edit the contents, color, and font of an existing annotation by right clicking on it and selecting “Edit Annotation” in the main Spectral Profile window.
Spectral Profile Tool Preferences shows the setting dialogs accessible from the toolbar. This Preferences dialog opened by the ‘gear’ icon allows the user to:
Toggle automatic scaling the x- and y-ranges of the plot.
Toggle the coordinate grid overlay in the background of the plot.
Toggle whether registered images other than the current one appear as overlays on the plot.
Toggle whether these profiles are plotted relative to the main profile (in development).
Toggle the display of tooltips (in development).
Toggle the plotting of a top axis.
Toggle between a histogram and simple line style for the plot.
Toggle between the radio and optical versions of the Spectral Profile tool Note: We discuss only the radio version here; this mainly impacts the Spectral Line Fitting and Collapse/Moments functionality..
Toggle the overplotting of a line showing the channel currently being displayed in the main Display Panel.
The Color Curve Preferences dialog opened by the ‘color wheel’ icon allows the user to:
Select the color of the line marking the current channel shown in the main Display Panel.
Select the color used to overlay molecular lines from Splatalogue.
Select the color to plot the initial Gaussian estimate used in spectral line fitting.
Select the color used for the zoom rectangle.
Set a queue of colors used to plot the various data sets registered in the Display Panel.
Set a queue of colors to plot the set of Gaussian fits.
Set a queue of colors to plot the synthesized curve.
Two sets of preset colors, “Traditional” or “Alternative”, are available or the user can define their own custom color palette.
The legend options opened by the ‘signpost’ icon allow the user to toggle the plotting of a legend defining the curves shown in the main Spectral Profile window. Using a drop-down dialog, the legend can be placed in the top left corner of the plot, to the right of the plot, or below the plot. Toggling the color bar causes the color of the curve to be indicated either via a short bar or using the color of the text itself. Double click the names of the files or curves to edit the text shown for that curve by hand. A legend is only available if more than a single spectrum has been loaded.
The spectral smoothing option (triangle icon in the Spectral Profile toolbar) has two methods, “Boxcar” and “Hanning” with the selection of odd numbers for the smoothing kernel width in channels.
Main Spectral Profile Window
The main window shows the spectrum extracted from the active region of the image in the main Display Panel. The spectra from the same region in any other registered images are also plotted if overlays are enabled. Menus along the bottom of the image allow the user to select how the spectrum is displayed. From left to right:
The units for the bottom spectral axis.
The units for the top spectral axis.
NOTE: Dual axes are enabled only if a single image is registered and the top axis option is enabled. In general, dual axes are not well-defined for mixed data sets. The exception is that open data cubes with matched frequency/spectral axes will allow dual axes.
The units for the left intensity or flux axis
NOTE: The “Fraction of Peak” option allows for easy comparison of data with disparate intensity scales.
The velocity reference frame used if a velocity axis is chosen for the top or bottom axis.
The method used to extract spectrum from the region — a mean over all pixels in the region, a median, sum, or a sum converting units to get a flux density over the region (Jy).
Toggle the calculation and overplotting of error bars calculated from scatter in the data (rmse refers to root mean square error).
In addition to these drop-down menus, the main Spectral Profile window allows the user to do the following using keyboard and mouse inputs:
jump the main Display Panel window to a specified channel (CTRL+Right click): hold CTRL and right click in the spectrum. A marker will appear and the main Viewer Display Panel will jump to display that channel.
animate the main Display Panel in a movie across a frequency range (CTRL+Right click+drag): hold CTRL, Right click, and drag. The main Viewer Display panel will respond by showing a movie scrolling across the selected spectral channels.
zoom the Spectral Profile (+/-, mouse drag): Use the +/- keys to zoom in the same way as the toolbar buttons. Alternatively, press and drag the left mouse button. A yellow box is drawn onto the panel. After releasing the mouse button, the plot will zoom to the selected range.
pan the Spectral Profile (arrows): Use the arrow keys to pan the plot.
select a spectral range for analysis: hold shift, left click, and drag. A gray area will be swept out in the display. This method can be used to select a range for spectral line fitting or collapsing a data cube (in the Collapse/Moments window).
NOTE: If the mouse input to the Spectral Profile browser becomes confused hit the ESC key several times and it will reset.
Spectral-Line Fitting
Spectral Line Fitting Tab: Using the Spectral Line Fitting Tab (found at the bottom left of the Spectral Profile Tool), the user can fit a combination of a polynomial and multiple Gaussian components. The range to be fit can be specified (gray region) manually or with a shift+click+drag. Initial estimates for each component may be entered by hand or specified via an initial estimates GUI. The results are output to a dialog and text file with the fit overplotted (here in blue) on the spectrum (with the possibility to save it to disk).
Specifying Initial Gaussian Estimates Graphically and Fitting Output: The top panel shows the graphical specification of initial estimates for Gaussian fitting. Slider bars specify the center, FWHM, and peak intensity for the initial estimate. The bottom panel shows the verbose output of the fitting.
The Spectral-Line Fitting tab allows the user to interactively fit a combination of Gaussian and polynomial profiles to the data shown in the Spectral Line Profile tool. The tool includes a number of options:
A drag-down menu labeled “Curve” at the top of the panel allows the user to pick which data set to fit.
The spectral range to fit can be specified by either holding shift+left click+dragging out a region in the main spectral profile window or by typing it manually into the boxes labeled Min and Max near the top left of the fitting panel.
Optionally multiple fits can be carried out once, fitting each spectrum in the region in turn. To enable this, check the ‘MultiFit’ box.
Optionally a polynomial of the specified order may be fit. To do so, check the ‘Polynomial’ fit check box and then specify the desired order.
The results may be saved to a text file. This text file should be specified before the fit is carried out. Click ‘Save’ and then use the dialog to specify the file name. Note that the fit curve itself becomes a normal spectral profile data set and can be saved to disk using the toolbar (‘disk’ icon) after the fit.
One or more Gaussians can be fit, although results are presently most stable for one Gaussian. Specify the number of Gaussians in the box marked “Gaussian Count” and then enter initial estimates for the peak, center, and FWHM in the table below. Any of these values can be fixed for any of the Gaussians being fit. Initial estimates can also be manually specified by clicking “Estimates”. This brings up an additional GUI window, where sliders can be used to specify initial estimates for each Gaussian to be fit.
For plotting purposes, one may wish to oversample the fit (i.e., plot a smooth Gaussian), you can do so by increasing the Fit Samples/Channel to a high number to finely sample the fit when plotting.
NOTE: Currently the tool works well for specifying a single Gaussian. Fitting multiple Gaussian components can become unstable.
Line Overlays
Line Overlays Tab: The Line Overlay tab (found at the bottom left of the Spectral Profile Tool) allows users to query the CASA copy of the Splatalogue spectral line database. Enter the redshift of your source (far right of the panel), select an Astronomical Filter from the drop down menu, and use shift+click+drag to select a frequency range (or enter the range manually in the boxes marked Min and Max). The “Search” button will bring up a separate “Search Results” window, which can in turn be used to graph the candidate lines in the main Spectral Profile window (here CO v=0).
Each version of CASA includes a local version of the Splatalogue spectral line database and this can be used to identify and overplot spectral transitions. This feature, shown in Line Overlay Tab, allows the user to search Splatalogue over the range of interest.
To overlay spectral lines:
Select the Line Overlays tab in the Spectral Profiles tab.
If you know it, enter the redshift or velocity of your source in the “Doppler Shift” panel. Otherwise, the lines will be overlaid assuming a redshift of 0.
Specify a minimum and maximum frequency range to search, either by typing a range or by holding shift and left click and dragging out a range in the spectrum (you will see a gray box appear). If you don’t specify a range, the tool will search over the frequency range of spectrum.
Optionally, you may select an astronomical filter from the list (e.g., commonly used extragalactic lines or lines often found in hot cores, see Splatalogue for more information). This is usually a good idea because it pares the potentially very large list of candidate lines to a smaller set of reasonable candidates.
Click ‘Search’ and the Spectral Profile will search Splatalogue for a list of Spectral lines that fit the selected Astronomical Filter in the selected frequency range for the selected redshift. A “Molecular Line Search Results” dialog box will pop up showing the list of candidate lines.
Highlight one or more of these transitions and click ‘Graph Selected Lines’. A set of vertical markers will appear in the main Spectral Profile window at the appropriate (redshifted) frequencies for the line.
NOTE: You will want to click ‘Clear Lines’ between searches, especially if you update the redshift.
The Collapse/Moments Tool
The CASA Viewer (imview) can collapse a data cube into an image, for instance allowing to look at the emission integrated along the z axis or the mean velocity of emission along the line of sight. You can access this functionality via the Collapse/Moments tool (accessed via the Tools drop down menu in the Main Display planel or via the four inward pointing arrows icon in the Main Toolbar) which is shown in Collapse/Moments Tool.
The tool uses the same format as the Spectral Profile tool and will show the integrated spectrum of whatever region or point is currently selected in the Main Display Panel. To create a moment map:
Select a range over which to integrate either manually using the left part of the window, by adding an interval and typing in the values into the boxes marked Min and Max or by holding SHIFT + Left Click and dragging out the range of interest.
Pick the set of algorithms (listed in the box labeled “Moment(s)”) that you will use to collapse the image along the z-axis. Clicking an option toggles that moment method, and the collapse will create a new image for each selected moment. For details on the individual collapse method, see the immoments task for more details on each moment.
The moment may optionally include or exclude pixels within a certain range (for example, you might include only values with signal-to-noise of three or greater when calculating the velocity dispersion). You can enter the values to include or exclude manually in the Thresholding window on the right or you can open a histogram tool to specify this range graphically by clicking Specify Graphically (before this can work, you must click ‘Include’ or ‘Exclude’).
The results of the collapse can be saved to a file, which consists of a string specifying the specific moment tacked onto a root file name that you can specify using Select Root Output File.
When you are satisfied with your chosen options, press ‘Collapse’.
NOTE: Even if you don’t save the results of the collapse to a file, you can still save the map later using the Save as… entry in the Data pull down menu from the Main Viewer Display Panel.
Interactive Position-Velocity Diagram Creation
The route to create position-velocity cuts in the viewer is illustrated in Position/Velocity Tool:
Select the ‘P/V cut’ tool from the Mouse Toolbar and use it to draw a line across a data cube along the axis you want to visualize.
Open the Region Manager Panel and go to the pV tab. Highlight the cut you just drew. You should see the end point coordinates listed, along with information on the length and position angle of the cut. You can set the averaging width (in pixels) in a window at the bottom of the tab.
When you are satisfied, hit ‘Generate P/V’. This will create a new Main Viewer Display Panel showing the position velocity cut. The axes should be Offset and velocity.
The new image can be saved to disk with the Data\:Save as… option.
Image Analysis in the Viewer (imview)¶
Analysis Tools that are available in the Viewer (task imview).
The Brightness Profile Tool
The viewer allows the user to draw multiple line segments using the “Polyline drawing” button, and this feature can be used to display one-dimensional brightness profiles of images, such as shown in the Spatial Profile Tool. After double-clicking the last line segment, the ‘Regions’ dock will then display a preview of the slice in the Spatial Profile tab and the full “Spatial Profile Tool” can be launched from there by clicking the “Spatial Profile Tool” button. This “Spatial Profile Tool” panel allows one to select the interpolation method to connect the pixels, and a number count for the sampled pixels in between markers. ‘Automatic’ will show all pixels. The x-axis of the display can be either the distance along the slice or the X and Y coordinate projections (e.g. along RA and DEC). All segments are also listed at the bottom with their start and end coordinates, the distance and the position angles of each slice segment. The color tool can be used to give each segment a separate color.
The Histogram Tool
CASA can calculate and visualize a histogram of pixel values inside a region of interest. To examine this histogram, select Histogram from the Tools drop-down menu or the ‘Histogram’ icon in the Main Toolbar (looks like a comb). This opens the full Histogram Tool; more limited versions are accessible from the Region Manager Panel, the graphical color table manipulation tool, and the Collapse/Moments tool.
The resulting Histogram Tool should look something like Histogram Tool. The menus along the top (or the corresponding mouse clicks) allow one to:
Zoom to the full range, a selected percentile, or a graphical range.
Change the display of the histogram to show a log axis, display as either a line plot, an outline, or a filled histogram. Change the number of bins in the histogram, or clear the plot (to start over).
Configure what data are fed into the histogram. You can use this menu to tell the histogram to track the channel currently selected in the main Viewer Display Panel (click the “Track Channel” box) or to integrate across some range of channels (defaulting to the whole image). You can also switch the 2-D footprint used between the whole Image, the Selected Region, and All Regions.
Save (via the disk icon) an image of the histogram to a graphical file on disk.
The Histogram Tool also allows you to fit the distribution using either a Gaussian or a Poisson distribution, for example to estimate the noise in the image (a Gaussian will be a good choice to describe the noise in most radio data cubes). You can specify initial estimates or let the program generate initial guesses. The fit is then overplotted on the histogram (colors can be adjusted by clicking the color wheel icon in the toolbar) and details of the fit are printed to the text window below the fit button.
The Two-Dimensional Fitting Tool
CASA can fit two-dimensional Gaussians to an intensity distribution, and the Two-Dimensional Fitting Tool in the Viewer exposes this functionality interactively. This tool, accessed by the ‘blue circles’ icon in the Main Toolbar or the Tools:Fit menu item, has an interface like that shown in Two-Dimensional Fitting Tool. The interface exposes several options:
NOTE: The two dimensional fitter only operates on a single channel at a time.
You can select whether to fit only the selected region or the whole image plane and specify which channel of the cube you want to operate on.
Initial Estimates: The box in the top left corner allows the user to specify initial estimates by feeding in a file. The easiest way to make an appropriate file is to edit an existing one. Even easier, you can use the Find Sources button to automatically generate a temporary file of initial estimates.
Pixel Range: You can choose to only include a certain range of pixel intensity values in the fit. For example, you might choose to only fit Gaussians to pixels a few times above the measured noise level. You can use the Specify Graphically button to bring up an interactive histogram of the region (a reduced functionality version of the full Histogram Tool).
Output: You can choose to save the output of the fit as a file to the specified directory and to subtract the fit from the image and to subtract the fit from the original, creating a Residual Image that gets stored as a CASA image and automatically loaded into the viewer. This gives a way to tell how well your fit describes the total emission.
Visualization: You can toggle whether the fit is displayed on the viewer or not and change the color of the marker.
Click Fit to start the fit. If the fit does not converge, try improving your initial estimates and fitting again.
Printing from the Viewer¶
You can select Data\:Print from the drop down menu or click the printer icon in the Main Toolbar to bring up the Viewer Print Manager. From this panel, you can “Print” the contents of Display Panel to a hardcopy or “Save” them as an image in a format selected from the drop-down menu at the bottom left of the window.
NOTE: The save feature will overwrite the file in question without prompting.
The Viewer Print Manager allows you to adjust the DPI, orientation, and page format (Output Media) for Postscript or PDF files and to scale the image to a desired pixel size for other images.
To achieve the best output it is usually advisable to adjust the settings in the Viewer Print Manager (printer icon in the Main Toolbar), Data Display Options (Data\:Adjust Data Display), and Viewer Canvas Manager (wrench+”P” icon in the Main Toolbar). For PDF and Postscript output, turning the DPI up all the way yields the best-looking results. For other images, a white background often makes for better looking images than the default black. It is often necessary to increase the Line Width in the Axis Label Properties (in the Data Display Options panel) to ensure that the labels will be visible when printed. Increasing from the default of 1.4 to a value around 2 often works well.
The following figure shows an example of printing to a file while adjusting the Data Display Options and Viewer Canvas Manager to improve the appearance of the plot.
Scripting using imview¶
The imview task offers scriptable access to many viewer options. This enables the production of customized plots without invoking the GUI and allows one to open the viewer to a carefully selected state.
imview has the following inputs:
#imview :: View an image
raster = {} #(Optional) Raster filename (string)
#or complete raster config
#dictionary. The allowed dictionary
#keys are file (string), scaling
#(numeric), range (2 element numeric
#vector), colormap (string), and
#colorwedge (bool).
contour = {} #(Optional) Contour filename (string)
#or complete contour config
#dictionary. The allowed dictionary
#keys are file (string), levels
#(numeric vector), unit (float), and
#base (float).
zoom = 1 #(Optional) zoom can specify
#intermental zoom (integer), zoom
#region read from a file (string) or
#dictionary specifying the zoom
#region. The dictionary can have two
#forms. It can be either a simple
#region specified with blc (2 element
#vector) and trc (2 element vector)
#[along with an optional coord key
#("pixel" or "world"; pixel is the
#default) or a complete region
#rectangle e.g. loaded with
#"rg.fromfiletorecord( )". The
#dictionary can also contain a
#channel (integer) field which
#indicates which channel should be
#displayed.
axes = -1 #(Optional) this can either be a
#three element vector (string) where
#each element describes what should
#be found on each of the x, y, and z
#axes or a dictionary containing
#fields "x", "y" and "z" (string).
out = {} #(Optional) Output filename or
#complete output config dictionary.
#If a string is passed, the file
#extension is used to determine the
#output type (jpg, pdf, eps, ps, png,
#xbm, xpm, or ppm). If a dictionary
#is passed, it can contain the
#fields, file (string), scale
#(float), dpi (int), or orient
#(landscape or portrait). The scale
#field is used for the bitmap formats
#(i.e. not ps or pdf) and the dpi
#parameter is used for scalable
#formats (pdf or ps).
The raster and contour parameters specify which images to load and how these images should be displayed. These parameters take python dictionaries as inputs. The fields in these dictionaries specify how the image will be displayed.
An example call to imview looks like this:
imview(raster={'file': 'ngc5921.clean.image','range': [-0.01,0.03],'colormap': 'Hot Metal 2','scaling': -1},
contour={'file': 'ngc5921.clean.image'},
axes={'x':'Declination'},
zoom={'channel': 7, 'blc': [75,75], 'trc': [175,175],'coord': 'pixel'},
out='myout.png')
The argument to raster is enclosed in the curly braces { }. Within these braces are a number of “key”:”value” pairs. Each sets an option in the viewer, with the GUI parameter to set defined by the “key” and the value to set it to defined by “value.” In the example above, file=’ngc5921.clean.image’ sets the file name of the raster image, range= [-0.01,0.03] sets the range of pixel values used for the scaling.
contour works similarly to raster but can accept multiple dictionaries in order to produce multiple contour overlays on a single image. To specify multiple contour overlays, simply pass multiple dictionaries (comma delimited) in to the contour argument:
contour={'file': 'file1.image', 'levels': [1,2,3] }, {'file': 'file2.image', 'levels': [0.006, 0.008, 0.010] }
zoom specifies the part of the image to be shown. The example above specifies a channel as well as the top right corner “trc” and the bottom left corner “blc” of the region of interest.
axes defines what axes are shown. By default, the viewer will show ‘x’:’Right Ascension’, ‘y’:’Declination’ but one may also view position-frequency images.
out defines the filename of the output, with the extension setting the file type.
Currently, the following parameters are supported:
raster -- (string) image file to open
(dict) file (string) => image file to open
scaling (float) => scaling power cycles
range (float*2) => data range
colormap (string) => name of colormap
colorwedge (bool) => show color wedge?
contour -- (string) file to load as a contour
(dict) file (string) => file to load
levels (float*N) => relative levels
base (numeric) => zero in relative levels
unit (numeric) => one in the relative levels
zoom -- (int) integral zoom level
(string) region file to load as the zoom region
(dict) blc (numeric*2) => bottom left corner
trc (numeric*2) => top right corner
coord (string) => pixel or world
channel (int) => channel to display
(dict) <region record> => record loaded
e.g., rg.fromfiletorecord( )
axes -- (string*3) dimension to display on the x, y, and z axes
(dict) x => dimension for x-axes
y => dimension for y-axes
z => dimension for z-axes
out -- (string) file with a supported extension
[jpg, pdf, eps, ps, png, xbm, xpm, ppm]
(dict) file (string) => filename
format (string) => valid ext (filename ext overrides)
scale (numeric) => scale for non-eps, non-ps output
dpi (numeric) => dpi for eps or ps output
orient (string) => portrait or landscape
Examples are also found in help imview
.
Scripting using the viewer tool
The viewer tool may also be used to generate simple figures that can be directly saved to an output image file format (png, jpg, etc). Below is an example.
def dispimage(imname=''):
qq = viewertool()
qq.load(imname)
qq.datarange(range=[-0.01,1.1])
qq.colormap(map='Rainbow 3')
qq.colorwedge(show=True)
qq.zoom(blc=[100,150], trc=[600,640])
qq.output(device='fig_trial.png',format='png')
qq.close()
Note that only basic controls are available via the viewertool interface. For additional customization via a script, please see the following section describing “Using Viewer state files within a script”.
Using Viewer state files within a script
In order to access the full flexibility of the GUI interface in customizing the viewer settings and display options, a hand-crafted viewer state can be saved, edited, and subsequently restored/rendered via a script that then allows the saving of the figure to a file on disk.
For example:
Step 1 : Customize the viewer by hand. For example, choose to open an image, customize the display data ranges, choose a colormap, change axis label properties, change the units of the movie axis label, edit the panel background color, adjust margins and and resize the panel window.
Step 2 : Click on the “save viewer state” button on the top control panel of the viewer. This will save a .rstr file, which is an xml file containing a complete description of the current state of the viewer.
Step 3 : Edit the text xml file as required. The simplest operation is to search and replace the name of the CASA image being opened. More complex editing can be done via stand-alone editing scripts perhaps using standard python xml parser/editing packages.
Step 4 : Restore the state of the viewer from the edited xml .rstr file, using the viewertool as follows to subsequently save a .png figure to disk.
CASA <1>: vx = viewertool()
CASA <2>: x = vx.panel('mystate.rstr')
CASA <3>: vx.output('myfig.png',panel=x)
(There are two interactive ways to restore the viewer state as well. The first is by starting up the viewer with no image chosen, and then clicking on the “restore viewer state” button and choosing this .rstr file to open. Alternately, the casaviewer can itself be opened by supplying this .rstr file as the ‘image’ to open.)
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/carta.ipynb
CARTA¶
CARTA is the Cube Analysis and Rendering Tool for Astronomy, a new image visualization and analysis tool designed for the ALMA, the VLA, and the SKA pathfinders. As image sizes continue to increase with modern telescopes, viewing an image with a local image viewer or with a remote image viewer via the ssh protocol becomes less efficient. The mission of CARTA is to provide usability and scalability for the future by utilizing modern web technologies and computing parallelization.
Download and Installation (see https://cartavis.github.io/)
CARTA is a separate application and not directly integrated with CASA. Refer to the official CARTA website for download and installation instructions as well as a proper set of documentation.
Some advantages over the CASA Viewer (tasks imview and msview):
Much better performance, able to handle very large image cubes
Modern web browser based interface allowing local and remote display options.
Can display Stokes wedges.
Proper display of image headers
Flexibility to modify and save the layout
Supports new HDF5 image format (in addition to CASA Image, MIRIAD, and FITS)
Rotation support for regions
RMS display for spectra
Better image rendering widget
Better animation control
Gzip image display
Subsequent releases of CARTA will continue to enhance CARTA’s performance. For a full overview of the current and upcoming features, see the official CARTA website.
CARTA is developed by the Academia Sinica Institute of Astronomy and Astrophysics (ASIAA), the Inter-University Institute for Data Intensive Astronomy (IDIA), the National Radio Astronomy Observatory (NRAO), and the Department of Physics, University of Alberta.
Remaining Capability Gaps¶
Note: CARTA does not yet offer an exact replacement for every feature in the old CASA Viewer (tasks imview and msview). Many things are complete, and CARTA offers a good deal of enhanced functionality over the imview task, but a small subset of things may still be missing.
Some remaining features of the CASA Viewer that are not yet in CARTA and either under development for CARTA v.4, or planned for a future CARTA version:
Complete set of fitting tools (under development)
Image annotation (under development)
Multi-channel plots (under development)
Save and reload states (under development)
Sharing of states (under development)
Support for AIPS files with beam in HISTORY (under development)
Source finder tool (planned)
Profile annotation (planned)
Rotated cube view (planned)
Scalable output (SVG ir PDF) (planned)
Regions that extend across spectral and stokes planes (planned)
Histogram fitting (planned)
Full support CRTF (not available in CASA Viewer) (planned)
Interactive Clean
The CASA Viewer serves as both a stand-alone analysis platform for image artifacts, as well as an interactive front-end to control CASA execution (via interactive clean).
CARTA is focused on providing image analysis to a wide variety of groups and organizations, some of which are not affiliated with CASA or only have access to the output images. Integrating CASA and CARTA together through interactive clean control would not be efficient.
Instead, the CASA team is developing a separate interactive control interface for users of CASA. This interface will include interactive clean unified with other widgets in the CASA suite related to runtime execution, status, and control. This interface will not duplicate CARTA analysis functionality and is intended to be used in conjunction with CARTA for image viewing. More details will be provided as development progresses.
MS View
The CASA Viewer is also able to open MeasurementSet format files for raster display and flagging of visibility data. CARTA will focus solely on image format files (including FITS and HDF5). The CASA team is working to migrate MS view and PlotMS to a new unified MeasurementSet plotting and analysis utility. More details will be provided as development progresses.
Using CARTA with CASA¶
The CARTA Website provides download links for CARTA on a variety of platforms and usage types. CARTA is intended for more than just CASA usage and supports installation within large data warehouses and archives.
Refer to the CARTA installation and configuration instructions for more information.
Personal laptops or workstations
Many users expect to run CARTA alongside a CASA instance to view the products produced by CASA. The Stand-alone application version of CARTA is intended for this purpose.
If you are running CASA on your own laptop or workstation that you are directly using (so the machine where the data reside is the same that will render the visualization), then simply downloading and executing the Stand-alone application version of CARTA for your OS is sufficient.
Clusters
If you are running CASA or viewing data products on a cluster, you should have a site deployment of CARTA configured by your system administrator. This is a single CARTA instance that is always running at a fixed URL and allows users to connect with their institution login credentials.
Note that the stand-alone application version of CARTA may work on a cluster through ssh/VNC with the --no_browser
option discussed in the next section, but the site deployment version is the intended distribution for this type of usage.
Running CARTA at NRAO¶
Instructions for users within the NRAO network or with an NRAO account.
Warning: The default Firefox installation on some NRAO machines may not support the WebGL version of CARTA. Try Chrome or contact the helpdesk for an updated version of Firefox
Using the Site Deployment version
NRAO has set up a provisional site deployment of CARTA for users running CASA on the CV cluster. This is a temporary instance on borrowed hardware intended to allow for initial access while permentant hardware is procured.
https://carta-test.cv.nrao.edu
Before starting: in your linux home directory, create symbolic links to the directories on /lustre that have the data that you wish to visualize
If you are outside the NRAO network, you must start a VPN connection to NRAO.
login with your NRAO userid/password
Note: This provisional server is in Charlottesville and can only access CV lustre filesystems. There is no NM-based CARTA server at this time, so please use the below instructions on VPN or ssh tunneling to remotely connect to the AOC in New Mexico.
Remotely Connecting to an NRAO Workstation or Cluster
If the site deployment version is not an option for you, then you can instead invoke the stand-alone installation of carta by using the --no_browser
switch:
via VPN
Connect your home computer to the NRAO VPN
(Optional) Reserve a cluster node in the usual manner
$ ssh <username>@your_cluster_node
or$ ssh <username>@your_workstation
from your home computer, or from a terminal in a remote VNC or fastx desktop displayfrom the ssh or VNC/fastx terminal, (optional) cd to the directory where the data you want to visualize live and (not optional) type one of the following:
$ carta --no_browser
$ APPIMAGE_EXTRACT_AND_RUN=1 carta --no_browser
This starts the CARTA backend on the remote machine and prints a link to the terminal
Copy the link printed to the screen to a web browser opened on your home computer (not a web browser within your VNC or fastx display).
via ssh tunneling
To invoke the stand-alone installation of carta through ssh, use the following instructions:
(Optional) Reserve a cluster node in the usual manner
$ ssh username@ssh.aoc.nrao.edu
(for NM)$ ssh username@polaris.cv.nrao.edu
(for CV)$ ssh your_cluster_node
(e.g., ssh nmpost43 or cvpost002)cd to the directory where the data you want to visualize live and (not optional) type the following:
$ carta --no_browser
Get the following line:
[info] CARTA is accessible at http://10.64.10.143:3002/?token=052680c2-b602-4d8d-9ac4-e4cb1120f56b
Pick the port number from the URL above (here: 3002) then, in a new terminal:
$ ssh -N username@ssh.aoc.nrao.edu -L 3002:nmpost043.aoc.nrao.edu:3002
(for NM and example nmpost043)$ ssh -N username@polaris.cv.nrao.edu -L 3002:cvpost002.cv.nrao.edu:3002
(for CV and example cvpost002)Then, in a local browser, replace the URL but not port to ‘localhost’:
$ http://localhost:3002/?token=052680c2-b602-4d8d-9ac4-e4cb1120f56b
NOTE: Using the CARTA stand-alone application version through VNC or fastx without the –no_browser option may not work or will lead to significantly diminished performance (since it bypasses CARTA’s graphics acceleration features). While this is how one would typical use the CASA viewer or DS9, this is not how CARTA is designed to be used.
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/pipeline.ipynb
Pipeline¶
Links to the ALMA and VLA pipeline Documentation Pages
The ALMA and VLA CASA pipelines are documented externally. Please see the follow the links below:
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/simulation.ipynb
Simulations¶
The capability of simulating observations and data sets from VLA, ALMA, and other existing and future observatories is an important use-case for CASA. This not only allows the user to get an idea of the capabilities of these instruments for doing science, but also provides benchmarks for the performance and utility of the software to process “realistic” data sets (with atmospheric and instrumental effects). Simulations can also be used to tune parameters of the data reduction and therefore help to optimize the process. CASA can calculate visibilities (create a MeasurementSet) for any interferometric array, and calculate and apply calibration tables representing some of the most important corrupting effects.
Tasks available for simulating observations are:
simobserve - simulate and create custom synthetic MeasurementSets for an interferometric or total power observation
simanalyze - image and analyze simulated data set, including diagnostic images and plots
simalma - simulate an ALMA observation including multiple configurations of the 12-m interferometric array, the 7-m ACA, and total power measurements by streamlining the capabilities of both simobserve and simanalyze
Inside the Toolkit: The simulator methods are in the simulator tool sm. Many of the other CASA tools are helpful when constructing and analyzing simulations. Following general CASA practice, the greatest flexibility and functionality is available in the Toolkit, and the most commonly used procedures are bundled for convenience into the tasks.
Utility functions: The simutil python class contains numerous utility methods which can be used to facilitate simulations, especially when using the Toolkit.
Simulating interferometric observations using the simobserve and simanalyze tasks proceeds in the following steps:
Make a model image or component list. The model is a representation of the sky brightness distribution that you would like to simulate observing (details on model specification in the simobserve documentation).
Use the simobserve task to create a MeasurementSet (uv data) that would be measured by a telescope observing the specified input model of sky brightness. simobserve can also introduce corruption modeling thermal noise or atmospheric effects. The task can be called multiple times to generate e.g., observations from the same model using different array configurations. simobserve can also simulate total power observations, which can be combined with interferometric data in simanalyze.
Image (grid, invert, and deconvolve) the simulated observation(s) with the simanalyze task. simanalyze can also compare the simulated image with your input (convolved with the output clean beam) and then calculate a “fidelity image” that indicates how well the simulated output matches the convolved input image. Alternately, you can create an image yourself with the tclean task, and then use simanalyze to compare that to the sky model input.
ALMA users, especially those less familiar with CASA, are encouraged to use the simalma task, which provides additional information on the multiple simobserve and simanalyze calls required to simulate an ALMA observation which may consist of 12m interferometric, 7m interferometric, and 12m total power data, and attempts to provide useful feedback on those different observation components, to help the user better understand the observing considerations.
Note that simobserve and simalma currently only handle linear polarization. For implementing full Stokes behavior, please see these examples.
More simulation examples can be found in the CASA guides http://casaguides.nrao.edu, under “Simulating Observations in CASA”, and in the Notebook examples of Simulation in CASA in CASA Docs. It is possible to run the steps independently and optionally, as long as you follow the simobserve and simanalyze conventions about filenames.
Tip: A list of antenna configuration files known to CASA is linked from the simulation CASA guides. On Unix, Linux, and Mac computers, you can usually also find this list yourself by typing, for instance, “locate alma.cycle4.1.cfg” and looking at other files in that directory.
ALMA simulations¶
The task simalma simulates an ALMA observation by ALMA 12-m, ACA-7m and total power arrays. It takes an input model image or a list of components, plus configurations of ALMA antennas (locations and sizes), and simulates a particular ALMA observation (specified by mosaic setup and observing cycles and times). The outputs are MeasurementSets. The task optionally generates images from the MeasurementSets.
Technically speaking, simalma internally calls simobserve and simanalyze as many times as necessary to simulate and analyze an ALMA observation. Some of the simulation and imaging parameters are automatically set to values typical of ALMA observations. Thus, it has a simpler task interface compared to simobserve plus simanalyze at the cost of limited flexibility. If the user wants to have more control of simulation setup, it is available by manually running simobserve and simanalyze multiple times or by using the simulator (sm) tool.
WARNING: The task simalma is designed to only be invoked once for a simulation setup. It always sets up a skymodel and pointings. That means that simalma is not supposed to be run multiple times for a project, unlike simobserve and simanalyze. The task simalma may ignore or overwrite the old results when it is run more than once with the same project name.
There are options in simalma to simulate observation of ACA 7-m and total power arrays, to apply thermal noise, and/or to generate images from simulated MeasurementSets. One inputs a vector of configurations, and a corresponding vector of total times to observe each component. Thermal noise is added to visibilities when pwv > 0 . The ATM atmospheric model is constructed from the characteristics of the ALMA site and a user defined Precipitable Water Vapour (pwv) value. Set pwv = 0 to omit the thermal noise. Finally, when image = True, synthesized images are generated from the simulated MeasurementSets.
Antenna Configuration
The configurations of the ALMA 12-m and 7-m arrays are defined by the antennalist parameter, which can be a vector. Each element of the vector can be either the name of an antenna configuration file or a desired resolution, e.g., ‘alma;cycle1;5arcsec’. Some examples:
antennalist = [‘alma.cycle2.5.cfg’,’aca.cycle2.i.cfg’]; totaltime = [‘20min’,’2h’]’: Will observe the 12-m array in configuration C32-5 for 20 minutes and the ACA 7-m array for 2 hours.
antennalist = [‘alma;cycle2;0.5arcsec’,’aca.i.cfg’]; totaltime = [‘20min’,’2h’]’: Will observe the 12-m array in whatever Cycle 2 configuration yields a zenith synthesized beam as close as possible to 0.5 arcsec (at the center frequency of your skymodel) for 20 minutes and the ACA 7-m array for 2 hours.
antennalist = [‘alma.cycle1.2.cfg’,’aca.cycle2.i.cfg’]; totaltime = ‘20min’: Will observe the 12-m array in Cycle 1 configuration 2 for 20 minutes and the ACA 7-m array for the default of 2×(12-m time) = 1h20min. This parameter setting will also generate a warning that the user is combining configurations from different ALMA Cycles (but the simulation will run despite that).
Total power can either be included along with interferometric configurations e.g., antennalist = [‘alma.cycle1.2.cfg’,’aca.cycle2.i.cfg’,’aca.tp.cfg’], or by using the tpnant and tptime parameters. The latter is preferred since it allows greater control (in particular the number of total power antennas to use – if more than one is used, multiple total power observations will be generated and combined in imaging).
Field Setup
There are two ways to setup pointings, i.e., Rectangle Setup and Multi-Pointing.In the Rectangle Setups, pointings are automatically calculated from the pointing center (direction) and the map size. A rectangular map region is covered by a hexagonal grid (maptype = ‘alma’) with Nyquist sampling, i.e., 0.48 primary beam (PB) spacing (where PB ≡ 1.2 λ / D), in both ALMA 12-m and ACA 7-m array simulations. A slightly larger area is mapped in ACA total power simulations for later combination with interferometer visibilities. The map area is extended by 1 PB in each direction and covered by a lattice grid with 0.225 PB spacing.
In Multi-Pointing, a list of pointings is defined in the direction parameter or read from a file (when setpointings = False; note that simobserve can read ALMA OT pointing files in the old and new format but the latter only when they are saved as sexagesimal absolute positions). The ALMA 12-m and ACA 7-m arrays observe the specified directions. The ACA total power simulations map either (1) square regions of 2 PB extent centered at each of the pointings, or (2) a rectangle region that covers all the pointings. Either (1) or (2), whichever can be done with the smaller number of points, is selected. The pointing spacing in total power simulations is, again, 0.225 PB in lattice grids.
It is advisable that for Total Power Simulations, the field is chosen sufficiently large, maybe padding at least 1-2 primary beams on each side.
Integration time
The total observation time of each component or configuration is defined by the totaltime parameter as noted above. A scalar will trigger use of the Cycle 2 default time multipliers, 1:0.5:2:4 for the first 12-m configuration, any additional 12-m configurations, any 7-m configuration, and any total power observation.
In general, the integration time (dump interval) of simulations is defined by the integration parameter with an exception. Since the ACA total power array always observes larger areas compared to the ALMA 12-m and ACA 7-m arrays, it is possible that the ACA total power array cannot cover all pointings in the given observation time. In such a case, the integration time in the total power simulation is scaled so that the all pointings are observed at least once in its observation time, i.e., integration_TP = tptime / (the number of total power pointings).
Imaging and combination of ALMA with ACA
The CLEAN algorithm is used in simalma to generate images from visibilities. The visibilities are weighted to UV-plane using Briggs weighting.When ACA observations are simulated, visibilities of ACA 7-m are weighted by the relative sensitivities to ALMA 12-m visibilities, and both data sets are concatenated before imaging. The relative weight of ACA 7-m visibilities is defined in proportion to the ratio of beam areas squared, i.e., \((7/12)^{4} = 0.11\). This is because simalma uses a bandwidth and an integration time common to both ALMA 12-m and ACA 7-m simulations.
The interferometer and total power images are combined using feather task when total power observations are included. The total power image is scaled by the interferometer primary beam coverage before combination. The final image product is the combined image corrected for the interferometer primary beam coverage. The output image of the feather task is divided by the interferometer primary beam coverage in the final step.
simutil¶
simutil contains numerous utility methods which can assist users in generic ephemeris and geodesy calculations to aid in performing simulations and other activities in CASA, as well as some methods used internally by simobserve and simanalyze. Several of these tasks directly call the simulator tool in an attempt to lessen the amount of scripting required and to make it easier for the user. It is used by import and instantiation, similarly to testhelper and recipes:
from casatasks.private import simutil
mysu = simutil.simutil()
help(mysu.readantenna)
Antenna configuration files are important for several tasks in simutil and other simulator tools. Below is an example of a properly formatted configuration file.
#observatory=ALMA
#COFA=-67.75,-23.02
#coordsys=LOC (local tangent plane)
# uid___A002_Xdb6217_X55ec_target.ms
# x y z diam station ant
-5.850273514 -125.9985379 -1.590364043 12. A058 DA41
-19.90369337 52.82680653 -1.892119601 12. A023 DA42
13.45860758 -5.790196849 -2.087805181 12. A035 DA43
5.606192499 7.646657746 -2.087775605 12. A001 DA44
24.10057423 -25.95933768 -2.08466565 12. A036 DA45
The observatory, COFA (center of array), coordsys (coordinate system), x, y, z, diam (diameter) and name will be interpreted as header keys or value pairs if they contain “=” and begin with #, and as comments otherwise. Other possible header keys are: zone, datum, or hemisphere. If no sixth column is provided, antenna names will default to station names. If no fifth column is provided, station names will default to A0x where x is the zero-indexed row number. To find the observatory name, one can check the known observatories list by using the measures tool command me.obslist. If an unknown observatory is specified, then one either must use absolute positions (coordsys, XYZ [Cartesian coordinates], UTM [Universal Transverse Mercator]), or specify COFA (longitude and latitude). coordsys can be XYZ (Earth-centered), UTM (easting, northing, and altitude), or LOC (xoffset, yoffset, and height). Files for many observatories can be found in the directory returned by the following command:
casatools.casadata.datapath+"/alma/simmos/"
Tsys and Noise¶
simutil.noisetemp
Noise temperature and efficiencies can be calculated for several telescopes: ALMA, ACA, EVLA, VLA, and SMA. The inputs for simutil.noisetemp method include: telescope, e.g., “ALMA”, freq (observing frequency) as a quantity string, e.g., “300GHz”, diam (optional - knows diameters for arrays above), e.g., “12m”, epsilon = RMS surface accuracy in microns (also optional - this method contains the engineering specification values for each telescope). The outputs produced \(\eta_p\) phase efficiency (from Ruze formula), \(\eta_s\) spill (main beam) efficiency, \(\eta_b\) geometrical blockage efficiency, \(\eta_t\) taper efficiency, \(\eta_q\) correlator efficiency including quantization, \(t_{rx}\) receiver temperature. Where the total antenna efficiency can be calculated from these outputs as such: \(\epsilon = \eta_p * \eta_s * \eta_b * \eta_t\).
NOTE: VLA correlator efficiency includes waveguide loss. EVLA correlator efficiency is probably optimistic at 0.88.
simutil.sensitivity
This method is used to calculate the noise in an observation by adding noise to visibilities in exactly the same way as sm.corrupt (if doimnoise=True) and also creates a simulated image from which to measure noise. The inputs to calculate sensitivity are: freq, bandwidth = channel width, e.g., “1GHz”, etime = exposure time / length of track, e.g., “500sec”, integration = scan time, e.g., “10sec”, elevation, e.g., “80deg”; either an antennalist (a simobserve-format antenna configuration filename) must be given or the parameters telescope, diam, and nant (number of antennas) must be set. Other optional inputs include: pwv in mm, doimnoise uses the simulator task sm.corrupt to create an MS and image it to measure the noise, integration or integration time (units required) e.g., “10s”, debug, method which is equivalent to the mode parameter in the simulator task sm.setnoise (options: “tsys-atm” (default) or “tsys-manual”), tau0 or the zenith atmospheric opacity (must be set if method=”tsys-manual”), and t_sky (default=200 (K) when method=”tsys-manual”).
Geodesy and Antenna Positions¶
NOTE: For more information on geodesy and pointing and other helper functions that are useful and available at https://www.ngs.noaa.gov/TOOLS/program_descriptions.html.
The ITRF frame mentioned in several of the following tasks is not the official ITRF (International Terrestrial Reference Frame), just a right-handed Cartesian system with X going through 0 latitude and 0 longitude, and Z going through the north pole.
simutil.readantenna
simutil.readantenna is a helper function to read antenna configuration files, using the antab parameter as an input. Outputs will be: earth-centered x,y,z, diameter, name, observatory_name, observatory_measure_dictionary.
NOTE: The observatory_measure_dictionary output was added between CASA 4.7 and 5.0.
simutil.baselineLengths
When given an antenna configfile, this method will return the zenith baseline lengths.
simutil.approxBeam
When given an antenna configfile and freq (in GHz), this method will return the approximate beam size at zenith from the 90th percentile baseline length.
simutil.long2xyz
This method returns the nominal ITRF (X, Y, Z) coordinates [m] for a point at geodetic latitude (parameter lat) and longitude (parameter lon) [radians] and elevation [m].
simutil.xyz2long
When given ITRF Earth-centered (X, Y, Z, using the parameters x, y, and z) coordinates [m] for a point, this method returns geodetic latitude and longitude [radians] and elevation [m]. Elevation is measured relative to the closest point to the (latitude, longitude) on the WGS84 (World Geodetic System 1984) reference ellipsoid.
simutil.locxyz2itrf
This method returns the nominal ITRF (X, Y, Z) coordinates [m] for a point at “local” (x, y, z, using the parameters locx, locy, and locz) [m] measured at geodetic latitude (lat) and longitude (longitude) [degrees] and altitude (alt) of the reference point. The “local” (x, y, z) are measured relative to the closest point on the WGS84 reference ellipsoid, with z normal to the ellipsoid and y pointing north.
simtuil.itrf2loc
Given Earth-centered ITRF (X, Y, Z, using the parameters x, y, and z) coordinates [m] and the Earth-centered coords of the center of array (using the parameters cx, cy, and cz), this method returns local (x, y, z) [m] relative to the center of the array, oriented with x and y tangent to the closest point at the COFA (latitude, longitude) on the WGS84 reference ellipsoid, with z normal to the ellipsoid and y pointing north.
simutil.itrf2locname
Given Earth-centered ITRF (X, Y, Z) coordinates [m] and the name of an known array using the obsname parameter (see me.obslist), the method simutil.itrf2locname returns local (x, y, z) [m] relative to the center of the array, oriented with x and y tangent to the closest point at the COFA (latitude, longitude) on the WGS84 reference ellipsoid, with z normal to the ellipsoid and y pointing north.
simutil.utm2xyz
This method returns the nominal ITRF (X, Y, Z) coordinates [m] for a point at UTM easting, northing, elevation [m], and zone of a given datum (e.g., ‘WGS84’) and north/south flag nors (“N” or “S”, denotes northern or southern hemisphere). The ITRF frame used is not the official ITRF, just a right-handed Cartesian system with X going through 0 latitude and 0 longitude, and Z going through the north pole.
simutil.utm2long
The method simutil.utm2long converts UTM coordinates to GPS longitude and latitude (in radians). This task has the following parameters: east, north, zone, datum, and nors.
Pointing and Directions¶
simutil.calc_pointings2
This method is used to calculate mosaic pointings to cover a region. This returns a hexagonally packed list of pointings determined by the size (either [size[0],size[1]] or [size,size] if a single value is given) parameter separated by parameter spacing and fitting inside an area specified by direction and maptype. If multiple pointings can not be fit to the given parameters, a single pointing will be returned. ]If direction is a list, the task simply returns the direction and the number of pointings in it. The 3 options for maptype are: “HEX”agonal (default), “SQU”are, and “ALM”A (triangular tiling). The hexagonal packing starts with a horizontal row centered on direction, and the other rows alternate being horizontally offset by a half spacing. For hexagonal or square maptypes, the relmargin (default=0.5) parameter affects the number of pointings returned in the mosaic pattern. For triangular maptypes, the beam parameter is used to determine the number of pointings returned in the mosaic pattern, although this parameter is optional.
simutil.read_pointings
This method will read a pointing list from a file using the parameter filename. The input file (ASCII) should contain at least 3 fields separated by a space which specify positions with epoch, RA and Dec (in degrees / minutes / seconds or hours / minutes / seconds). The optional field and time columns should be a list of decimal numbers which specifies integration time at each position (in units of seconds). The lines which start with ‘#’ are ignored and can be used as comment lines. Example of a file:
#Epoch RA DEC TIME(optional)
J2000 23h59m28.10 -019d52m12.35 10.0
J2000 23h59m32.35 -019d52m12.35 10.0
J2000 23h59m36.61 -019d52m12.35 60.0
simutil.write_pointings
This method will write a list of pointings out to a file (example above), given by the parameter filename. The optional parameter time can be an array of integration times.
simutil.average_direction
This method will return the average of directions (default=None) as a string, and relative offsets.
simutil.median_direction
This method will return the median of directions (default=None) as a string, and relative offsets.
simutil.ephemeris
This method calculates the elevation of a source on a given date, in a given direction, seen from a given telescope. The date should be given in the format YEAR / MO / DY / TI:ME. The time given is referenced with the International Atomic Time, or TAI (from the French name name temps atomique international). Other optional parameters include: usehourangle (boolean parameter which sets or unsets the reference time at transit, essentially centering the plot), ms (uses the information from the OBSERVATION table in the given MeasurementSet and plots the entire range of the observation), and cofa (allows the user to change the center of the array position). The cofa parameter must be set if using an unknown observatory. A list of known observatories can be found by using the measures tool command me.obslist.
Utility¶
simutil.statim
This method will plot an image and calculate its statistics. Optional parameters: plot (default True), incell, disprange (low and high values for pl.imshow), bar (show colorbar, default=True), showstats (show stats on the image, default=True).
simutil.plotants
An alternate antenna configuration plotting routine that takes arrays of x,y=local offset from the array center, z=altitude, d=diameter, and name. This method routine either plots points or, if the array is compact enough to see the diameters, plots to the actual scaled size of the dishes.
simutil.modifymodel
simutil.modifymodel is a method that converts a model image into a 4D-coordinate image that can be used in CASA, with axes in space, stokes, spectral order, which the Toolkit requires (e.g., sm.predict in the simulator tool). The input parameters inimage and outimage allow the user to specify the names of the input and output. Values that are absent in the input, or that the user wishes to override, can be input as quantity strings with the in* parameters (inbright, indirection, incell, incenter, inwidth, innchan). e.g., inbright=”4Jy/pixel” will scale outimage to have 4Jy/pixel peak, incell=”0.2arcsec” will set the cell size in outimage to 0.2arcsec. The flatimage parameter allows one to also generate a flat (2D, integrated intensity) image from inimage, which can be useful for display purposes.
simutil.convimage
Given a (2D) model (modelflat) image, this method will regrid it to the scale of the outflat image, and convolve it to the beam of the outflat image. This is useful to compare a skymodel with a simulated output image. The optional parameter complist allows the user to import a componentlist to add unresolved components to the outflat image. Information on creating a component list can be found in the CASA guides here.
simutil.imtclean
This wrapper function is the method by which the standard CASA imaging task tclean is called for simulated image reconstruction inside the task simanalyze[. It replaces the deprecated method simutil.imtclean. If dryrun=True, this method only creates a template ‘[imagename.config].tclean.last’ file for users to reference in their custom calls to tclean. The cell parameter expects a list of qa.quantity objects. ]Selecting individual fields for imaging is not supported.
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/parallel-processing.ipynb
Parallel Processing¶
WARNING FOR MAC USERS: Parallel processing using mpicasa is not support for Mac OS/OSX.
The parallelization approach adopted in CASA is the so-called embarrassingly parallelization. Embarassingly parallel workload or problem is one where little or no effort is needed to separate the problem into a number of parallel tasks. This is often the case where there is little or no dependency or need for communication between those parallel tasks, or for results between them.
In order to run one analysis on multiple processors, one can parallelize the work by dividing the data into several parts (“partitioning”) and then:
run a CASA instance on each part, or
have non-trivially parallelized algorithms, which make use of several processors within a single CASA instance. Non-trivial parallelization is presently only implemented in a few areas of the CASA codebase, and is based on OpenMP, which is a shared-memory parallel programming library. For example certain sections of the imaging code of CASA are parallelized using OpenMP.
All other parallelization is achieved by partitioning the MeasurementSet (MS) of interest using the task partition or at import time using importasdm. The resulting partitioned MS is called a “Multi-MS” or “MMS”. The parallel processing of a Multi-MS is possible using the Message Passing Interface (MPI). MPI is a standard which addresses primarily the message-passing parallel programming model in a practical, portable, efficient and flexible way.
WARNING: Parallel processing on multi-MSs in CASA is unverified - please use at own discretion.
Logically, an MMS has the same structure as an MS but internally it is a group of several MSs which are virtually concatenated. Virtual concatenation of several MSs or MMSs into an MMS can also be achieved via task virtualconcat.
Due to the virtual concatenation, the main table of an MMS appears like the union of the main tables of all the member MSs such that when the MMS is accessed like a normal MS, processing can proceed sequentially as usual. Each member MS or “Sub-MS” of an MMS, however, is at the same time a valid MS on its own and can be processed as such. This is what happens when the MMS is accessed by a parallelized task. The partitioning of the MMS is recognized and work is started in parallel on the separate Sub-MSs, provided that the user has started CASA with mpicasa.
The internal structure of an MMS can be inspected using task listpartition.
Configuration and Control¶
CASA can be run in parallel on a cluster of computer nodes or on a single multi-core computer. In the multi-node case, the following requirements are necessary for all nodes to be included in the cluster. Users with access to a cluster will not need to do these settings, but it is still useful to be aware of the configuration:
Password-less ssh access from the client (user) machine into all the nodes to be included in the cluster.
NOTE: This is not necessary when using only localhost, i.e. if the cluster is deployed only on the machine where CASA is running.
All the input files must be located in a shared file-system, accessible from all the nodes comprising the cluster, and mounted in the same path of the file-system.
Mirrored CASA installation with regards to the CASA installation in the client (user) machine, so that the following environmental variables are pointing to valid installations: PATH, LD_LIBRARY_PATH, IPYTHONDIR, CASAPATH, CASAARCH, PYTHONHOME, __CASAPY_PYTHONDIR, PGPLOT_DEV, PGPLOT_DIR, PGPLOT_FONT. This is usually achieved by having the CASA installation on a shared file-system.
Configuration and Start-Up
The main library used in CASA (4.4+) to achieve parallelization is the Message Passing Interface (MPI) and in particular the OpenMPI implementation. MPI is already included in the CASA distribution so that users do not need to install it. The CASA distribution comes with a wrapper of the MPI executor, which is called mpicasa. This wrapper does several settings behind the scenes in order to properly configure the environment to run CASA in parallel.
The collection of CASA processes which will run the jobs from parallelized tasks, is set up via mpicasa. The simplest example is to run CASA in parallel on the localhost using the available cores in the machine. A typical example would be to run CASA on a desktop with 16 cores such as the following example:
path_to_casa/mpicasa -n 16 path_to_casa/casa <casa_options>
Where:
mpicasa: Wrapper around mpirun, which can be found in the casa installation directory. Example: /home/user/casa-release-4.5.0-el6/bin
-n : MPI option to get the number of processes to run.
16: The number of cores to be used in the localhost machine.
casa: Full path to the CASA executable, casa.
casa_options: CASA options such as: -c, –nogui, –log2term, etc.
NOTE: MPI uses one process as the MPI Client, which is where the user will see messages printed in the terminal or in the logger. The other processes are used for the parallel work and are called MPI Servers. Because of this, usually we give number_of_processes + 1.
NOTE: when several versions of CASA are available in the PATH, there is the risk that the executable mpicasa and other executables used by CASA, such as casaplotms or asdm2MS, would be picked from one of those different versions instead of the “path_to_casa/casa” version that we want to run. This is typically the case in data reduction clusters where either the default environment or user setup scripts set the PATH to point to the latest release of CASA, for example. In such cases, it is safer to make sure in advance that the first version found in the PATH is the right one, with a command like this (bash), as explained in the CASA distribution README:
export PATH=path_to_casa/bin:$PATH
It is also possible to use other nodes, which can form a “cluster”. Following the requirements given above, replace the “-n” option of mpicasa with a “-hostfile host_file”, as shown below:
mpicasa -hostfile <host_file> path_to_casa/casa <casa_options>
Where:
host_file: It is a text file containing the name of the nodes forming the cluster and the number of cores to use in each one of the nodes.
Example:
orion slots=5
antares slots=4
sirius slots=4
The above configuration file will set up a cluster comprised of three nodes (orion, antares and sirius), deploying the cores per node as follows: At host “orion” up to 5 cores will be deployed (including the MPI Client). If the processing requires more cores, it will take them from “antares” and once all the 4 engines in “antares” are used, it will use up to 4 cores in “sirius”.
To run CASA in interactive mode (without the “-c” option) the user needs to first login to the desired computer node with X11 forwarding. This is achieved with the command ssh -XY <node>, where <node> is the hostname of the computer where he/she wants to run CASA. * *
mpicasa -n <number_of_processes> path_to_casa/casa
This will open an xterm window for the interactive work. To get help do:
mpicasa --help
Parallel Imaging¶
The parallelization of imaging is achieved through task tclean. The parallelization itself is tied closely to the major-minor cycles of the imager and follows a different approach of that used by other tasks. The parallelization inside tclean does not need the MS to be partitionted into a Multi-MS. It will work in the same way if the input is an MS or MMS. But in order to run tclean in parallel it is necessary to launch CASA with mpicasa, in the same way as for other tasks. One extra step necessary to run tclean in parallel is to set the parameter parallel=True. Details of the parallelization are described in the section mentioned above, as well as in the synthesis-imaging chapter on “Imager Parallelization”.
Parallel imaging on an MS file (rather than MMS file) in tclean is an official mode of operations in the ALMA pipeline since Cycle-6, and officially endorsed by CASA as per CASA 5.4. We recommend users interested in parallel processing to use this mode of operation. For large data products, the imaging step often dominates the overall runtime, as well as the advantages that can be achieved with parallelization (see CASA Memo 5). Processing Multi-MS files, either for imaging or calibration, remains at the discretion of the user.
The Multi-MS¶
Parallel processing using Multi-MS (MMS) in CASA is unverified. Please use at own discretion.
Please consider parallel imaging using normal MS as alternative.
Multi-MS Structure¶
A Multi-MS (MMS) is structured to have a reference MS on the top directory and a sub-directory called SUBMSS, which contains each partitioned Sub-MS. A Multi-MS can be handled like a normal “monolithic” MS. It can be moved and renamed like any other directory. CASA tasks that are not MMS-aware can process it like a monolithic MS.
All sub-tables of Sub-MSs are identical, except for the SOURCE and HISTORY sub-tables. The reference MS contains links to the sub-tables of the first Sub-MS, which is identified by a “0000” index on its name. All subsequent Sub-MSs also contain links to the sub-tables of the first Sub-MS, except for the SOURCE and HISTORY sub-tables. The following is a typical view of the reference MS directory of a Multi-MS. Symbolic links have an **@** at the end.
> ls uid___A002_X30a93d_X43e.ms/
ANTENNA@ CALDEVICE@ FIELD@ OBSERVATION@ PROCESSOR@ STATE@ SYSPOWER@ WEATHER@
ASDM_ANTENNA@ DATA_DESCRIPTION@ FLAG_CMD@ POINTING@ SOURCE@ SUBMSS/ table.dat
ASDM_CALWVR@ FEED@ HISTORY@ POLARIZATION@ SPECTRAL_WINDOW@ SYSCAL@ table.info
The following is a view of the Sub-MSs directory. The Sub-MS names have the MMS name followed by a 4-digit index.
> ls uid___A002_X30a93d_X43e.ms/SUBMSS/
uid___A002_X30a93d_X43e.ms.0000.ms/ uid___A002_X30a93d_X43e.ms.0002.ms/
uid___A002_X30a93d_X43e.ms.0001.ms/ uid___A002_X30a93d_X43e.ms.0003.ms/
Next example shows the second Sub-MS, which has symbolic links to all sub-tables except the SOURCE and HISTORY tables. These two tables need write-access in several cases when running in parallel.
> ls --l uid___A002_X30a93d_X43e.ms/SUBMSS/uid___A002_X30a93d_X43e.ms.0001.ms/
ANTENNA -> ../uid___A002_X30a93d_X43e.ms.0000.ms/ANTENNA/
ASDM_ANTENNA -> ../uid___A002_X30a93d_X43e.ms.0000.ms/ASDM_ANTENNA/
ASDM_CALWVR -> ../uid___A002_X30a93d_X43e.ms.0000.ms/ASDM_CALWVR/
CALDEVICE -> ../uid___A002_X30a93d_X43e.ms.0000.ms/CALDEVICE/
DATA_DESCRIPTION -> ../uid___A002_X30a93d_X43e.ms.0000.ms/DATA_DESCRIPTION/
FEED -> ../uid___A002_X30a93d_X43e.ms.0000.ms/FEED/
FIELD -> ../uid___A002_X30a93d_X43e.ms.0000.ms/FIELD/
FLAG_CMD -> ../uid___A002_X30a93d_X43e.ms.0000.ms/FLAG_CMD/
HISTORY/
OBSERVATION -> ../uid___A002_X30a93d_X43e.ms.0000.ms/OBSERVATION/
POINTING -> ../uid___A002_X30a93d_X43e.ms.0000.ms/POINTING/
POLARIZATION -> ../uid___A002_X30a93d_X43e.ms.0000.ms/POLARIZATION/
PROCESSOR -> ../uid___A002_X30a93d_X43e.ms.0000.ms/PROCESSOR/
SOURCE/
SPECTRAL_WINDOW -> ../uid___A002_X30a93d_X43e.ms.0000.ms/SPECTRAL_WINDOW/
STATE -> ../uid___A002_X30a93d_X43e.ms.0000.ms/STATE/
SYSCAL -> ../uid___A002_X30a93d_X43e.ms.0000.ms/SYSCAL/
SYSPOWER -> ../uid___A002_X30a93d_X43e.ms.0000.ms/SYSPOWER/
table.dat
table.f1
table.f10
.....
WEATHER -> ../uid___A002_X30a93d_X43e.ms.0000.ms/WEATHER/
Multi-MS Creation¶
partition
The partition task is the main task to create a “Multi-MS”. It takes an input MeasurementSet and creates an output “Multi-MS” based on the data selection parameters.
The inputs to partition are:
CASA <1>: inp partition
--------> inp(partition)
#partition :: Task to produce Multi-MSs using parallelism
vis = '' #Name of input MeasurementSet
outputvis = '' #Name of output MeasurementSet
createmms = True #Should this create a multi-MS output
separationaxis = 'auto' #Axis to do parallelization across(scan, spw, baseline, auto)
numsubms = 'auto' #The number of SubMSs to create (auto or any number)
flagbackup = True #Create a backup of the FLAG column in the MMS.
datacolumn = 'all' #Which data column(s) to process.
field = '' #Select field using ID(s) or name(s).
spw = '' #Select spectral window/channels.
scan = '' #Select data by scan numbers.
antenna = '' #Select data based on antenna/baseline.
correlation = '' #Correlation: '' ==> all, correlation='XX,YY'.
timerange = '' #Select data by time range.
intent = '' #Select data by scan intent.
array = '' #Select (sub)array(s) by array ID number.
uvrange = '' #Select data by baseline length.
observation = '' #Select by observation ID(s).
feed = '' #Multi-feed numbers: Not yet implemented.
The keyword createmms is by default set to True to create an output MMS. It contains three sub-parameters, separationaxis, numsubms and flagbackup. Partition accepts four axes to do separation across: ‘auto’, ‘scan’ ‘spw’ or ‘baseline’. The default separationaxis=’auto’ will first separate the MS in spws, then in scans. It tries to balance the spw and scan content in each Sub-MS also taking into account the available fields.
The baseline axis is mostly useful for Single-Dish data. This axis will partition the MS based on the available baselines. If the user wants only auto-correlations, she/he should use the antenna selection syntax such as antenna=’*&&&’ together with the baseline separation axis. Note that if numsubms=’auto’, the task will try to create as many Sub-MSs as the number of available parallel cores used when starting CASA with mpicasa. If the user wants to have one Sub-MS for each baseline, he/she should set the numsubms parameter to a number higher than the number of baselines to achieve this.
The user may force the number of “Sub-MSs” in the output MMS by setting the sub-parameter numsubms. The default ‘auto’ is to create as many Sub-MSs as the number of engines used when starting CASA with mpicasa, in an optimized way.
The flagbackup sub-parameter will create a backup of the FLAG column and save it to the .flagversions file.
importasdm
Task partition has been embedded in task importasdm so that at import time the user can already create a MMS. Set the parameter createmms to True and the output of importasdm will be a MMS created with default parameters. Sub-parameters separationaxis and numsubms are also available in importasdm. From this point on in the data reduction chain, tasks that have been parallelized will run automatically in parallel when they see an MMS and tasks that are not parallelized will work in the same way as they normally do on a MS.
Parallel Calibration¶
Parallel processing using Multi-MS (MMS) in CASA is unverified - please use at own discretion.
Please consider parallel imagingusing normal MS as alternative.
Some of the calibration tasks are internally parallelized and will run in parallel if the input MS is a Multi-MS. Other tasks are not and will work normally in the presence of an input MMS. A typical calibration cascade will work normally in parallel when it sees an input MMS. In order to do that, the first step is to set createmms=True inside importasdm to create a Multi-MS. Once that is done, the calibration steps will distribute the processing in parallel if CASA is started with mpicasa, or in serial otherwise.
Contrary to the MS, the calibration tables created by calibration tasks are not partitioned. For instance, when gaincal is run on a Multi-MS, it will create the same output gaincal table as if the input was a normal MS.
The following calibration tasks are internally parallelised and will work on each Sub-MS in parallel.
flagdata
setjy
applycal
hanningsmooth
cvel2
uvcontsub
mstransform
split
Special considerations when running some tasks in parallel
uvcontsub
When the input is a Multi-MS and CASA is started in parallel using mpicasa, uvcontsub will try to process each Sub-MS in parallel. Depending on the parameters of uvcontsub and the separation axis of the partitioned Multi-MS, processing the input in parallel is not possible. This will happen for example when the input MMS is separated using the default axis ‘auto’. The ‘auto’ axis will partition the MMS by the scan and spw axes, in a way to balance the content on each Sub-MS.
If uvcontsub is called with combine=’spw’, the task will expect to find all selected spws in each Sub-MS, as each parallel engine will process a Sub-MS independently of the others. In such cases, task uvcontsub will issue some warnings that the process cannot be continued in parallel. The task will internally handle such cases and will continue to process the input in serial, as if the Multi-MS was a normal monolithic MS.
The following steps can be informed in order to find out what is the partition axis of the MMS and what is the content of each Sub-MS. First, use task listpartition to obtain information on the MMS.
CASA <2>: listpartition('combspw.mms')
INFO listpartition::::@almahpc05:MPIClient
INFO listpartition::::@almahpc05:MPIClient+ ##########################################
INFO listpartition::::@almahpc05:MPIClient+ ##### Begin Task: listpartition #####
INFO listpartition::::@almahpc05:MPIClient listpartition(vis="combspw.ms",createdict=False,listfile="")
INFO listpartition::::@almahpc05:MPIClient This is a Multi-MS with separation axis = scan,spw
INFO listpartition::::@almahpc05:MPIClient Sub-MS Scan Spw Nchan Nrows Size
INFO listpartition::::@almahpc05:MPIClient+combspw.ms.0000.ms 1 [ 1 5 6 9 12 16] [128 128 128 128 128 128] 252 4.9M
INFO listpartition::::@almahpc05:MPIClient 2 [ 0 3 13 17 18 21] [128 128 128 128 128 128] 378
INFO listpartition::::@almahpc05:MPIClient combspw.ms.0001.ms 1 [ 0 4 8 13 17 21] [128 128 128 128 128 128] 252 4.5M
INFO listpartition::::@almahpc05:MPIClient 2 [ 2 6 7 10 14 22] [128 128 128 128 128 128] 378
INFO listpartition::::@almahpc05:MPIClient combspw.ms.0002.ms 1 [ 3 7 10 14 20 22] [128 128 128 128 128 128] 252 4.5M
INFO listpartition::::@almahpc05:MPIClient 2 [ 5 11 12 15 19 23] [128 128 128 128 128 128] 378
INFO listpartition::::@almahpc05:MPIClient combspw.ms.0003.ms 1 [ 2 11 15 18 19 23] [128 128 128 128 128 128] 252 4.5M
INFO listpartition::::@almahpc05:MPIClient 2 [ 1 4 8 9 16 20] [128 128 128 128 128 128] 378
INFO listpartition::::@almahpc05:MPIClient ##### End Task: listpartition #####
INFO listpartition::::@almahpc05:MPIClient+ ##########################################
In the above example, the MMS was partitioned using the default axis ‘auto’ (scan,spw). One can see the Sub-MSs do not contain all spws, therefore depending on the selection used in the task, it will not be possible to proceed in parallel. See the following example for the warnings given by the task in this case.
CASA <8>: uvcontsub(vis="combspw.mms",fitspw="1~10:5~122,15~22:5~122",excludechans=False,combine="spw",fitorder=0,spw="6~14",want_cont=False)
2018-02-06 15:45:09 INFO uvcontsub::::@almahpc05:MPIClient
2018-02-06 15:45:09 INFO uvcontsub::::@almahpc05:MPIClient+ ##########################################
2018-02-06 15:45:09 INFO uvcontsub::::@almahpc05:MPIClient+ ##### Begin Task: uvcontsub #####
2018-02-06 15:45:09 INFO uvcontsub::::@almahpc05:MPIClient uvcontsub(vis="combspw.mms",field="",fitspw="1~10:5~122,15~22:5~122",excludechans=False,combine="spw",
2018-02-06 15:45:09 INFO uvcontsub::::@almahpc05:MPIClient+ solint="int",fitorder=0,spw="6~14",want_cont=False)
2018-02-06 15:45:11 WARN uvcontsub::::@almahpc05:MPIClient Cannot run with combine='spw' in parallel because the Sub-MSs do not contain all the selected spws
2018-02-06 15:45:11 WARN uvcontsub::::@almahpc05:MPIClient The Multi-MS will be processed in serial and will create an output MS
2018-02-06 15:45:11 INFO uvcontsub::::@almahpc05:MPIClient split is being run internally, and the selected spws
2018-02-06 15:45:11 INFO uvcontsub::::@almahpc05:MPIClient will be renumbered to start from 0 in the output!
2018-02-06 15:45:11 INFO uvcontsub::::@almahpc05:MPIClient Preparing to add scratch columns.
2018-02-06 15:45:11 INFO uvcontsub::::@almahpc05:MPIClient splitting to /data/users/scastro/work/CAS-10697/combspw.mms.contsubId4wzP with spw="1~5,6~14,15~22"
2018-02-06 15:45:11 INFO SubMS::parseColumnNames() Using DATA column.
A few options are possible at this stage. User can let the process continue in serial, which depending on the size of the MS, can take long, and at the end the continuum subtracted output will be a normal MS. Depending on what the user wants to do next, there is the possibility to recreate the MMS using task partition. If user only wants to run tclean and create an image, having either MS or MMS will work in the same way because tclean can run in parallel regardless whether the input is MS or MMS.
If the users opts to recreate the MMS before running uvcontsub, best recommend axis to do combine=’spw’ is per scan. Partition will have to be called in the following way:
partition(vis='myMS.ms', outputvis='myout.ms', createmms=True separationaxis='scan')
flagdata (with mode=’rflag’)
The Rflag action=’calculate’ can be used to produce the frequency and time thresholds in a first pass which can then be applied in a second pass, using action=’apply’ once or several times. When this is done with the Multi-MS structure the thresholds calculated in the first pass might differ from the thresholds that would be calculated using a single MS structure. This is due to the fact that in the Multi-MS structure the data are partitioned into Sub-MSs. The default is to produce a balanced partition with respect to the SPWs and scans, with the aim to get content from all SPWs and scans into each of the Sub-MSs. For this reason, the statistics calculated by RFlag may differ across Sub-MSs, as they would differ for different data selections. At the moment this issue has not been assessed thoroughly for real-world datasets. A related question that is not understood in detail at the moment, and that can affect both serial and parallel runs of RFlag, is how much the thresholds can differ between the single pass and dual pass modes of RFlag.
Examples parallelization¶
Parallel processing using Multi-MS (MMS) in CASA is unverified - please use at own discretion.
Please consider parallel imagingusing normal MS as alternative.
Examples of running CASA in parallel
The following is a list of typical examples on how to run CASA in parallel. Once CASA is started with mpicasa and the “Multi-MS” is created, there is basically no difference between running CASA in serial and in parallel. You can find an example of a parallelized analysis in the alma-m100-analysis-hpc-regression.py script located in a sub-directory of your CASA distribution. For example, if CASA is untarred in /home/user/casa-release-5.0.0-el6, the alma-m100 script can be found in /home/user/casa-release-5.0.0-el6/lib/python2.7/regressions/
alma-m100-analysis-hpc-regression.py
Example 1. Run the above regression script in parallel, using 8 cores in parallel and 1 core as the MPI Client.
mpicasa -n 9 <path_to_casa>/casa --nogui --log2term -c alma-m100-analysis-hpc-regression.py
Example 2. Start CASA as described before for an interactive session, using 5 cores on the local machine.
mpicasa -n 5 <path_to_casa>/casa <casa-options>
An xterm will be open showing in the tile bar rank0. Rank 0 is where the MPIClient runs. The other 4 cores have been opened and are idle waiting for any activity to be sent to them.
Run importasdm to create a “Multi-MS” and save the online flags to a file. The output will be automatically named uid__A002_X888a.ms, which is an MMS partitioned across spw and scan. The online flags are saved in the file uid__A002_X888a_cmd.txt.
CASA <2>: importasdm('uid__A002_X888a', createmms=True, savecmds=True)
List the contents of the MMS using listobs. In order to see how the MMS is partitioned, use listpartition.
CASA <3>: listobs('uid__A002_X888a.ms', listfile='uid__A002_X888a.listobs')
CASA <4>: listpartition('uid__A002_X888a.ms')
Apply the online flags produced by importasdm, using flagdata in list mode. flagdata is parallelized therefore each engine will work on a separated “Sub-MS” to apply the flags from the uid__A002_X888a_cmd.txt file. You will see messages in the terminal (also saved in the casa-###.log file), containing the strings MPIServer-1, MPIServer-2, etc., for all the cores that process in parallel.
CASA <5>: flagdata('uid__A002_X888a.ms', mode='list' inpfile='uid__A002_X888a_cmd.txt')
Flag auto-correlations and the high Tsys antenna also using list mode for optimization.
CASA <6>: flagdata('uid__A002_X888a.ms', mode='list',
inpfile=["autocorr=True","antenna='DA62'"])
Create all calibration tables in the same way as for a normal MS. Task gaincal is not parallelized, therefore it will work on the MMS as if it was a normal MS.
CASA <7>: gaincal('uid__A002_X888a.ms', caltable='cal-delay_uid__A002_X888a.K',
field='*Phase*',spw='1,3,5,7', solint='inf',combine='scan',
refant=therefant, gaintable='cal-antpos_uid__A002_X888a',
gaintype='K'))
Apply all the calibrations to the MMS. applycal will work in parallel on each “Sub-MS” using the available cores.
CASA <8>: applycal(vis='uid__A002_X888a.ms', field='0', spw='9,11,13,15',
gaintable=['uid__A002_X888a.tsys',
'uid__A002_X888a.wvr.smooth',
'uid__A002_X888a.antpos'],
gainfield=['0', '', ''], interp='linear,linear',
spwmap=[tsysmap,[],[]], calwt=True, flagbackup=False)
Split out science spectral windows. Task split is also parallelized, therefore it will recognize that the input is an MMS and will process it in parallel, creating also an output MMS.
CASA <9>: split(vis='uid__A002_X888a.ms', outputvis='uid__A002_X888a.ms.split',
datacolumn='corrected', spw='9,11,13,15', keepflags=True)
Run tclean normally to create your images.
Advanced: Interface Framework¶
The mpi4casa parallelization framework and advanced CASA parallel processing
The CASA parallelization framework, mpi4casa was developed as a layer on top of MPI using a client-server model. The Client is the master process, driving user interaction, and dispatching user commands to the servers. Servers are all the other processes, running in the background, waiting for commands sent from the client side.
One use-case of mpi4casa is to run CASA in parallel on a Multi-MS, as explained in previous chapters. There are other ways to process the data in parallel using mpi4casa without the need to create a Multi-MS. For instance, advanced users can benefit from the mpi4casa implementation to run multiple task commands in different cores or nodes.
Initialization
Start CASA in parallel as explained in previous chapters, using mpicasa.
Import MPICommandClient from mpi4casa module
from mpi4casa.MPICommandClient import MPICommandClient
Create an instance of MPICommandClient
client = MPICommandClient()
Set logging policy
client.set_log_mode('redirect')
Initialize command handling services
client.start_services()
Syntax to send a command request
ret = client.push_command_request(command,block,target_server,parameters)
command: String containing the Python/CASA command to be executed. The command parameters can be included within the command in itself also as strings.
block: Boolean to control whether command request is executed in blocking mode (True) or in non-blocking mode (False). Default is False (non-blocking).
target_server: List of integers corresponding to the server IDs to handle the command
target_server=None: The command will be executed by the first available server
target_server=2: The command will be executed by the server n #2 as soon as it is available
target_server=[0,1]: The command will be executed by the servers n #2 and #3
parameters (Optional): Alternatively the command parameters can be specified in a separated dictionary using their native types instead of strings.
ret (Return Variable):
In non-blocking mode: It will not block and will return an Integer (command ID) to retrieve the command response at a later stage.
In blocking mode: It will block until the list of dictionaries, containing the command response is received.
Syntax to receive a command result
ret = client.get_command_response(command_request_id_list,block)
command_request_id_list: List of Ids (integers) corresponding to the commands whose result is to be retrieved.
block: Boolean to control whether command request is executed in blocking mode (True) or in non-blocking mode (False).
ret (Return Variable): List of dictionaries, containing the response parameters. The dictionary elements are as follows:
successful (Boolean): indicates whether command execution was successful or failed
traceback (String): In case of failure contains the traceback of the exception thrown
ret: Contains the result of the command in case of successful execution
Example 1:
Run wvrgcal in 2 different MeasurementSets (for instance each one corresponding to an Execution Block):
#Example of full command including parameters
cmd1 = "wvrgcal(vis='X54.ms',caltable='cal-wvr_X54',spw=[1,3,5,7])"
cmdId1 = client.push_command_request(cmd1,block=False)
#Example of command with separated parameter list
cmd2 = "wvrgcal()"
params2={'vis':'X54.ms','caltable':'cal-wvr_X54','spw':[1,3,5,7]}
cmdId2 = client.push_command_request(cmd2,block=False,parameters=params2)
#Retrieve results
resultList = client.get_command_response([cmdId1, cmdId2],block=True)
Note: target_server is not specified because these are monolithic state-less commands, therefore any server can process them.
Example 2:Use the CASA ms tool to get the data from 2 EBs and apply a custom median filter:
#Open MSs
client.push_command_request("tb.open('x54.ms')",target_server=1)
client.push_command_request("tb.open('x220.ms')",target_server=2)
#Apply median filter
client.push_command_request("data=ms.getcell('DATA',1)",target_server=[1,2])
client.push_command_request("from scipy import signal",target_server=[1,2])
client.push_command_request("filt_data=signal.medfilt(data)",target_server=[1,2])
#Put filter data back in the MSs
client.push_command_request("tb.putcell('DATA',1,filt_data)",target_server=[1,2])
#Close MSs
client.push_command_request("tb.close(),target_server=[1,2],block=True)
NOTE: target_server is specified as each command depends on the state generated by previous ones; block will block only on the last commands as all the others will be executed using a FIFO queue, meaning the commands will be received in the same order they were sent.
Link to first version of the CASA framework development document
Advanced: Multiprocessing and Multithreading¶
This section explains technical details aimed at advanced users and system administrators who are interested in knowing more about different forms of parallelization in CASA, customizing processes and threads in CASA sessions, and/or optimizing performance. Most users would not normally need to be aware of or modify these settings.
The parallelization approach described in this chapter, embarrassingly parallel, is based on multiprocessing. Certain areas of CASA also use a different form of parallelization, multithreading (via OpenMP). By default, mpicasa disables multithreading. This is to prevent competition for CPU cores between OpenMP threads and MPI processes, given that OpenMP would spawn as many threads as cores are available. In other words, if CASA is started with mpicasa -n 8 ...
, 8
processes are spawned but no additional threads are spawned.
The mechanism used to disable multithreading is the OMP_NUM_THREADS environment variable. When CASA is run under mpicasa, the mpicasa wrapper sets the environment variable OMP_NUM_THREADS to 1 before starting CASA, unless it has previously been set by the user in which case it is not modified. This effectively disables OpenMP multithreading unless the user has set explicitly a number of threads in OMP_NUM_THREADS.
To use hybrid parallel approaches, with X processes and Y threads per process, the user can set OMP_NUM_THREADS to Y and then start CASA using mpicasa -n X...
. Hybrid setups are not validated and not particularly recommended. For more details on this and other related environment variables that control OpenMP multithreading please refer to the OpenMP documentation. Imager Parallelization explains how
multiprocessing and multithreading are used in imaging.
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/memo-series.ipynb
Memo Series & Knowledgebase¶
CASA Memos¶
CASA Memo Series
CASA Memo 1: MeasurementSet Definition version 2.0 - A.J. Kemball & M.H. Wieringa, v. 01/2000
CASA Memo 2: Convention for UVW calculations in CASA - U. Rau, v. 02/2013
CASA Memo 3: MeasurementSet Selection Syntax - S. Bhatnagar, v. 06/2015
CASA Memo 4: CASA Imager Paralellization: Measurement and Analysis of Runtime Performance for Continuum Imaging - S. Bhatnagar & CASA HPC team, v. 03/2016
CASA Memo 5: CASA Performance on Lustre: serial vs parallel and comparison with AIPS - B. Emonts, v. 06/2018
CASA Memo 6: User Survey and Helpdesk Statistics 2018 - B. Emonts, v. 11/2018
CASA Memo 7: ALMA Mosaicing Imaging Issues Prior to Cycle 6 - North American ALMA Science Center (NAASC) Software Support Team, v. 09/2018. Copy of NAASC Memo 117 (credit: NAASC).
CASA Memo 8: Non-Amnesic CLEAN: A boxless/maskless CLEAN style deconvolution algorithm - K. Golap, v. 12/2015
CASA Memo 9: Improvements to Multi-Scale CLEAN and Multi-Term Multi-Frequency CLEAN - J.-W. Steeb & U. Rau, v. 07/2019
CASA Memo 10: Restoring/Common Beam and Flux Scaling in tclean - J.-W. Steeb & U. Rau, v. 08/2019
CASA Memo 11: Heterogeneous Pointing Corrections in AW-Projection - P. Jagannathan & S. Bhatnagar 10/2019
CASA Memo 12: The Fringefit Task in CASA - D. Small & G. Moellenbrock 08/2022
CASAcore Memos¶
CASA’s underlying structure is based on casacore. The CASAcore Notes describe some of the most fundamental properties, such as on the table system, data selection syntax, and data model definitions.
CASA Knowledgebase¶
The CASA Knowledgebase pages provide bits of wisdom on CASA that should be preserved, and that may be of use to expert users.
Correcting bad common beam¶
This script re-calculates the common beam by discarding certain channels where the point-spread-function (PSF) or ‘beam’ deviates substantially from those of the other channels.
It can happen that an MS contains one or more channels for which the point-spread-function (PSF) or ‘beam’ deviates a lot from those of the other channels. This can be the result of substantial flagging, missing data, or other factors. When making a data cube with restoringbeam=’common’, these “outlier” channels can create a common beam that does not reflect the beam for the bulk of the channels.
This example_robust_common_beam.py script will correct the common beam by detecting and/or flagging outliers channels in the calculation of the common beam. The outlier channels are identified as those channels for which the area of those beams deviate from the median beam by a user-specified factor in computation of the median area beam. The script will do the following:
Run tclean with niter=0
Detect/flag outliers in chan beams
Use the remaining beams with ia.commonbeam() to get a new commonbeam
Specify this explicitly to tclean in the subsequent niter>0 run.
The attach script primarily demonstrates the solution of iter0 -> calc_good_beam -> specify_restoringbeam_in_iter1 along with tclean. If the new commonbeam is not larger than all the bad beams, then the iter0 tclean run’s restoration step will throw warnings for all channels that cannot be convolved to the new common beam.
The functionality provided by this script is not yet implemented in the commonbeam method of the image analysis tool (ia.commonbeam).
Please note that the script is based on an ALMA test-data that was used for characterizing the problem in pipeline processing at the NAASC (JIRA ticket PIPE-375). Parameters have to be adjusted for each use case, including heuristics to detect outlier channels and what to substitute the ‘bad’ beams with.
ALMA polarization: XY-phase solver avoids +-45 deg solution¶
The XY-phase calibration obtained with CASA task gaical (gaintype=’XYf+QU’) avoids to output +/- 45 degree solutions. This is a subtle consequence of the algorithm used to solve for the cross-hand phase, and should be harmless for the calibration of ALMA full-polarization data.
When using the task gaincal with parameter gaintype=’XYf+QU’, the solution to solve is a fit for a slope in 2D data (imag vs. real) from data that has noise is both dimensions. In such cases, it is always better to fit a shallow (not steep) slope, so if the slope (for imag vs. real) comes out >1.0, it flips the axes (to real vs. imag) and re-fits it (and inverts the resulting value to account for the swap). This minimizes the effect of the real axis noise on the slope calculation. This yields far more accurate solutions when the nominal slope is very large (>>1.0, e.g., the data nearly parallel to the imag axis == cross-hand phase approaching +/-90 deg).
The case of slope = 1.0 (which is cross-hand phase of 45 deg) corresponds to the slope at which to pivot the axis swap decision. When plotting the cross-hand phase solutions, a gap appears at +-45 deg (see figure). This gap is a property of the typical spacings between values in the statistical distribution of slope values. I.e., for a sample of N points filling a distribution with a finite width, there is a characteristic minimum spacing between values that is some small fraction of the width of the distribution. Of course, smaller spacings are not forbidden, but they are rare. The axis swap reveals this property since all of the (nominal) slopes that are >1.0 (cross-hand phases >45.0 deg) are fit with swapped data and yield the inverse slope (<1.0), and than inverted to be >1.0 again. The typical slopes mostly do not get arbitrarily close to exactly 1.0, so the gap appears. This is essentially an extension of the fact that (for any slope), the precise exact value need not be realized in any instance in a sample of solutions for it. E.g., in a sample of N gaussian-distributed values centered on some specific value the likelihood of any sample having that precise central value is vanishingly small.
The gap should become smaller when either the noise decreases, or the number of channels (for the same noise) increases.
This feature should be harmless for the calibration of ALMA full-polarization data.
User tips on installing CASA on unsupported OSs¶
This document provide tips from users on how to install CASA on unsupported operating systems, like LINUX Ubuntu, Debian, and Fedora.
Disclaimer: the information in this document is provided by users and not verified by the CASA team. The information does not reflect official CASA recommendations and should be used at your own risk.
CASA officially supports certain versions of LINUX Redhat and Mac OSX. See the CASA Download page for more information.
We realize that many users wish to try and run CASA on different operating systems. Below are some tips from the user community on installing CASA on unsupported platforms. CASA will not run on Windows.
Ubuntu or Debian - Please see the following PDF: Installing CASA on Unbuntu or Debian
Fedora - Fedora 32 may cause CASA to crash on startup with the error “code for hash md5 was not found”. This is caused by changes to libssl.so in the compat-openssl10 package, which prevents the CASA supplied version of this library from loading. An easy fix is to replace the CASA version of libssl.so.10 with the OS version in /lib64 (i.e., libssl.so.1.0.2o).
Wideband Mosaic Imaging and Pointing Corrections for the VLA Sky Survey¶
This Knowledgebase article describes testing results for wideband mosaic imaging and pointing corrections for VLASS.
The Very Large Array Sky Survey (VLASS) critically depends on accurate wideband mosaicing algorithms for its single epoch imaging and ultimately cumulative imaging. To a large degree, VLASS requirements have driven CASA’s development of the AW-projection and related algorithms for widefield and wideband imaging in CASA. This Knowledgebase article provides complementary reports on the testing of Wideband Mosaic Imaging and Pointing Corrections for VLASS.
Calculation of Weights for Data with Varying Integration Time¶
Knowledgebase Article: Calculation of Weights for Data with Varying Integration Time
George Moellenbrock Original 18 Dec 2018, latest edited version 04 Nov 2019
When the nominal weights are expected to be uniform (because integration time, channel bandwidth, effective Tsys, collecting area, etc. are all uniform, or the visibilities are normalized), extracting weight information from the apparent statistics of otherwise stable visibility measurements is a simple matter of calculating the apparent simple variance in the visibility real and imaginary parts over a sufficient local sample of values. The real and imaginary part variances should be approximately equal, and the inverse of their mean is the correct weight to assign to each of the visibility values within the sample. Here, “stable visibility” means no systematic variation of amplitude or phase within the local sample. Noise-dominated visibilities are ideal; otherwise, well-calibrated data with no true visibility variation are desirable. These conditions are also needed for the more general case described below.
When the integration time (aka “exposure”) varies within the local sample (such as can be generated by averaging of the data post-correlation, where the number of samples per averaging bin may vary, especially at the end of scans), we expect the correct variance for each visibility to be inversely proportional to the net integration time, and this complicates the calculation. It is necessary to determine a weighted variance per unit inverse integration time, wherein the sample weights for the variance calculation are the per-visibility integration times, \(e_i\). If the only reason the underlying variance differs among samples is the variable integration time, then a uniform normalized variance estimate of the whole sample may be obtained by scaling the residual data per sampleby the square root of their (known) integration times. Here, residual data means any underlying visibility signal—presumably the average of the visibility samples, using nominal (proportional to integration time, etc.) weights—has been subtracted. The simple variance of thisrescaled sample is, in effect, the variance per unit inverseintegration time.
For visibilities \(V_i\), with integration times \(e_i\):
\(<\)var\(_{norm}\)\(>\) = Sum (\(e_i\) (\(V_i\) - \(<\)\(V\)\(>\))\(^2\)/\(N\) [1]
where \(<\)\(V\)\(>\) = Sum(\(w_i\)\(V_i\))/Sum(\(w_i\)) [1a]
and \(w_i\) are the nominal data weights presumably proportional tointegration time and other relevant factors. In practice, we could probably just use \(w_i\) = \(e_i\) in equation [1a] since all of the other relevant factors witin \(w_i\) are assumed constant within the sample. Note that the units of \(<\)var\(_{norm}\)\(>\) are in squared visibility amplitude (\({\rm Jy}^{2}\), presumably) times seconds. Note also that \(<\)var\(_{norm}\)\(>\) is essentially the simple variance of the ensemble \(\sqrt{(e_i)}\).d\(V_i\) (where d\(V_i\) is (\(V_i\)-\(<\)\(V\)\(>\))), i.e., of the residual visibilities scaled so that their noise is independent of integration time.
The normalized weight-per-unit-integration time is thus the inverse of \(<\)var\(_{norm}\)\(>\):
\(W_{norm}\) = 1/\(<\)var\(_{norm}\)\(>\) [2]
and per-datum revised weights may be calculated as:
\(W_i\) = \(W_{norm}\) * \(e_i\) [3]
Another way of arriving at this result is to calculate a weighted variance:
\(<\)var\(>\) = Sum(\(e_i\) (\(V_i\) - \(<\)\(V\)\(>\))\(^2\)) / Sum(\(e_i\)) [4]
which corresponds to the (simple) mean exposure time, which is:
\(<\)\(e\)\(>\) = Sum(\(e_i\)) / N [5]
The product of these yields \(<\)var\(_{norm}\)\(>\), as above in [1]:
\(<\)var\(_{norm}\)> = \(<\)var\(>\)\(<\)\(e\)\(>\) [6]
and \(W_{norm}\) may be formed and applied as in [2] and [3] above.
This calculation should be done for both real and imaginary parts ofthe visibility sample and averaged, or for both parts jointly, and [3] used to form the revised weights.
NB: In calculating sample variance, it is generally customary to acknowledge the loss of one degree of freedom due to use of the meanvisibility, \(<\)\(V\)\(>\) in the calculation. Essentially, \(<\)\(V\)\(>\) will have a finite error that tends to bias the resulting variance downward. For simple variance calculations, a factor N/(N-1) is applied to the variance calculation to unbias it, and this factor can be significant for modest N. Since a non-trivially weighted mean is used in the above (otherwise simple, non-weighted) variance calculation (eqn [1]), it may be appropriate to consider a more carefully weighted calculationfor the N/(N-1) factor. The required factor is:
D = 1 - ( Sum(\(w_i\)\(^2\)) / Sum(\(w_i\))\(^2\) ) [9]
where \(w_i\) are the a priori nominal weights used in [1a] above. This factor can be shown to equal (N-1)/N and so should be divided into the \(<\)var\(_{norm}\)\(>\) result.
However, since the nominal error in the variance (and thus the weights) will be <10% (an accuracy we are unlikely to achieve ingeneral anyway) for N>10, and will be uniform over many sample groups in the overall statwt execution, we assume that it is adequate to use thesimpler N/(N-1) factor, or omit it entirely.
Bug affecting polarization visibility data in concatenated data¶
Knowledgebase Article: characterization of a bug that affected polarization visibility data in concatenated data in CASA versions up to 5.6.
George Moellenbrock Original 17 Apr 2020, latest edited version 12 May 2020
CASA 5.7/6.1 fixed a bug in concat and importfitsidi that affected polarization visibility data in concatenated MSs. The problem that occurs in CASA versions earlier than 5.7/6.1 is that the cross-hands may be spuriously mis-ordered on a subset of baselines in the concatenated MS.This Knowledgebase Article describes the main effects that this bug has on concat’d data in general, and in particular on the processing of ALMA and VLA data with CASA up to and including version 5.6.A careful analysis has revealed the following effects in concat’d MSs in CASA versions prior to 5.7/6.1:
In general, visibility data cross-hands may be spuriously swapped on some baselines in concat’d data when the antenna lists in the input MSs are partially different. The concat task adopts the first (in time order) MS’s antenna list for the output MS, and appends unique antennas from later MS(s), typically with larger indices than in their original MSs. Depending on the original antenna indexing, baselines between these additional antennas and antennas that did occur in the first MS may sometimes require conjugation (reversal of the order of antennas in the baseline) to maintain internal indexing consistency within the output MS (for all baselines in an MS, the first antenna must have an index which is lower than (or same as) the index of the second antenna). When baselines are conjugated in this way, the sign of the phase of each correlation and the UVWs must be reversed, and the cross-hands swapped (RL or XY will become LR or YX, respectively, and vice-versa). Prior to CASA 5.7/6.1, the sign reversals were correct, but the cross-hand swap was spuriously omitted. For successfully calibrated data (i.e., calibrated prior to running concat), this means that the sign of the imaginary part of the cross-hand visibilities will be incorrect, and thus the sign Stokes U (for circular feeds) or Stokes V (for linear feeds) will be incorrect on the affected baselines in concat’d data. Since the pathology affects only the cross-hands, polarimetry calibration of concat’d data may be adversely affected.
For reconfigurable arrays, note that an antenna’s particular position (pad) makes it unique, i.e., a specific physical antenna that has moved is a new unique antenna in this context. Concatenation of entirely disjoint antenna lists are unaffected, since all additional antennas in the concatenation will have uniformly incremented indices, and no baseline conjugation will be required. Certain unique cases of different antenna lists are immune to this problem, e.g., new unique antennas with already-higher indices than all other common antennas, etc.
The concat task initially sorts the MSs into time order (only at whole-MS granularity), so the effect cannot be controlled merely by adjusting the order in which the input MSs are specified in the concat call (i.e., there is no easy user-directed fix).
An implicit concat happens in importfitsidi (i.e., VLBI, typically) when specifying multiple FITS-IDI files. Here the antenna re-indexing occurs upon fill, and thus most-likely before calibration. Since ordinary gain calibration uses only the parallel-hands, it will not be affected by the underlying cross-hand swap error. However, polarimetry calibration will be affected to the extent that there are spuriously swapped cross-hands within the filled MS. EVN observations consisting of multiple FITS-IDI will have the same antenna table in each file and are therefore unaffected by this bug.
Since the pathology affects only the cross-hands, purely Stokes I (total intensity) observations of any kind should not be affected (even if the cross-hand are present and some are affected, and thus technically incorrect).
For ALMA and VLA data, calibration typically occurs prior to any potentially pathological use of concat, and the impact will be as follows:
ALMA: Total intensity observations of any kind are not affected. For successfully calibrated (per session) ALMA polarization data not subject to this pathology in the concat of multiple contiguous execblocks within each session*, the pathology affects only Stokes V (circular polarization) when concat-ing multiple sessions subject to the baseline conjugation conditions described in item 1 above. This is because the spurious cross-hand swap effectively sabotages only the sign of the imaginary part of the cross-hands in some baselines, i.e., the apparent Stokes V signal sampled by linear feeds. The net effect will be to suppress the net Stokes V signal in imaging. This presumably affects only a very tiny minority of existing ALMA observations.
Note: In the course of standard scripted ALMA polarimetry calibration, the split task does not remove antennas from the ANTENNA subtable, even when they are fully flagged and keepflags=False. The data rows are not retained in this case, but the ANTENNA subtable seen by concat remains complete. Therefore, as long as the execblocks within the contiguous polarization session are uniform as regards to antenna population (as is intended), the polarization calibration within an individual session should not be subject to this this pathology.
VLA: Total intensity observations of any kind are not affected. For successfully calibrated VLA polarization observations (which typically do not require a prior concat), the pathology affects only Stokes U when concat-ing multiple observations subject to the baseline conjugation conditions described in item 1 above. By the same logic as for ALMA, the Stokes U (cross-hand imaginary part for circular feeds) will be suppressed. Since this affects part of the linearly polarized net response, a larger fraction of VLA cases (cf ALMA) may be affected. The pathological condition arises when antennas are variously removed from the array (typically up to 2 or 3 at any given time) for incidental maintenance, so as to generate datasets with fewer than the full complement of 27 antennas, or when antennas move in and out of the barn (even when there are 27 antennas present in each observation), and when such disparate observations are concat’d. For concats of different VLA configurations, some baselines to antennas that did not move (typically ~12 out of 27 antennas) between the configurations will be affected, even if the total antenna lists (by name/number) have not changed. This is because antennas that did move (only) are unique new antennas in the concat by virtue of their new positions, and some baselines between them and the stationary antennas must be conjugated in the concat.
If observations are combined implicitly in imaging by specifying a list of MSs to tclean, there should be no problem, since the bug is an artifact of the mechanical combination of MSs into a single MS on disk.
Single Dish imaging bug for EBs with different antenna positions in common antenna names and IDs¶
A bug was found for imaging of single-dish data with the tasks sdimaging and tsdimagingin CASA versions 5.6 and lower. This bug could cause an inaccurate direction conversion when more than one MS is given and these data contain the same antenna with the same antenna ID but with different location, i.e., station or pad where the antenna is placed. The bug affects the brightness distribution and flux density in the combined image, since the coordinates are not correct for some fraction of the data set.
Details of this bug can be found in this Knowledgebase Article.
The bug was fixed in CASA 5.7/6.1
CASA Data Repository for developers building from source¶
For general users, information on the CASA Data Repository can be found on the “External Data” pages in CASA Docs.
For the build from source, such as those the developers use, the “casadata” is taken from a list of directories given in .casa/config.py. A second set of datasets is necessary if the developer wants to run the CASA tests. The new casatestdata repository can be cloned following the instructions given in Bitbucket: https://open-bitbucket.nrao.edu/projects/CASA/repos/casatestdata/browse
For example, for developers at NRAO Charlottesville, the entries in their config.py could look like:
datapath=['/home/casa/data/casatestdata/','/home/casa/data/distro/']
or just this:
datapath=['/home/casa/data/casatestdata/']
where: /home/casa/data/casatestdata/ is the new test data repository of CASA that contains all the data needed to run CASA tests. /home/casa/data/distro/ contains only the “casadata” needed to launch CASA and from here it is packaged to be included in the casalith tarballs.
For CASA build from source, the distro data is found in the following place:
casa6 build from source: distro is taken from .casa/config.py under:
datapath=['/some-directory/distro/']
casa5 build from source: distro is taken from under:
$CASAPATH/data/
Warning: to developers regarding awproject using CASA 6.1 or earlier:
In CASA 6.2 a bug was fixed in the way that the awproject gridder in tclean calls the data repository in CASA 6. Previously tclean was looking at .casa/config.py for the location of the distro data but only considering the root data directory existence and not checking existence of the data (more specifically, antenna response data) actually used by it. As of CASA 6.2, the logic of checking the data path to look for the antenna response using the awproject gridder in tclean is as follows:
It looks for the antenna response data directory by constructing the full path with the root data paths from dataPath(), which returns data paths defined in config.py (for casa6). The specific data directory to look for varies by telescope (i.e. ALMA vs VLA).
If 1 fails, it looks for the data in the path determined by the function, distroDataPath(), which returns the default root data directory path in the case of casalith tarballs.
If all above fail, it tries the path defined by CASAPATH (only relevant for casa5)
Cube Refactor¶
A refactor of cube imaging in tclean was implemented in CASA 6.2 with the following goals:
Parallel and serial should, within numerical precision, be equal (6.1 and previous gave slightly different results depending on the number of processors used)
Eliminate refconcat cubes, which could be a performance problem in image analysis.
Ability to restart tclean for cubes with different number of processors ( 6.1 and previous had to be restarted with the exact same number of processors used in the first run of tclean).
Ability to save model visibilities in MODEL_DATA (or virtual) when running tclean in parallel.
Remove the clash of chanchunking and parallel numprocesses (e.g multiple field cubes and chanchunking).
As part of this extensive cube refactor, the following features were implemented in 6.2:
Parallel and serial runs now use the same code and that has fixed the differences which used to be found between serial and different n-parallel runs. Serial and parallel runs now give identical results to within numerical precision.
Refconcat cubes are no longer produced (also the directory workingdir is no longer made for cube runs)
tclean can be restarted in parallel or serial independent of how the first run was achieved.
For cubes, model visibilities can now be saved in parallel.
In parallel run, a common beam can now be given (e.g restoringbeam=’common’). Previously, tclean had to be re-run in serial to restore to a single restoring beam for all channels.
Tracking a moving source (with ephemeris tables) now works in parallel with specmode=’cubesource’.
The chanchunks parameter has been removed, as the refactor code made it redundant.
parallel=True or parallel=False is not considered for cube imaging. If casa is launched via the mpicasa call then cube imaging is in parallel else if it is invoked via the casa call then it is run in serial.
Interactive tclean for cubes now works when launched in parallel.
Using a psf to reset to another psfphasecenter for mosaic now works for serial and parallel.
The major and minor cycles states have been dissociated when a major cycle is happening, the minor cycle code does not hold any state in memory and vice versa. This is the reason why a selectvis message will be seen at every major cycle. This should reduce the amount of memory used.
Details of the cube refactor efforts can be found in this this pdf.
Installing mutiple Python versions on Mac OS 11¶
One solution for handling multiple Python installations across versions (e.g. Python 3.6 and 3.8) is to use pyenv.
Pyenv can be installed on Mac OS 11 a number of ways, but perhaps the simplest is to use homebrew:
brew install pyenv
Installation of a given version of Python via pyenv is normally a single simple command such as:
pyenv install 3.8.0
which can be repeated for each version of Python desired.
For Mac OS 11, however, there is currently a known issue preventing this simple command from working. As a workaround, the following command has worked in testing on Mac OS 11 with an Intel x86 64 bit architecture. See the github issue, however, for the most up to date information on the bug.
CFLAGS="-I$(brew --prefix openssl)/include -I$(brew --prefix bzip2)/include -I$(brew --prefix readline)/include -I$(xcrun --show-sdk-path)/usr/include"
LDFLAGS="-L$(brew --prefix openssl)/lib -L$(brew --prefix readline)/lib -L$(brew --prefix zlib)/lib -L$(brew --prefix bzip2)/lib"
arch -x86_64 pyenv install --patch 3.8.0 < <(curl -sSL https://github.com/python/cpython/commit/8ea6353.patch\?full_index\=1)
In this example, 3.8.0 may be replaced by the desired installation version of Python.
Finally, the .zshrc file (or similar configuration file for your desired terminal) should be modified to include the following:
PYENV_ROOT=/Users/username/.pyenv
export PATH="$PYENV_ROOT/bin:$PATH"
export PATH="$PYENV_ROOT/shims:$PATH"
eval "$(pyenv init -)"
With this complete, the Python version can be switched by setting the local or global Python using pyenv, and then restarting the terminal. Modular CASA can then be installed within a virtual environment following the instructions on the CASA Installation page.
pyenv local 3.8.0
pyenv global 3.8.0
Reference Material¶
Collection of relevant reference material
AIPS-CASA Dictionary¶
CASA tasks equivalent to AIPS tasks
AIPS tasks and their corresponding CASA tasks. Note that this list is not complete and there is also not a one-to-one translation.
AIPS Task |
CASA task/tool |
Description |
---|---|---|
APROPOS |
taskhelp |
List tasks with a short description of their purposes |
BLCAL |
blcal |
Calculate a baseline-based gain calibration solution |
BLCHN |
blcal |
Calculate a baseline-based bandpass calibration solution |
BPASS |
bandpass |
Calibrate bandpasses |
CALIB |
gaincal |
Calibrate gains (amplitudes and phases) |
CLCAL |
applycal |
Apply calibration to data |
COMB |
immath |
Combine images |
CPASS |
bandpass |
Calibrate bandpasses by polynomial fitting |
CVEL |
cvel/mstransform |
Regid visibility spectra |
DBCON |
concat/virtualconcat |
Concatenate u-v datasets |
DEFAULT |
default |
Load a task with default parameters |
FILLM |
importvla |
Import old-format VLA data |
FITLD |
importuvfits/importfitsidi |
Import a u-v dataset which is in FITS format |
FITLD |
importfits |
Import an image which is in FITS format |
FITTP |
exportuvfits |
Write a u-v dataset to FITS format |
FITTP |
exportfits |
Write an image to FITS format |
FRING |
(coming soon) |
Calibrate group delays and phase rates |
GETJY |
fluxscale |
Determine flux densities for other cals |
GO |
go |
Run a task |
HELP |
help |
Display the help page for a task (also use casa.nrao.edu/casadocs) |
IMAGR |
clean/tclean |
Image and deconvolve |
IMFIT |
imfit |
Fit gaussian components to an image |
IMHEAD |
vishead |
View header for u-v data |
IMHEAD |
imhead |
View header for an image |
IMLIN |
imcontsub |
Subtract continuum in image plane |
IMLOD |
importfits |
Import a FITS image |
IMSTAT |
imstat |
Measure statistics on an image |
INP |
inp |
View task parameters |
JMFIT |
imfit |
Fit gaussian components to an image |
LISTR |
listobs |
Print basic data |
MCAT |
ls |
List image data files |
MOMNT |
immoments |
Compute moments from an image |
OHGEO |
imregrid |
Regrids an image onto another image’s geometry |
PBCOR |
impbcor/widebandpbcor |
Correct an image for the primary beam |
PCAL |
polcal |
Calibrate polarization |
POSSM |
plotms |
Plot bandpass calibration tables |
POSSM |
plotms |
Plot spectra |
PRTAN |
listobs |
Print antenna locations |
PRTAN |
plotants |
Plot antenna locations |
QUACK |
flagdata |
Remove first integrations from scans |
RENAME |
mv |
Rename an image or dataset |
RFLAG |
flagdata |
Auto-flagging |
SETJY |
setjy |
Set flux densities for flux cals |
SMOTH |
imsmooth |
Smooth an image |
SNPLT |
plotms |
Plot gain calibration tables |
SPFLG |
msview |
Flag raster image of time v. channel |
SPLIT |
split |
Write out u-v files for individual sources |
STATWT |
statwt |
Weigh visibilities based on their noise |
TASK |
inp |
Load a task with current parameters |
TGET |
tget |
Load a task with parameters last used for that task |
TVALL |
imview |
Display image |
TVFLG |
msview |
Flag raster image of time v. baseline |
UCAT |
ls |
List u-v data files |
UVFIX |
fixvis |
Compute u, v, and w coordinates |
UVFLG |
flagdata |
Flag data |
UVLIN |
uvcontsub/mstransform |
Subtract continuum from u-v data |
UVLSF |
uvcontsub/mstransform |
Subtract continuum from u-v data |
UVPLT |
plotms |
Plot u-v data |
UVSUB |
uvsub |
Subtracts model u-v data from corrected u-v data |
WIPER |
plotms |
Plot and flag u-v data |
ZAP |
rmtables |
Delete data files |
MIRIAD-CASA Dictionary¶
CASA tasks equivalent to MIRIAD tasks
The Table below provides a list of common Miriad tasks, and their equivalent CASA tool or tool function names. The two packages differ in both their architecture and calibration and imaging models, and there is often not a direct correspondence. However, this index does provide a scientific user of CASA who is familiar with MIRIAD, with a simple translation table to map their existing data reduction knowledge to the new package.
In particular, note that the procedure of imaging and cleaning of visibility data between CASA and MIRIAD differs slightly. In MIRIAD the tasks invert, clean and restor are used in order to attain “CLEAN” images, whereas in CASA the task clean/tclean achieves the same steps as a single task.
MIRIAD Task |
CASA task/tool |
Description |
---|---|---|
atlod |
importatca |
Import ATCA RPFITS files |
blflag |
plotms/msview |
Interactive baseline based editor/flagger |
cgcurs |
imview |
Interactive image analysis |
cgdisp |
imview |
Image display, overlays |
clean |
clean/tclean |
Clean an image |
delhd |
imhead (mode=’del’), clearcal |
Delete values in dataset/remove calibration tables |
fits |
importmiriad,importfits, exportfits, importuvfits, exportuvfits |
FITS uv/image filler |
gethd |
imhead (mode=’get’) |
Return values in image header |
gpboot |
fluxscale |
Set flux density scale |
gpcal |
gaincal, polcal |
Polarization leakage and gain calibration |
gpcopy |
applycal |
Copy calibration tables from one source to another |
gpplt |
plotms |
Plot calibration solutions |
imcomb |
immath |
Image combination |
imfit |
imfit |
Image-plane component fitter |
impol |
immath (mode=’poli’ or ‘pola’) |
Manipulate polarization images (see example) |
imstat |
imstat |
Image statistics |
imsub |
imsubimage |
Extract sub-image |
invert |
clean/tclean |
Synthesis imaging/make dirty map |
linmos |
im tool (im.linearmosaic) |
Linear mosaic combination of images |
maths |
immath |
Calculations involving images |
mfboot |
fluxscale |
Set flux density scale |
mfcal |
bandpass, gaincal |
Bandpass and gain calibration |
moment |
immoments |
Calculate image moments |
prthd |
imhead, listobs, vishead |
Print header of image or uvdata |
puthd |
imhead (mode=’put’) |
Add/edit values in image header |
restor |
clean |
Restore a clean component model |
selfcal |
clean, gaincal, etc. |
Selfcalibration of visibility data |
uvplt |
plotms |
Plot visibility data |
uvspec |
plotms |
Plot visibility spectra data |
uvsplit |
split |
Split visibility dataset by source/frequency etc |
Dan Briggs’ Dissertation - Robust Weighting¶
Dan Briggs’ Dissertation on high fidelity imaging with interferometers, including robust (‘Briggs’) weighting.
Link to Dan Briggs’ Dissertation (pdf version)
Flux Calibrator Models¶
Descriptions of flux calibrator models for flux density scaling
There are two categories of flux calibrator models available to determine flux density scales: compact extra-galactic sources and solar system objects. The models for bright extragalactic sources are described in the form of polynomial expressions for spectral flux densities and clean component images for spatial information. The flux density scales based on the solar system objects are commonly used to establish flux density scales for mm and sub-mm astronomy. These models consist of brightness temperature models and their ephemeris data.
Compact extragalactic sources¶
For the VLA, the default source models are customarily point sources defined by the ‘Baars’, ‘Perley 90’, ‘Perley-Taylor 99’, ‘Perley-Butler 2010’, ‘Perley-Butler 2013’ (time-variable), ‘Perley-Butler 2017’ (time-variable) or ‘Scaife-Heald 2012’ flux density scales (‘Perley-Butler 2017’ is the current standard by default), or point sources of unit flux density if the flux density is unknown. ‘Stevens-Reynolds 2016’ currently contains only one source, 1934-638, which is primarily used for flux calibrator for the ACTA.
The model images (CLEAN component images) are readily available in CASA for the sub set of the sources listed below. The task setjy provides listing of the available model images included in the CASA package’s data directory. You can find the path to the directory containing your list of VLA Stokes I models by typing (inside CASA) [print os.getenv(‘CASAPATH’).split(‘ ‘)[0] + ‘/data/nrao/VLA/CalModels/’]. The setjy Description Page in CASA Docs also lists the models that are available in CASA. These models can be plotted in plotms.
Alternatively, the user can provide a model image at the appropriate frequency in Jy/pixel units, typically the .model made by clean (which is a list of components per pixel, as required, although the restored .image is in Jy/beam). For unknown calibrators, however, the spectral flux distribution has to be explicitely specified in setjy. If you do not specify the correct path to a model (and you have not provided your own model), the default model of a point sources of unit flux density will be adopted.
3C/Common Name |
B1950 Name |
J2000 Name |
Alt. J2000 Name |
Standards |
---|---|---|---|---|
– |
– |
– |
J0133-3629 |
9 |
3C48 |
0134+329 |
0137+331 |
J0137+3309 |
1,2,3,4,5,6,7,9 |
FORNAX X |
– |
– |
J0322-3712 |
9 |
3C123 |
0433+295 |
0437+296 |
J0437+2940 |
2, 9 |
3C138 |
0518+165 |
0521+166 |
J0521+1638 |
1,3,4,5,6 |
PICTOR A |
– |
– |
J0519-4546 |
9 |
3C144 (TAURUS A/CRAB) |
– |
– |
J0534+2200 |
9 |
3C147 |
0538+498 |
0542+498 |
J0542+4951 |
1,3,4,5,6,7,9 |
3C196 |
0809+483 |
0813+482 |
J0813+4813 |
1,2,7,9 |
3C218(HYDRA A) |
– |
– |
J0918-1205 |
9 |
3C274 (VIRGO A) |
– |
– |
J1230+1223 |
9 |
3C286 |
1328+307 |
1331+305 |
J1331+3030 |
1,2,3,4,5,6,7,9 |
3C295 |
1409+524 |
1411+522 |
J1411+5212 |
1,2,3,4,5,6,7,9 |
3C438 (HERCULES A) |
– |
– |
J1651+0459 |
9 |
3C353 |
– |
– |
J1720-0059 |
9 |
– |
1934-638 |
– |
J1939-6342 |
1,3,4,5,6,8 |
C380 |
1828+487 |
1829+487 |
J1829+4845 |
7,9 |
3C405 (CYGNUS A) |
– |
– |
J1959+4044 |
9 |
3C444 |
– |
– |
J2214-1701 |
9 |
3C461 (CASSIOPEIA A) |
– |
– |
J2323+5848 |
9 |
Standards are: (1) Perley-Butler 2010, (2) Perley-Butler 2013, (3) Perley-Taylor 99, (4) Perley-Taylor 95, (5) Perley 90, (6) Baars, (7) Scaife-Heald 2012, (8) Stevens-Reynolds 2016 (9) Perley-Butler 2017
Known sources and their alternative names recognized by setjy task
ALMA also uses a few dozen compact QSO as flux standards, monitored 2-4 times a month at bands 3, 6 and 7 (90 - 345 GHz). Due to rapid variability these data are not packaged with CASA, but can be accessed via https://almascience.eso.org/alma-data/calibrator-catalogue
Baars
The only standard to not have the year in the name. It is 1977. The models are second order polynomials in log(ν), valid between 408 MHz and 15 GHz.
Reference: Baars et al. (1977) [1] with a commentary by Kellermann, K. I. (1999) [2]
Perley 90
This standard also includes 1934-638 from Reynolds (7/94) and 3C138 from Baars et al. (1977) [1] .
Details can be found at http://www.vla.nrao.edu/astro/calib/manual/baars.html.
Perley-Taylor 95
Perley and Taylor (1995.2); plus Reynolds (1934-638; 7/94) Details can be found at http://www.vla.nrao.edu/astro/calib/manual/baars.html.
Perley-Taylor 99
Perley and Taylor (1999.2); plus Reynolds (1934-638; 7/94) Details can be found at http://www.vla.nrao.edu/astro/calib/manual/baars.html.
Perley-Butler 2010
A preliminary version of Perley-Butler 2013. This version also has coefficients for sources that showed some degree of variability (see Perley & Butler (2013) [3]) but they are treated as the steady sources (i.e. no time dependent models are used).
Perley-Butler 2013
Flux scale for the constant flux sources 3C123, 3C196, 3C286, and 3C295 as well as variable sources (3C48, 3C138, and 3C147). The models for the variable sources are time-dependent.Reference: Perley & Butler (2013) [3] .
Scaife-Heald 2012
Low frequency, 30-300MHz, calibrators 3C48, 3C147, 3C196, 3C286, 3C295, and 3C380.
Reference: Scaife & Heald (2012) [4]
Stevens-Reynolds 2016
Low frequency (<11GHz) polynomial from Reynolds and updated high frequecy polynomial from Stevens.
Reference: Partridge et al. (2016) [5]
Perley-Butler 2017
The flux density scale of Perley-Butler 2013 extended downward to ~50 MHz. Twenty sources were drawn from the Baar, Perley-Butler 2013, and Scaife-Heald 2012. Flux scale for the constant flux sources Fornax A, 3C123, J0444-2809, Pictor A, 3C144, (Taurus A or Crab), 3C196, 3C218 (Hydra A), 3C274 (Virgo A or M87), 3C286, 3C295, 3C348 (Hercules A), 3C353, 3C380, 3C405 (Cygnus A), 3C444, and 3C461 (Cassiopeia A) as well as variable sources (3C48, 3C138, and 3C147). The models for the variable sources are time-dependent. The frequency range valid for the model for each source is also listed below.
Source |
Valid frequency range in GHz |
---|---|
J0133-3649 |
0.2-4 |
3C48 |
0.05-50 |
Fornax X |
0.2-0.5 |
3C123 |
0.06-50 |
J0444-2809 |
0.2-2.0 |
3C138 |
0.2-50 |
Pictor A |
0.2-4.0 |
Taurus A |
0.05-4.0 |
3C147 |
0.05-50 |
3C196 |
0.050-50 |
Hydra A |
0.05-12 |
Virgo A |
0.05-3 |
3C286 |
0.05-50 |
3C295 |
0.05-50 |
Hercules A |
0.2-12 |
3C353 |
0.2-4 |
3C380 |
0.05-4.0* |
Cygnus A |
0.05-12 |
3C444 |
0.2-12 |
Cassiopeia A |
0.2-4 |
* The corrected frequency range for 3C380 is noted here based on B. J. Butler 2018, private comunication (CAS-9538)Reference: Perley & Butler (2017) [7]
Solar System objects¶
The usual approach in mm and sub-mm regimes is to use models that are, to first order, thermal sources in the Solar System. Their apparent brightness varies in time with their distance from the Earth (and Sun), and orientation if they are not perfect spheres with zero obliquity. However, most of them have almost constant surface properties, so once those properties are measured their apparent brightness distributions, they can in principle be predicted for any time, given an ephemeris. Planets, in particular, have more complex spectra and effects such as atmospheric lines, magnetic fields, seasons, polar caps and surface features that need to be taken into account when they are available and significant. In CASA, the Solar System objects supported by setjy are available through the ‘Butler-JPL-Horizons 2010’, and ‘Butler-JPL-Horizons 2012’ standards. It is recommended to use ‘Butler-JPL-Horizons 2012’ as it contains updated models. The 2012 models are described in ALMA Memo 594, which is available on https://science.nrao.edu/facilities/alma/~aboutALMA/Technology/ALMA_Memo_Series/alma594/abs594 . Models can be found by typing (in CASA) [print os.getenv(‘CASAPATH’).split(‘ ‘)[0] + ‘/data/alma/SolarSystemModels’.]
The following objects are supported based on models from Butler-JPL-Horizons 2012, updated where necessary as mentioned under each object. Please refer ALMA Memo594 for the detailed comparisons with the models in Butler-JPL-Horizons-2010.
Venus
The model spans the frequencies from ~300MHz to 1THz. No atmospheric lines such as CO,H2O, HDO, and etc are included. Modeled based on Clancy et al. (2012) [6].
Mars
Full implementation of the model of Rudy et al. (1987) [7], tabulated as a function of time and frequency (30-1000GHz). No atmospheric lines are included.
Jupiter
Model for 30-1020GHz (from Glenn Orton, private communication), does not include synchrotron emission.
Uranus
Model for 60-1800GHz (from Glenn Orton and Raphael Moreno, private communication), contains no rings or synchrotron.
Neptune
Model for 2-2000 GHz (from Glenn Orton and Raphael Moreno, private communication), contains no rings or synchrotron.
Io
Spline interpolation of data points from 15 to 29980 GHz (references: please refer to the ALMA memo 594 Table 1). Strongly not recommended to use for the primary flux calibrator for ALMA observations.
Europa
Spline interpolation of data points from 15 to 29980 GHz (references: please refer to the ALMA memo 594 Table 1). Strongly not recommended to use for the primary flux calibrator for ALMA observations.
Ganymede
Spline interpolation of data points from 5 to 29980 GHz (references: please refer to the ALMA memo 594 Table 1).
Callisto
Spline interpolation of data points from 5 to 29980 GHz (references: please refer to the ALMA memo 594 Table 1).
Titan
Model from Mark Gurwell, from 53.3-1024.1 GHz. Contains surface and atmospheric emission. The atmosphere includes N2-N2 and N2-CH4 Collision-Induced Absorption (CIA), and lines from minor species CO, 13CO, C18O, HCN, H13CN and HC15N. See, e.g., Gurwell & Muhleman (2000) [8]; Gurwell (2004) [9].
Asteroids
Some asteroids, namely Ceres, Pallas, Vesta, and Juno are included in the Butler-JPL-Horizons 2012. The models consists of the constant brightness temperature in frequency. For Ceres, Pallas, and Vesta, updated models based on thermophysical models (TPM) (T. Mueller, private communication) which are tabulated in time and frequency, are available for the observations taken after January 1st 2015, 0:00 UT. setjy task will automatically switch to the new models for the observations taken on and after that date. The TPM are also available for Lutetia but it is not advised to use for the absolute flux calibration for ALMA. Each of the tabulated models contains the flux density at 30, 80, 115, 150, 200, 230, 260, 300, 330, 360, 425, 650, 800, 950, and 1000 GHz. The time resolution is 1 hour for Ceres and 15 min for Lutetia, Pallas, and Vesta. The cubic interpolation is employed to obtain the flux densities at other frequencies.
Ceres
Model with a constant \(T_b\) = 185K over frequencies (Moullet et al. 2010 [10], Muller & Lagerros 2002 [11], Redman et al. 1998 [12], Altenhoff et al. 1996 [13]) if time of the observations took place (\(t_{obs}\)) is before 2015.01.01, 0:00 UT, TPM if \(t_{obs}\) \(\ge\) 2015.01.01, 0:00 UT.
Pallas
Model with a constant \(T_b\) = 189K (Chamberlain et al. 2009 [14], Altenhoff et al. 1994 [15]) for \(t_{obs}\) \(\lt\) 2015.01.01, 0:00 UT, and TPM for \(t_{obs}\) \(\ge\) 2015.01.01, 0:00 UT
Vesta
Model with a constant \(T_b\) = 155K (Leyrat et al. 2012 [16], Chamberlain et al. 2009 [14], Redman et al. 1998 [12], Altenhoff et al. 1994 [15]) for \(t_{obs}\) \(\lt\) 2015.01.01, 0:00 UT, and TPM for \(t_{obs}\) \(\ge\) 2015.01.01, 0:00 UT
Juno
Model with a constant \(T_b\) = 153K (Chamberlain et al. 2009 [14], Altenhoff et al. 1994 [15])
Bibliography¶
Baars, J. W. M. et al. 1977, A&A, 61, 99 ADS
Kellermann, K. I. 2009, A&A 500, 143 ADS
Perley, R. A., & Butler, B. J. 2013, ApJS, 204, 19 ADS
Scaife, A. M., & Heald, G. H. 2012, MNRAS, 423, 30 ADS
Partridge et al. 2016, ApJ 821,1 ADS
Clancy, R.T. et al. 2012, Icarus, 217, 779 ADS
Perley, R. A. & Butler, B. J. 2017, ApJS, 230,7ADS
Rudy, D.J. et al. 1987, Icarus, 71, 159 ADS
Gurwell, M.A. & D.O. Muhleman 2000, Icarus, 145, 65w ADS
Gurwell, M.A. 2004, ApJ, 616, L7 ADS
Moullet, A. et al. 2010, A&A, 516, L10 ADS
Muller, T.G. & J.S.V. Lagerros 2002, A&A, 381, 324 ADS
Redman, R.O. et al. 1998, AJ, 116, 1478 ADS
Altenhoff, W.J. et al. 1996, A&A, 309, 953 ADS
Chamberlain, M.A. et al. 2009, Icarus, 202, 487 ADS
Altenhoff, W.J. et al. 1994, A&A, 287, 641 ADS
Leyrat, C. et al. 2012, A&A, 539, A154 ADS
Flux Calibrator Models - Data Formats¶
Conventions and Data Formats
This section describes the conventions, the formats as well as the locations of the data of the flux density calibrator models used in setjy. The detailed descriptions of specific flux standards and list of the calibrators are found in Flux Calibrator Models in the Reference Material section.
Extragalactic flux calibrator source models¶
The spectral flux density models are expressed in a polynomial in the form
where \(\nu\) is a frequency either in MHz or GHz depending on the standard. In setjy, the point source model is constructed as a componentlist scaled by the spectral flux density model. For the standards, Baars, Perley 90, Perley-Taylor 95, Perley-Taylor 99, Perley-Butler 2010, and Stevens-Reynolds 2016,the polynomial coefficients are hard-coded in the code.
For Perley-Butler 2013 and Scaife-Heald 2012, the coefficients are stored in CASA tables called PerleyButler2013Coeffs and ScaifeHeald2012Coeffs, respectively located in ~/nrao/VLA/standards/ in the CASA data directory(from CASA prompt, you can find the data root path by typing casa[‘dirs’][‘data’]). The separation of the data from the flux calibration code makes the maintenace easy and enable a user to acces the informaiton directly. Your can access these tables using the table tool (tb) and browsetable task. The list of the column header for PerleyButler2013Coeffs is shown below:
CASA <8>: tb.colnames
--------> tb.colnames()
Out[8]:
['Epoch',
'3C48_coeffs',
'3C48_coefferrs',
'3C138_coeffs',
'3C138_coefferrs',
'3C147_coeffs',
'3C147_coefferrs',
'3C286_coeffs',
'3C286_coefferrs',
'3C123_coeffs',
'3C123_coefferrs',
'3C295_coeffs',
'3C295_coefferrs',
'3C196_coeffs',
'3C196_coefferrs']
The coefficients of each source are stored in a column as a vector and the corresponding errors are stored in a seperate column. The different row represents the corresponding coefficinets at that epoch for the time variable sources while for the steady sources each row contains identical information. The frequency is assumed in GHz.
The list of the column header for ScaifeHeald2012Coeffs is shown below:
CASA <11>: tb.colnames
---------> tb.colnames()
Out[11]:
['Epoch',
'3C48_coeffs',
'3C48_coefferrs',
'3C147_coeffs',
'3C147_coefferrs',
'3C196_coeffs',
'3C196_coefferrs',
'3C286_coeffs',
'3C286_coefferrs',can b
'3C295_coeffs',
'3C295_coefferrs',
'3C380_coeffs',
'3C380_coefferrs']
The reference frequnecy for Scaife-Heald 2012 is 150MHz.
Solar System objects¶
For the solar system object used as a flux calibrator, setjy contstruct a model visibiity of the calibrator with the (averaged) brightness temperature model and ephemeris data of the sources as described in ALMA Memo #594. While the older Bulter-JPL-Horizons 2010 standard, hard-coded the brightness temperature models in the code, the models for Butler-JPL-Horizons 2012 are tabulated in ASCII files (SSobject_name_Tb.dat) located in the CASA data directory under ~/alma/SolarSystemModels. With an exception of Mars, the data for the brightness temparature models are stored in a simple format: 1st column - source frequency in GHz; 2nd column - the brightness temperature in Kelvin. The follow example script shows how it can be plotted for Titan.
import numpy as np
rootdatapath=casa['dirs']['data']
source='Titan'
datapath=rootdatapath+'/alma/SolarSystemModels/'+source+'_Tb.dat'
data=np.genfromtxt(datapath)
data=data.transpose()
freq=data[0]
temp=data[1]
pl.plot(freq,temp)
pl.title(source+' Tb model')
pl.xlabel('Frequency (GHz)')
pl.ylabel('Tb (K)')
And the following is the output plot by executing the script above.
The Tb model for Mars (Mars_Tb.dat) is calculated as a function of time and frequency, with tabulations every hour and at frequencies of: 30, 80, 115, 150, 200, 230, 260, 300, 330, 360, 425, 650, 800, 950, and 1000 GHz. The first line of the file contain frequencies in GHz. The data starts at the second line of the file with the format: YYYY MM DD HH MJD Tb for at each frequency sepearated by a space.
New Asteroid models
Ceres_fd_time.dat, Luthetia_fd_time.dat, Pallas_fd_time.dat, and Vesta_fd_time.dat contain thermophysical models by Th. Mueller (private communication). These time variable models are already converted to flux densities and are tabulated for 30, 80, 115, 150, 200, 230, 260, 300, 330, 360, 425, 650, 800, 950, and 1000 GHz. Time intevals are 1 hr. for Ceres and 15min. for Luthetia, Pallas, and Vesta with the data available from 2015 01 01 0UT to 2021 01 01 0 UT. In setjy task,these models are automatically selected for the data with the observation dates falls within this time range.
Spectral Frames¶
Spectral Frames supported in CASA
Spectral Frames¶
CASA supported spectral frames:
Frame |
Description |
Definition |
---|---|---|
REST |
rest frequency |
Lab frame or source frame; cannot be converted to any other frame |
LSRK |
LSR as a kinematic (radio) definition (J2000) based on average velocity of stars in the Solar neighborhood |
20km/s in direction of RA, Dec - [270,+30] deg (B1900.0) (Gordon 1975 [1] ) |
LSRD |
Local Standard of Rest (J2000), dynamical, IAU definition. Solar peculiar velocity in the reference frame of a circular orbit about the Galactic Center, based on average velocity of stars in the Solar neighborhood and solar peculiar motion |
U\(\odot\)=9kms/s, V\(\odot\)=12km/s,W\(\odot\)=7km/s. Or 16.552945km/s towards l,b = 53.13, +25.02 deg (Delhaye 1965 [2]) |
BARY |
Solar System Baryceneter (J2000) |
|
GEO |
Geocentric, referenced to the Earth’s center |
|
TOPO |
Topocentric |
Local observatory frame, fixed in observing frequency, no doppler tracking |
GALACTO |
Galactocentric (J2000), referenced to dynamical center of the Galaxy |
220 km/s in the direction l,b = 270, +0 deg. (Kerr and Lynden-Bell 1986 [3]) |
LGROUP |
Mean motion of Local Group Galaxies with respect to its bary center |
308km/s towards l,b = 105,-7 |
CMB |
Cosmic Microwave Background, COBE measurements of dipole anisotropy |
369.5km/s towards l,b = 264.4,48.4. (Kogut et al. 1993 [4]) |
Undefined |
Doppler Types¶
CASA supported Doppler types (velocity conventions) where \(f_v\) is the observed frequency and \(f_0\) is the rest frame frequency of a given lineand positive velocity V is increasing away from the observer:
Name |
Description |
---|---|
RADIO |
\(V = c \frac{(f_0 - f_v)}{f_0}\) |
Z |
\(V=cz\) \(z = \frac{(f_0 - f_v)}{f_v}\) |
RATIO |
\(V=c(\frac{f_v}{f_o})\) |
BETA |
\(V=c\frac{(1-(\frac{f_v}{f_0})^2)}{(1+(\frac{f_v}{f_0})^2)}\) |
GAMMA |
\(V=c\frac{(1 + (\frac{f_v}{f_0})^2)}{2\frac{f_v}{f_0}}\) |
OPTICAL |
\(V= c\frac{(f_0 - f_v)}{f_v}\) |
TRUE |
\(V=c\frac{(1-(\frac{f_v}{f_0})^2)}{(1+(\frac{f_v}{f_0})^2)}\) |
RELATIVISTIC |
\(V=c\frac{(1-(\frac{f_v}{f_0})^2)}{(1+(\frac{f_v}{f_0})^2)}\) |
Bibliography¶
Gordon 1975: ”Methods of Experimental Physics: Volume 12: Astrophysics, Part C: Radio Observations”, ed. M.L.Meeks, Academic Press 1976
Delhaye 1965 (
Kerr F. J. & Lynden-Bell D. 1986 MNRAS, 221, 1023 (
Kogut A. et al. 1993 ( ^
Time Reference Frames¶
CASA supported time reference frames:
Acronym |
Name |
Description |
---|---|---|
ET |
Ephemeris Time |
The time scale used prior to 1984 as the independent variable in gravitational theories of the solar system. In 1984, ET was replaced by dynamical time (see TDB, TT). |
GAST |
Greenwich Apparent Sidereal Time |
The Greenwich hour angle of the true equinox [1] of date. |
GMST |
Greenwich Mean Sidereal Time |
The Greenwich hour angle of the mean equinox [1] of date, defined as the angular distance on the celestial sphere measured westward along the celestial equator from the Greenwich meridian to the hour circle that passes through a celestial object or point. GMST (in seconds at UT1=0) = 24110.54841 + 8640184.812866 * T + 0.093104 * \(T^2\) - 0.0000062 * \(T^3\) where T is in Julian centuries from 2000 Jan. 1 12h UT1: T = d / 36525 d = JD - 2451545.0 |
GMST1 |
GMST calculated specifically with reference to UT1 |
|
IAT |
International Atomic Time (a.k.a. TAI en Francais): |
The continuous time scale resulting from analysis by the Bureau International des Poids et Mesures of atomic time standards in many countries. The fundamental unit of TAI is the SI second [2] on the geoid [3] , and the epoch is 1958 January 1. |
LAST |
Local Apparent Sidereal Time |
LAST is derived from LMST by applying the equation of equinoxes [1] or nutation of the mean pole of the Earth from mean to true position yields LAST. |
LMST |
Local Mean Sidereal Time |
Sidereal time is the hour angle of the vernal equinox, the ascending node of the ecliptic on the celestial equator. The daily motion of this point provides a measure of the rotation of the Earth with respect to the stars, rather than the Sun. It corresponds to the coordinate right ascension of a celestial body that is presently on the local meridian. LMST is computed from the current GMST plus the local offset in longitude measured positive to the east of Greenwich, (converted to a sidereal offset by the ratio 1.00273790935 of the mean solar day to the mean sidereal day.) LMST = GMST + (observer’s east longitude) |
TAI |
International Atomic Time (a.k.a. TAI en Francais) |
see IAT |
TCB |
Barycentric Coordinate Time |
The coordinate time of the Barycentric Celestial Reference System (BCRS), which advances by SI seconds [2] within that system. TCB is related to TCG and TT by relativistic transformations that include a secular term. |
TCG |
||
TDB |
Barycentric Dynamical Time |
A time scale defined by the IAU (originally in 1976; named in 1979; revised in 2006) used in barycentric ephemerides and equations of motion. TDB is a linear function of TCB that on average tracks TT over long periods of time; differences between TDB and TT evaluated at the Earth’s surface remain under 2 ms for several thousand years around the current epoch. TDB is functionally equivalent to Teph, the independent argument of the JPL planetary and lunar ephemerides DE405/LE405. |
TDT |
Terrestrial Dynamical Time |
The time scale for apparent geocentric ephemerides defined by a 1979 IAU resolution. In 1991 it was replaced by TT. |
TT |
Terrestrial Time |
An idealized form of International Atomic Time (TAI) with an epoch offset; in practice TT = TAI + 32s.184. TT thus advances by SI seconds on the geoid [3] |
UT |
Universal Time |
Loosely, mean solar time on the Greenwich meridian (previously referred to as Greenwich Mean Time). In current usage, UT refers either to UT1 or to UTC. |
UT1 |
UT1 is formally defined by a mathematical expression that relates it to sidereal time. Thus, UT1 is observationally determined by the apparent diurnal motions of celestial bodies, and is affected by irregularities in the Earth’s rate of rotation. |
|
UT2 |
Before 1972 the time broadcast services kept their time signals within 0.1 seconds [2] of UT2, which is UT1 with annual and semiannual variations in the earth’s rotation removed. The formal relation between UT1 and UT2 is UT2 = UT1 + 0.022 * sin(2 * Pi * t) - 0.012 * cos(2 * Pi * t) 0.006 * sin(4 * Pi * t) + 0.007 * cos(4 * Pi * t) where t = 2000.0 + (MJD - 51544.03) / 365.2422 is the Besselian day fraction, and MJD is the Modified Julian Date (Julian Date - 2400000.5) |
|
UTC |
Coordinated Universal Time |
UTC is based on IAT but is maintained within 0s.9 of UT1 by the introduction of leap seconds when necessary. |
Footnote(s)
mean equator and equinox v. true equator and equinox: The mean equator and equinox are used for the celestial coordinate system defined by the orientation of the Earth’s equatorial plane on some specified date together with the direction of the dynamical equinox on that date, neglecting nutation. Thus, the mean equator and equinox moves in response only to precession. Positions in a star catalog have traditionally been referred to a catalog equator and equinox that approximate the mean equator and equinox of a standard epoch. The true equator and equinox are affected by both precession and nutation. The Equation of the Equinoxes is the difference (apparent sidereal time minus mean sidereal time). Equivalently, the difference between the right ascensions of the true and mean equinoxes, expressed in time units.
The Systeme International (SI) second is defined as the duration of 9,192,631,770 cycles of radiation corresponding to the transition between two hyperfine levels of the ground state of caesium 133.
The geoid is an equipotential surface that coincides with mean sea level in the open ocean. On land it is the level surface that would be assumed by water in an imaginary network of frictionless channels connected to the ocean.
Coordinate Frames¶
CASA supported spatial coordinate frames:
Name |
Description |
---|---|
J2000 |
mean equator and equinox at J2000.0 (FK5) |
JNAT |
geocentric natural frame |
JMEAN |
mean equator and equinox at frame epoch |
JTRUE |
true equator and equinox at frame epoch |
APP |
apparent geocentric position |
B1950 |
mean epoch and ecliptic at B1950.0. |
B1950_VLA |
mean epoch (1979.9) and ecliptic at B1950.0 |
BMEAN |
mean equator and equinox at frame epoch |
BTRUE |
true equator and equinox at frame epoch |
GALACTIC |
Galactic coordinates |
HADEC |
topocentric HA and declination |
AZEL |
topocentric Azimuth and Elevation (N through E) |
AZELSW |
topocentric Azimuth and Elevation (S through W) |
AZELNE |
topocentric Azimuth and Elevation (N through E) |
AZELGEO |
geodetic Azimuth and Elevation (N through E) |
AZELSWGEO |
geodetic Azimuth and Elevation (S through W) |
AZELNEGEO |
geodetic Azimuth and Elevation (N through E) |
ECLIPTC |
ecliptic for J2000 equator and equinox |
MECLIPTIC |
ecliptic for mean equator of date |
TECLIPTIC |
ecliptic for true equator of date |
SUPERGAL |
supergalactic coordinates |
ITRF |
coordinates wrt ITRF Earth frame |
TOPO |
apparent topocentric position |
ICRS |
International Celestial reference system |
Physical Units¶
CASA regonizes physical units as listed in the following Tables.
Prefix Name Value ——–>Physical units: Prefixes
Unit |
Name |
Value |
---|---|---|
$ |
(currency) |
1 _ |
% |
(percent) |
0.01 |
%% |
(permille) |
0.001 |
A |
(ampere) |
1 A |
AE |
(astronomical unit) |
149597870659 m |
AU |
(astronomical unit) |
149597870659 m |
Bq |
(becquerel) |
1 s−1 |
C |
(coulomb) |
1 s A |
F |
(farad) |
1 m−2 kg−1 s4 A2 |
Gy |
(gray) |
1 m2 s−2 |
H |
(henry) |
1 m2 kg s−2 A−2 |
Hz |
(hertz) |
1 s−1 |
J |
(joule) |
1 m2 kg s−2 |
Jy |
(jansky) |
10−26 kg s−2 |
K |
(kelvin) |
1 K |
L |
(litre) |
0.001 m3 |
M0 |
(solar mass) |
1.98891944407× 1030 kg |
N |
(newton) |
1 m kg s−2 |
Ohm |
(ohm) |
1 m2 kg s−3 A−2 |
Pa |
(pascal) |
1 m−1 kg s−2 |
S |
(siemens) |
1 m−2 kg−1 s3 A2 |
S0 |
(solar mass) |
1.98891944407× 1030 kg |
Sv |
(sievert) |
1 m2 s−2 |
T |
(tesla) |
1 kg s−2 A−1 |
UA |
(astronomical unit) |
149597870659 m |
V |
(volt) |
1 m2 kg s−3 A−1 |
W |
(watt) |
1 m2 kg s−3 |
Wb |
(weber) |
1 m2 kg s−2 A−1 |
_ |
(undimensioned) |
1 _ |
a |
(year) |
31557600 s |
arcmin |
(arcmin) |
0.000290888208666 rad |
arcsec |
(arcsec) |
4.8481368111×10−6 rad |
as |
(arcsec) |
4.8481368111e×10−6 rad |
cd |
(candela) |
1 cd |
cy |
(century) |
3155760000 s |
d |
(day) |
86400 s |
deg |
(degree) |
0.0174532925199 rad |
g |
(gram) |
0.001 kg |
h |
(hour) |
3600 s |
l |
(litre) |
0.001 m3 |
lm |
(lumen) |
1 cd sr |
lx |
(lux) |
1 m−2 cd sr |
m |
(metre) |
1 m |
min |
(minute) |
60 s |
mol |
(mole) |
1 mol |
pc |
(parsec) |
3.08567758065×1016 m |
rad |
(radian) |
1 rad |
s |
(second) |
1 s |
sr |
(steradian) |
1 sr |
t |
(tonne) |
1000 kg |
Physical SI Units
Unit |
Name |
Value |
---|---|---|
“ |
(arcsec) |
4.8481368111×10−6 rad |
“_2 |
(square arcsec) |
2.35044305391× 10−11 sr |
‘ |
(arcmin) |
0.000290888208666 rad |
‘_2 |
(square arcmin) |
8.46159499408×10−8 sr |
: |
(hour) |
3600 s |
:: |
(minute) |
60 s |
Ah |
(ampere hour) |
3600 s A |
Angstrom |
(angstrom) |
1e-10 m |
Btu |
(British thermal unit (Int)) |
1055.056 m2 kg s−2 |
CM |
(metric carat) |
0.0002 kg |
Cal |
(large calorie (Int)) |
4186.8 m2 kg s−2 |
FU |
(flux unit) |
10−26 kg s−2 |
G |
(gauss) |
0.0001 kg s−2 A−1 |
Gal |
(gal) |
0.01 m s−2 |
Gb |
(gilbert) |
0.795774715459 A |
Mx |
(maxwell) |
10−8 m2 kg s−2 A−1 |
Oe |
(oersted) |
79.5774715459 m−1 A |
R |
(mile) |
0.000258 kg−1 s A |
St |
(stokes) |
0.0001 m2 s−1 |
Torr |
(torr) |
133.322368421 m−1 kg s−2 |
USfl_oz |
(fluid ounce (US)) |
2.95735295625×10−5 m3 |
USgal |
(gallon (US)) |
0.003785411784 m3 |
WU |
(WSRT flux unit) |
5× 10−29 kg s−2 |
abA |
(abampere) |
10 A |
abC |
(abcoulomb) |
10 s A |
abF |
(abfarad) |
109 m−2 kg−1 s4 A2 |
abH |
(abhenry) |
10−9 m2 kg s−2 A−2 |
abOhm |
(abohm) |
10−9 m2 kg s−3 A−2 |
abV |
(abvolt) |
10−8 m2 kg s−3 A−1 |
ac |
(acre) |
4046.8564224 m2 |
arcmin_2 |
(square arcmin) |
8.46-2159499408×10−8 sr |
arcsec_2 |
(square arcsec) |
2.35044305391×10−11 sr |
ata |
(technical atmosphere) |
98066.5 m−1.kg.s−2 |
atm |
(standard atmosphere) |
101325 m−1.kg.s−2 |
bar |
(bar) |
100000 m−1.kg.s−2 |
beam |
(undefined beam area) |
1 _ |
cal |
(calorie (Int)) |
4.1868 m2 kg s−2 |
count |
(count) |
1 _ |
cwt |
(hundredweight) |
50.80234544 kg |
deg_2 |
(square degree) |
0.000304617419787 sr |
dyn |
(dyne) |
10−5 m kg s−2 |
eV |
(electron volt) |
1.60217733×10−19 m2 kg s−2 |
erg |
(erg) |
10−7 m2 kg s−2 |
fl_oz |
(fluid ounce (Imp)) |
2.84130488996×10−5 m3 |
ft |
(foot) |
0.3048 m |
fu |
(flux unit) |
10−26 kg s−2 |
fur |
(furlong) |
201.168 m |
gal |
(gallon (Imp)) |
0.00454608782394 m3 |
ha |
(hectare) |
10000 m2 |
hp |
(horsepower) |
745.7 m2 kg s−3 |
in |
(inch) |
0.0254 m |
kn |
(knot (Imp)) |
0.514773333333 m s−1 |
lambda |
(lambda) |
1 _ |
lb |
(pound (avoirdupois)) |
0.45359237 kg |
ly |
(light year) |
9.46073047×1015 m |
mHg |
(metre of mercury) |
133322.387415 m−1 kg s−2 |
mile |
(mile) |
1609.344 m |
n_mile |
(nautical mile (Imp)) |
1853.184 m |
oz |
(ounce (avoirdupois)) |
0.028349523125 kg |
pixel |
(pixel) |
1 _ |
sb |
(stilb) |
10000 m−2 cd |
sq_arcmin |
(square arcmin) |
8.46159499408×10−8 sr |
sq_arcsec |
(square arcsec) |
2.35044305391×10−11 sr |
sq_deg |
(square degree) |
0.000304617419787 sr |
statA |
(statampere) |
3.33564095198×10−10 A |
statC |
(statcoulomb) |
3.33564095198×10−10 s A |
statF |
(statfarad) |
1.11188031733×10−12 m−2 kg−1 s4 A2 |
statH |
(stathenry) |
899377374000 m2 kg s−2 A−2 |
statOhm |
(statohm) |
899377374000 m2 kg s−3 A−2 |
statV |
(statvolt) |
299.792458 m2 kg s−3 A−1 |
u |
(atomic mass unit) |
1.661×10−27 kg |
yd |
(yard) |
0.9144 m |
yr |
(year) |
31557600 s |
Custom units available in CASA
Physical Constants¶
Physical constants recognized by CASA
The physical constants listed below are defined in CASA. They can be listed with the command
import casatools.constants
print qa.map('Constants')
For information about the constants as stored in the code, please see this Casacore document
Constant |
Name |
Value |
---|---|---|
pi |
3.14.. |
3.14159 |
ee |
2.71.. |
2.71828 |
c |
speed of light |
2.99792×108 m s−1 |
G |
grav. constant |
6.67259×1011 N m2 kg−2 |
h |
Planck constant |
6.62608×10−34 J s |
HI |
HI line frequency |
1420.41 MHz |
R |
gas constant |
8.31451 J K−1 mol−1 |
NA |
Avogadro # |
6.02214×1023 mol−1 |
e |
electron charge |
1.60218×10−19 C |
mp |
proton mass |
1.67262×10−27 kg |
mp_me |
mp/me |
1836.15 |
mu0 |
permeability vac. |
1.25664×10−6 H m−1 |
eps0 |
permittivity vac. |
8.85418782×10−12 F m-1 |
k |
Boltzmann constant |
1.38066×10−23 J K−1 |
F |
Faraday constant |
96485.3 C mol−1 |
me |
electron mass |
9.10939×10−31 kg |
re |
electron radius |
2.8179e×10−15 m |
a0 |
Bohrs radius |
5.2918×10−11 m |
R0 |
solar radius |
6.9599×108 m |
k2 |
IAU grav. const2 |
0.000295912 AU3 d−2 S0−1 |
Resources and Links¶
Useful supplementary information relevant to CASA
Tutorials¶
CASAguides contains the most comprehensive data reduction tutorials for ALMA (including simulations) and the VLA. Tutorials also exist for other telescopes, such as ATCA, SMA, and CARMA. The CASAguides wiki is also a great resource for tips and tricks in using CASA and contains user contributed scripts.
Workshops¶
ALMA community days webpages provide material that was used during workshops
The 2018 VLA data reduction workshop contains course material
Synthesis Imaging in Radio Astronomy II (pdf book (chapters)
Python¶
Dependencies/Libraries¶
Community Examples¶
A collection of community provided scripts covering various CASA tutorials, examples, demonstrations, tips, tricks and general best practices.
Open in Colab: https://colab.research.google.com/github/casangi/examples/blob/master/community/casa6_demo.ipynb
Modular CASA Demo¶
Original Author: rraba@nrao.edu
Description¶
This notebook shows how to install the modular CASA 6 packages with some basic orientation:
locate the casadata folder
list the available tasks
find and print the log file
run a simple tclean command
view the output images with Astropy
view the output images with CARTA
Installation¶
First the system must be configured with the appropriate pre-requisite libraries to create a virtual display necessary for later plots/images.
[ ]:
# prerequisite system setup
import os
os.system('apt-get install xvfb')
os.system('pip install pyvirtualdisplay')
from pyvirtualdisplay import Display
display = Display(visible=0,size=(1024,768))
display.start( )
print('completed system setup')
completed system setup
Then we can choose from the available CASA packages to install: casatools, casatasks, casaplotms, casaviewer, almatasks, casampi, casashell, casadata, casampi, casaplotserver
The pip installer generally handles dependencies automatically (for example casatasks needs casatools), however casadata is the exception that must be explicitly installed and updated by the user.
[ ]:
import os
print("installing casa packages...\n")
os.system("pip install casatasks==6.3.0.48")
os.system("pip install casaviewer==1.2.14")
os.system("pip install casadata")
print("downloading MeasurementSet from CASAguide First Look at Imaging...\n")
os.system("wget https://bulk.cv.nrao.edu/almadata/public/working/sis14_twhya_calibrated_flagged.ms.tar")
os.system("tar -xvf sis14_twhya_calibrated_flagged.ms.tar")
print("make a config file for Google Colab...\n")
!mkdir ~/.casa
!echo "home = '/content/'" > ~/.casa/config.py
!echo "datapath = ['`find / -type d -name casadata`']" >> ~/.casa/config.py
!more ~/.casa/config.py
installing casa packages...
downloading MeasurementSet from CASAguide First Look at Imaging...
make a config file for Google Colab...
home = '/content/'
datapath = ['/usr/local/lib/python3.7/dist-packages/casadata']
Getting Started¶
We can inspect the contents of a package, or better yet, read its corresponding API section in CASAdocs
[ ]:
import casatasks
casatasks.__all__
['casalog',
'version',
'version_string',
'imhead',
'immoments',
'imhistory',
'applycal',
'bandpass',
'blcal',
'calstat',
'concat',
'split',
'listobs',
'flagdata',
'flagcmd',
'setjy',
'cvel',
'cvel2',
'importuvfits',
'importfits',
'exportfits',
'exportuvfits',
'partition',
'listpartition',
'flagmanager',
'mstransform',
'tclean',
'immath',
'vishead',
'uvsub',
'spxfit',
'splattotable',
'specsmooth',
'specflux',
'smoothcal',
'specfit',
'imstat',
'slsearch',
'delmod',
'imsubimage',
'accor',
'asdmsummary',
'clearcal',
'conjugatevis',
'exportasdm',
'importasdm',
'clearstat',
'fixplanets',
'fixvis',
'phaseshift',
'fluxscale',
'ft',
'gaincal',
'gencal',
'uvcontsub3',
'testconcat',
'apparentsens',
'hanningsmooth',
'imcollapse',
'imcontsub',
'imdev',
'imfit',
'impbcor',
'importasap',
'importatca',
'importfitsidi',
'importgmrt',
'importnro',
'importvla',
'impv',
'imrebin',
'imreframe',
'imregrid',
'imsmooth',
'imtrans',
'imval',
'initweights',
'listcal',
'listfits',
'listhistory',
'listsdm',
'listvis',
'makemask',
'polcal',
'polfromgain',
'predictcomp',
'rerefant',
'rmfit',
'rmtables',
'sdatmcor',
'sdbaseline',
'sdcal',
'sdfit',
'sdfixscan',
'sdgaincal',
'sdimaging',
'sdsmooth',
'tsdimaging',
'nrobeamaverage',
'sdtimeaverage',
'simanalyze',
'simobserve',
'feather',
'simalma',
'statwt',
'virtualconcat',
'uvcontsub',
'uvmodelfit',
'visstat',
'widebandpbcor',
'importmiriad',
'plotweather',
'plotants',
'fringefit',
'plotbandpass',
'sdintimaging',
'sdpolaverage',
'sdsidebandsplit',
'plotprofilemap']
We execute tasks just like normal Python functions. Many times they will write information to the log or a specified output file, which we then must display.
[ ]:
from casatasks import listobs
rc = listobs(vis='sis14_twhya_calibrated_flagged.ms', listfile='obslist.txt', verbose=False, overwrite=True)
!cat obslist.txt
================================================================================
MeasurementSet Name: /content/sis14_twhya_calibrated_flagged.ms MS Version 2
================================================================================
Observer: cqi Project: uid://A002/X327408/X6f
Observation: ALMA(26 antennas)
Data records: 80563 Total elapsed time = 5647.68 seconds
Observed from 19-Nov-2012/07:36:57.0 to 19-Nov-2012/09:11:04.7 (UTC)
Fields: 5
ID Code Name RA Decl Epoch SrcId nRows
0 none J0522-364 05:22:57.984648 -36.27.30.85128 J2000 0 4200
2 none Ceres 06:10:15.950590 +23.22.06.90668 J2000 2 3800
3 none J1037-295 10:37:16.079736 -29.34.02.81316 J2000 3 16000
5 none TW Hya 11:01:51.796000 -34.42.17.36600 J2000 4 53161
6 none 3c279 12:56:11.166576 -05.47.21.52464 J2000 5 3402
Spectral Windows: (1 unique spectral windows and 1 unique polarization setups)
SpwID Name #Chans Frame Ch0(MHz) ChanWid(kHz) TotBW(kHz) CtrFreq(MHz) BBC Num Corrs
0 ALMA_RB_07#BB_2#SW-01#FULL_RES 384 TOPO 372533.086 610.352 234375.0 372649.9688 2 XX YY
Antennas: 21 'name'='station'
ID= 1-4: 'DA42'='A050', 'DA44'='A068', 'DA45'='A070', 'DA46'='A067',
ID= 5-9: 'DA48'='A046', 'DA49'='A029', 'DA50'='A045', 'DV02'='A077',
ID= 10-15: 'DV05'='A082', 'DV06'='A037', 'DV08'='A021', 'DV10'='A071',
ID= 16-19: 'DV13'='A072', 'DV15'='A074', 'DV16'='A069', 'DV17'='A138',
ID= 20-24: 'DV18'='A053', 'DV19'='A008', 'DV20'='A020', 'DV22'='A011',
ID= 25-25: 'DV23'='A007'
Another example, lets do channel averaging with MSTransform. Here we need to make sure we’ve deleted the previous output file if/when running multiple times. Since this task doesn’t return anything, we can look at the end of the log file to see what happened.
[ ]:
from casatasks import mstransform
os.system("rm -fr chanavg.ms")
mstransform(vis='sis14_twhya_calibrated_flagged.ms', outputvis='chanavg.ms',
datacolumn='DATA', chanaverage=True, chanbin=3)
!tail casa-202*.log
2021-10-14 17:43:24 INFO MSTransformManager::parseMsSpecParams Tile shape is [0]
2021-10-14 17:43:24 INFO MSTransformManager::parseChanAvgParams Channel average is activated
2021-10-14 17:43:24 INFO MSTransformManager::parseChanAvgParams Channel bin is [3]
2021-10-14 17:43:24 INFO MSTransformManager::colCheckInfo Adding DATA column to output MS from input DATA column
2021-10-14 17:43:24 INFO MSTransformManager::open Select data
2021-10-14 17:43:24 INFO MSTransformManager::createOutputMSStructure Create output MS structure
2021-10-14 17:43:24 INFO ParallelDataHelper::::casa Apply the transformations
2021-10-14 17:43:29 INFO mstransform::::casa Task mstransform complete. Start time: 2021-10-14 17:43:23.610120 End time: 2021-10-14 17:43:29.323998
2021-10-14 17:43:29 INFO mstransform::::casa ##### End Task: mstransform #####
2021-10-14 17:43:29 INFO mstransform::::casa ##########################################
Running tclean¶
Tclean works in non-interactive mode only (interactive=False).
[ ]:
from casatasks import tclean
print("running tclean, may take a bit...")
tclean(vis='sis14_twhya_calibrated_flagged.ms', imagename='first_image',
field='5', spw='', specmode='mfs', deconvolver='hogbom', nterms=1,
gridder='standard', imsize=[250,250], cell=['0.1arcsec'],
weighting='natural', threshold='0mJy', niter=5000,
interactive=False, savemodel='modelcolumn')
print("complete")
running tclean, may take a bit...
complete
View Images with Viewer¶
We can use the casaviewer package to view images, but we need to start the viewer manually as a separate process
[ ]:
import subprocess as sp
sp.Popen('/usr/local/lib/python3.7/dist-packages/casaviewer/__bin__/casaviewer-x86_64.AppImage',
shell=True, preexec_fn=os.setsid, stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.STDOUT)
<subprocess.Popen at 0x7fb797158190>
Now call imview and render the image to an output file where it can then be displayed
[ ]:
from casaviewer import imview
from IPython.display import Image
imview('first_image.image', out='test.png')
Image(filename="test.png")
(0) waiting for viewer process...
(1) waiting for viewer process...
(2) waiting for viewer process...
(3) waiting for viewer process...
(4) waiting for viewer process...
...{'id': 'casaviewer:b1fc', 'priority': 0, 'types': array(['shutdown', 'image-view', 'interactive-clean'], dtype='<U18'), 'uri': '0.0.0.0:44403'}

View Images with Astropy¶
We can use the image tool from casatools to load raw image data, then feed it to another Python package like Astropy and display it using Matplotlib.
Astropy is already installed in Google Colaboratory, but if running this on some other Jupyter Hub system, you’ll probably need to pip install astropy
. Also note that we didn’t explicitly install casatools either, but it was automatically installed as a dependency of casatasks.
[ ]:
from casatools import image as IA
from astropy.wcs import WCS
import matplotlib.pyplot as plt
import numpy as np
ia = IA()
ia.open('first_image.image')
pix = ia.getchunk()[:,:,0,0]
csys = ia.coordsys()
ia.close()
rad_to_deg = 180/np.pi
w = WCS(naxis=2)
w.wcs.crpix = csys.referencepixel()['numeric'][0:2]
w.wcs.cdelt = csys.increment()['numeric'][0:2]*rad_to_deg
w.wcs.crval = csys.referencevalue()['numeric'][0:2]*rad_to_deg
w.wcs.ctype = ['RA---SIN', 'DEC--SIN']
plt.subplots(1,1, figsize=(10,7))
ax = plt.subplot(1, 1, 1, projection=w)
p1 = int(pix.shape[0]*0.25)
p2 = int(pix.shape[0]*0.75)
im = ax.imshow(pix[p1:p2,p1:p2].transpose(), origin='lower', cmap=plt.cm.viridis)
plt.colorbar(im, ax=ax)
ax.set_xlabel('Right Ascension')
ax.set_ylabel('Declination')

Open in Colab: https://colab.research.google.com/github/casangi/examples/blob/master/community/jupyter_and_x11.ipynb
PlotMS in a Notebook¶
Original Author: drs@nrao.edu
Description¶
X11 is a client/server windowing system. The client communicates with the server using the X Window System Core Protocol. This architecture allows the X11 server to accept drawing commands from a remote client and render the GUI drawing commands on the hardware which the server controls/uses, e.g. monitor, keyboard and mouse. The X11 server renders the drawing commands onto a frame buffer. This buffer allows for caching drawing results. At the appropriate time, the buffer contents are flushed video RAM.
This architecture allowed for the implementation of a virtual Frame Buffer and an accompanying virtual server. This allows X11 applications to run without actual display hardware. This is the only way to use X11 (including Qt) applications in the context of a Jupyter notebook. There is no other way.
If someone implemented an X11 server designed for use with Jupyter notebooks, the display of X11 applications within the notebook may be possible. This potential server would behave like XQuartz (the X11 server implementation for MacOS which renders X11 widgets using the native GUI environment on MacOS).
This notebook shows how virtual frame buffers can be used to run X11 applications within a Jupyter notebook. The application used is casaplotms, a Qt application.
The first step is to perform som configuration steps:
[1]:
# Installation
import os
print("performing setup (tasks a few minutes)...")
os.system('apt-get install xvfb')
os.system('pip install casaplotms')
os.system('pip install casadata')
os.system('pip install pyvirtualdisplay')
print('complete')
performing setup (tasks a few minutes)...
complete
Setup Virtual Frame Buffer¶
The pyvirtualdisplay package can be used to configure and launch a virtual frame buffer server. However, the way new processes are directed to the new frame buffer server is via a unix shell environment variable, DISPLAY. This means that this frame buffer server will be used by all X11 processes. This makes it difficult to encapsulate this pattern.
[2]:
from pyvirtualdisplay import Display
display = Display(visible=0,size=(2048,2048))
display.start( )
[2]:
<pyvirtualdisplay.display.Display at 0x7f0d0f7c2810>
Get Data¶
Somehow data must be made available…
[3]:
print("download some data...")
os.system("wget https://bulk.cv.nrao.edu/almadata/public/working/sis14_twhya_calibrated_flagged.ms.tar")
os.system("tar -xvf sis14_twhya_calibrated_flagged.ms.tar")
print('complete')
download some data...
complete
Plot Data¶
Start X11 application and plot data…
[4]:
from casaplotms import plotms
print('making the plot...')
plotms(vis='sis14_twhya_calibrated_flagged.ms',plotfile='sis14_twhya_calibrated_flagged.jpg',showgui=False,highres=True,width=600,height=350,overwrite=True)
print('complete')
making the plot...
complete
Display the Exported Raster Image¶
[5]:
from IPython.display import Image
Image(filename="sis14_twhya_calibrated_flagged.jpg")
[5]:

Open in Colab: https://colab.research.google.com/github/casangi/examples/blob/master/community/phaseshift.ipynb
Numerical Accuracy of Task phaseshift¶
Original author: dmehring@nrao.edu
Description¶
This notebook demonstrates the determination of the numercial accuracy of task phaseshift. Much of the simulation-related code is adapted from that of rurvashi https://gitlab.nrao.edu/rurvashi/simulation-in-casa-6/-/blob/master/Simulation_Script_Demo.ipynb
[1]:
import os
print("installing pre-requisite packages...")
os.system("pip install scipy matplotlib")
print("installing casa...")
# make sure there is a version that contains phaseshift
os.system("pip install casatasks==6.3.0.48")
os.system("pip install casadata")
print("complete")
installing pre-requisite packages...
installing casa...
complete
[2]:
# imports and set globals
from casatools import componentlist, ctsys, measures, quanta, simulator
from casatasks import flagdata, imfit, imstat, phaseshift, tclean
from casatasks.private import simutil
import glob
import os
import shutil
cl = componentlist()
me = measures()
qa = quanta()
sm = simulator()
comp_list = 'sim_onepoint.cl'
orig_ms = 'sim_data.ms'
orig_im = 'im_orig'
pshift_ms = 'sim_data_pshift.ms'
pshift_im = 'im_post_phaseshift'
tshift_im = 'im_post_tclean_shift'
# simulated source flux density
exp_flux = 5
[3]:
def cleanup():
"""clean up from any previous runs"""
for f in (comp_list, orig_ms, pshift_ms):
if os.path.exists(f):
shutil.rmtree(f)
for x in (orig_ms, orig_im, pshift_im, tshift_im):
for path in glob.glob(x + '.*'):
shutil.rmtree(path)
[4]:
def __direction_string(ra, dec, frame):
"""helper function for often needed string"""
return ' '.join([frame, ra, dec])
[5]:
def __makeMSFrame(antenna_file, spwname, freq, radir, decdir, dirframe):
"""
Construct an empty Measurement Set that has the desired
observation setup.
"""
# Open the simulator
sm.open(ms=orig_ms)
# Read/create an antenna configuration.
# Canned antenna config text files are located at
# /home/casa/data/trunk/alma/simmos/*cfg
# antennalist = os.path.join(ctsys.resolve("alma/simmos"), "vla.d.cfg")
# Fictitious telescopes can be simulated by specifying x, y, z, d,
# an, telname, antpos.
# x,y,z are locations in meters in ITRF (Earth centered)
# coordinates.
# d, an are lists of antenna diameter and name.
# telname and obspos are the name and coordinates of the
# observatory.
(x, y, z, d, an, an2, telname, obspos) = (
simutil.simutil().readantenna(antenna_file)
)
# Set the antenna configuration
sm.setconfig(
telescopename=telname, x=x, y=y, z=z, dishdiameter=d,
mount=['alt-az'], antname=an, coordsystem='global',)
referencelocation=me.observatory(telname)
# Set the polarization mode (this goes to the FEED subtable)
sm.setfeed(mode='perfect R L', pol=[''])
# Set the spectral window and polarization (one
# data-description-id).
# Call multiple times with different names for multiple SPWs or
# pol setups.
sm.setspwindow(
spwname=spwname, freq=freq, deltafreq='0.1GHz',
freqresolution='0.2GHz', nchannels=1, stokes='RR LL'
)
# Setup source/field information (i.e. where the observation phase
# center is) Call multiple times for different pointings or source
# locations.
sm.setfield(
sourcename="fake", sourcedirection=me.direction(
rf=dirframe, v0=radir, v1=decdir
)
)
# Set shadow/elevation limits (if you care). These set flags.
sm.setlimits(shadowlimit=0.01, elevationlimit='1deg')
# Leave autocorrelations out of the MS.
sm.setauto(autocorrwt=0.0)
# Set the integration time, and the convention to use for timerange
# specification
# Note : It is convenient to pick the hourangle mode as all times
# specified in sm.observe() will be relative to when the source
# transits.
sm.settimes(
integrationtime='60s', usehourangle=True,
referencetime=me.epoch('UTC', '2019/10/4/00:00:00')
)
# Construct MS metadata and UVW values for one scan and ddid
# Call multiple times for multiple scans.
# Call this with different sourcenames (fields) and spw/pol
# settings as defined above.
# Timesteps will be defined in intervals of 'integrationtime',
# between starttime and stoptime.
sm.observe(
sourcename="fake", spwname=spwname, starttime='-5.0h',
stoptime='+5.0h'
)
# Close the simulator
sm.close()
# Unflag everything (unless you care about elevation/shadow flags)
flagdata(vis=orig_ms, mode='unflag')
[6]:
def __makeCompList(ra, dec, frame):
"""make a componentlist of point sources"""
# Add sources, one at a time.
# Call multiple times to add multiple sources.
# ( Change the 'dir', obviously )
cl.addcomponent(
dir=__direction_string(ra, dec, frame),
flux=exp_flux,
fluxunit='Jy', freq='1.5GHz', shape='point',
spectrumtype="constant"
)
# Save the file
cl.rename(filename=comp_list)
cl.done()
[7]:
def __predictSimFromComplist():
sm.openfromms(orig_ms)
# Predict from a component list
sm.predict(complist=comp_list, incremental=False)
sm.close()
[8]:
def __summarize(imagename, imfit_box, sim_source_dir, label, prec):
# get the image statistics, and print the world coordinates of the maxposf,
# and note that they are within one 8" pixel of the simulated source position,
# as expected.
x = imstat(imagename)
# to be even more accurate, fit a 2-D gaussian to the source to show that, to
# approximately within the fit errors, the position is coincident to the
# expected position
y = imfit(imagename, box=imfit_box)
poserr = y['deconvolved']['component0']['shape']['direction']['error']
print(label)
print('Simulated source position', sim_source_dir)
print("coordinates of max position from imstat", x['maxposf'])
cl.fromrecord(y['deconvolved'])
rd = cl.getrefdir(0)
cl.done()
ra_err = qa.div(
qa.div(qa.quantity(poserr['longitude']), 15),
qa.cos(qa.quantity(rd['m1']))
)
ra_err['unit'] = 's'
dec_err = qa.quantity(poserr['latitude'])
print(
"fitted position from imfit",
qa.time(qa.totime(qa.quantity(rd['m0'])), prec=6+prec, form='hms')[0], '\u00b1',
qa.tos(ra_err, prec=prec),
qa.angle(qa.totime(qa.quantity(rd['m1'])), prec=6+prec)[0], '\u00b1',
qa.tos(dec_err, prec=prec),
)
[9]:
def verify():
def __create_input_ms():
"""create the input MS"""
__makeMSFrame(antenna_file, spwname, freq, fra, fdec, fframe)
# Make the component list
__makeCompList(radir, decdir, dirframe)
# Predict Visibilities
__predictSimFromComplist()
for observatory in ('VLA', 'ALMA'):
cleanup()
print(observatory, 'simulation:')
# This is the source position
if observatory == 'VLA':
radir = '19h49m43'
decdir = '38d45m15'
dirframe = 'J2000'
ant_cfg = "vla.d.cfg"
spwname = 'LBand'
freq = '1.0GHz'
cell = '8.0arcsec'
imfit_box = '1870, 165, 1890, 185'
prec = 3
elif observatory == 'ALMA':
radir = '19h59m33.2'
decdir = '40d40m53.2'
dirframe = 'J2000'
antenna_file = ant_cfg = 'alma.cycle8.8.cfg'
spwname = 'Band4'
freq = '150GHz'
cell = '0.06arcsec'
imfit_box = '123, 1876, 143, 1896'
prec = 5
antenna_file = os.path.join(ctsys.resolve("alma/simmos"), ant_cfg)
dirstring = __direction_string(radir, decdir, dirframe)
# this is the original phase center
fra = '19h59m28.5'
fdec = '+40.40.01.5'
fframe = 'J2000'
print("Create the simulated MS.\n")
__create_input_ms()
print(
"Image simulated MS with no phase shift. The source is offset from\n"
"the phase center of the image. We use wproject and wprojplanes\n"
"to correctly account for the non-negligible values of the w\n"
"coordinate because the source is far from the phase center.\n"
)
tclean(
vis=orig_ms, imagename=orig_im, datacolumn='data',
imsize=2048, cell=cell, gridder='wproject',
niter=20, gain=0.2, wprojplanes=128, pblimit=-0.1
)
print("Summarize results:\n")
__summarize(
orig_im + '.image', imfit_box, dirstring, 'Image with no shift applied', 5
)
print(
"\nNow use phaseshift to shift the phase center of the\n"
"MS to the source position.\n"
)
phaseshift(vis=orig_ms, outputvis=pshift_ms, phasecenter=dirstring)
print(
"Image the phase shifted MS. The image can be significantly\n"
"smaller because the source will now be at the image center.\n"
)
tclean(
vis=pshift_ms, imagename=pshift_im, datacolumn='data',
imsize=256, cell=cell, gridder='wproject',
niter=20, gain=0.2, wprojplanes=128, pblimit=-0.1
)
print("Summarize the results:\n")
__summarize(
pshift_im + '.image', '118, 118, 138, 138', dirstring,
'Phase shifted image using phaseshift to set the phase center', prec + 3
)
print("\nNow image the original, unshifted MS using a phase shift in tclean\n")
tclean(
vis=orig_ms, imagename=tshift_im, datacolumn='data',
imsize=256, cell=cell, gridder='wproject',
niter=20, gain=0.2, wprojplanes=128, pblimit=-0.1,
phasecenter=__direction_string(radir, decdir, dirframe)
)
print("Summarize the results\n")
__summarize(
tshift_im + '.image', '118, 118, 138, 138', dirstring,
'Phase shifted image using tclean to set phase center', prec + 3
)
print()
[10]:
if __name__ == '__main__':
verify()
VLA simulation:
Create the simulated MS.
Image simulated MS with no phase shift. The source is offset from
the phase center of the image. We use wproject and wprojplanes
to correctly account for the non-negligible values of the w
coordinate because the source is far from the phase center.
Summarize results:
Image with no shift applied
Simulated source position J2000 19h49m43 38d45m15
coordinates of max position from imstat 19:49:42.907, +38.45.13.185, I, 999980842.28Hz
fitted position from imfit 19:49:42.99118 ± 0.01171s +038.45.15.21608 ± 0.10346arcsec
Now use phaseshift to shift the phase center of the
MS to the source position.
Image the phase shifted MS. The image can be significantly
smaller because the source will now be at the image center.
Summarize the results:
Phase shifted image using phaseshift to set the phase center
Simulated source position J2000 19h49m43 38d45m15
coordinates of max position from imstat 19:49:43.000, +38.45.15.000, I, 999983449.88Hz
fitted position from imfit 19:49:43.000008 ± 0.001796s +038.45.14.999908 ± 0.016103arcsec
Now image the original, unshifted MS using a phase shift in tclean
Summarize the results
Phase shifted image using tclean to set phase center
Simulated source position J2000 19h49m43 38d45m15
coordinates of max position from imstat 19:49:43.000, +38.45.15.000, I, 999980842.28Hz
fitted position from imfit 19:49:42.989652 ± 0.001851s +038.45.14.869487 ± 0.016600arcsec
ALMA simulation:
Create the simulated MS.
Image simulated MS with no phase shift. The source is offset from
the phase center of the image. We use wproject and wprojplanes
to correctly account for the non-negligible values of the w
coordinate because the source is far from the phase center.
Summarize results:
Image with no shift applied
Simulated source position J2000 19h59m33.2 40d40m53.2
coordinates of max position from imstat 19:59:33.200, +40.40.53.214, I, 1.49997e+11Hz
fitted position from imfit 19:59:33.20002 ± 0.00002s +040.40.53.20107 ± 0.00038arcsec
Now use phaseshift to shift the phase center of the
MS to the source position.
Image the phase shifted MS. The image can be significantly
smaller because the source will now be at the image center.
Summarize the results:
Phase shifted image using phaseshift to set the phase center
Simulated source position J2000 19h59m33.2 40d40m53.2
coordinates of max position from imstat 19:59:33.200, +40.40.53.200, I, 1.49997e+11Hz
fitted position from imfit 19:59:33.20000000 ± 0.00000483s +040.40.53.19999711 ± 0.00007946arcsec
Now image the original, unshifted MS using a phase shift in tclean
Summarize the results
Phase shifted image using tclean to set phase center
Simulated source position J2000 19h59m33.2 40d40m53.2
coordinates of max position from imstat 19:59:33.200, +40.40.53.200, I, 1.49997e+11Hz
fitted position from imfit 19:59:33.20007211 ± 0.00000624s +040.40.53.20079618 ± 0.00010252arcsec
Overview¶
The script performs two simulations, the first based on the VLA and the second based on ALMA. For each, an MS is created using the simulator tool. In order to make the verification process more transparent, no noise is introduced into the simulated data. The visibilities are then predicted using a 5 Jy point source that has a significant offset (so that the w coordinate is non- negligible) from the phase center. The MS is imaged by tclean using no phase center shift, to illustrate that the source is indeed significantly offset from the phase center. Because proper account of the w coordinate must be made, gridder=’wproject’ and wprojectplanes are set in tclean. The peak pixel coordinates are computed via the task imstat and a two dimensional Gaussian fit using the task imfit is performed to further constrain the source position to illustrate that it is indeed located at the coordinates specified when creating the MS. The phaseshift task is then run on the original MS, shifting the phase center to the position of the source. The resulting MS is imaged using tclean. Both imstat and imfit are run as before to verify the source coordinates. Finally, tclean is run using the original MS and specifying the phasecenter parameter to be coincident with the source position, to illustrate that tclean can also properly shift the phase center. Results, described in the next sections, indicate that both phaseshift and tclean properly shift the phase center to within a small fraction of a pixel, and that phaseshift is slightly more accurate, although both are very close to the expected result.
Results¶
The output of the script, when run on an RHEL 7 machine, is ### VLA simulation Image with no shift applied:
Simulated source position: J2000 19h49m43 38d45m15
coordinates of max position from imstat: 19:49:42.907, +38.45.13.185, I, 999980842.28Hz
fitted position from imfit 19:49:42.99118 ± 0.01171s +038.45.15.21608 ± 0.10346arcsec
Phase shifted image using phaseshift to set the phase center:
Simulated source position J2000 19h49m43 38d45m15
coordinates of max position from imstat 19:49:43.000, +38 45.15.000, I, 999983449.88Hz
fitted position from imfit 19:49:43.000008 ± 0.001796s +038.45.14.999908 ± 0.016103arcsec
Phase shifted image using tclean to set phase center:
Simulated source position J2000 19h49m43 38d45m15
coordinates of max position from imstat 19:49:43.000, +38.45.15.000, I, 999980842.28Hz
fitted position from imfit 19:49:42.989652 ± 0.001851s +038.45.14.869487 ± 0.016600arcsec
ALMA simulation¶
Image with no shift applied:
Simulated source position J2000 19h59m33.2 40d40m53.2
coordinates of max position from imstat 19:59:33.200, +40.40.53.214, I, 1.49997e+11Hz
fitted position from imfit 19:59:33.20002 ± 0.00002s +040.40.53.20107 ± 0.00038arcsec
Phase shifted image using phaseshift to set the phase center:
Simulated source position J2000 19h59m33.2 40d40m53.2
coordinates of max position from imstat 19:59:33.200, +40.40.53.200, I, 1.49997e+11Hz
fitted position from imfit 19:59:33.20000000 ± 0.00000442s +040.40.53.19999711 ± 0.00007270arcsec
Phase shifted image using tclean to set phase center:
Simulated source position J2000 19h59m33.2 40d40m53.2
coordinates of max position from imstat 19:59:33.200, +40.40.53.200, I, 1.49997e+11Hz
fitted position from imfit 19:59:33.20007214 ± 0.00000593s +040.40.53.20079661 ± 0.00009744arcsec
VLA Simulation Details¶
The VLA simulation uses antenna positions in the D configuration and a frequency of 1.0 GHz. The resulting images have 8.0” pixels. The phase center and the source in the original MS were separated by about 2.7 degrees. Figure 1 shows the full image created from the original MS. The point source can be seen in the lower right corner of the image. Figure 2 shows the image created from the MS after running phaseshift to shift the phase center to the position of the source. The contours represent the image created from the original MS, using tclean to shift the phase center to the source position using the phasecenter parameter. Figure 3 is the central portion of Figure 2.The results (see above) indicate that source position in the image that is not phase shifted is indeed as expected. The somewhat large fit errors are likely due to the fact that the source is not located at the center of a pixel. The results for the image created from the MS that has been produced by phaseshift show excellent agreement with the expected source position. The fit uncertainty provides an upper limit of about 30 marcsec (0.003 pixels) of the offset of the source from the phase center, and the best fit results are two orders of magnitude better than that at about 0.1 marcsec (0.00002 pixels). The results of the image created from the unshifted MS by applying the phase shift in tclean using the phasecenter parameter are also very good, although not as good as the image made from the MS created using phaseshift. In this case, the offset between source and phase center falls significantly outside the fit uncertainty, with a separation of about 0.2arcsec (0.02 pixels).
Figure 1: VLA simulated data image prior to phase shift. The source is in the lower right corner.
Figure 2: VLA simulated data full image after running phaseshift. The contours represent the image created by setting the phasecenter parameter in tclean.
Figure 3: VLA simulated data image central portion after running phaseshift. The contours represent the image created by setting the phasecenter parameter in tclean.
ALMA Simulation Details¶
The ALMA simulation uses antenna positions in the eighth configuration of cycle 8 and a frequency of 150 GHz. The resulting images have 60 marcsec pixels. The phase center and the source in the original MS were separated by about 1.2 arcmin. Figure 4 shows the full image created from the original MS. The point source can be seen in the upper left corner of the image. Figure 5 shows the image created from the MS after running phaseshift to shift the phase center to the position of the source. The contours represent the image created from the original MS, and using tclean to shift the phase center to the source position using the phasecenter parameter. Figure 6 is the central portion of Figure 5. The results (see above) indicate that source position in the image that is not phase shifted is indeed as expected. The somewhat large fit errors are likely due to the fact that the source is not located at the center of a pixel. The results for the image created from the MS that has been produced by phaseshift show excellent agreement with the expected source position. The fit uncertainty provides an upper limit of about 90 μarcsec (0.001 pixels) of the offset of the sourcefrom the phase center, and the best fit results are an order of magnitude better than that at about 3 μarcsec (0.00005 pixels). The results of the image created from the unshifted MS by applying the phase shift in tclean using the phasecenter parameter are also very good, although not as good as the image made from the MS created using phaseshift. In this case, the offset between source and phase center falls significantly outside the fit uncertainty, with a separation of about 1 marcsec (0.02 pixels).
Figure 4: ALMA simulated data image prior to phase shift. The source is in the upper left corner.
Figure 5: ALMA simulated data full image after running phaseshift. The contours represent the image created by setting the phasecenter parameter in tclean.
Figure 6: ALMA simulated data image central portion after running phaseshift. The contours represent the image created by setting the phasecenter parameter in tclean.
Open in Colab: https://colab.research.google.com/github/casangi/examples/blob/master/community/simulation_script_demo.ipynb
Simulation in CASA¶
Original Author: rurvashi@aoc.nrao.edu
Description¶
Get creative with data sets to be used for test scripts and characterization of numerical features/changes. This notebook goes beneath the simobserve task and illustrates simple ways in which developers and test writers can make full use of the flexibility offered by our tools and the imager framework. It also exercises some usage modes that our users regularly encounter and exposes some quirks of our scripting interface(s). Rudimentary image and data display routines are included below.
Topics Covered below
- Install CASA 6 and Import required libraries
- Make an empty MS with the desired sub-structure
- Make a true sky model
- Predict visibilities onto the DATA column of the MS
- Add noise and other errors
- A few example use cases
- Image one channel
- Cube imaging with a spectral line
- Continuum wideband imaging with model subtraction
- Self-calibration and imaging
- Ideas for CASA developers and test writers to do beyond these examples.
Installation¶
Option 1 : Install local python3
export PPY=`which python3`
virtualenv -p $PPY --setuptools ./local_python3
./local_python3/bin/pip install --upgrade pip
./local_python3/bin/pip install --upgrade numpy matplotlib ipython astropy
./local_python3/bin/pip install --extra-index-url https://casa-pip.nrao.edu/repository/pypi-group/simple casatools
./local_python3/bin/pip install --extra-index-url https://casa-pip.nrao.edu/repository/pypi-group/simple casatasks
./local_python3/bin/pip3 install jupyter
Option 2 : Install at runtime (for Google Colab)
[1]:
import os
print("installing pre-requisite packages...")
os.system("apt-get install libgfortran3")
print("installing casa...")
os.system("pip install --index-url https://casa-pip.nrao.edu:443/repository/pypi-group/simple casatasks==6.2.0.106")
os.system("pip install --index-url https://casa-pip.nrao.edu:443/repository/pypi-group/simple casadata")
print("complete")
installing pre-requisite packages...
installing casa...
complete
Import Libraries
[2]:
# Import required tools/tasks
from casatools import simulator, image, table, coordsys, measures, componentlist, quanta, ctsys
from casatasks import tclean, ft, imhead, listobs, exportfits, flagdata, bandpass, applycal
from casatasks.private import simutil
import os
import pylab as pl
import numpy as np
from astropy.io import fits
from astropy.wcs import WCS
# Instantiate all the required tools
sm = simulator()
ia = image()
tb = table()
cs = coordsys()
me = measures()
qa = quanta()
cl = componentlist()
mysu = simutil.simutil()
Make an empty MS with the desired uvw/scan/field/ddid setup¶
Construct an empty Measurement Set that has the desired observation setup. This includes antenna configuration, phase center direction, spectral windows, date and timerange of the observation, structure of scans/spws/obsidd/fieldids (and all other MS metadata). Evaluate UVW coordinates for the entire observation and initialize the DATA column to zero.
Methods
Make an empty MS
[3]:
def makeMSFrame(msname = 'sim_data.ms'):
"""
Construct an empty Measurement Set that has the desired observation setup.
"""
os.system('rm -rf '+msname)
## Open the simulator
sm.open(ms=msname);
## Read/create an antenna configuration.
## Canned antenna config text files are located here : /home/casa/data/trunk/alma/simmos/*cfg
antennalist = os.path.join( ctsys.resolve("alma/simmos") ,"vla.d.cfg")
## Fictitious telescopes can be simulated by specifying x, y, z, d, an, telname, antpos.
## x,y,z are locations in meters in ITRF (Earth centered) coordinates.
## d, an are lists of antenna diameter and name.
## telname and obspos are the name and coordinates of the observatory.
(x,y,z,d,an,an2,telname, obspos) = mysu.readantenna(antennalist)
## Set the antenna configuration
sm.setconfig(telescopename=telname,
x=x,
y=y,
z=z,
dishdiameter=d,
mount=['alt-az'],
antname=an,
coordsystem='global',
referencelocation=me.observatory(telname));
## Set the polarization mode (this goes to the FEED subtable)
sm.setfeed(mode='perfect R L', pol=['']);
## Set the spectral window and polarization (one data-description-id).
## Call multiple times with different names for multiple SPWs or pol setups.
sm.setspwindow(spwname="LBand",
freq='1.0GHz',
deltafreq='0.1GHz',
freqresolution='0.2GHz',
nchannels=10,
stokes='RR LL');
## Setup source/field information (i.e. where the observation phase center is)
## Call multiple times for different pointings or source locations.
sm.setfield( sourcename="fake",
sourcedirection=me.direction(rf='J2000', v0='19h59m28.5s',v1='+40d44m01.5s'));
## Set shadow/elevation limits (if you care). These set flags.
sm.setlimits(shadowlimit=0.01, elevationlimit='1deg');
## Leave autocorrelations out of the MS.
sm.setauto(autocorrwt=0.0);
## Set the integration time, and the convention to use for timerange specification
## Note : It is convenient to pick the hourangle mode as all times specified in sm.observe()
## will be relative to when the source transits.
sm.settimes(integrationtime='2000s',
usehourangle=True,
referencetime=me.epoch('UTC','2019/10/4/00:00:00'));
## Construct MS metadata and UVW values for one scan and ddid
## Call multiple times for multiple scans.
## Call this with different sourcenames (fields) and spw/pol settings as defined above.
## Timesteps will be defined in intervals of 'integrationtime', between starttime and stoptime.
sm.observe(sourcename="fake",
spwname='LBand',
starttime='-5.0h',
stoptime='+5.0h');
## Close the simulator
sm.close()
## Unflag everything (unless you care about elevation/shadow flags)
flagdata(vis=msname,mode='unflag')
Plot columns of the MS
[4]:
def plotData(msname='sim_data.ms', myplot='uv'):
"""
Options : myplot='uv'
myplot='data_spectrum'
"""
from matplotlib.collections import LineCollection
tb.open(msname)
# UV coverage plot
if myplot=='uv':
pl.figure(figsize=(4,4))
pl.clf()
uvw = tb.getcol('UVW')
pl.plot( uvw[0], uvw[1], '.')
pl.plot( -uvw[0], -uvw[1], '.')
pl.title('UV Coverage')
# Spectrum of chosen column. Make a linecollection out of each row in the MS.
if myplot=='data_spectrum' or myplot=='corr_spectrum' or myplot=='resdata_spectrum' or myplot=='rescorr_spectrum' or myplot=='model_spectrum':
dats=None
if myplot=='data_spectrum':
dats = tb.getcol('DATA')
if myplot=='corr_spectrum':
dats = tb.getcol('CORRECTED_DATA')
if myplot=='resdata_spectrum':
dats = tb.getcol('DATA') - tb.getcol('MODEL_DATA')
if myplot=='rescorr_spectrum':
dats = tb.getcol('CORRECTED_DATA') - tb.getcol('MODEL_DATA')
if myplot=='model_spectrum':
dats = tb.getcol('MODEL_DATA')
xs = np.zeros((dats.shape[2],dats.shape[1]),'int')
for chan in range(0,dats.shape[1]):
xs[:,chan] = chan
npl = dats.shape[0]
fig, ax = pl.subplots(1,npl,figsize=(10,4))
for pol in range(0,dats.shape[0]):
x = xs
y = np.abs(dats[pol,:,:]).T
data = np.stack(( x,y ), axis=2)
ax[pol].add_collection(LineCollection(data))
ax[pol].set_title(myplot + ' \n pol '+str(pol))
ax[pol].set_xlim(x.min(), x.max())
ax[pol].set_ylim(y.min(), y.max())
pl.show()
tb.close()
Examples
Make a Measurement Set and inspect it
[5]:
makeMSFrame()
[6]:
plotData(myplot='uv')

[7]:
listobs(vis='sim_data.ms', listfile='obslist.txt', verbose=False, overwrite=True)
## print(os.popen('obslist.txt').read()) # ?permission denied?
fp = open('obslist.txt')
for aline in fp.readlines():
print(aline.replace('\n',''))
fp.close()
================================================================================
MeasurementSet Name: /content/sim_data.ms MS Version 2
================================================================================
Observer: CASA simulator Project: CASA simulation
Observation: VLA(27 antennas)
Data records: 6318 Total elapsed time = 36000 seconds
Observed from 03-Oct-2019/21:21:40.2 to 04-Oct-2019/07:21:40.2 (UTC)
Fields: 1
ID Code Name RA Decl Epoch SrcId nRows
0 fake 19:59:28.500000 +40.44.01.50000 J2000 0 6318
Spectral Windows: (1 unique spectral windows and 1 unique polarization setups)
SpwID Name #Chans Frame Ch0(MHz) ChanWid(kHz) TotBW(kHz) CtrFreq(MHz) Corrs
0 LBand 10 TOPO 1000.000 100000.000 1000000.0 1450.0000 RR LL
Antennas: 27 'name'='station'
ID= 0-5: 'W01'='P', 'W02'='P', 'W03'='P', 'W04'='P', 'W05'='P', 'W06'='P',
ID= 6-11: 'W07'='P', 'W08'='P', 'W09'='P', 'E01'='P', 'E02'='P', 'E03'='P',
ID= 12-17: 'E04'='P', 'E05'='P', 'E06'='P', 'E07'='P', 'E08'='P', 'E09'='P',
ID= 18-23: 'N01'='P', 'N02'='P', 'N03'='P', 'N04'='P', 'N05'='P', 'N06'='P',
ID= 24-26: 'N07'='P', 'N08'='P', 'N09'='P'
Make a True Sky Model (component list and/or image)¶
Construct a true sky model for which visibilities will be simulated and stored in the DATA column. This could be a component list (with real-world positions and point or gaussian component types), or a CASA image with a real-world coordinate system and pixels containing model sky values. It is possible to also evaluate component lists onto CASA images.
Methods
Make a source list
Once made,it can be used either for direction evaluation of simulated visibilities, or first evaluated onto a CASA image before visibility prediction.
[8]:
def makeCompList(clname_true='sim_onepoint.cl'):
# Make sure the cl doesn't already exist. The tool will complain otherwise.
os.system('rm -rf '+clname_true)
cl.done()
# Add sources, one at a time.
# Call multiple times to add multiple sources. ( Change the 'dir', obviously )
cl.addcomponent(dir='J2000 19h59m28.5s +40d44m01.5s',
flux=5.0, # For a gaussian, this is the integrated area.
fluxunit='Jy',
freq='1.5GHz',
shape='point', ## Point source
# shape='gaussian', ## Gaussian
# majoraxis="5.0arcmin",
# minoraxis='2.0arcmin',
spectrumtype="spectral index",
index=-1.0)
# Print out the contents of the componentlist
#print('Contents of the component list')
#print(cl.torecord())
# Save the file
cl.rename(filename=clname_true)
cl.done()
Make an empty CASA image
[9]:
def makeEmptyImage(imname_true='sim_onepoint_true.im'):
## Define the center of the image
radir = '19h59m28.5s'
decdir = '+40d44m01.5s'
## Make the image from a shape
ia.close()
ia.fromshape(imname_true,[256,256,1,10],overwrite=True)
## Make a coordinate system
cs=ia.coordsys()
cs.setunits(['rad','rad','','Hz'])
cell_rad=qa.convert(qa.quantity('8.0arcsec'),"rad")['value']
cs.setincrement([-cell_rad,cell_rad],'direction')
cs.setreferencevalue([qa.convert(radir,'rad')['value'],qa.convert(decdir,'rad')['value']],type="direction")
cs.setreferencevalue('1.0GHz','spectral')
cs.setreferencepixel([0],'spectral')
cs.setincrement('0.1GHz','spectral')
## Set the coordinate system in the image
ia.setcoordsys(cs.torecord())
ia.setbrightnessunit("Jy/pixel")
ia.set(0.0)
ia.close()
### Note : If there is an error in this step, subsequent steps will give errors of " Invalid Table Operation : SetupNewTable.... imagename is already opened (is in the table cache)"
## The only way out of this is to restart the kernel (equivalent to exit and restart CASA).
## Any other way ?
Evaluate the component list onto the image cube
[10]:
def evalCompList(clname='sim_onepoint.cl', imname='sim_onepoint_true.im'):
## Evaluate a component list
cl.open(clname)
ia.open(imname)
ia.modify(cl.torecord(),subtract=False)
ia.close()
cl.done()
Edit pixel values directly
[11]:
def editPixels(imname='sim_onepoint_true.im'):
## Edit pixel values directly
ia.open(imname)
pix = ia.getchunk()
shp = ia.shape()
#pix.fill(0.0)
#pix[ int(shp[0]/2), int(shp[1]/2), 0, :] = 4.0 # A flat spectrum unpolarized source of amplitude 1 Jy and located at the center of the image.
pix[ int(shp[0]/2), int(shp[1]/2), 0, 6] = pix[ int(shp[0]/2), int(shp[1]/2), 0, 6] + 2.0 # Add a spectral line in channel 1
ia.putchunk( pix )
ia.close()
View an Image Cube
Use some image viewer, or just pull the pixels out and use matplotlib
[12]:
# Display an image using AstroPy, with coordinate system rendering.
def dispAstropy(imname='sim_onepoint_true.im'):
exportfits(imagename=imname, fitsimage=imname+'.fits', overwrite=True)
hdu = fits.open(imname+'.fits')[0]
wcs = WCS(hdu.header,naxis=2)
fig = pl.figure()
fig.add_subplot(121, projection=wcs)
pl.imshow(hdu.data[0,0,:,:], origin='lower', cmap=pl.cm.viridis)
pl.xlabel('RA')
pl.ylabel('Dec')
# Display an image cube or a single plane image.
# For a Cube, show the image at chan 0 and a spectrum at the location of the peak in chan0.
# For a Single plane image, show the image.
def dispImage(imname='sim_onepoint_true.im', useAstropy=False):
ia.open(imname)
pix = ia.getchunk()
shp = ia.shape()
ia.close()
pl.figure(figsize=(10,4))
pl.clf()
if shp[3]>1:
pl.subplot(121)
if useAstropy==False:
pl.imshow(pix[:,:,0,0])
pl.title('Image from channel 0')
else:
dispAstropy(imname)
if shp[3]>1:
pl.subplot(122)
ploc = np.where( pix == pix.max() )
pl.plot(pix[ploc[0][0], ploc[1][0],0,:])
pl.title('Spectrum at source peak')
pl.xlabel('Channel')
Examples
Make a component list and evaluate it onto a CASA image
[13]:
## Make the component list
makeCompList()
## Make an empty CASA image
makeEmptyImage()
## Evaluate the component list onto the CASA image
evalCompList()
## Display
dispImage()

[14]:
## Edit the pixels of the CASA image directly (e.g. add a spectral line)
editPixels()
## Display
dispImage()

Simulate visibilities from the sky model into the DATA column of the MS¶
Simulate visibilities for the true sky model, applying a variety of instrumental effects. This step either evaluates the DFT of a component model, or uses an imaging (de)gridder. Instrumental effects can be applied either by pre-processing the sky model before ‘standard’ degridding, or by invoking one of the wide-field imaging gridders to apply W-term, A-term and mosaicing effects. Noise, extra spectral lines or RFI may be added at this point, as well as gain errors via the application of carefully constructed calibration tables.
Methods
Use the simulator tool
Visibilities are predicted and saved in the DATA column of the MS. It is preferable to use the simulator only when the standard gridder is desired. Prediction can be done from an input model image or a component list
[15]:
def predictSim(msname='sim_data.ms',
imname='sim_onepoint_true.im',
clname='sim_onepoint.cl',
usemod='im',
usepb=False):
"""
usemod = 'im' : use the imname image
usemod = 'cl' : use the clname component list
usepb = True : to include static primary beams in the simulation.
"""
## Open an existing MS Frame
sm.openfromms(msname)
# Include primary Beams
if usepb==True:
sm.setvp( dovp = True, usedefaultvp = True )
if usemod=='im':
# Predict from a model image
sm.predict( imagename = imname, incremental=False)
else:
# Predict from a component list
sm.predict( complist = clname ,incremental=False)
# Close the tool
sm.close()
Use imager (or ft)
Visibilities are predicted and saved in the MODEL_DATA column of the MS. The values must then be copied to the DATA column. Use this approach when non-standard gridders are required, typically when instrument-dependent effects are included, or when Taylor-coefficient wideband image models are to be used for visibility prediction.
Step 1 : Simulate visibilities into the MODEL column using tclean
tclean can be used for model prediction with all gridders (‘standard’, ‘wproject’, ‘mosaic’, ‘awproject’). Wide-field and full-beam effects along with parallactic angle rotation may be included with appropriate settings. tclean can predict model visibilities only from input images and not component lists.
[16]:
## Use an input model sky image - widefield gridders
def predictImager(msname='sim_data.ms',
imname_true='sim_onepoint_true.im',
gridder='standard'):
os.system('rm -rf sim_predict.*')
# Run tclean in predictModel mode.
tclean(vis=msname,
startmodel=imname_true,
imagename='sim_predict',
savemodel='modelcolumn',
imsize=256,
cell='8.0arcsec',
specmode='cube',
interpolation='nearest',
start='1.0GHz',
width='0.1GHz',
nchan=10,
reffreq='1.5Hz',
gridder=gridder,
normtype='flatsky', # sky model is flat-sky
cfcache='sim_predict.cfcache',
wbawp=True, # ensure that gridders='mosaic' and 'awproject' do freq-dep PBs
pblimit=0.05,
conjbeams=False,
calcres=False,
calcpsf=True,
niter=0,
wprojplanes=1)
Step 1 : Simulate visibilities into the MODEL column using ft
The ‘ft’ task implements the equivalent of gridder=’standard’ in tclean. Wide-field effects cannot be simulated.
In addition, it offers the ability to predict visibilities from component lists (which tclean does not).
[17]:
def predictFt(msname='sim_data.ms',
imname='sim_onepoint_true.im',
clname='sim_onepoint.cl',
usemod='im'):
if usemod=='im':
## Use an image name and the ft task
ft(vis = msname, model = imname, incremental = False, usescratch=True)
else:
## Use a component list and the ft task
ft(vis = msname, complist = clname, incremental = False, usescratch=True)
Step 2 : Copy contents of the MODEL column to the DATA column
[18]:
### Copy visibilities from the MODEL column to the data columns
### This is required when predicting using tclean or ft as they will only write to the MODEL column
def copyModelToData(msname='sim_data.ms'):
tb.open(msname,nomodify=False);
moddata = tb.getcol(columnname='MODEL_DATA');
tb.putcol(columnname='DATA',value=moddata);
#tb.putcol(columnname='CORRECTED_DATA',value=moddata);
moddata.fill(0.0);
tb.putcol(columnname='MODEL_DATA',value=moddata);
tb.close();
Examples
If the above commands were run in order, the component list contains only a steep-spectrum continuum source, but the model image cube contains an additional spectral line in it.
Option 1 : Predict using the simulator and a componentlist
[19]:
# Predict Visibilities
predictSim(usemod='cl')
# Plot
plotData(myplot='data_spectrum')

Option 2 : Predict using the simulator and an input image
[20]:
# Predict visibilities
predictSim(usemod='im')
# Plot
plotData(myplot='data_spectrum')

Option 3 : Predict using tclean and a model image with gridder=’standard’
[21]:
predictImager()
copyModelToData()
plotData(myplot='data_spectrum')

Option 4 : Predict using ft and a component list
[22]:
# Predict using ft
predictFt(usemod='cl')
copyModelToData()
# Plot
plotData(myplot='data_spectrum')

Option 5 : Predict using ft and an input image
[23]:
# Predict using ft
predictFt(usemod='im')
copyModelToData()
# Plot
plotData(myplot='data_spectrum')

Add Noise and other errors to the simulated visibilities¶
Methods
Add Visibility noise
[24]:
## Add Gaussian random noise
def addNoiseSim(msname='sim_data.ms'):
sm.openfromms(msname);
sm.setseed(50)
sm.setnoise(mode='simplenoise',simplenoise='0.05Jy');
sm.corrupt();
sm.close();
Add random numbers
[25]:
def addNoiseRand(msname = 'sim_data.ms'):
## Add noise and other variations
tb.open( msname, nomodify=False )
dat = tb.getcol('DATA')
## Add noise to the first few channels only. ( Ideally, add separately to real and imag parts... )
from numpy import random
dat[:,0:4,:] = dat[:,0:4,:] + 0.5 * random.random( dat[:,0:4,:].shape )
## Add some RFI in a few rows and channels....
#dat[ :, :, 1 ] = dat[ :, :, 1] + 2.0
tb.putcol( 'DATA', dat )
tb.close()
Add antenna gain errors
[26]:
## Add antenna gain errors.
def addGainErrors(msname='sim_data.ms'):
sm.openfromms(msname);
sm.setseed(50)
sm.setgain(mode='fbm',amplitude=0.1)
sm.corrupt()
sm.close();
## Note : This step sometimes produces NaN/Inf in the visibilities and plotData() will complain ! If so, just run it again. I thought that setting the seed will control this, but apparently not.
Examples
Use the simulator to add Gaussian random noise (1 Jy rms noise)
[27]:
addNoiseSim()
plotData(myplot='data_spectrum')

A few Imaging and Calibration examples¶
Image one channel¶
[28]:
# Call tclean
os.system('rm -rf try0.*')
tclean(vis='sim_data.ms',
imagename='try0',
datacolumn='data',
spw='0:5', # pick channel 5 and image it
imsize=300,
cell='8.0arcsec',
specmode='mfs',
gridder='standard',
niter=200,
gain=0.3,
interactive=False,
usemask='auto-multithresh')
[28]:
{}
[29]:
# Display the output restored image
dispImage('try0.image')

[30]:
# Display the point spread function
dispImage('try0.psf')

Cube Imaging of a spectral-line dataset¶
This is a spectral line dataset with noise
[31]:
# Call tclean
os.system('rm -rf try1.*')
tclean(vis='sim_data.ms',
imagename='try1',
datacolumn='data',
imsize=300,
cell='8.0arcsec',
specmode='cube',
interpolation='nearest',
gridder='standard',
niter=200,
gain=0.3,
savemodel='modelcolumn')
[31]:
{}
[32]:
# Display the output restored image
dispImage('try1.image')

[33]:
# Display the residual image
dispImage('try1.residual')

Continuum imaging with model subtraction¶
Pick the line-free channels (all but chan 6) and fit a 2nd order polynomial to the spectrum.
[34]:
# Call tclean
os.system('rm -rf try2.*')
tclean(vis='sim_data.ms',
imagename='try2',
datacolumn='data',
spw='0:0~5,0:7~9', # Select line-free channels
imsize=300,
cell='8.0arcsec',
specmode='mfs',
deconvolver='mtmfs',
nterms=3,
gridder='standard',
niter=150,
gain=0.3)
[34]:
{}
[35]:
# Display the output restored image
dispImage('try2.image.tt0')

[36]:
# Predict the tclean mtmfs model onto all channels.
tclean(vis='sim_data.ms',
imagename='try2',
datacolumn='data',
spw='', # Select all channels to predict onto.
imsize=300,
cell='8.0arcsec',
specmode='mfs',
deconvolver='mtmfs',
nterms=3,
gridder='standard',
niter=0,
calcres=False,
calcpsf=False,
savemodel='modelcolumn')
[36]:
{}
[37]:
# Plot residual data.
plotData(myplot='resdata_spectrum')

This shows the continuum power law emission subtracted out and only the spectral line remaining in the data (if the model is subtracted).
If the ‘uvsub’ task is run, this is what would get saved in corrected_data. It is also a form of continuum modeling and subtraction.
Imaging with Gain Errors and Self Calibration¶
First, re-simulate by starting from ideal visibilities, and adding gain errors and noise.
[38]:
# Predict visibilities
predictSim(usemod='im')
[39]:
# Simulate antenna gain errors
addGainErrors()
# Add noise on top of the gain-corrupted data
addNoiseSim()
# Display
plotData(myplot='data_spectrum')

Image the corrupted data
[40]:
# Call tclean
os.system('rm -rf try3.*')
tclean(vis='sim_data.ms',
imagename='try3',
datacolumn='data',
imsize=300,
cell='8.0arcsec',
specmode='cube',
interpolation='nearest',
gridder='standard',
niter=150, # Don't go too deep since the data are corrupted
gain=0.3,
mask='circle[[150pix,150pix],3pix]', # Give it a mask to help. Without this, the self-cal isn't as good.
savemodel='modelcolumn')
[40]:
{}
[41]:
# Display the output restored image
dispImage('try3.image')

[42]:
# Display the new residual image
dispImage('try3.residual')

This image shows artifacts from gain errors (different from the pure noise-like errors in the previous simulation)
Calculate gain solutions (since we have already saved the model)
[43]:
bandpass(vis='sim_data.ms',
caltable='sc.tab',solint='int')
Apply gain solutions
[44]:
applycal(vis='sim_data.ms',
gaintable='sc.tab')
[45]:
## Plot Calibrated data
plotData(myplot='corr_spectrum')

Compare with the above uncalibrated data. Also, compare with visibilities simulated just with noise. Subsequent imaging should use the corrected_data column.
[46]:
# Call tclean to image the corrected data
os.system('rm -rf try4.*')
tclean(vis='sim_data.ms',
imagename='try4',
datacolumn='corrected',
imsize=300,
cell='8.0arcsec',
specmode='cube',
interpolation='nearest',
gridder='standard',
niter=200, # Go deeper now. Also, no mask needed
gain=0.3,
savemodel='modelcolumn')
[46]:
{}
[47]:
# Display the output restored image
dispImage('try4.image')

[48]:
# Display the residual image
dispImage('try4.residual')

A better-looking residual image, compared to before self-calibration.
Change Log¶
09/29/22 CAS-13930 - Update to test infrastructure.
09/27/22 CAS-13920 [6.5.3.0, 6.5.2.25] - task_tclean: peak memory and runtime profiling routines have been removed as they sometimes were the cause of a crash; the corresponding keys in the return dictionary will no longer be present.
09/26/22 CAS-13877 [6.5.2.24] - Update to test infrastructure.
09/21/22 CAS-13856 [6.5.2.23] - Fixed plotms crash when combining antenna iteration, averaging and negation.
09/19/22 CAS-13915 [6.5.2.22] - Update to test infrastructure.
09/12/22 CAS-13910 [6.5.2.21] - The getWeather function of plotbandpass now uses np.median when combining values across multiple weather stations to prevent potential issues with faulty values.
09/09/22 CAS-13194 [6.5.2.20] - Simulator tool : The Simulator code underlying sm.settrop() was improved by enabling the tool to support visibility integration times as short as 0.1 s . The new parameter "simint" (seconds) controls the time granularity of the simulation. Default value for simint is -1 which uses a granularity of 10 s (the same as in previous CASA versions).
09/08/22 CAS-12555 [6.5.2.19] - task flagdata : The mode='shadow' now uses the uvw values found in the UVW column of the MS to calculate the uv distances of baselines, for all baselines for which such values are available. For baselines not present in the MS, shadow flags are derived by calculating UVW values from antenna positions.
09/07/22 CAS-13842 [6.5.2.18] - Runtime performance of ephemeris imaging with tclean and tsdimaging was improved.
09/06/22 CAS-13654 - Updated imhead exception message in the case of mode='get' to alert the user that the expected values for hdkey in this case are often different from the keys in the dictionary returned by mode='summary'.
09/06/22 CAS-10682 [6.5.2.17] - Performances of task sdimaging have been improved, by caching spectra coordinates computed while creating the normal image and re-using - instead of re-computing - them when creating the weight image. A new parameter - enablecache - has been added to the task, making it possible to turn this new feature on or off. The performances improvement is most noticeable for fast-scan datasets with a few channels. For typical fast-scan solar datasets, re-using cached spectra coordinates is roughly 20 times faster than re-computing them, resulting in a 25~30% speed-up for the whole sdimaging task.
09/06/22 CAS-13917 - Maintenance tasks for Bamboo automated tests.
08/26/22 CAS-13144 [6.5.2.16] - deconvolve : A new task named 'deconvolve' has been added to provide stand-alone access to the image-domain deconvolver algorithms available within tclean. Options supported in this initial version are 'hogbom','clark','clarkstokes','multiscale', with support for single-plane images and spectral cubes. The 'mtmfs' and 'asp' algorithms will be enabled in a later release.
08/24/22 CAS-13867 [6.5.2.15] - Update to test infrastructure.
08/23/22 CAS-13739 [6.5.2.14] - tclean : Added a new iteration control parameter 'nmajor' to directly limit the number of minor-major cycle sets.
08/22/22 CAS-11936 [6.5.2.13] - calanalysis module : Warnings appeared in the pipeline when a baseline was flagged completely. This was switched from a WARNING in the logs to an INFO post. task plotbandpass : Messages that earlier appeared only in the console, now appear in the logs of casa and the pipeline.
08/17/22 CAS-13862 [6.5.2.12] - Update to test infrastructure.
08/17/22 CAS-13893 - Update to test infrastructure.
08/14/22 CAS-13865 [6.5.2.11] - Update to test infrastructure.
08/14/22 CAS-13859 - Update to test infrastructure.
08/12/22 CAS-13868 [6.5.2.10] - Update to test infrastructure.
08/11/22 CAS-13880 [6.5.2.9] - Update to test infrastructure.
08/11/22 CAS-13813 - casalog tool : A new method, getOrigin() has been implemented to retrieve the origin of messages to be displayed.
08/10/22 CAS-13820 [6.5.2.8] - Update runtest testrunner to update xml to display results in bamboo. Display now shows testscript testclass.testcase in bamboo results page
08/10/22 CAS-13631 - New task uvcontsub added for continuum subtraction in the uv-domain. The old task uvcontsub has been deprecated and renamed as uvcontsub_old. Task uvcontsub3 has been removed. The option douvcontsub of mstransform has been deprecated.
08/08/22 CAS-13861 [6.5.2.7] - Update to test infrastructure.
08/08/22 CAS-13850 - ImageAnalysis tool : (1) Added complete docs for image.beamarea(). No interface changes. (2) Added string mbret parameter to image.restoringbeam(). mbret="list" is the default, and produces backward compatible behavior. mbret="matrix" indicates that the return dictionary should be structured such that 'bmaj', 'bmin', and 'bpa' have numpy arrays as values, making it simple to utilize these values as numpy arrays rather than having to write python code to construct such structures.
08/04/22 CAS-13871 [6.5.2.6] - Build System rework : Edited includes in some files.
08/02/22 CAS-13860 [6.5.2.5] - Update to test infrastructure.
08/01/22 CAS-13864 [6.5.2.4] - Build-system rework : Unused files with experimental asp-deconvolver code have been removed from the code repository.
08/01/22 CAS-12581 - task_tclean: fixed a data selection issue that could lead to NaNs in some image planes.
07/28/22 CAS-13858 [6.5.2.3] - Update to test infrastructure.
07/25/22 CAS-13718 [6.5.2.2] - task_fringefit: This task now supports the {{uvrange}} parameter provided through the {{selectdata}} facility. The documentation for this parameter can be found in the "Calibration Synthesis" section of the manual. Users inexperienced with this parameter are warned that the uvrange selection is made _before _calibration; which data are calibrated and which are flagged may not match expectations unless the consequences of the selection are carefully thought through.
07/21/22 CAS-13760 [6.5.2.1] - plotms : Fixed an int overflow bug that sometimes prevented interactive flagging/locating from working correctly.
07/20/22 CAS-750 [6.5.2.0, 6.5.1.22] - tool_table: Implemented more thorough input checking and better error handling in the getcellslice() method.
07/20/22 CAS-13674 - task_sdbaseline: Fixed the output in ascii-text and csv formats in case of per-spectrum baseline fitting so that unnecessary info (non-existent parameter values) is not printed
07/19/22 CAS-13673 [6.5.1.21] - task_sdbaseline: Fixed the incorrect parameter names output in ascii-text format in case of per-spectrum baseline fitting
07/19/22 CAS-13660 - task_tclean: fixed a bug that prevented an outer UV taper to work in combination with weighting='natural'.
07/18/22 CAS-13849 [6.5.1.20] - tool_quanta: added a new parameter keepshape to qa.quantity and qa.unit to preserve the shape of N-dimensional arrays. These N-dimensional quantities are compatible with qa.convert but not with other quanta tool methods.
07/18/22 CAS-13808 - tool_table: a new row() method and tablerow class was added to facilitate reading and writing of table rows.
07/15/22 CAS-13869 [6.5.1.19] - task_tclean: the warning for non-zero edge pixels in the PB image will now only be shown for gridders 'mosaic' and 'awproject'.
07/15/22 CAS-12313 - Correction to the inline docs of the sm.setnoise tool method.
07/14/22 CAS-13668 - Fixed a bug found in sdbaseline when blmode=‘apply’ is selected. Now this setting properly takes account mask information and calculate weight correctly when baseline-subtraction is applied separately from the fitting.
07/14/22 CAS-13785 [6.5.1.18] - task_plotms: fixed a crash when turning calibration on without giving a cal library string.
07/12/22 CAS-13738 [6.5.1.17] - * casashell now prints out a summary of any python exceptions that happen during startup (previously it would silently exit). Users of the monolithic CASA should not notice any changes. This change is primarily useful to developers building the components locally. * the setup.py build scripts now print out information about exceptions that happen during that process (previously they were largely silent). This change is only relevant to developers.
07/11/22 CAS-13831 [6.5.1.16] - task_sdimaging: Fixed a caching issue that could lead to slightly different images from the same data selection.
07/08/22 CAS-12901 [6.5.1.15] - task_plotms: the tick labels will now switch to scientific notation when displaying small values near zero or very large values
07/06/22 CAS-13847 [6.5.1.14] - Updated casacore submodule reference to 0d871e1fca1f96abd8bf1326fb45c253192d01c2.
07/03/22 CAS-13821 [6.5.1.13] - task_immoments: fixed a rare bug where writing history to the output files could cause a crash.
06/30/22 CAS-13839 [6.5.1.12] - Updated ALMA Band 1 parameters in simulation tasks and tools.
06/29/22 CAS-13855 [6.5.1.11] - replaced instances of numpy.asscalar with numpy.ndarray.item inside the test code of sdimaging
06/28/22 CAS-13667 [6.5.1.10] - task_sdbaseline: blmode=‘fit’ will now properly account for mask information and calculate the weight of the baseline-subtracted spectral data.
06/28/22 CAS-13824 - tool ms.getdata(): fixed a bug where averaging multiple columns could yield different results than averaging a single column
06/28/22 CAS-13848 - Fixed caching issues and data cleanup in some test scripts.
06/20/22 CAS-13494 [6.5.1.9] - task_tsdimaging: Improved the documentation of the 'restfreq' parameter.
06/16/22 CAS-13724 [6.5.1.8] - task_plotms: the Help->About window of the GUI now shows more complete version information.
06/15/22 CAS-13789 [6.5.1.7] - Fix in runtest test runner to parse correct build tag from auxiliary repositories
06/09/22 CAS-13823 [6.5.1.6] - Update to logic in bamboo test runner to run default list of test if test suite cannot be generated
06/08/22 CAS-13722 [6.5.1.5] - task_plotbandpass: fixed a bug affecting multi-panel, multi-page output.
06/02/22 CAS-13713 [6.5.1.4] - A bug was fixed in sdatmcor, which overrode OpenMP configuration.
05/27/22 CAS-13816 [6.5.1.3] - Add new unittests for casatestutils check submodule
05/25/22 CAS-13332 [6.5.1.2] - Add unittest to jupyter notebook for runtest.py testrunner.
05/23/22 CAS-13672 [6.5.1.1] - the description for the 'maskmode' parameter in the inline help of sdbaseline is updated
05/20/22 CAS-13830 [6.5.1.0, 6.5.0.14] - Fixed minor warning raised by Python 3.8 in task_flagdata
05/20/22 CAS-13819 - None
Open in Colab: https://colab.research.google.com/github/casangi/casadocs/blob/v6.5.2/docs/notebooks/citing-casa.ipynb
Citing CASA¶
Please cite the following paper if you use the CASA software:
The CASA Team, et al., *“CASA, the Common Astronomy Software Applications for Radio Astronomy”*, PASP, 134, 114501. DOI: 10.1088/1538-3873/ac9642
Other relevant CASA publications¶
van Bemmel, I., Kettenis, M., Small, D., et al. 2022, “CASA on the fringe – Development of VLBI processing capabilities for CASA”, PASP, 134, 114502. DOI: 10.1088/1538-3873/ac81ed
Emonts, B., Raba, R., Rau, U., et al. 2022, “The CASA software for radio astronomy: status update from ADASS 2020”, Astronomical Data Analysis Software and Systems XXX, ASP Conf. Ser., 532, 389
Raba, R. 2020, “CASA Next Generation Infrastructure”, Astronomical Data Analysis Software and Systems XXX, ASP Conf. Ser., 532, 67
Emonts, B., Raba, R., Moellenbrock, G., et al. 2020, “The CASA software for radio astronomy: status update from ADASS 2019”, Astronomical Data Analysis Software and Systems XXIX, ASP Conf. Ser., 527, 267
Raba, R., Schiebel, D., Emonts, B., et al. 2020, “CASA 6: Modular Integration in Python”, Astronomical Data Analysis Software and Systems XXIX, ASP Conf. Ser., 527, 271
Kepley, A., Tsutsumi. T., Brogan, C., et al. 2020,”Auto-multithresh: A General Purpose Automasking Algorithm”, PASP, 132, 02505
Rau, U., Naik, N., Braun, T., 2019, “A Joint Deconvolution Algorithm to Combine Single-dish and Interferometer Data for Wideband Multiterm and Mosaic Imaging” AJ, 158, 3
Rau, U., Bhatnagar, S., Owen, F. N., 2016, “Deep Wideband Single Pointings and Mosaics in Radio Interferometry”, AJ, 152, 124
Petry, D., CASA Development Team, 2012, “Analysing ALMA Data with CASA”, Astronomical Data Analysis Software and Systems XXI, ASP Conf. Ser., 461, 849
McMullin, J. P., Waters, B., Schiebel, D., Young, W., & Golap, K. 2007, “CASA Architecture and Applications”, Astronomical Data Analysis Software and Systems XVI, ASP Conf. Ser. 376, ed. R. A. Shaw, F. Hill, & D. J. Bell (San Francisco, CA: ASP), 127