visstat

visstat(vis, axis='amplitude', datacolumn='data', useflags=True, spw='', field='', selectdata=True, antenna='', uvrange='', timerange='', correlation='', scan='', array='', observation='', timeaverage=False, timebin='0s', timespan='', maxuvwdistance=0.0, disableparallel=False, ddistart=-1, taql='', monolithic_processing=False, intent='', reportingaxes='ddid', doquantiles=True)[source]

Calculates statistical information from a MeasurementSet

[Description] [Examples] [Development] [Details]

Parameters
  • vis (path) - Name of MeasurementSet or Multi-MS

  • axis (string=’amplitude’) - Values on which to compute statistics

    axis = amp
    • datacolumn (string=’data’) - Which data column to use (data, corrected, model, float_data)

    axis = amplitude
    • datacolumn (string=’data’) - Which data column to use (data, corrected, model, float_data)

    axis = phase
    • datacolumn (string=’data’) - Which data column to use (data, corrected, model, float_data)

    axis = real
    • datacolumn (string=’data’) - Which data column to use (data, corrected, model, float_data)

    axis = imag
    • datacolumn (string=’data’) - Which data column to use (data, corrected, model, float_data)

    axis = imaginary
    • datacolumn (string=’data’) - Which data column to use (data, corrected, model, float_data)

  • useflags (bool=True) - Take flagging into account?

  • spw (string=’’) - spectral-window/frequency/channel

  • field (string=’’) - Field names or field index numbers: ''==>all, field='0~2,3C286'

  • selectdata (bool=True) - More data selection parameters (antenna, timerange etc)

    selectdata = True
    • antenna (string=’’) - antenna/baselines: ''==>all, antenna = '3,VA04'

    • timerange (string=’’) - time range: ''==>all, timerange='09:14:0~09:54:0'

    • correlation (string=’’) - Select data based on correlation

    • scan (string=’’) - scan numbers: ''==>all

    • array (string=’’) - (sub)array numbers: ''==>all

    • observation ({string, int}=’’) - observation ID number(s): '' = all

    • uvrange (string=’’) - uv range: ''==>all; uvrange = '0~100klambda', default units=meters

  • timeaverage (bool=False) - Average data in time.

    timeaverage = True
    • timebin (string=’0s’) - Bin width for time averaging.

    • timespan ({string, stringVec}=’’) - Span the timebin across scan, state or both.

    • maxuvwdistance (double=0.0) - Maximum separation of start-to-end baselines that can be included in an average. (meters)

  • intent ({string, stringVec, int, intVec}=’’) - Select data by scan intent.

  • reportingaxes (string=’ddid’) - Which reporting axis to use (ddid, field, integration)

  • doquantiles (bool=True)

Returns

stats (dict) - statistics from a given MeasurementSet column or column-derived value, grouped by ddi, field or integration

Description

This task returns a dictionary with statistical information about data in a MeasurementSet or Multi-MS.

The following statistics are computed and added to the returned dictionary: mean value, minimum value, maximum value, sum of values, sum of squared values, sum of weights, median, median absolute deviation, first and third quartiles, minimum, maximum, variance, standard deviation, and root mean square. Two other fields indicate whether the data are weighted and whether they are masked. The field ‘npts’ gives the number of data points. The parameter ‘doquantiles’ can be set to False to show the statistical output excluding quantiles, which significantly decreases the run-time of visstat.

Statistics may be computed on any of the following axes: flag, antenna1, antenna2, feed1, feed2, field_id, array_id, data_desc_id, flag_row, interval, scan, scan_number, time, weight, weight_spectrum, amp, amplitude, phase, real, imag, imaginary, and uvrange (weight, amp, imag and scan are aliases for weight_spectrum, amplitude, imaginary and scan_number, respectively) Note that the statistics are computed on scalar values only; for example, the average amplitude is computed as a scalar average.

Additionally, statistics for any axis may be computed on subsets of the MeasurementSet partitioned by values of data description id, field id or integration number. The ‘reportingaxes’ argument is used to partition the sample set along an axis. For example, setting its value to ‘ddid’ will result in the statistics of the chosen sample values partitioned by unique values of the data description id. Thus setting ‘axis’ to ‘amp’ and ‘reportingaxes’ to ‘ddid’ will report statistics of visibility amplitudes for each unique value of data description id in the MeasurementSet.

When the ‘reportingaxes’ argument is used to partition the data, if one of the partitions is completely flagged and useflags=True, the returned report for that partition will have the number of points set to zero and the statistics set to ‘NaN’. For example, if partitionaxes =’field’, a list of fields is given, and some of the fields are completely flagged, the number of points reported for those fields will be 0 and their statistics ‘NaN’.

Besides returning the statistical information in a dictionary, this task prints the statistics to the CASA logger. When no valid data is found for some of the ‘reportingaxes’ selections, it prints a warning about it.

Optionally, the statistical information can be computed based only on a given subset of the MeasurementSet using selection parameters.

Note

Note: If the MS consists of inhomogeneous data, it may be necessary to use selection parameters to select a homogeneous subset of the MS. For example, if the MS contains several spectral windows, each having a different number of channels, use spw=’2’ to run visstat on homogenous data within the MS.

Examples

To create and view a dictionary called ‘mystat’ containing the visibility statistics of ngc5921.ms:

CASA <1>: mystat = visstat(vis='data/regression/unittest/setjy/ngc5921.ms',
                           axis='amp', datacolumn='data', useflags=False, spw='',
                           field='', selectdata=True, correlation='RR', timeaverage=False,
                           intent='', reportingaxes='ddid')

CASA <2>: mystat
Out[2]:
{'DATA_DESC_ID=0': {'firstquartile': 0.023732144385576248,
  'isMasked': False,
  'isWeighted': False,
  'max': 73.75,
  'maxDatasetIndex': 12,
  'maxIndex': 1204,
  'mean': 4.511831488357214,
  'medabsdevmed': 0.0432449858635664,
  'median': 0.051963627338409424,
  'min': 2.2130521756480448e-05,
  'minDatasetIndex': 54,
  'minIndex': 4346,
  'npts': 1427139.0,
  'rms': 16.42971891790897,
  'stddev': 15.798076313999745,
  'sum': 6439010.678462409,
  'sumOfWeights': 1427139.0,
  'sumsq': 385235713.187832,
  'thirdquartile': 0.3004012107849121,
  'variance': 249.57921522295976}}

To access only the standard deviation statistic:

CASA <3>: mystat['DATA_DESC_ID=0']['stddev']
Out[3]: 15.798076313999745
Development

No additional development details

Parameter Details

Detailed descriptions of each function parameter

vis (path) - Name of MeasurementSet or Multi-MS
axis (string='amplitude') - Values on which to compute statistics
datacolumn (string='data') - Which data column to use (data, corrected, model, float_data)
useflags (bool=True) - Take flagging into account?
spw (string='') - spectral-window/frequency/channel
field (string='') - Field names or field index numbers: ''==>all, field='0~2,3C286'
selectdata (bool=True) - More data selection parameters (antenna, timerange etc)
antenna (string='') - antenna/baselines: ''==>all, antenna = '3,VA04'
uvrange (string='') - uv range: ''==>all; uvrange = '0~100klambda', default units=meters
timerange (string='') - time range: ''==>all, timerange='09:14:0~09:54:0'
correlation (string='') - Select data based on correlation
scan (string='') - scan numbers: ''==>all
array (string='') - (sub)array numbers: ''==>all
observation ({string, int}='') - observation ID number(s): '' = all
timeaverage (bool=False) - Average data in time.
timebin (string='0s') - Bin width for time averaging.
timespan ({string, stringVec}='') - Span the timebin across scan, state or both.
maxuvwdistance (double=0.0) - Maximum separation of start-to-end baselines that can be included in an average. (meters)
disableparallel (bool=False) - Hidden parameter for internal use only. Do not change it!
ddistart (int=-1) - Hidden parameter for internal use only. Do not change it!
taql (string='') - Table query for nested selections
monolithic_processing (bool=False) - Hidden parameter for internal use only. Do not change it!
intent ({string, stringVec, int, intVec}='') - Select data by scan intent.
reportingaxes (string='ddid') - Which reporting axis to use (ddid, field, integration)
doquantiles (bool=True) - If False, quantile-like statistics are not computed. These include the first and third quartiles, the median, and the median of the absolute deviation from the median.