xarray-ms#

xarray-ms presents a Measurement Set v4 view (MSv4) over CASA Measurement Sets (MSv2). It provides access to MSv2 data via the xarray API, allowing MSv4 compliant applications to be developed on well-understood MSv2 data.

In [1]: import xarray_ms

In [2]: import xarray

In [3]: import xarray.testing

In [4]: from xarray_ms.testing.simulator import simulate

# Simulate a Measurement Set with 2 channel and polarisation configurations
In [5]: ms = simulate("test.ms", data_description=[
   ...:   (8, ("XX", "XY", "YX", "YY")),
   ...:   (4, ("RR", "LL"))])
   ...: 

In [6]: ms
Out[6]: '/tmp/tmpu0p11gge/test.ms'

In [7]: dt = xarray.open_datatree(ms)

In [8]: dt
Out[8]: 
<xarray.DataTree>
Group: /
└── Group: /test
    ├── Group: /test/partition_000
    │   │   Dimensions:                     (time: 5, baseline_id: 6, frequency: 8,
    │   │                                    polarization: 4, uvw_label: 3)
    │   │   Coordinates:
    │   │       baseline_antenna1_name      (baseline_id) object 48B ...
    │   │       baseline_antenna2_name      (baseline_id) object 48B ...
    │   │     * baseline_id                 (baseline_id) int64 48B 0 1 2 3 4 5
    │   │     * frequency                   (frequency) float64 64B 8.56e+08 ... 1.712e+09
    │   │     * polarization                (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
    │   │     * time                        (time) float64 40B 2.09e+11 ... 2.09e+11
    │   │   Dimensions without coordinates: uvw_label
    │   │   Data variables:
    │   │       EFFECTIVE_INTEGRATION_TIME  (time, baseline_id) float64 240B ...
    │   │       FLAG                        (time, baseline_id, frequency, polarization) uint8 960B ...
    │   │       TIME_CENTROID               (time, baseline_id) float64 240B ...
    │   │       UVW                         (time, baseline_id, uvw_label) float64 720B ...
    │   │       VISIBILITY                  (time, baseline_id, frequency, polarization) complex64 8kB ...
    │   │       WEIGHT                      (time, baseline_id, frequency, polarization) float32 4kB ...
    │   │   Attributes:
    │   │       creation_date:      2025-02-26T06:35:37.283416+00:00
    │   │       partition_info:     {'field_name': ['FIELD-0'], 'intent': ['CALIBRATE_AMP...
    │   │       schema_version:     4.0.0
    │   │       type:               visibility
    │   │       xarray_ms_version:  0.2.1
    │   └── Group: /test/partition_000/antenna_xds
    │           Dimensions:           (antenna_name: 3,
    │                                  cartesian_pos_label/ellipsoid_pos_label: 3)
    │           Coordinates:
    │             * antenna_name      (antenna_name) object 24B 'ANTENNA-0' ... 'ANTENNA-2'
    │               mount             (antenna_name) object 24B ...
    │               station           (antenna_name) object 24B ...
    │           Dimensions without coordinates: cartesian_pos_label/ellipsoid_pos_label
    │           Data variables:
    │               ANTENNA_POSITION  (antenna_name, cartesian_pos_label/ellipsoid_pos_label) float64 72B ...
    └── Group: /test/partition_001
        │   Dimensions:                     (time: 5, baseline_id: 6, frequency: 4,
        │                                    polarization: 2, uvw_label: 3)
        │   Coordinates:
        │       baseline_antenna1_name      (baseline_id) object 48B ...
        │       baseline_antenna2_name      (baseline_id) object 48B ...
        │     * baseline_id                 (baseline_id) int64 48B 0 1 2 3 4 5
        │     * frequency                   (frequency) float64 32B 8.56e+08 ... 1.712e+09
        │     * polarization                (polarization) <U2 16B 'RR' 'LL'
        │     * time                        (time) float64 40B 2.09e+11 ... 2.09e+11
        │   Dimensions without coordinates: uvw_label
        │   Data variables:
        │       EFFECTIVE_INTEGRATION_TIME  (time, baseline_id) float64 240B ...
        │       FLAG                        (time, baseline_id, frequency, polarization) uint8 240B ...
        │       TIME_CENTROID               (time, baseline_id) float64 240B ...
        │       UVW                         (time, baseline_id, uvw_label) float64 720B ...
        │       VISIBILITY                  (time, baseline_id, frequency, polarization) complex64 2kB ...
        │       WEIGHT                      (time, baseline_id, frequency, polarization) float32 960B ...
        │   Attributes:
        │       creation_date:      2025-02-26T06:35:37.493946+00:00
        │       partition_info:     {'field_name': ['FIELD-0'], 'intent': ['CALIBRATE_AMP...
        │       schema_version:     4.0.0
        │       type:               visibility
        │       xarray_ms_version:  0.2.1
        └── Group: /test/partition_001/antenna_xds
                Dimensions:           (antenna_name: 3,
                                       cartesian_pos_label/ellipsoid_pos_label: 3)
                Coordinates:
                  * antenna_name      (antenna_name) object 24B 'ANTENNA-0' ... 'ANTENNA-2'
                    mount             (antenna_name) object 24B ...
                    station           (antenna_name) object 24B ...
                Dimensions without coordinates: cartesian_pos_label/ellipsoid_pos_label
                Data variables:
                    ANTENNA_POSITION  (antenna_name, cartesian_pos_label/ellipsoid_pos_label) float64 72B ...

Measurement Set v4#

NRAO/SKAO are developing a new xarray-based Measurement Set v4 specification. While there are many changes some of the major highlights are:

  • xarray is used to define the specification.

  • MSv4 data consists of Datasets of ndarrays on a regular time-channel grid. MSv2 data is tabular and, while in many instances the time-channel grid is regular, this is not guaranteed, especially after MSv2 datasets have been transformed by various tasks.

xarray Datasets are self-describing and they are therefore easier to reason about and work with. Additionally, the regularity of data will make writing MSv4-based software less complex.

xradio#

casangi/xradio provides a reference implementation that converts CASA v2 Measurement Sets to Zarr v4 Measurement Sets using the python-casacore package.

Why xarray-ms?#

  • By developing against an MSv4 xarray view over MSv2 data, developers can develop applications on well-understood data, and then seamlessly transition to newer formats. Data can also be exported to newer formats (principally zarr) via xarray’s native I/O routines. However, the xarray view of either format looks the same to the software developer.

  • xarray-ms builds on xarray’s backend API: Implementing a formal CASA MSv2 backend has a number of benefits:

    • xarray’s internal I/O routines such as open_dataset and open_datatree can dispatch to the backend to load data.

    • Similarly xarray’s lazy loading mechanism dispatches through the backend.

    • Automatic access to any chunked array types supported by xarray including, but not limited to dask.

    • Arbitrary chunking along any xarray dimension.

  • xarray-ms uses arcae, a high-performance backend to CASA Tables implementing a subset of python-casacore’s interface.

  • Some limited support for irregular MSv2 data via padding.

Work in Progress#

The Measurement Set v4 specification is currently under active development. xarray-ms is also currently under active development and does not yet have feature parity with MSv4 or xradio. Most measures information and many secondary sub-tables are currently missing.

However, the most important parts of the MSv2 MAIN tables, as well as the ANTENNA, POLARIZATON and SPECTRAL_WINDOW sub-tables are implemented and should be sufficient for basic algorithm development.