Tutorial#
The Measurement Set v2.0 is a tabular format that
includes notions of regularity or, the shape of the data, in the MAIN table.
This is accomplished through the DATA_DESC_ID column which defines the
Spectral Window and Polarisation Configuration associated with each row:
the shape of the visibility in each row of the DATA column can
vary per-row.
By contrast Measurement Set v4.0 specifies a collection of Datasets of ndarrays on a regular grid. To move data between the two formats, it is necessary to partition or group MSv2 rows by the same shape and configuration.
In xarray-ms, this is accomplished by specifying partition_schema
when opening a Measurement Set.
Different columns may be used to define the partition.
See Partioning Schema for more information.
Opening a Measurement Set#
As xarray-ms implements an xarray backend,
it is possible to use the xarray.backends.api.open_datatree() function
to open multiple partitions of a Measurement Set.
In [1]: import xarray_ms
In [2]: import xarray
In [3]: import xarray.testing
In [4]: from xarray_ms.testing.simulator import simulate
# Simulate a Measurement Set with 2 channel and polarisation configurations
In [5]: ms = simulate("test.ms", data_description=[
...: (8, ("XX", "XY", "YX", "YY")),
...: (4, ("RR", "LL"))])
...:
In [6]: dt = xarray.open_datatree(ms, partition_schema=["FIELD_ID"])
In [7]: dt
Out[7]:
<xarray.DataTree>
Group: /
└── Group: /test
├── Group: /test/partition_000
│ │ Dimensions: (time: 5, baseline_id: 6, frequency: 8,
│ │ polarization: 4, uvw_label: 3)
│ │ Coordinates:
│ │ baseline_antenna1_name (baseline_id) object 48B ...
│ │ baseline_antenna2_name (baseline_id) object 48B ...
│ │ * baseline_id (baseline_id) int64 48B 0 1 2 3 4 5
│ │ * frequency (frequency) float64 64B 8.56e+08 ... 1.712e+09
│ │ * polarization (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
│ │ * time (time) float64 40B 2.09e+11 ... 2.09e+11
│ │ Dimensions without coordinates: uvw_label
│ │ Data variables:
│ │ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 240B ...
│ │ FLAG (time, baseline_id, frequency, polarization) uint8 960B ...
│ │ TIME_CENTROID (time, baseline_id) float64 240B ...
│ │ UVW (time, baseline_id, uvw_label) float64 720B ...
│ │ VISIBILITY (time, baseline_id, frequency, polarization) complex64 8kB ...
│ │ WEIGHT (time, baseline_id, frequency, polarization) float32 4kB ...
│ │ Attributes:
│ │ creation_date: 2025-02-26T06:35:39.644336+00:00
│ │ partition_info: {'field_name': ['FIELD-0'], 'intent': ['CALIBRATE_AMP...
│ │ schema_version: 4.0.0
│ │ type: visibility
│ │ xarray_ms_version: 0.2.1
│ └── Group: /test/partition_000/antenna_xds
│ Dimensions: (antenna_name: 3,
│ cartesian_pos_label/ellipsoid_pos_label: 3)
│ Coordinates:
│ * antenna_name (antenna_name) object 24B 'ANTENNA-0' ... 'ANTENNA-2'
│ mount (antenna_name) object 24B ...
│ station (antenna_name) object 24B ...
│ Dimensions without coordinates: cartesian_pos_label/ellipsoid_pos_label
│ Data variables:
│ ANTENNA_POSITION (antenna_name, cartesian_pos_label/ellipsoid_pos_label) float64 72B ...
└── Group: /test/partition_001
│ Dimensions: (time: 5, baseline_id: 6, frequency: 4,
│ polarization: 2, uvw_label: 3)
│ Coordinates:
│ baseline_antenna1_name (baseline_id) object 48B ...
│ baseline_antenna2_name (baseline_id) object 48B ...
│ * baseline_id (baseline_id) int64 48B 0 1 2 3 4 5
│ * frequency (frequency) float64 32B 8.56e+08 ... 1.712e+09
│ * polarization (polarization) <U2 16B 'RR' 'LL'
│ * time (time) float64 40B 2.09e+11 ... 2.09e+11
│ Dimensions without coordinates: uvw_label
│ Data variables:
│ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 240B ...
│ FLAG (time, baseline_id, frequency, polarization) uint8 240B ...
│ TIME_CENTROID (time, baseline_id) float64 240B ...
│ UVW (time, baseline_id, uvw_label) float64 720B ...
│ VISIBILITY (time, baseline_id, frequency, polarization) complex64 2kB ...
│ WEIGHT (time, baseline_id, frequency, polarization) float32 960B ...
│ Attributes:
│ creation_date: 2025-02-26T06:35:39.649031+00:00
│ partition_info: {'field_name': ['FIELD-0'], 'intent': ['CALIBRATE_AMP...
│ schema_version: 4.0.0
│ type: visibility
│ xarray_ms_version: 0.2.1
└── Group: /test/partition_001/antenna_xds
Dimensions: (antenna_name: 3,
cartesian_pos_label/ellipsoid_pos_label: 3)
Coordinates:
* antenna_name (antenna_name) object 24B 'ANTENNA-0' ... 'ANTENNA-2'
mount (antenna_name) object 24B ...
station (antenna_name) object 24B ...
Dimensions without coordinates: cartesian_pos_label/ellipsoid_pos_label
Data variables:
ANTENNA_POSITION (antenna_name, cartesian_pos_label/ellipsoid_pos_label) float64 72B ...
Warning
The MSv4 spec is still under development and the arrangement and naming of the DataTree branches is likely to change.
Selecting a subset of the data#
By default, open_datatree() will return a datatree
with a lazy view over the data.
xarray has extensive functionality for
indexing and selecting data.
For example, one could select select some specific dimensions out:
In [8]: dt = xarray.open_datatree(ms, partition_schema=["FIELD_ID"])
In [9]: subdt = dt.isel(time=slice(1, 3), baseline_id=[1, 3, 5], frequency=slice(2, 4))
In [10]: subdt
Out[10]:
<xarray.DataTree>
Group: /
└── Group: /test
├── Group: /test/partition_000
│ │ Dimensions: (time: 2, baseline_id: 3, frequency: 2,
│ │ polarization: 4, uvw_label: 3)
│ │ Coordinates:
│ │ baseline_antenna1_name (baseline_id) object 24B ...
│ │ baseline_antenna2_name (baseline_id) object 24B ...
│ │ * baseline_id (baseline_id) int64 24B 1 3 5
│ │ * frequency (frequency) float64 16B 1.101e+09 1.223e+09
│ │ * polarization (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
│ │ * time (time) float64 16B 2.09e+11 2.09e+11
│ │ Dimensions without coordinates: uvw_label
│ │ Data variables:
│ │ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 48B ...
│ │ FLAG (time, baseline_id, frequency, polarization) uint8 48B ...
│ │ TIME_CENTROID (time, baseline_id) float64 48B ...
│ │ UVW (time, baseline_id, uvw_label) float64 144B ...
│ │ VISIBILITY (time, baseline_id, frequency, polarization) complex64 384B ...
│ │ WEIGHT (time, baseline_id, frequency, polarization) float32 192B ...
│ │ Attributes:
│ │ creation_date: 2025-02-26T06:35:39.725218+00:00
│ │ partition_info: {'field_name': ['FIELD-0'], 'intent': ['CALIBRATE_AMP...
│ │ schema_version: 4.0.0
│ │ type: visibility
│ │ xarray_ms_version: 0.2.1
│ └── Group: /test/partition_000/antenna_xds
│ Dimensions: (antenna_name: 3,
│ cartesian_pos_label/ellipsoid_pos_label: 3)
│ Coordinates:
│ * antenna_name (antenna_name) object 24B 'ANTENNA-0' ... 'ANTENNA-2'
│ mount (antenna_name) object 24B ...
│ station (antenna_name) object 24B ...
│ Dimensions without coordinates: cartesian_pos_label/ellipsoid_pos_label
│ Data variables:
│ ANTENNA_POSITION (antenna_name, cartesian_pos_label/ellipsoid_pos_label) float64 72B ...
└── Group: /test/partition_001
│ Dimensions: (time: 2, baseline_id: 3, frequency: 2,
│ polarization: 2, uvw_label: 3)
│ Coordinates:
│ baseline_antenna1_name (baseline_id) object 24B ...
│ baseline_antenna2_name (baseline_id) object 24B ...
│ * baseline_id (baseline_id) int64 24B 1 3 5
│ * frequency (frequency) float64 16B 1.427e+09 1.712e+09
│ * polarization (polarization) <U2 16B 'RR' 'LL'
│ * time (time) float64 16B 2.09e+11 2.09e+11
│ Dimensions without coordinates: uvw_label
│ Data variables:
│ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 48B ...
│ FLAG (time, baseline_id, frequency, polarization) uint8 24B ...
│ TIME_CENTROID (time, baseline_id) float64 48B ...
│ UVW (time, baseline_id, uvw_label) float64 144B ...
│ VISIBILITY (time, baseline_id, frequency, polarization) complex64 192B ...
│ WEIGHT (time, baseline_id, frequency, polarization) float32 96B ...
│ Attributes:
│ creation_date: 2025-02-26T06:35:39.729882+00:00
│ partition_info: {'field_name': ['FIELD-0'], 'intent': ['CALIBRATE_AMP...
│ schema_version: 4.0.0
│ type: visibility
│ xarray_ms_version: 0.2.1
└── Group: /test/partition_001/antenna_xds
Dimensions: (antenna_name: 3,
cartesian_pos_label/ellipsoid_pos_label: 3)
Coordinates:
* antenna_name (antenna_name) object 24B 'ANTENNA-0' ... 'ANTENNA-2'
mount (antenna_name) object 24B ...
station (antenna_name) object 24B ...
Dimensions without coordinates: cartesian_pos_label/ellipsoid_pos_label
Data variables:
ANTENNA_POSITION (antenna_name, cartesian_pos_label/ellipsoid_pos_label) float64 72B ...
At this point, the subdt DataTree is still lazy – no Data variables have been loaded
into memory.
Loading a DataTree#
By calling load on the lazy datatree, all the Data Variables are loaded onto the dataset as numpy arrays.
In [11]: subdt.load()
Out[11]:
<xarray.DataTree>
Group: /
└── Group: /test
├── Group: /test/partition_000
│ │ Dimensions: (time: 2, baseline_id: 3, frequency: 2,
│ │ polarization: 4, uvw_label: 3)
│ │ Coordinates:
│ │ baseline_antenna1_name (baseline_id) object 24B 'ANTENNA-0' ... 'ANT...
│ │ baseline_antenna2_name (baseline_id) object 24B 'ANTENNA-1' ... 'ANT...
│ │ * baseline_id (baseline_id) int64 24B 1 3 5
│ │ * frequency (frequency) float64 16B 1.101e+09 1.223e+09
│ │ * polarization (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
│ │ * time (time) float64 16B 2.09e+11 2.09e+11
│ │ Dimensions without coordinates: uvw_label
│ │ Data variables:
│ │ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 48B 0.0 0.0 ... 0.0
│ │ FLAG (time, baseline_id, frequency, polarization) uint8 48B ...
│ │ TIME_CENTROID (time, baseline_id) float64 48B -3.507e+09 .....
│ │ UVW (time, baseline_id, uvw_label) float64 144B 2...
│ │ VISIBILITY (time, baseline_id, frequency, polarization) complex64 384B ...
│ │ WEIGHT (time, baseline_id, frequency, polarization) float32 192B ...
│ │ Attributes:
│ │ creation_date: 2025-02-26T06:35:39.725218+00:00
│ │ partition_info: {'field_name': ['FIELD-0'], 'intent': ['CALIBRATE_AMP...
│ │ schema_version: 4.0.0
│ │ type: visibility
│ │ xarray_ms_version: 0.2.1
│ └── Group: /test/partition_000/antenna_xds
│ Dimensions: (antenna_name: 3,
│ cartesian_pos_label/ellipsoid_pos_label: 3)
│ Coordinates:
│ * antenna_name (antenna_name) object 24B 'ANTENNA-0' ... 'ANTENNA-2'
│ mount (antenna_name) object 24B 'ALT-AZ' 'ALT-AZ' 'ALT-AZ'
│ station (antenna_name) object 24B 'STATION-0' ... 'STATION-2'
│ Dimensions without coordinates: cartesian_pos_label/ellipsoid_pos_label
│ Data variables:
│ ANTENNA_POSITION (antenna_name, cartesian_pos_label/ellipsoid_pos_label) float64 72B ...
└── Group: /test/partition_001
│ Dimensions: (time: 2, baseline_id: 3, frequency: 2,
│ polarization: 2, uvw_label: 3)
│ Coordinates:
│ baseline_antenna1_name (baseline_id) object 24B 'ANTENNA-0' ... 'ANT...
│ baseline_antenna2_name (baseline_id) object 24B 'ANTENNA-1' ... 'ANT...
│ * baseline_id (baseline_id) int64 24B 1 3 5
│ * frequency (frequency) float64 16B 1.427e+09 1.712e+09
│ * polarization (polarization) <U2 16B 'RR' 'LL'
│ * time (time) float64 16B 2.09e+11 2.09e+11
│ Dimensions without coordinates: uvw_label
│ Data variables:
│ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 48B 0.0 0.0 ... 0.0
│ FLAG (time, baseline_id, frequency, polarization) uint8 24B ...
│ TIME_CENTROID (time, baseline_id) float64 48B -3.507e+09 .....
│ UVW (time, baseline_id, uvw_label) float64 144B 2...
│ VISIBILITY (time, baseline_id, frequency, polarization) complex64 192B ...
│ WEIGHT (time, baseline_id, frequency, polarization) float32 96B ...
│ Attributes:
│ creation_date: 2025-02-26T06:35:39.729882+00:00
│ partition_info: {'field_name': ['FIELD-0'], 'intent': ['CALIBRATE_AMP...
│ schema_version: 4.0.0
│ type: visibility
│ xarray_ms_version: 0.2.1
└── Group: /test/partition_001/antenna_xds
Dimensions: (antenna_name: 3,
cartesian_pos_label/ellipsoid_pos_label: 3)
Coordinates:
* antenna_name (antenna_name) object 24B 'ANTENNA-0' ... 'ANTENNA-2'
mount (antenna_name) object 24B 'ALT-AZ' 'ALT-AZ' 'ALT-AZ'
station (antenna_name) object 24B 'STATION-0' ... 'STATION-2'
Dimensions without coordinates: cartesian_pos_label/ellipsoid_pos_label
Data variables:
ANTENNA_POSITION (antenna_name, cartesian_pos_label/ellipsoid_pos_label) float64 72B ...
Opening a Measurement Set with dask#
Generally speaking, observational data will be too large to fit in memory. Either portions of the dataset must be selected and loaded, or it must be processed in chunks.
Data processing using a chunked storage engine such as dask
can be enabled by specifying the chunks parameter:
In [12]: dt = xarray.open_datatree(ms, partition_schema=["FIELD_ID"],
....: chunks={"time": 2, "frequency": 2})
....:
In [13]: dt
Out[13]:
<xarray.DataTree>
Group: /
└── Group: /test
├── Group: /test/partition_000
│ │ Dimensions: (time: 5, baseline_id: 6, frequency: 8,
│ │ polarization: 4, uvw_label: 3)
│ │ Coordinates:
│ │ baseline_antenna1_name (baseline_id) object 48B dask.array<chunksize=(6,), meta=np.ndarray>
│ │ baseline_antenna2_name (baseline_id) object 48B dask.array<chunksize=(6,), meta=np.ndarray>
│ │ * baseline_id (baseline_id) int64 48B 0 1 2 3 4 5
│ │ * frequency (frequency) float64 64B 8.56e+08 ... 1.712e+09
│ │ * polarization (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
│ │ * time (time) float64 40B 2.09e+11 ... 2.09e+11
│ │ Dimensions without coordinates: uvw_label
│ │ Data variables:
│ │ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 240B dask.array<chunksize=(2, 6), meta=np.ndarray>
│ │ FLAG (time, baseline_id, frequency, polarization) uint8 960B dask.array<chunksize=(2, 6, 2, 4), meta=np.ndarray>
│ │ TIME_CENTROID (time, baseline_id) float64 240B dask.array<chunksize=(2, 6), meta=np.ndarray>
│ │ UVW (time, baseline_id, uvw_label) float64 720B dask.array<chunksize=(2, 6, 3), meta=np.ndarray>
│ │ VISIBILITY (time, baseline_id, frequency, polarization) complex64 8kB dask.array<chunksize=(2, 6, 2, 4), meta=np.ndarray>
│ │ WEIGHT (time, baseline_id, frequency, polarization) float32 4kB dask.array<chunksize=(2, 6, 2, 4), meta=np.ndarray>
│ │ Attributes:
│ │ creation_date: 2025-02-26T06:35:39.873896+00:00
│ │ partition_info: {'field_name': ['FIELD-0'], 'intent': ['CALIBRATE_AMP...
│ │ schema_version: 4.0.0
│ │ type: visibility
│ │ xarray_ms_version: 0.2.1
│ └── Group: /test/partition_000/antenna_xds
│ Dimensions: (antenna_name: 3,
│ cartesian_pos_label/ellipsoid_pos_label: 3)
│ Coordinates:
│ * antenna_name (antenna_name) object 24B 'ANTENNA-0' ... 'ANTENNA-2'
│ mount (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ station (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ Dimensions without coordinates: cartesian_pos_label/ellipsoid_pos_label
│ Data variables:
│ ANTENNA_POSITION (antenna_name, cartesian_pos_label/ellipsoid_pos_label) float64 72B dask.array<chunksize=(3, 3), meta=np.ndarray>
└── Group: /test/partition_001
│ Dimensions: (time: 5, baseline_id: 6, frequency: 4,
│ polarization: 2, uvw_label: 3)
│ Coordinates:
│ baseline_antenna1_name (baseline_id) object 48B dask.array<chunksize=(6,), meta=np.ndarray>
│ baseline_antenna2_name (baseline_id) object 48B dask.array<chunksize=(6,), meta=np.ndarray>
│ * baseline_id (baseline_id) int64 48B 0 1 2 3 4 5
│ * frequency (frequency) float64 32B 8.56e+08 ... 1.712e+09
│ * polarization (polarization) <U2 16B 'RR' 'LL'
│ * time (time) float64 40B 2.09e+11 ... 2.09e+11
│ Dimensions without coordinates: uvw_label
│ Data variables:
│ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 240B dask.array<chunksize=(2, 6), meta=np.ndarray>
│ FLAG (time, baseline_id, frequency, polarization) uint8 240B dask.array<chunksize=(2, 6, 2, 2), meta=np.ndarray>
│ TIME_CENTROID (time, baseline_id) float64 240B dask.array<chunksize=(2, 6), meta=np.ndarray>
│ UVW (time, baseline_id, uvw_label) float64 720B dask.array<chunksize=(2, 6, 3), meta=np.ndarray>
│ VISIBILITY (time, baseline_id, frequency, polarization) complex64 2kB dask.array<chunksize=(2, 6, 2, 2), meta=np.ndarray>
│ WEIGHT (time, baseline_id, frequency, polarization) float32 960B dask.array<chunksize=(2, 6, 2, 2), meta=np.ndarray>
│ Attributes:
│ creation_date: 2025-02-26T06:35:39.879470+00:00
│ partition_info: {'field_name': ['FIELD-0'], 'intent': ['CALIBRATE_AMP...
│ schema_version: 4.0.0
│ type: visibility
│ xarray_ms_version: 0.2.1
└── Group: /test/partition_001/antenna_xds
Dimensions: (antenna_name: 3,
cartesian_pos_label/ellipsoid_pos_label: 3)
Coordinates:
* antenna_name (antenna_name) object 24B 'ANTENNA-0' ... 'ANTENNA-2'
mount (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
station (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
Dimensions without coordinates: cartesian_pos_label/ellipsoid_pos_label
Data variables:
ANTENNA_POSITION (antenna_name, cartesian_pos_label/ellipsoid_pos_label) float64 72B dask.array<chunksize=(3, 3), meta=np.ndarray>
Per-partition chunking#
Different chunking may be desired, especially when applied to
different channelisation and polarisation configurations.
In these cases, the preferred_chunks argument can be used
to specify different chunking setups for each partition.
In [14]: dt = xarray.open_datatree(ms, partition_schema=["FIELD_ID"],
....: chunks={},
....: preferred_chunks={
....: (("DATA_DESC_ID", 0),): {"time": 2, "frequency": 4},
....: (("DATA_DESC_ID", 1),): {"time": 3, "frequency": 2}})
....:
See the preferred_chunks argument of
open_datatree()
for more information.
In [15]: dt
Out[15]:
<xarray.DataTree>
Group: /
└── Group: /test
├── Group: /test/partition_000
│ │ Dimensions: (time: 5, baseline_id: 6, frequency: 8,
│ │ polarization: 4, uvw_label: 3)
│ │ Coordinates:
│ │ baseline_antenna1_name (baseline_id) object 48B dask.array<chunksize=(6,), meta=np.ndarray>
│ │ baseline_antenna2_name (baseline_id) object 48B dask.array<chunksize=(6,), meta=np.ndarray>
│ │ * baseline_id (baseline_id) int64 48B 0 1 2 3 4 5
│ │ * frequency (frequency) float64 64B 8.56e+08 ... 1.712e+09
│ │ * polarization (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
│ │ * time (time) float64 40B 2.09e+11 ... 2.09e+11
│ │ Dimensions without coordinates: uvw_label
│ │ Data variables:
│ │ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 240B dask.array<chunksize=(2, 6), meta=np.ndarray>
│ │ FLAG (time, baseline_id, frequency, polarization) uint8 960B dask.array<chunksize=(2, 6, 4, 4), meta=np.ndarray>
│ │ TIME_CENTROID (time, baseline_id) float64 240B dask.array<chunksize=(2, 6), meta=np.ndarray>
│ │ UVW (time, baseline_id, uvw_label) float64 720B dask.array<chunksize=(2, 6, 3), meta=np.ndarray>
│ │ VISIBILITY (time, baseline_id, frequency, polarization) complex64 8kB dask.array<chunksize=(2, 6, 4, 4), meta=np.ndarray>
│ │ WEIGHT (time, baseline_id, frequency, polarization) float32 4kB dask.array<chunksize=(2, 6, 4, 4), meta=np.ndarray>
│ │ Attributes:
│ │ creation_date: 2025-02-26T06:35:39.985161+00:00
│ │ partition_info: {'field_name': ['FIELD-0'], 'intent': ['CALIBRATE_AMP...
│ │ schema_version: 4.0.0
│ │ type: visibility
│ │ xarray_ms_version: 0.2.1
│ └── Group: /test/partition_000/antenna_xds
│ Dimensions: (antenna_name: 3,
│ cartesian_pos_label/ellipsoid_pos_label: 3)
│ Coordinates:
│ * antenna_name (antenna_name) object 24B 'ANTENNA-0' ... 'ANTENNA-2'
│ mount (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ station (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ Dimensions without coordinates: cartesian_pos_label/ellipsoid_pos_label
│ Data variables:
│ ANTENNA_POSITION (antenna_name, cartesian_pos_label/ellipsoid_pos_label) float64 72B dask.array<chunksize=(3, 3), meta=np.ndarray>
└── Group: /test/partition_001
│ Dimensions: (time: 5, baseline_id: 6, frequency: 4,
│ polarization: 2, uvw_label: 3)
│ Coordinates:
│ baseline_antenna1_name (baseline_id) object 48B dask.array<chunksize=(6,), meta=np.ndarray>
│ baseline_antenna2_name (baseline_id) object 48B dask.array<chunksize=(6,), meta=np.ndarray>
│ * baseline_id (baseline_id) int64 48B 0 1 2 3 4 5
│ * frequency (frequency) float64 32B 8.56e+08 ... 1.712e+09
│ * polarization (polarization) <U2 16B 'RR' 'LL'
│ * time (time) float64 40B 2.09e+11 ... 2.09e+11
│ Dimensions without coordinates: uvw_label
│ Data variables:
│ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 240B dask.array<chunksize=(3, 6), meta=np.ndarray>
│ FLAG (time, baseline_id, frequency, polarization) uint8 240B dask.array<chunksize=(3, 6, 2, 2), meta=np.ndarray>
│ TIME_CENTROID (time, baseline_id) float64 240B dask.array<chunksize=(3, 6), meta=np.ndarray>
│ UVW (time, baseline_id, uvw_label) float64 720B dask.array<chunksize=(3, 6, 3), meta=np.ndarray>
│ VISIBILITY (time, baseline_id, frequency, polarization) complex64 2kB dask.array<chunksize=(3, 6, 2, 2), meta=np.ndarray>
│ WEIGHT (time, baseline_id, frequency, polarization) float32 960B dask.array<chunksize=(3, 6, 2, 2), meta=np.ndarray>
│ Attributes:
│ creation_date: 2025-02-26T06:35:39.989773+00:00
│ partition_info: {'field_name': ['FIELD-0'], 'intent': ['CALIBRATE_AMP...
│ schema_version: 4.0.0
│ type: visibility
│ xarray_ms_version: 0.2.1
└── Group: /test/partition_001/antenna_xds
Dimensions: (antenna_name: 3,
cartesian_pos_label/ellipsoid_pos_label: 3)
Coordinates:
* antenna_name (antenna_name) object 24B 'ANTENNA-0' ... 'ANTENNA-2'
mount (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
station (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
Dimensions without coordinates: cartesian_pos_label/ellipsoid_pos_label
Data variables:
ANTENNA_POSITION (antenna_name, cartesian_pos_label/ellipsoid_pos_label) float64 72B dask.array<chunksize=(3, 3), meta=np.ndarray>
Writing a DataTree to Zarr#
zarr is a chunked storage format designed for use with distributed file systems. Once a DataTree view of the data has been established, it is trivial to export this to a zarr store.
In [16]: import os.path
In [17]: import tempfile
In [18]: dt = xarray.open_datatree(ms, partition_schema=["FIELD_ID"],
....: chunks={},
....: preferred_chunks={
....: (("DATA_DESC_ID", 0),): {"time": 2, "frequency": 4},
....: (("DATA_DESC_ID", 1),): {"time": 3, "frequency": 2}})
....:
In [19]: zarr_path = f"{tempfile.mkdtemp()}{os.path.sep}test.zarr"
In [20]: dt.to_zarr(zarr_path, consolidated=True, compute=True)
It is then trivial to open this using open_datatree:
In [21]: dt2 = xarray.open_datatree(zarr_path)
In [22]: xarray.testing.assert_identical(dt, dt2)
Writing a DataTree to Cloud Storage#
xarray incorporates standard functionality for writing xarray datasets to cloud storage.
Here we will use the s3fs package to write to an S3 bucket.
import s3fs
# custom-profile in .aws/credentials
s3 = s3fs.S3FileSystem(profile="custom-profile",
client_kwargs={"region_name": "af-south-1"})
# A path in a bucket
store = s3fs.mapping.S3Map("bucket/scratch/test.zarr", s3=s3,
check=True, create=False)
dt.to_zarr(store=store, mode="w", compute=True, consolidated=True)
See the xarray documentation on Cloud Storage Buckets for information on interfacing with other cloud providers.