SKA Data Product API Overview

This API is used to provide a list of SKA Data Products (files) that are hosted at a configurable storage location <PERSISTENT_STORAGE_PATH>.

Automatic API Documentation

Detailed interactive documentation for the API is available through Swagger UI. Access it at http://<API URL>/docs while running the application.

Basic Usage

Note

This API is typically deployed behind a secure layer that encrypts communication (TLS/SSL) and likely requires user authentication through a separate system. When accessing the API through a browser, both the encryption and the authentication will be handled by the browser, but direct access with scripts or notebooks to the API from outside the cluster is currently not supported. To make use of this API directly, the user need to access it from within the cluster where it is hosted.

Note

If a data product have been assigned a context.access_group, then that data product will not be available/listed when accessing the api directly with scripts or notebooks. This is due the required access token of an authenticate user that is not available in this mode of operation.

Status endpoint

Verify the API’s status by sending a GET request to the /status endpoint. The response will indicate the API’s operational state.

Request

GET /status

Response

{
    "api_running": true,
    "api_version": "0.8.0",
    "startup_time": "2024-08-06T21:59:18.333369",
    "last_metadata_update_time": "2024-08-06T21:59:18.333359",
    "metadata_store_status": {
        "store_type": "Persistent PosgreSQL metadata store",
        "host": "localhost",
        "port": 5432,
        "user": "postgres",
        "running": true,
        "schema": "sdp_sdp_dataproduct_dashboard_dev",
        "table_name": "localhost_sdp_dataproduct_dashboard_dev_v1",
        "number_of_dataproducts": 10,
        "postgresql_version": "PostgreSQL 16.3 on x86_64-pc-linux-musl, compiled by gcc (Alpine 13.2.1_git20240309) 13.2.1 20240309, 64-bit"
    },
    "search_store_status": {
        "metadata_store_in_use": "ElasticsearchMetadataStore",
        "url": "https://localhost:9200",
        "user": "elastic",
        "running": true,
        "connection_established_at": "2024-08-06T21:59:18.210017",
        "number_of_dataproducts": 10,
        "indices": "ska-dp-dataproduct-localhost-dev-v1",
        "cluster_info": {
            "name": "46f82bbc7307",
            "cluster_name": "docker-cluster",
            "cluster_uuid": "5nqaD334QZuVZjjMYAFCmQ",
            "version": {
                "number": "8.14.2",
                "build_flavor": "default",
                "build_type": "docker",
                "build_hash": "2afe7caceec8a26ff53817e5ed88235e90592a1b",
                "build_date": "2024-07-01T22:06:58.515911606Z",
                "build_snapshot": false,
                "lucene_version": "9.10.0",
                "minimum_wire_compatibility_version": "7.17.0",
                "minimum_index_compatibility_version": "7.0.0"
            },
            "tagline": "You Know, for Search"
        }
    }
}

Search endpoint

Use the search endpoint to query your data products. Specify a time range and key-value pairs to filter your results. The response prioritizes products within the timeframe that best match your criteria.

Request

POST /dataproductsearch

Body

{
    "start_date": "2000-12-12",
    "end_date": "2032-12-12",
    "key_value_pairs": ["execution_block:eb-m005-20231031-12345"]
}

Response

[
    {
        "execution_block": "eb-m005-20231031-12345",
        "date_created": "2023-10-31",
        "dataproduct_file": "eb-m005-20231031-12345",
        "metadata_file": "eb-m005-20231031-12345/ska-data-product.yaml",
        "config.cmdline": "-dump /product/eb-m004-20191031-12345/ska-sdp/pb-m004-20191031-12345/vis.ms",
        ...
        "obscore.instrument_name": "SKA-LOW",
        "id": 6
    }
]

Re-index data products endpoint

The data product metadata store can be re-indexed but making a get request to the reindexdataproducts endpoint. This allows the user to update the metadata store if metadata have been added or changed since the previous indexing.

Request

GET /reindexdataproducts

Response

"Metadata is set to be re-indexed"

Download data product endpoint

Sending a post request to the download endpoint will return a stream response of the specified data product as a tar archive.

The body of the post request must contain the name of the file and the relative path of the file you want to download as listed in the file list response above.

Request

POST /download

Body

{
    "execution_block": "eb-test-20200325-00001"
}

Response

A stream response of the specified data product as a tar archive

Retrieve metadata of a data product endpoint

Sending a post request to the dataproductmetadata endpoint will return a Response with the metadata of the data product in a JSON format.

The body of the post request must contain the name of the file “ska-data-product.yaml” and the relative path of the metadata file.

For example, the post request body:

Request

POST /dataproductmetadata

Body

{
    "execution_block": "eb-test-20200325-00001"
}

Response

{
    "interface": "http://schema.skao.int/ska-data-product-meta/0.1",
    "execution_block": "eb-m001-20191031-12345",
    "context":
    {
        "observer": "AIV_person_1",
        "intent": "Experimental run as part of XYZ-123",
        "notes": "Running that signal from XX/YY/ZZ through again, things seem a bit flaky"
    },
    "config":
    {
        "processing_block": "pb-m001-20191031-12345",
        "processing_script": "receive",
        "image": "artefact.skao.int/ska-docker/vis_receive",
        "version": "0.1.3",
        "commit": "516fb5a693f9dc9aff5d46192f4e055b582fc025",
        "cmdline": "-dump /product/eb-m001-20191031-12345/ska-sdp/pb-m001-20191031-12345/vis.ms"
    },
    "files":
    [
        {
            "path": "vis.ms",
            "status": "working",
            "description": "Raw visibility dump from receive"
        }
    ]
}

Ingest new data product

Sending a POST request to the ingestnewdataproduct endpoint will load and parse a file at the supplied filename, and add the data product to the metadata store.

Request

POST /ingestnewdataproduct

Body

{
    "execution_block": "eb-test-20200325-00001",
    "relativePathName": "product/eb-test-20200325-00001"
}

Ingest new metadata endpoint

Note

In this release, ingested metadata is not persistently stored. This means any data you add will be cleared when the API restarts. This functionality will be changed in future releases.

Sending a POST request to the ingestnewmetadata endpoint will parse the supplied JSON data as data product metadata, and add the data product to the metadata store.

For example, the POST request body:

Request

POST /ingestnewmetadata

Body

{
    "interface": "http://schema.skao.int/ska-data-product-meta/0.1",
    "execution_block": "eb-test-20240806-99999",
    "context": {
        "observer": "REST ingest",
        "intent": "",
        "notes": ""
    },
    "config": {
        "processing_block": "",
        "processing_script": "",
        "image": "",
        "version": "",
        "commit": "",
        "cmdline": ""
    },
    "files": [],
    "obscore": {
        "access_estsize": 0,
        "access_format": "application/unknown",
        "access_url": "0",
        "calib_level": 0,
        "dataproduct_type": "MS",
        "facility_name": "SKA",
        "instrument_name": "SKA-LOW",
        "o_ucd": "stat.fourier",
        "obs_collection": "Unknown",
        "obs_id": "eb-test-20240806-99999",
        "obs_publisher_did": "",
        "pol_states": "XX/XY/YX/YY",
        "pol_xel": 0,
        "s_dec": 0,
        "s_ra": 0.0,
        "t_exptime": 5.0,
        "t_max": 57196.962848574476,
        "t_min": 57196.96279070411,
        "t_resolution": 0.9,
        "target_name": ""
    }
}

API User

The Data Product Dashboard (DPD) will usually be used via the GUI, for certain systems and users direct access to the API may be useful and desired. This guide will help users get up to speed with the Data Product Dashboard API.

DPD API documentation can be found at https://developer.skao.int/projects/ska-dataproduct-api/en/latest/overview.html#automatic-api-documentation. The DPD API is self documenting and as such the available endpoints can be found at /docs

Searching for and Downloading Data Products When searching for data products it is important to ensure that the most recent data is available. The cached map for the in-memory solution periodically checks for new product that are available, but there is a way to manually ensure this, namely through the update command:

import requests
BASE_URL = "http://localhost:8000"
response = requests.get(f"{BASE_URL}/reindexdataproducts")
print(response.status_code)
>>> 202

Searching for a specific product can be done by date or by other metadata fields available.

data = {
    "start_date": "2001-12-12",
    "end_date": "2032-12-12",
    "key_value_pairs": ["execution_block:eb-m001-20191031-12345"]
}
response = requests.post(f"{BASE_URL}/dataproductsearch", json=data)
products = response.json()
print(products)
>>> [{'execution_block': 'eb-m001-20191031-12345', 'date_created': '2019-10-31', 'dataproduct_file': 'eb-m001-20221212-12345', 'metadata_file': 'eb-m001-20221212-12345/ska-data-product.yaml', 'interface': 'http://schema.skao.int/ska-data-product-meta/0.1', 'context.observer': 'AIV_person_1', 'context.intent': 'Experimental run as part of XYZ-123', 'context.notes': 'Running that signal from XX/YY/ZZ through again, things seem a bit flaky', 'config.processing_block': 'pb-m001-20191031-12345', 'config.processing_script': 'receive', 'config.image': 'artefact.skao.int/ska-docker/vis_receive', 'config.version': '0.1.3', 'config.commit': '516fb5a693f9dc9aff5d46192f4e055b582fc025', 'config.cmdline': '-dump /product/eb-m001-20191031-12345/ska-sdp/pb-m001-20191031-12345/vis.ms', 'id': 2}]

Identify the product that should be downloaded and select it. This will be one of the products in the list of returned products:

product = products[0]

The download endpoint returns a response that can be used to stream the data product into a tarball. This can saved into a local file:

data = {"execution_block": product["dataproduct_file"],"relativePathName": product["dataproduct_file"]}
response = requests.post(f"{BASE_URL}/download", json=data)

with open('product.tar', 'wb') as fd:
    for chunk in response.iter_content(chunk_size=4096):
        fd.write(chunk)

The tarball can then be opened using standard operation software. On linux this can be done using

$ tar -xvf ./product.tar
eb-m001-20221212-12345/