Collecting feedback and lessons learned for update of APEx Interoperability and Compliance Guidelines

Dear all,

This post serves to collect feedback and lessons learned, particularly from PRR ingestions undertaken in the last few months, so as to update the APEx Interoperability and Compliance Guidelines that are input to current and future projects (SEF, SUP, etc.).

Cheers,

Paulo

Not coming from the PRR ingestion experience but from the StatEO conference: SDMX format as an interoperable way to convey and exchange statistical metadata, see What is SDMX? | SDMX – Statistical Data and Metadata eXchange . The current ICG includes a section for metadata and even format-specific metadata recommendations, only for STAC, maybe we can consider adding this format.

From CA WG meeting, on Zarr. Chunk Sizes:

It depends on several factors, including the application (timeseries analysis, HPC, machine learning) and the underlying storage (e.g. can be adapted to the specific object storage provider).

Having two complete copies of a dataset with different chunking strategies (e.g. one temporal, one spatial) might be justified

Even dynamic (on-the-fly), temporary re-chunking might be feasible if the time penalty of not doing it is too large


Consider concept of Zarr “profiles”.

Sharing SUP-specific notes and requirements on behalf of Patrick:

  • Raster files with many bands (e.g. >20, time series, or hyper spectral)

    • Band interleave BSQ rather than BIP if profiles are to be plotted
    • Default bands to load - this used to be a thing in ENVI and was quite useful, we could think about capturing this somewhere in the STAC metadata
    • Could the STAC also capture the band stretch parameters (min-max)?
  • COG pyramid levels

    • Be more specific on COG formatting: compression, byte range access, pyramid levels
    • For very large collections, which level could be skipped
    • Provide the relevant GDAL commands
  • ZARR requirements

    • Align with EarthCODE
    • GeoZARR?
    • Include specific requirement for ZARR from geospatial explorer
  • Projection EPSG/CRS support

    • Avoid exotic (e.g. national projections)
    • Avoid unnecessary reprojection

Some further comments on this:

  • Hyperspectral - to be specific the BSQ requirement is from being able to render specific bands or RGB composites efficiently, rather than being related to profiles being plotted. Performance of profile plots may be a trade off in relation to tiles size
  • COGS - should also provide guidance on tile size to stress the sweet spot (256 or 512 pixels)
  • COG metadata - use of embedded colour palettes in COGs
  • Zarr requirements - there are still various discussions around this, and we are still to explore the latest GeoZarr capabilities in OpenLayers for use in the GE. However recent discussions have suggested the need to consider spatial chunking of Zarr for visualisation and temporal chunking for extracting temporal profiles. This has been the approach taken in the World Peatland project.
  • Projection support - Geospatial Explorer is able to easily support multiple projections now that can be defined in config, so no need to be so strict on projections. There may be justifiable reasons for using a national projection (i.e. improved accuracy)
  • GeoJSON - can be rendered in the GE, but is loaded into memory at start up - so fine for small number of small GeoJSON files, but anything over a few MB would be better in a cloud optimised format - i.e. FlatGeobuff
  • Charts in GE - GE supports CSVs and GeoJSON as chart sources now too.
  • Statistics in GE - we should elaborate on Flatgeobuff requirements, and the available tooling to generate them. Also some clarity on geometry resolution (we use multiple scales of geometry for the hierarchy used in the NUTS statistics for example)
  • STAC (general) - STAC collections should be internally consistent - i.e. all items should represent equivalent content (assets) for different AOIs or TOIs. Collections with items desribing different datasets should be avoided
  • Static STAC vs STAC APIs - static STAC collections are fine for collections of a modest size, but large collections require APIs
  • WMS / WMTS - GetCapabilities should declare their projection, to avoid uncessary reprojection in the GE if needed
  • CORS - it should be stressed the need for CORS policies to allow access from the GE. This is a common blocker.

Sharing the following SUP-specific requirements on behalf of Diego Moglioni:

  • GE log scaling support for SAR data (now available)
  • Improving GE pixel distribution percentiles setting for RGB composite rendering (now it should be done manually for each raster that has to be added as a separate data layer – impractical for a large number of tiles; ideally this should be implemented dynamically once the RGB/percentiles option is selected for all the tiles composing a layer)
  • Investigating the STAC standard for sharing the result url of a search in the APEx STAC browser

I think that, apart from the info collected in this post, the additional important inputs are in the PRR feedback workshop MoM that @james.wheeler prepared and distributed.

I think that a lot of what we have here are requirements for APEX/GE rather than data/algorithm providers, which are the main targets of the APEx ICG, so it is important to distinguish between what will be there in the future and what is there already now, that is concrete and can be communicated to SUP projects, also considering the SUP schedule.

Best Regards,

Paulo