This post serves to collect feedback and lessons learned, particularly from PRR ingestions undertaken in the last few months, so as to update the APEx Interoperability and Compliance Guidelines that are input to current and future projects (SEF, SUP, etc.).
Not coming from the PRR ingestion experience but from the StatEO conference: SDMX format as an interoperable way to convey and exchange statistical metadata, see What is SDMX? | SDMX – Statistical Data and Metadata eXchange . The current ICG includes a section for metadata and even format-specific metadata recommendations, only for STAC, maybe we can consider adding this format.
It depends on several factors, including the application (timeseries analysis, HPC, machine learning) and the underlying storage (e.g. can be adapted to the specific object storage provider).
Having two complete copies of a dataset with different chunking strategies (e.g. one temporal, one spatial) might be justified
Even dynamic (on-the-fly), temporary re-chunking might be feasible if the time penalty of not doing it is too large
Hyperspectral - to be specific the BSQ requirement is from being able to render specific bands or RGB composites efficiently, rather than being related to profiles being plotted. Performance of profile plots may be a trade off in relation to tiles size
COGS - should also provide guidance on tile size to stress the sweet spot (256 or 512 pixels)
COG metadata - use of embedded colour palettes in COGs
Zarr requirements - there are still various discussions around this, and we are still to explore the latest GeoZarr capabilities in OpenLayers for use in the GE. However recent discussions have suggested the need to consider spatial chunking of Zarr for visualisation and temporal chunking for extracting temporal profiles. This has been the approach taken in the World Peatland project.
Projection support - Geospatial Explorer is able to easily support multiple projections now that can be defined in config, so no need to be so strict on projections. There may be justifiable reasons for using a national projection (i.e. improved accuracy)
GeoJSON - can be rendered in the GE, but is loaded into memory at start up - so fine for small number of small GeoJSON files, but anything over a few MB would be better in a cloud optimised format - i.e. FlatGeobuff
Charts in GE - GE supports CSVs and GeoJSON as chart sources now too.
Statistics in GE - we should elaborate on Flatgeobuff requirements, and the available tooling to generate them. Also some clarity on geometry resolution (we use multiple scales of geometry for the hierarchy used in the NUTS statistics for example)
STAC (general) - STAC collections should be internally consistent - i.e. all items should represent equivalent content (assets) for different AOIs or TOIs. Collections with items desribing different datasets should be avoided
Static STAC vs STAC APIs - static STAC collections are fine for collections of a modest size, but large collections require APIs
WMS / WMTS - GetCapabilities should declare their projection, to avoid uncessary reprojection in the GE if needed
CORS - it should be stressed the need for CORS policies to allow access from the GE. This is a common blocker.
Sharing the following SUP-specific requirements on behalf of Diego Moglioni:
GE log scaling support for SAR data (now available)
Improving GE pixel distribution percentiles setting for RGB composite rendering (now it should be done manually for each raster that has to be added as a separate data layer – impractical for a large number of tiles; ideally this should be implemented dynamically once the RGB/percentiles option is selected for all the tiles composing a layer)
Investigating the STAC standard for sharing the result url of a search in the APEx STAC browser
I think that, apart from the info collected in this post, the additional important inputs are in the PRR feedback workshop MoM that @james.wheeler prepared and distributed.
I think that a lot of what we have here are requirements for APEX/GE rather than data/algorithm providers, which are the main targets of the APEx ICG, so it is important to distinguish between what will be there in the future and what is there already now, that is concrete and can be communicated to SUP projects, also considering the SUP schedule.