Jump to: navigation, search

ESSI2.2 - Metadata, Data Models and Semantics

We have "born digital" - now what about "born semantic"?

Adam Leadbetter

...

While much effort has been put in to creating and curating these digital data, there has been little work on using semantic mark up of data from the point of collection – what we term “born semantic”.

In this presentation we report on two efforts to expand this area: Qartod-to-OGC (Q2O) and SenseOCEAN. These projects have taken a common approach to “born semantic”:

  • create or reuse appropriate controlled vocabularies, published to World Wide Web Commission (W3C) standards
  • use standards from the Open Geospatial Consortium’s Sensor Web Enablement (SWE) initiative to describe instrument setup, deployment and/or outputs using terms from those controlled vocabularies
  • embed URLs from the controlled vocabularies within the SWE documents in a "Linked Data" conformant approach

Q2O developed best practices examples of

  • SensorML descriptions of Original Equipment Manufacturers’ metadata (model characteristics, capabilities, manufacturer contact, etc ...)
  • set-up and deployment SensorML files; and data centre process-lineage
  • using registered vocabularies to describe terms (including input, output, processes, parameters, quality control flags)

...

The sensor descriptions are being profiled in SensorML and the controlled vocabularies are being repurposed from those used within the European Commission SeaDataNet project and published on the community standard NERC Vocabulary Server.

Commenti

Il metodo si applica ai sensori CTD di SeaBird, annotando (immagino il SensorML) tramite RDFa. Lo scopo è quello di annotare semanticamente le osservazioni sulla base della descrizione del sensore, inserita direttamente nel firmware dello stesso. Lo scopo finale è quello di accedere ai sensori come nodi Linked Data, opportunità che (connessione e banda permettendo) mi è sempre parsa come il deployment più naturale dei sensori.

A harmonized vocabulary for soil observed properties

Bruce Simons

...

However, observed property terms are often defined during different activities and projects in isolation of one another, resulting in data that has the same scope being represented with different terms, using different formats and formalisms, and published in various access methods. Significantly, many soil property vocabularies conflate multiple concepts in a single term, e.g. quantity kind, units of measure, substance being observed, and procedure.

...

We have developed a vocabulary for observed soil properties by adopting and extending a previously defined water quality vocabulary. The observed property model separates the information elements, based on the Open Geospatial Consortium (OGC) Observations & Measurements model and extending the NASA/TopQuadrant ‘Quantities, Units, Dimensions and Types’ (QUDT) ontology. The imported water quality vocabulary is formalized using the Web Ontology Language (OWL). Key elements are defined as sub-classes or sub-properties of standard Simple Knowledge Organization System (SKOS) elements, allowing use of standard vocabulary interfaces.

...

By formalizing the model for observable properties, and clearly labelling the separate elements, soil property observations may be more easily mapped to the OGC Observations & Measurements model for cross-domain applications.

Commenti

Il focus è sulle proprietà del suolo ma credo che il discorso possa essere traslato alle osservazioni marine di Ritmare. Si parte dal modello property-type di O&M e si distinguono le differenti componenti all'interno dei parametri osservativi (substance, property, unit, state, ...). L'ontologia SKOS viene estesa con i costrutti definiti dalla ontologia QUDT (quantities, units, dimensions, data types). Per i nostri scopi potrebbe essere sufficente utilizzare i legami tra i vari vocabolari SeaDataNet (P01, P02, ...).

Evaluation Methodology for UML and GML Application Schemas Quality

Agnieszka Chojka

INSPIRE Directive implementation in Poland has caused the significant increase of interest in making spatial data and services available, particularly among public administration and private institutions.

...

The process of harmonisation requires either working out new data structures or adjusting existing data structures of spatial databases to INSPIRE guidelines and recommendations. Data structures are described with the use of UML and GML application schemas. Although working out accurate and correct application schemas isn’t an easy task. There should be considered many issues, for instance recommendations of ISO 19100 series of Geographic Information Standards, appropriate regulations for given problem or topic, production opportunities and limitations (software, tools). In addition, GML application schema is deeply connected with UML application schema, it should be its transla- tion. Not everything that can be expressed in UML, though can be directly expressed in GML, and this can have significant influence on the spatial data sets interoperability, and thereby the ability to valid data exchange.

...

The principal subject of this research is to propose an evaluation methodology for UML and GML application schemas quality prepared in the Head Office of Geodesy and Cartography in Poland within the INSPIRE Directive implementation works.

QualityML: a dictionary for quality metadata encoding

Miquel Ninyerola

...

In this direction, we have developed the QualityML (http://qualityml.geoviqua.org), a dictionary that contains

hierarchically structured concepts to precisely define and relate quality levels: from quality classes to quality measurements. Generically, a quality element is the path that goes from the higher level (quality class) to the lowest levels (statistics or quality metrics). This path is used to encode quality of datasets in the corresponding metadata schemas.

...

On one hand, the QualityML is a profile of the ISO geospatial metadata standards providing a set of rules for precisely documenting quality indicator parameters that is structured in 6 levels. On the other hand, QualityML includes semantics and vocabularies for the quality concepts. Whenever possible, if uses statistic expressions from the UncertML dictionary (http://www.uncertml.org) encoding. However it also extends UncertML to provide list of alternative metrics that are commonly used to quantify quality. A specific example, based on a temperature dataset, is shown below. The annual mean temperature map has been validated with independent in-situ measurements to obtain a global error of 0.5 ̊. Level 0: Quality class (e.g., Thematic accuracy) Level 1: Quality indicator (e.g., Quantitative attribute correctness) Level 2: Measurement field (e.g., DifferentialErrors1D) Level 3: Statistic or Metric (e.g., Half-lengthConfidenceInterval) Level 4: Units (e.g. Celsius degrees) Level 5: Value (e.g.0.5) Level 6: Specifications. Additional information on how the measurement took place, citation of the reference data, the traceability of the process and a publication describing the validation process encoded using new 19157 elements or the GeoViQua (http://www.geoviqua.org) Quality Model (PQM-UQM) extensions to the ISO models.

...

Enriching the Web Processing Service

Christoph Wosniok

...

However, modern use cases or whole workflow processes demand specifications of lifecycle management and service orchestration. Orchestrating smaller sub-processes is a task towards interoperability; a comprehensive documentation by using appropriate metadata is also required.


...

The RichWPS ModelBuilder enables the graphics-aided design of workflow processes based on existing local and distributed processes and geospatial services. Once tested by the RichWPS Server, a composition can be deployed for production use on the RichWPS Server. The ModelBuilder obtains necessary processes and services from a directory service, the RichWPS semantic proxy. It manages the lifecycle and is able to visualize results and debugging-information. One aim will be to generate reproducible results; the workflow should be documented by metadata that can be integrated in Spatial Data Infrastructures. The RichWPS Server provides a set of interfaces to the ModelBuilder for, among others, testing composed work- flow sequences, estimating their performance and to publish them as common processes. Therefore the server is oriented towards the upcoming WPS 2.0 standard and its ability to transactionally deploy and undeploy processes making use of a WPS-T interface. In order to deal with the results of these processing workflows, a server side extension enables the RichWPS Server and its clients to use WPS presentation directives (WPS-PD), a content related enhancement for the standardized WPS schema.

...

Commenti

La piattaforma RichWPS ha lo scopo di rendere i WPS (2.0) accessibili ai "non esperti" e può offrire spunti interessanti per realizzare un motore di workflow per Ritmare (presto la piattaforma dovrebbe divenire open-source). La definizione del workflow (in BPMN, non ho capito se 1.0 o 2.0 anche se penso che limitarsi ai costrutti della versione 1.0 possa essere sufficente per i nostri scopi). La descrizione del workflow lega input e output delle singole componenti. NetCDF è sempre tra i piedi.

No More Metadata!

Peter Baumann

For well-known technologically motivated reasons, communities have developed the distinction between data and metadata. Mainly this was because data were too big to analyze, and often too complex as well.

...

With the advent of Big Data technology we are in a position ot overcome this age-old digital divide. Utilizing NewSQL concepts, query techniques go beyond the classical set paradigm and can also handle large graphs and arrays. Access and retrieval can be accomplished on a high semantic level.

In our presentation we show, on the example of array data, how the data/metadata divide can be effectively eliminated today. We will do so by showing queries combining metadata and ground-truth data retrieval will be shown for SQL and XQuery.

Commenti

Personalmente, penso che la distinzione tra dati e metadati non sia solo da ascrivere alle motivazioni nell'abstract, bensì siano anche funzionali a consentire approcci alla discovery basati su harvesting e brokering, rispettivamente difficili e impossibili nel caso dati e metadati siano integrati.

Credo anche che la sovrapposizione di dati e metadati non sia generalizzabile a tutte le tipologie di dati. Temo infine che un qualsiasi approccio "semantico" alla discovery possegga già un formato (RDF) e un linguaggio di query (SPARQL) preferenziale.

Non ho trovato informazioni su Unified Coverage Model (UCM), che dovrebbe essere il punto di partenza del lavoro. L'approccio utilizza Rasdaman; i WCS sono visti come un "documento virtuale", in formato XML e memorizzato in eXist (il principale database XML), che viene interrogato tramite XPath e XQuery.

Le interrogazioni possono anche essere eseguite come SQL e stanno lavorando all'integrazione di SPARQL.