CIESM International Conference on East - West Cooperation in Marine Science
(Sochi, 1-3 December 2014)

Abstracts of Panel communications


Panels:

Panel [A] - Physical processes in coastal waters
Panel [B] - Geo-hazards
Panel [C] - Invasive species
Panel [D] - Contaminants & marine litter
Panel [E] - Marine biotechnology & society
Panel [F] - Data harmonization

Panels abstracts

--------------------------------------------------------------------------------------------------------------------------------------
Panel [F] - Data harmonization

co-moderators : Drs Frank Oliver Gloeckner and Nikolai Mikhailov
--------------------------------------------------------------------------------------------------------------------

Title : Data Harmonisation across Disciplines – the Ocean Sampling Day as an Example
by Frank Oliver Glöckner
Max Planck Inst. & Jacobs Univ., Bremen, Germany

Summary :
Investigations in molecular biology have transitioned from single experiments to high-throughput endeavours spearheaded by genomic science. Although the genomic revolution is rooted in medicine and biotechnology, environmental studies, most notably those of marine ecosystems, currently deliver highest quantities of data. New sequencing technologies are providing an increasingly powerful resource to investigate microbial diversity and function at the “Omics” level. But the full potential of modern “Omics” investigations in the marine realm only unfolds by contextualising sequence data with environmental/oceanographic and biodiversity data. The challenge of this approach is to harmonize and integrate heterogeneous data sources across disciplines. Common understandings as well as proper standards are a prerequisite for seamless data exchange.
The EU 7FP “Ocean of Tomorrow Project” Marine Microbial Biodiversity, Bioinformatics, Biotechnology (Micro B3, www.microb3.eu) unites intensive oceanographic monitoring, thorough biodiversity studies and high-throughput DNA sequencing of marine genomes, metagenomes and pan-genomes. The project addresses interdisciplinary needs in marine ecosystems biology and biotechnology by considering established best practice within the disciplines and deriving practical least-change means to align practices. As a proof of concept the Ocean Sampling Day initiative (OSD, www.oceansamplingday.org) was established as part of Micro B3. OSD is a simultaneous sampling campaign of the world’s oceans which took place (for the first time) on the summer solstice (June 21st) in the year 2014. As part of ramping up the first official OSD in 2014 several pilot-OSD studies were conducted to help establish the co-ordination (creation of the OSD sites network), logistics (sampling, shipping and processing), bioinformatics (metadata capture, standards, storage, analysis and data exchange) and policies (data policy for OSD, ABS/MTA/DTA). This standardized procedure ensures a high level of consistency between data points across all samples and researchers.
OSD led to the development of two standards to harmonize data: The M2B3 (Marine Microbial Biodiversity, Bioinformatics and Biotechnology) Reporting Standard (1) describes minimal mandatory and recommended contextual information for a marine microbial sample obtained in the epipelagic zone, (2) includes meaningful information for researchers in the oceanographic, biodiversity and molecular disciplines, and (3) can easily be adopted by any marine laboratory with minimum sampling resources. The M2B3 Service Standard defines a software interface through which these data can be discovered and explored in data repositories.
185 marine sampling sites have finally participated in OSD 2014. In total more than 1000 filters with biomass of marine samples from all continents have been taken and shipped to Bremen for DNA extraction, sequencing and bioinformatics analysis. Repetition of OSD is currently planned for June 2015. These cumulative samples, related in time, space and environmental parameters, will now provide insights into fundamental rules describing microbial diversity and function and contribute to the blue economy through the identification of novel, ocean-derived biotechnologies.

More information about OSD:
OSD Movie Teaser (1:30 minutes)
http://youtu.be/hBGOkB-EImc?list=PLgacjRIHqvMC39eKYdGH0HAM68YszmbuJ
OSD Movie (9 minutes) with subtitles in several languages
http://youtu.be/yUm7SsSe-cw?list=PLgacjRIHqvMC39eKYdGH0HAM68YszmbuJ




-----------------------------

Title : Universal database and services based on the integration of information for oceans
by Alexander Kobelev
All-Russia Research Inst. of Hydrometeorological Info., Obninsk, Russia

Summary :
To ensure the integrated provision of hydrometeorological information and information on marine activities to users integration of distributed and heterogeneous data is needed. Integration is implemented with the help of the unified data description model based on 19115/19139 ISO standards, unified vocabulary of parameters, common codes and classifiers. For data held in various sources various types of physical storage are typical: factographic, object (images and documents), spatial, service. There are various types of data logical presentation: points, profiles, grids. The same attributes of data may be presented in various units of measurement and numerical systems. In addition, data management is required when data are downloaded, processed and used. For all of these processes data harmonization is necessary. Data harmonization is implemented with the help of the universal data base (UDB) developed as a result of data integration to allow data to be presented in a unified form to be further used for generation of information products.
The UDB provides a unified access to data and metadata. A set of parameters of the environment may change with time. These changes should traced automatically and the UDB should be adapted in due time to ensure adequate data downloading. The UDB should have a data model, which makes it possible to deal with any data being integrated due to the flat data structure used for all types of data. The UDB should include a set of functions for preliminary data processing (library of processing functions) such as conversion to unified units of measurement, filtration (e.g. by specific parameters), calculation of derived characteristics, integration of data from various sources, data accumulation in time, data indexation, etc. These functions are included into the data processing algorithm and are reflected in the data life cycle. Implementation of the algorithm, fixed in the data life cycle, results in one or several derived tables of the data base. The UDB also manages the process of services updating.
The UDB concept has been implemented within the framework of the Unified System of Information on the Global Ocean (ESIMO).




-----------------------------

Title : An abstract scheme for the provision of harmonized data products
by Giuseppe M.R. Manzella
ETT SpA, La Spezia, Italy

Summary :
Data or data sets from different sources can satisfy needs for a variety of users. However, different applications require different constitutions of the data and it is not possible to state the fitness for use in a common way. Therefore it is necessary to state the ‘adequacy’ of data in an objective way and since the quality of different datasets must be comparable the quality description has to be in a standardized form. The principles on which Data Infrastructures are based (e.g. Collect data once and use it many times; Process and validate data at different levels; Develop a decision-making process for priorities that is user-driven) are posing some fundamental issues on the way they can serve different users for multiple uses, e.g.:
- Data can be used many times if they are fitting for purposes
- Different applications could need different precision or accuracy,
- In some specific application data can fit for purposes even if precision/accuracy is not high
- A synergic use of data from different monitoring and collection programmes needs to have information on all the sources of errors
- Analysis of data must consider also the errors associated to them

ISO standards introduce the necessary elements in the abstract process aiming to assess ‘how’ and ‘how much’ data meets applicable regulatory requirements and aims to enhance user satisfaction. Before combining or integrating data from different registries, one should consider similarities and differences in data collection methods, data quality, spatial and temporal resolution, etc. It is crucial that the data are comparable and compatible to avoid mistakes in analyses and interpretation. There are some questions that need to be addressed to determine whether and when data from different sources should be combined or compared:
- What about differences in data quality: How can these be measured and evaluated?
- What factors can affect data compatibility?
- How can one assess data comparability?
- When data are combined, what issues should be considered to determine whether the combined result is meaningful?

When data sets are compatible they could require some conversion. For this reason it is necessary to introduce the concept of ‘data harmonisation’. The general advice on this issue is to seek ways to harmonise existing data sets that are apparently incompatible, as this is likely to be more cost-efficient than starting afresh and collecting new data. The process will require some transformation or manipulation of the data to meet a common standard.

Data harmonisation is then an element of a management scheme based on ISO standards that starts from data quality specifications to data product specifications, in order to support the provision of access to interoperable spatial data through spatial data services in a representation that allows for combining it with other interoperable spatial data in a coherent way. The complete abstract scheme includes an evaluation of data by assuring relevance, reliability and fitness-for-purposes, adequacy, comparability and compatibility. Harmonisation is the further step for transforming data into interoperable data.




-----------------------------

Title : Harmonization of marine observation data and products as a "single window" of end users
by Denis Melnikov, N. Chunjaev, A.Vorontsov*
All-Russia Research Inst. of Hydrometeorological Info., Obninsk, Russia

Summary :
Results of distributed and inhomogeneous data integration represent a variety of data structures that are to be visualized by a minimum number of applications without additional programming. Therefore, visualization tools are to be set for metadata attributes available in data description. This allows data presentation templates to be automatically chosen according to such metadata attributes as a data storage system, observation/generalization frequency, space and time resolution, and platform type. Six templates are selected: time series, regular grid, profile, application, geo-service, and object file. A procedure of template selection is shown below.
If the resources have the following characteristics: Time presentation is regular (every three hours or every day, etc.); Space resolution is a fixed point or region; Platform is rigid; and Vertical presentation is a surface, in this case we have a classic time series that allows constructing data presentation as a map, graph and table.
Again, data visualization templates make it possible to obtain fundamentally new information products. These products are derived by automatic combination on the same graph of observed and forecast marine environment parameters for coastal stations, which are obtained from different sources. Combination of similar-in-content data for integrated data presentation is an important function of data integration (e.g., combination of original data obtained via GTS channels with observation platform data). Transmitting of data (for example, information on marine hazards) to any web device also requires interoperability standards to be used.
The data harmonization solutions proposed are used in developing the data visualization application on the portal of the Unified State System of Information on the Global Ocean (http://esimo.ru).




-----------------------------

Title : Harmonisation of ocean data - approaches and implementation in Russia
by Nikolai Mikhailov
All-Russia Research Inst. of Hydrometeorological Info., Obninsk, Russia

Summary :
Marine data standardization and availability are of great importance for ocean research and marine operations. Data are demanded by numerous scientific and design organizations, authorities and different maritime activity stakeholders. Data have different levels of processing (observations, forecasts, climatic summaries, analyses) and contain a large number of marine parameters (at least 800). Most often data are distributed geographically. It is required to exchange and provide an access to large amounts of data from distributed sources and of different type (databases, data files, GIS layers, geo-services and others). Methods and formats of data presentation are also different.
Data harmonization is a powerful mechanism to increase data accessibility taking into account data specificity and related problems. Moreover, the concept of "data harmonization" is rather complex and its precise (conventional) definition does not exist. But most often, data harmonization means a possibility to combine data from heterogeneous sources into integrated and consistent information products, in a way that is of no concern to end-user. When data are not harmonized, users have to spend a lot of time and other resources for data search and conversion.
Usually, physical, logical and organizational aspects of data harmonization are discussed, depending on the methods applied.. In Russia targeted data harmonization was implemented in the Unified System of Information on the Global Ocean (ESIMO). The system provides the information - communication infrastructure for integration of distributed and heterogeneous data provided by multi-discipline marine systems, and access to the integrated data on the basis of the “single window” principal.
Data harmonization solution is based on a number of components, such as unified dictionary of parameters, metadata, data model, exchange standards for data and services. These components are implemented in a Web-based environment.




-----------------------------

Title : Problems and solutions for the interoperability of heterogeneous and distributed data related to the marine environment and marine activities
by Evgeny Vjazilov, S. Belov
All-Russia Research Inst. of Hydrometeorological Info., Obninsk, Russia

Summary :
To harmonize heterogeneous and distributed data at the level of integrated data processing it is necessary to develop the unified vocabulary of parameters, to bring classifiers to the unified encoding system, to develop a wide range of metadata and to ensure software interoperability for data integration.
The unified vocabulary of parameters will allow all attributes used in data to be brought to the unified naming system.
To bring the encoding system to the unified classifiers the mapping of local codes to international and national codes is needed. This will make it possible to use any encoding notation without any impact on information interaction between systems and applications.
To integrate data the concept of information resource is using. Information resource is structured (data bases and files) or unstructured (document, set of documents) data developed and prepared for distribution among unlimited range of persons or used as a basis for provision of information services. Attributes of information resource are homogeneity of data structure, location in the same source (on the same carrier), and the same temporal-spatial resolution.
Inclusion into resource description of formalized data on the level of processing, form of presentation, data storage system, type of measurement platform, spatial-temporal scales will make it possible in the future to build various templates of data visualization. Data sources are presented in more detail through individual description of instruments, expeditions, projects, maritime organizations, observation platforms.
To harmonize data through integration ISO 19115 metadata interoperability standards, ISO standards of 19100 series, Open Geospatial Consortium standards, NetCDF international format and intersystem data exchange XML-technologies are used to the maximum extent possible.
Data harmonization based on the above approaches has been used for development of the Unified State System of Information on the Global Ocean (ESIMO), http://portal.esimo.ru.




-----------------------------

Title : A pan-European infrastructure for managing ocean and marine metadata and data
by George Zodiatis
SeaDataNet consortium, Univ. of Cyprus, Cyprus

Summary :
SeaDataNet (SDN) is the leading network in Europe actively developing and operating a pan-European infrastructure for harmonizing the management, the use of common vocabularies, QC procedures and standards for indexing and online access to marine metadata and ocean data and products, originating from data acquisition activities by all engaged coastal states institutions, of which 21 from the Mediterranean and 3 from Russia.
SDN continues and expands previous initiatives of the consortium since 2002, in particular Sea-Search, Black Sea Scene, Upgrade Black Sea Scene and several distributed data management structures developed during others EU projects. SDN is developing common regional products focusing on five regions: Mediterranean, Black Sea, Baltic Sea, Barents Sea and North Atlantic.
The SDN consolidates and populates an array of directories of marine data & information resources such as :
- EDMED: Marine Environment Data sets dispersed in the scientific laboratories;
- ROSCOP/CSR: Cruises Summary Reports;
- EDIOS : Initial Observing Systems;
- EDMERP : Marine Environment Research Projects;
- EDMO: Marine Organizations.

An important service developed in SDN is the Common Data Index (CDI) data discovery & access service. This provides a highly detailed insight in the availability and geographical spreading of a large variety of marine and ocean data sets, that are managed by data holders/ centres, that are connected to the SDN infrastructure. Moreover, it provides a unique interface for requesting access, and if granted, for downloading data sets from the SDN distributed data holders/centres. The CDI service provides metadata and access to more than 1.600.000 data sets, originating from more than 500 organisations in Europe, covering physical, chemical, biological, geological and geophysical data, and acquired in European waters and global oceans. Already more than 105 data centres from 34 countries are connected.
Latest developmets of SDN in the data harmonization includes among others :
- compliancy to INSPIRE, implementing the ISO-19139 for metadata and information
- SDN NetCDF format definition and implementation
- Ocean Data View (ODV) format has been adapted to manage also biological data
- IOC-IODE Ocean Data Portal and GEOSS Portal are linked to SDN portal
- improvement of the QC of the delivered data (duplicate checks, format checks, quality check loops) in cooperation with regional GOOS organisations

Besides the technological developments, SDN enhance the quality and perennial safeguarding of the data through training and capacity building to insure a common level of expertise and practice in the overall data management and inter-compared basic tools to all data holders/centres. The data management made by the SDN professional structure will avoid the loss of valuable observational data and provides an easy and integrated access to them. Finally, the involvement of the main marine institutes, which support these data holders/centres contributes to the sustainability of the SDN system and can be enhanced further in the frame of a foreseen joint convention.




-----------------------------