k.LAB: a semantic web platform for science

This document is a comprehensive, stand-alone technical introduction to the k.LAB platform, targeted to a technically savy readership. It does not substitute the k.LAB documentation and it cannot be used to learn how to use k.LAB, either from an end user or a developer perspective.

1. Introduction

Integrated modeling is a practice meant to maximize the value of scientific information by ensuring its modularity, reusability, interoperability and traceability throughout the scientific process. The open source k.LAB software, discussed here, is a full-stack solution for integrated modelling, supporting the production, curation, linking and deployment of scientific artifacts such as datasets, data services, modular model components and distributed computational services. The purpose of k.LAB is to ensure — by design rather than just intent — that the pool of such artifacts constitutes a seamless knowledge commons, readily actionable (by humans and machines) through a full realization of the linked data paradigm augmented with semantics and powered by artificial intelligence. This design enables automation of a wide range of modeling tasks that would normally require human experts to perform.

The k.LAB platform directly addresses the four FAIR principles (Findable, Accessible, Interoperable and Reusable), introducing innovation particularly in the practice of semantic annotation, which is reinvented as a modern, expressive approach meant to ease adoption by both providers and users of scientific knowledge. To the four dimensions in FAIR, k.LAB adds a reactivity dimension, in line with the original vision of a semantic web. Reactivity enables knowledge to also be deployed in an "internet of observations" creating live artifacts that can interact, improve and evolve as new information appears on the network.

The central service in the k.LAB modeling API wraps the resolution algorithm, which receives as input a logical query of the form "observe concept in context" (e.g., "observe change in land cover type in Colombia, 2015-2020", only slightly paraphrased from k.LAB’s near-natural language query formalism). In response, k.LAB assembles, documents, initializes and runs a computation (called a dataflow) that produces the observation of the concept that best fits the context, based on the integration of data and model components available in the distributed k.LAB network. The observations output by the API request, along with the dataflow assembled to generate them, are themselves scientific artifacts — automatically augmented with provenance records and user-readable documentation — that can be exported and curated as needed.

Artificial intelligence, driven by both semantics (machine reasoning) and the analysis of previous outcomes (machine learning), satisfies the request using a shared, communally owned and curated knowledge base (the worldview, a set of ontologies defining a common perspective for scientific observations) and the resource pool available at any given moment on the k.LAB network — ranking, selecting, adapting, and connecting data and model components made available by independent and uncoordinated providers. We refer to the process of building the computational strategy that "observes" a concept in a context as resolution, and to its execution to produce the desired result as contextualization. We also refer to the combination of both processes when we discuss the "resolution service" provided by the API.

This document describes the main principles and architecture of k.LAB. More detailed reference documentation for k.LAB is in development and will be referenced in this document where available.

2. Architecture of the k.LAB platform

The k.LAB software stack includes client and server components that support the creation, maintenance and use of a distributed semantic web platform where scientific information can be stored, published, curated and connected. The software is licensed through the Affero General Public License (AGPL) v.3.0; the core components are available as a single project in the k.LAB Git repository.

2.1. The software stack

Server components are deployed by certified partners to publish resources and semantic content (the k.LAB Node) and/or to provide modeling services and applications to online users (the k.LAB Engine). Published resources can include both static data and dynamic computations, both of which may be hosted in source form at the node or linked to external data (e.g. WCS, WFS, OpenDAP) or computational services (e.g. OpenCPU). The k.LAB Node software is deployed in containers that can be configured to host dedicated instances of Geoserver, PostgreSQL, Hyrax or other services; these are transparently managed through server adapters inside each node, virtually eliminating the need for specific training on those components for node administrators.
Client components are used by contributors to develop, validate and publish resources and semantic content (k.Modeler, an Integrated Development Environment (IDE) for semantic modeling), and by end users (k.LAB Explorer, a web-based application environment) to access modeling services and specialized applications built for the platform and delivered through the web.

Additional server components fill specific needs on the k.LAB network and are less commonly needed at partner sites. Among them the following are noteworthy:

The hub server, k.Hub, manages authentication and organizes node access for authenticated engines. The Integrated Modelling Partnership manages a set of nodes and a main hub, and releases site certificates that enable nodes to be connected to form the platform. Partners that need to manage users locally may also deploy and connect a hub, although this is normally only convenient in large deployments.
A semantic server collects and indexes the semantic knowledge from the worldview and all public projects, constantly compiling and revising documentation and use cases to assist users in the semantic annotation process. Users can look up annotations made by others and access hyperlinked, evolving descriptions of each concept and predicate. The semantic server can be connected to the k.Modeler editor to provide inline validation of logical expressions in models being developed, and a suggestion service that can find and propose comparisons with use cases extracted from peer-reviewed public projects. Through the use of specialized metadata added to source files, the server can be integrated with the editors so that assistance is available directly, to ease the development of semantic content as much as possible. The semantic server is in development and is not available to the general public yet.

Other, less critical server components are in development and are not discussed here. Among these, a statistics server collects anonymized information from successful and unsuccessful resolutions and processes them using machine learning techniques to improve the resolution algorithm.

2.2. The k.LAB logical layers

The set of active, connected nodes and engines at any given time forms what can be seen collectively as a distributed container, where scientific knowledge is found in three layers handling information at increasing levels of abstraction: the resources, semantic and reactivity layers. The first can be seen as a data curation platform based on modern linked data concepts, using a generalized and flexible data model. Semantic and reactive content for the platform is developed in the respective layers using two specialized languages, k.IM for semantic resources and k.Actors for reactive behaviors and applications. The k.Modeler IDE provides drag-and-drop interfaces to build resources and a specialized environment for k.LAB projects, to ease editing and debugging of the k.IM and k.Actors languages.

The resource layer provides a protocol for conventional data and computational resources or services to interoperate at the data level, matching identifiers, data types and metadata through a uniform API. Nodes and client applications include interfaces to manage development and submission of knowledge to the resources layer, to be published and curated either on-site or through hosting providers with full control of licensing and access.
The semantic layer provides a language that enables annotation of resources through semantically explicit logical expressions, ensuring findability, interoperability and accessibility through purely logical queries, validating consistency and producing mediation strategies through machine reasoning and logical inference. The semantic layer uses the k.IM language to specify semantic knowledge (compatible with W3-endorsed OWL 2) and models. These specifications, collected into namespaces and projects, can be deployed to k.LAB Nodes for the k.LAB inference engines to find, rank and use.
The reactivity layer provides behaviors for the scientific artifacts produced by running queries in the semantic layer, effectively turning observations into software agents. Such reactive observations can generate and react to events either locally (within the same session) or remotely. The reactivity layer enables distributed agent-based simulations and computations that automatically adapt to changing conditions or states. The k.Actors language is used to define behaviors for the reactivity layer. As a special case, behaviors bound to users and sessions can be used to quickly develop specialized interactive applications that run on the k.LAB Explorer platform, accessed through web browsers.

The separation of concerns and APIs in the three layers maximizes their value: for example, the resources layer can be seen through different semantics, therefore serving different purposes in different networks by reinterpreting it through the logical "lens" of a differently configured semantic layer.

2.3. Accessing the system

The k.LAB system can be accessed through (1) client software (usually an application running in a browser within the k.LAB Explorer web platform, or the k.LAB integrated development environment (IDE), k.Modeler) or (2) through its API by software applications. Providers of content may use the IDE or, in the near future, a content provider web interface available on all k.LAB Nodes, including of course any nodes deployed at the provider’s end. All users must be authenticated through a valid, secure certificate, which also establishes the semantic worldview of reference and any user permissions for the certificate holder. Content in all layers may be made available in public form or be linked to specific users or groups by its owners; access permissions are a mandatory part of metadata for all informational assets.

Regular users: Non-technical k.LAB users normally interact with the system through an instance of k.LAB Explorer exposed by a networked k.LAB Engine (or cluster thereof). The basic k.LAB Explorer is usable as a generic search-and-compute interface that allows users to easily set their context of observation to locations and times of interest. Queries are cached and suggestions are given based on the user’s groups and previous queries, providing an experience similar to modern search platforms. As k.LAB Explorer can be used as an application development platform (see further in this document), specific applications can be built on top of k.LAB Explorer and given a specialized access URL. Such applications, like the recently deployed ARIES for SEEA, look and feel like typical interactive web applications and can be developed and deployed with very minimal effort to assist specific groups of users.
Content providers and modelers: The k.LAB Engine, a server-side component, can also be run at the client side in a local configuration, so that new content can be developed and tested in a sandboxed environment before publishing, with full access to public resources. Such client use is supported and facilitated by a small, downloadable control center application that removes the complexities linked to installing, upgrading, starting and stopping the engine or the k.Modeler IDE. At the time of this writing, the IDE remains the endorsed toolkit to prepare both semantic and non-semantic content for distribution and publish it to the network. In the near future, more direct pathways will be enabled so that data contributors can also provide content (particularly datasets) through less technical, web-based interfaces to be developed.
Applications and software: k.LAB provides a stable API for all its server components, specifically the authenticating hub, the nodes and the engines. This API is used by all the k.LAB client software but can be used independently to enact a "modeling as a service" paradigm whose primary service provided is the resolution algorithm. At the time of this writing, the API is mostly used through k.LAB’s own client software, but ongoing projects and collaborations point to a more widespread integration of k.LAB API services within external platforms and applications in the next months. In addition to direct use of REST endpoints, served by engine clusters operated by the Basque Centre for Climate Change (k.LAB’s host institution) and partner institutions, client libraries for popular languages (Python, Javascript, R) will be made available to ease integration with existing applications, based on demand.

In addition to uploading content to existing nodes, institutionals contributors can use k.LAB Node software to deploy sites that contribute to the k.LAB network while retaining full control of all distribution details. Nodes are deployed as containers that can be easily set up and authorized by certified partners. k.LAB’s distributed paradigm supports an approach where (1) information remains under the ownership of its authoritative sources while (2) maximizing its availability and interoperability, (3) and compatibly with both public and commercial services, thanks to careful attribution of ownership and state-of-the-art encryption, access control and security.

3. The resource layer

The resource layer contains or provides access to all "conventional," non-semantic informational assets available to k.LAB — from raw datasets and bridges to external data services to algorithms expressed as mathematical equations or computable code.

The aim of the resource layer is to present common conventions and a consistent API for k.LAB to access and manage pre-existing data, models and services of all kinds, as a first layer of interoperability. The resource layer provides a generic protocol that can be adapted to any existing data source or service as well as databases and external computations; by contrast, the semantic layer specifies a language for interoperability. Because resources have no associated semantics, it is possible to reinterpret any resource through different semantics, enabling complete orthogonality between the resource layer and the semantic layer.

Importantly, computations can also live in the resource layer, ranging from simple equations to large and complex models. In fact, anything that takes inputs and produces outputs in numeric or other form (with no meaning explicitly attached beyond names and metadata) can be seen as a k.LAB resource. All resources are identified by a Uniform Resource Name (URN) which can be resolved, through the k.LAB API, to an informational record that contains all original metadata along with provenance information, history, and access permissions for the requesting user. Inputs, outputs and (in the case of resources that produce multiple objects) attributes will be similarly identified by a name and a data type.

In normal k.LAB usage, resources are not used directly by external clients, although the resource API is open to authorized users and can be used as the base layer of a standard linked data platform. Instead, the resource URN is used in semantic models (see the next section) that in turn populate the search space for the semantically-driven resolution algorithm at k.LAB’s core. A model that references a URN for which the requesting user has no access is automatically deactivated and does not participate in resolution, allowing k.LAB’s resolver to continue resolving through another authorized strategy. All resources have data types (number, text, boolean, or probability distributions thereof) and a geometry, which defines the original representation of space and time, if any (in contrast, semantics assets have semantics and scale).

A resource URN is a fully specified identifier that uniquely references a resource in k.IM namespaces. It consists of 4 parts, separated by colons:

A node name (the name of the node where the resource was originally published);
A catalog (a logical space handled by the node, for example a domain such as hydrology, or a name describing a large data collection);
A namespace (a secondary logical space within the catalog);
A resource identifier.

The node name may consist of the reserved word local (identifying an unpublished local resource, see below) or klab to identify a "universal" resource with no associated network storage and handled directly by the engine. The catalog part of the URN denotes a specific software adapter (e.g. klab:random: would introduce a URN pattern that produces various types of random data for testing, defined by the namespace and identifier). The core resource API, exposed by both k.LAB Engine and k.LAB Node, provides (1) a URN resolution service (URN → resource metadata), (2) standard Create/Read/Update/Delete (CRUD) operations on the resource layer, and (3) the most important operation, contextualization, which takes as input a resource URN and a geometry specification and returns the data content of the resource adapted to the passed geometry. The contextualization return value is a flexible data structure (based on Google Protobuf) that allows efficient marshalling of zero or more objects, each with an internal structure that admits scalar or distributed values along grids or tessellations, conformant to the request geometry. The result, complete with metadata and provenance information but still devoid of semantics, is passed to the k.LAB runtime to be turned into observations within the execution of a k.LAB dataflow.

Not all resources occupy physical storage on k.LAB nodes. In fact, k.LAB extends the notion of the URN to also encompass literals (e.g. model 100 as geography:Elevation in m, where 100 can be seen as a shorthand form of klab:literals:values.parsed:number#value=100) and specialized computational services which may simply serve as bridges to online services or computations (e.g. the urn klab:osm:relations:park would contextualize to all the parks stored as relations (polygons) in the OpenStreetMap service in the queried geometry).

3.1. Resource adapters

Diverse, extendible sourcing of information for resources is enabled through the use of adapters, software plug-ins that adapt a specific data or service format to the API. The adapter identifier and parameters are specified in the metadata associated to the URN and used to select the methods for contextualization, import, export and indexing. Adapters are made available as k.LAB components, installable in k.LAB Engines and k.LAB Nodes, and can be extended by developers using the Java API to support formats and services not yet available. External APIs (e.g. datacubes) can be supported by deploying a bridge adapter to one or more k.LAB Nodes.

To date, adapters for many file formats (CSV, spatial raster and vector data, NetCDF), protocols (WCS, WFS, OpenDAP, SDMX) and specialized services (OpenStreetMap, weather station data bridging to multiple databases and sources) are available and others (such as RDF/SPARQL) are in development. Other adapters enable specialized services, like scale-dependent selection of hierarchically organized datasets such as administrative regions or river basins. URN parameters can be added to the base URN to trigger specialized processing at the node’s end, such as resolution-dependent simplification of polygons, selection of interpolation methods, or any other adapter-dependent option that will best suit the desired semantics.

3.2. Lifecycle of k.LAB resources

Resources start their life as local within a user project, imported from files or through a resource editor integrated with the k.Modeler client software. Such local resources go through a validation process, meant to ensure that the k.LAB runtime can perform any operations on the resource that may be required during contextualization: for example, spatial data must have proper projections and valid polygons throughout. When a local resource is accepted, it can be used inside the project that contains it or in any other project that shares the same local workspace, but it is not visible to other k.LAB users. Local resources may be sufficient for the needs of a specific, short-term project, but the natural lifecycle of a resource continues with publication, which makes it available across the k.LAB network.

Publication of a resource is conditional on further validation; resources with incomplete metadata, licensing or ownership information are not accepted by the software. Successful publication uploads the resource to the staging area of a chosen k.LAB node, where it can be made available for general use and further edited in-place by its owner. Every edit of a published resource creates a new version of the resource and full history is kept. Published resources are independent of projects and obtain a unique, persistent URN; the hosting k.LAB Node may optimize their data content for faster serving and automatically mirror the resource to other nodes for increased availability. The resource publisher may choose to make access private (the default), available only to selected users or groups of users, or publicly accessible. URNs can be resolved by any node or engine through a distributed resolution service, and used in k.IM models without the need for any registration or download, as long as the user is connected to the k.LAB network. A model that references a URN that the running user has no access to will be automatically deactivated and not participate in resolution.

At the time of this writing, the staging "tier" of the resource layer is the only one enabled in k.LAB Node software. It is envisioned that an iterative resource review process, operated with participants drawn from the scientific community, will be used to promote resources to higher-ranking tiers, the level of which may affect the resolution algorithm (which will also incorporate user feedback and machine-learned usage statistics as resources are used in models). This process may eventually involve the attribution of a DOI to resources, resolved both through standard DOI proxy servers and directly by k.LAB, in which case the DOI may replace the URN in semantic models.

4. The semantic layer: semantic modeling

Semantic modeling involves the semantic annotation of non-semantic resources based on a shared worldview (a logically organized knowledge base containing concepts and relationships). The linking of semantics to resource URNs is done in models, i.e. semantic annotations that specify the meaning associated with resources and, when applicable, with their inputs, outputs and attributes. A non-semantic resource can represent either data and computations — k.LAB therefore treats data annotations and semantically annotated algorithms uniformly; we thus use the term model to refer to both. The pool of models findable within a k.LAB session, organized into projects made available on k.LAB nodes, constitutes the semantic layer, which is searched by the resolution algorithm to resolve a logical query to a result artifact.

All semantic assets, from the knowledge base itself (concepts, relationships) to semantically annotated content (data, algorithms) are specified in the k.IM language. While the underlying knowledge model for k.IM is the W3 standard OWL2 (to which all logical k.IM specifications can be translated), k.IM’s close resemblance to the structure of the English language makes it highly readable:

model occurrence of agriculture:Pollinator biology:Insect caused by earth:Weather
	observing
		earth:AtmosphericTemperature in Celsius named air_temperature,
		earth:SolarRadiation in J named solar_radiation
	set to [0.62 + 1.027 * air_temperature + 0.006 * solar_radiation];

In a departure from other ontology platforms, k.LAB specification of semantics admits logical expressions that combine predicates, operators and nouns in a fashion modeled on the grammar of the English language. For example, the k.IM statement im:Net value of ecology:Pollination (an observable expression, or observable in short) contains a predicate (im:Net) and a semantic operator value of that affects the meaning of the process concept ecology:Pollination and transforms it into the concept representing its quantifiable value. This linguistic articulation is key to the usability and parsimony of the underlying knowledge base, which can remain small and learnable thanks to the ability to combine and reuse terms and operators. At the same time, it supports the functioning of the machine reasoning underlying the resolution algorithm, which can reason independently on the different logical dimensions of an observable and infer computations that would otherwise require specialized, ad-hoc modeling. When specific models for a complex observable are lacking, each logical dimension of it may be resolved to one or more models that handle each specific component. Next, the set of models can be used to assemble the best-case computation to produce the finished observation (with component models ranked for best fit to the context before selecting the most appropriate one). If desired, resulting dataflow (algorithm) can be saved as a non-semantic resource for future reference and reproducible reuse in k.LAB.

The specialized k.IM editor provided with k.Modeler further facilitates the use and recognition of semantics by color-coding the fundamental classes of knowledge represented by concepts (blue for predicates, such as attributes, roles, realms or identities; green for quantifiable or categorizable qualities; brown, red, green/yellow, and yellow respectively for subjects, processes, events, and relationships) ^[1]. The editor is connected to the inference engine and further assists the modeler by checking the logical consistency of each observable as the user types and reporting any inconsistency as a syntax error. k.LAB models are typically very short, simple and easily readable. With few exceptions, each model resolves one observable expression (which follows the keyword model), with any required inputs stated merely as semantics (following the keyword observing). As a result, each model, by design, can be run and tested independently. For example, the model below

model occurrence of earth:Region with im:Still earth:PrecipitationVolume
	observing
		earth:Upstream im:Area in m^2 named contributing_area,
		geography:Slope in degree_angle named slope
	set to [
		def sloperadians = Math.tan((slope*1.570796) / 90)
	  	def twi = Math.log((contributing_area+1) / Math.tan((sloperadians+0.001)));
		return normalize(twi, -3.0, 30.0)
	];

requires observations of geographical slope and upstream drainage area as inputs to compute its output, a commonly used hydrological quantity (topographic wetness index) reinterpreted as a probability through the semantics of "occurrence of region with retained precipitation". None of the complex calculations required to compute the inputs needs to be part of the model, as their semantics (earth:Upstream im:Area and geography:Slope) are resolved at run time to the most appropriate model for the context when the primary observable is queried. The context can consist of a single point in space or of a gridded or polygon-based spatial coverage, without any modification to the model. If the context is temporally dynamic and the underlying state of a dependency (e.g. slope) changes in time, the k.LAB runtime will automatically notice the change and recompute the output, unless a specific model of change in occurrence of earth:Region with im:Still earth:PrecipitationVolume (a process affecting the quality after the change in operator) can be resolved in the context. When the model logics require that certain dependencies are satisfied in a specific way, scoping rules in k.IM can be used to ensure that specific models (or a specified subset of models) are chosen to satisfy the desired dependencies. It is also possible to use (libraries of) non-semantic models to refer to specific computations whose semantics are not deemed worth exposing, ensuring linkage with conventionally used metrics without sacrificing modularity or requiring overly difficult semantic characterization.

In many situations, models can be written independently of the specific spatial and temporal context in which they will be run, and often even in ways that are compatible with different interpretations of space and time (e.g. with regular or irregular subdivisions). When desired, language constructs can be used to lock a model or namespace so that it can only be applied to specified representations or ranges of extents and/or resolutions in both space and time, as well as to override the priorities in the resolution algorithm to handle any special need of the models or of the resources they use. Negotiation of inputs, outputs, data format, units or currencies, visualization and contextual validation are by default left to the k.LAB runtime. Writing models this way enforces a strict coding discipline and maximizes clarity, readability and parsimony: contributors only write the core of the algorithm that leads to one specific observation, leaving every other aspect (including the selection and computation of any inputs) to the resolver and the k.LAB runtime.

4.1. Semantic mediation and inference in support of modeling

In simple cases, the query "observe observable in context" is answered by locating a model annotating a data source as an observation of the specified observable. For example, setting the context to a geographical region (e.g. a country’s extent with a spatial grid model at 100m resolution and temporal context, e.g. the year 2010) and querying an observable such as geography:Elevation in m may retrieve, among others, the following model:

model im.data:geography:morphology:dem90 as geography:Elevation in m;

which annotates a network-available resource specified by the URN im.data:geography:morphology:dem90 as an observation of the geography:Elevation concept. The URN gives access to metadata including the original spatial and temporal coverage and resolution, through which the model, whose semantics is identical to the query’s, can be ranked for match to the context. If the model is deemed to be the best match, the k.LAB Engine will translate it into a set of processing steps (in this case simply a resource retrieval operation plus any necessary mediation) and pass the resulting dataflow to the runtime to compute and produce the resulting observation, in this case a raster map of elevation, with 100m resolution, reflecting the boundaries and time of the context. The dataflow will include any necessary reprojection, resampling, or unit transformation to match the query and the context. Other models may compete for the choice, made on the basis of criteria such as resolution and extent match, specificity, semantic match, and including criteria such as peer review results or usage feedback for the original data. If the chosen model only partially covers the context, additional models may contribute to its complete characterization, as long as they specify a compatible observable and their ranks are close enough.

Besides such simple and direct matches, machine reasoning backed by an observation-centered ontological framework can enable more sophisticated observation tasks that do not correspond to readily available annotations and are normally only possible through specialized, time-consuming work. In a straightforward example, attributes such as im:Normalized may be prepended to another observable to reinterpret the result, where the attribute would be resolved to an independent model (model im:Normalized using <normalization function>), possibly restricted to certain classes of observables (e.g. model im:Normalized of im:Quantity … to restrict its application to numerically quantifiable observables) and used to modify a straight observation of geography:Elevation if the normalized observable cannot be resolved directly. More interestingly, resolution strategies may cross inherency barriers to infer the best observation strategy when a direct match is not available. For example, a hypothetical query for (ecology:AboveGround ecology:Biomass) of biology:Eucalyptus biology:Tree ^[2] operated in the same country context would refer, by virtue of the inherency operator of, to a quality (above-ground biomass) inherent but to a particular subset (Eucalyptus) of the observations of a secondary subject (Tree) located in the primary context of the query (a geographical region). It would be resolved by the following strategy:

Locate a model for the original observable, (ecology:AboveGround ecology:Biomass) of biology:Eucalyptus biology:Tree, that is compatible with the context of observation. If found, resolve using it. Otherwise
Locate a model of the inherent subject, biology:Eucalyptus biology:Tree; if found, accept it as the strategy to instantiate an observation for each Eucalyptus tree in the region, so that a model of (ecology:AboveGround ecology:Biomass) can be later resolved in the context of each tree. If an "Eucalyptus tree" model cannot be resolved
Locate a model capable of instantiating every biology:Tree in the region; if found, locate a classifier model capable of either 1) checking if the tree is Eucalyptus or not (model biology:Eucalyptus of biology:Tree), or 2) attributing the abstract identity (biology:Species) of which biology:Eucalyptus is a subclass (model biology:Species of biology:Tree). Such a model would be applied to classify the tree observations, only keeping those that classify as Eucalyptus.
If Eucalyptus trees are resolved successfully through either strategy (2) or (3), locate a model of (ecology:AboveGround ecology:Biomass) for each tree to compute the biomass in the context of each. If successful, insert a dereifying operation to complete the observation, turning the "above ground biomass" values observed in the context of each tree into the quality "above ground biomass of Eucalyptus tree" observed in the context region.

Similar reasoning strategies can be applied to diverse situations, using semantic inference driven by the phenomenological understanding of the entities involved and the observation process applied to them. For example, a query for presence of biology:Tree that cannot be directly resolved could be satisfied by a model of (ecology:AboveGround ecology:Biomass) of biology:Tree because biomass (a im:Mass in a higher-level ontology) is an extensive property, therefore its non-zero value implies the existence of its inherent subject. The presence can be computed as a true/false value attributed to the context wherever the biomass of any tree is nonzero. In another commonly encountered use case, qualities that can only be correctly computed in specific contexts (for example hydrological qualities, such as "upstream area", which only produce correct results when computed in a correctly delineated river basin) can be automatically computed in arbitrary contexts. To do so, k.LAB first looks up a model to delineate all the relevant contexts (river basins) intersecting the areas, then applies the necessary models to compute the qualities inherently to those, then re-distributes the values over the desired context. Such behavior can be automated using a concept definition such as

area ContributingArea
	is earth:Upstream im:Area within hydrology:RiverBasin;

which incorporates the "natural" context RiverBasin in the semantics of a new quality; alternatively, and more correctly if the RiverBasin context is required only because of modeling constraints and not directly implied by the concept, the observable can be left unconstrained and models can be defined as

model earth:Upstream im:Area within hydrology:RiverBasin
	....;

In either case, the within operator mandates a RiverBasin context for the quality, which will trigger the distributed resolution process described previously whenever the observable is queried in any context where river basins can be observed. The same considerations hold for more complex observables such as processes, which have the ability to affect the value of qualities through time and to generate events or other objects; these, in turn, can become the context for other qualities or processes. The ability to automatically negotiate mediations based on inherency and phenomenological reasoning dramatically improves the capability of connecting diverse models without error, offering integration possibilities orders of magnitude beyond those allowed by the mere semantic matching of observables to models. Conventional approaches to such tasks require substantial planning, technical expertise and time.

Much of k.LAB’s power comes from the fact that component models pertaining to the different aspects of a larger modeling problem may be provided and shared by independent experts, with no need for any coordination beyond adherence to the same worldview. Each new model can serve multiple potential purposes and does not just add to, but rather multiplies the value of preexisting knowledge on the platform when interacting with it, just like words in natural language. The power of the resulting paradigm shift becomes obvious when the problem area addressed by modeling spans multiple disciplines, expertise types and languages, emphasizing the importance of a collaboratively built and endorsed worldview.

4.2. The worldview

Both annotation and inference, as described above, require a common set of ontologies that define the realm of knowledge that can be integrated and conform with the foundational principles of k.LAB’s observational model. We refer to this set of ontologies as the wordlview, a set of k.IM projects that are automatically synchronized to all users that adopt it. A worldview is linked to each user profile and associated certificate that connects each k.LAB Node to the k.LAB network; only engines and nodes that adopt the same worldview as the user’s are seen in that user’s k.LAB session.

Because a worldview is meant to describe observation of reality, not reality itself, it is naturally aware of scale; its semantics differentiates observables not only by phenomenological nature but also by the nature of the observation process applicable to them. For example, k.LAB distinguishes events from processes, a distinction that has no real epistemological rationale (and does not exist in ontologies such as BFO). Yet this distinction is fundamental from an observational perspective, as events are countable entities that must be instantiated, producing zero or more independent observations, prior to resolution (while only one instance of the same process may exist within the subject that provides a context for it). The range of scales of observation is key to the compatibility of worldviews: a single worldview can easily address the wide range of problems that are "visible" at the scale of observation of a human observer, encompassing for example economic, ecological and social phenomena. However, it would be difficult to maintain meaning if that same worldview was also used to annotate problems at extremely small or large observational scales (relevant to e.g. quantum physics or general relativity, respectively).

The development of a worldview is a large collaborative endeavor, whose success is essential to the full fruition of the k.LAB paradigm. To date, only one worldview (the im worldview, for Integrated Modeling) is being developed, initially by the k.LAB team and an extended group of collaborators. This worldview currently consists of Tier 1 namespaces, covering a set of disciplinary realms with only enough detail to enable k.LAB’s current applications. As applications of k.LAB grow, a process for the collaborative development, versioning and maintenance of the Tier 1 IM worldview will become increasingly important. Tier 2 namespaces will be defined to specialize and add semantic detail to the corresponding Tier 1 namespaces: for example, the Tier 1 hydrology namespace will be complemented by a project containing hydrology.xxx namespaces for each field of hydrology needed by specialized applications. Such Tier 2 projects will be tied to user groups that each user can opt into through their user profile on the k.LAB hub, so that those users can automatically access any projects and models that require Tier 2 concepts to be understood by the system. This modular approach will enable specific user groups to control the development of the needed terminology while remaining compatible with the core Tier 1 concepts, and allow a scaled and coordinated development of the knowledge base without overwhelming those users not needing highly domain-specific detail. The semantic server described in the introduction will recognize the user groups and provide suggestions for annotations matching the user’s chosen areas of expertise and level of detail.

4.2.1. Authorities

Providing semantics for identities such as taxonomic or chemical species presents a special challenge, as their number is virtually infinite: as a result, most commonly used ontologies (such as those in the OBO foundry) resort to providing some of the identities most likely needed by the communities of reference, but it is impossible to address all use cases with full generality, and even importing specialized ontologies (such as CHEBI for chemical identities) risks overwhelming the inference engine with too many (and still often not enough) concepts, or creating unnecessary incompatibility stemming from the usage of equivalent concepts from different ontologies. In k.LAB, this problem is averted through the introduction of authorities, a mechanism to interface with external vocabularies that enjoy broad community acceptance, fully integrating them into the k.IM language and the resulting ontologies. Such vocabularies are seen by contributors and users as externalized namespaces. In the k.IM language, an authoritative identity is specified with a form like IUPAC:water, easily distinguished from other concepts by its uppercase namespace identifier (a regular concept would have a lowercase namespace, e.g. geography:Slope). An authority’s use in k.IM triggers validation of the concept identifier (water) using an online service tied to the authority (IUPAC), which is advertised by nodes in the k.LAB network. Upon successful validation, an identity concept is produced for the statement whose definition is identical and stable at all points of use. This mechanism enables the externalization of large vocabularies (e.g. the IUPAC catalog of chemical species or the GBIF taxonomy identifiers) and structured specification conventions (e.g. the World Reference Base for soil types) that are validated and turned into stable, k.LAB-aligned semantics at the moment of their use. Another advantage of many authorities is their flexibility of specification: for example, IUPAC:water and IUPAC:H2O are valid identifiers that can be used in k.IM observables as written, and translate to the same concept (the chemical identity corresponding to water, encoded internally as the standard InChl key) using a IUPAC-endorsed catalog service provided by the U.S. National Institutes of Health. The k.LAB stack provides content contributors with assisted search interface and intelligent editor support with inline, "as-you-type" validation and documentation. The currently supported authorities include IUPAC, GBIF, the World Reference Base soil classification formalism, and the set of UN-endorsed statistical classifications provided through the FAO CALIPER service (the latter in development at the time of this writing).

4.3. Learning models

An important part of modeling is the adaptation of a computation to known data, so that it can best reproduce a known output from a known set of inputs, in order to increase confidence in predicted results when the model is run with unknown inputs. The main use cases for this activity are machine learning, which iteratively trains a statistical model until it produces the best fit to known data, and model calibration or data assimilation, used to find the optimal parameterization of mechanistic models. No modeling platform would be complete without addressing these "learning" capabilities. In k.LAB, model learning exploits the separation of the resource and semantic layer and the ability to find both inputs and outputs by resolving semantics. Models introduced by the keyword learn instead of model will resolve their outputs in addition to their inputs, producing a computable resource with a specified URN, independent of semantics, using a specialized function connected to the k.LAB runtime (a contextualizer, specified after the using keyword). The resource produced contains the trained computation, ready for future reuse by providing inputs through a model. For example, a minimal Bayesian suitability model to inform a land cover change model could use the following specification:

learn landcover:LandCoverType
	observing
		@predictor distance to infrastructure:Highway,
		@predictor distance to earth:Waterway,
		@predictor distance to earth:Coastline,
		@predictor geography:Slope,
		@predictor geography:Elevation,
		@predictor count of demography:HumanIndividual,
		@predictor earth:AtmosphericTemperature in Celsius
	using im.weka.bayesnet(resource = luc.suitability);

The function call following the keyword using invokes a learning process from the Weka software, in this case a Bayesian learner, integrated in k.LAB. When run in a spatially distributed context, the above model will resolve both the output (land cover type) and all predictors in the context of execution, sample them to produce a training dataset, and pass the latter to Weka to build and train a Bayesian model. The model obtained is, in turn, used to produce the luc.suitability local resource (using the Weka adapter) in the same project where the model is found. An interpolated map with the model’s prediction, along with a report including all metrics of fit, is also produced to facilitate the evaluation of results. The trained Bayesian network can be modified and retrained as needed using Weka as integrated with k.LAB. When satisfactory results are obtained, the trained model can be used for prediction through the URN of the trained resource:

model luc.suitability as landcover:LandCoverType
	observing
		distance to infrastructure:Highway,
		distance to earth:Waterway,
		distance to earth:Coastline,
		geography:Slope,
		geography:Elevation,
		count of demography:HumanIndividual,
		earth:AtmosphericTemperature in Celsius;

The above model uses the trained Bayesian classifier to produce probabilistic predictions of land cover type. With probabilistic resources such as this, an uncertainty map can also be obtained by adding the uncertainty concept correspondent to the main output (using the uncertainty of semantic operator) as a secondary output if desired:

model luc.suitability as landcover:LandCoverType,
		uncertainty of landcover:LandCoverType
	observing
		....

Similar considerations apply to other learning algorithms such as those found in the rest of the Weka platform or other machine learning platforms like Google’s TensorFlow. The resource containing the trained model will link its inputs by name and data type, and can be published to a node for remote execution by any users just like any other resource. Similar considerations apply to the prediction of qualities within countable entities (subjects, events, relationships) that are part of the context, training a classifier using each instance and its attributes as a training sample instead of sampling a distributed dataset like in the example above.

The problem of calibration or data assimilation of numerical models can be handled in the same fashion, by linking appropriate algorithms to k.LAB. At the time of this writing, an interface to the open source OpenDA package is being investigated for future integration.

4.4. Sessions and outputs of contextualization

Within a k.LAB session, a user or application sets a context and observes as many concepts as desired. Observations that were already made in the context automatically resolve any subsequent query for compatible concepts. At any time, the user or application can set or unset one or more scenarios to affect the selection of models. A scenario in k.LAB is simply a namespace whose contained models become "visible" to the system only when it is explicitly activated: when a scenario is active, its models take priority over any others to resolve their observables, potentially using other models to complete observations in case the scenario is only defined to cover a part of the context. Using scenarios, the environment within a context may be interactively defined to reflect specific hypotheses. In interactive use (for example with k.LAB Explorer) it is possible to build observation sets that use different scenarios, incrementally defining a context that reflects any desired conditions.

A context always contains the complete history of observations made within it, including the metadata and provenance records for all resources and models used. As dataflows are resolved and contextualized, provenance records stored along with the knowledge will be extended with all logical steps followed to compute the corresponding observations, remaining available to form a complete record of how the information within the context has come into existence. All this information is available to the user in interactive, graphical form when using a k.LAB client, and becomes part of the set of downloadable artifacts accessible within a context. These include:

A complete dataflow that includes all the processing steps and resources accessed to compute every observation in the context. The k.LAB runtime uses a specialized language, k.DL, to encode dataflows in a concise and reusable way; the k.DL code can be visualized (as text or as a flowchart-like diagram) and saved to a resource to reproduce the computations as needed. When saving to a resource, the k.LAB Engine will compute the intersected spatial and temporal coverage of all resources and models involved, so that the dataflow can be saved along with the detailed geometry where the computations can be replicated.
Complete provenance information for all the resource and models used in the context. The k.LAB runtime adheres internally to the Open Provenance Model (OPM) conventions, which are central to the layout of the internal class structure. An API call to extract the OPM-compatible provenance graph for a context is expected in version 1.0.
A tree of observations, each of which can be downloaded to the file formats supported by the configured adapters according to the spatial and temporal dimensions in the context. For example, an observations of a numerical or categorical quality (state) can be downloaded to a CSV file if scalar or distributed only in time, to a raster map (e.g. GeoTIFF or ArcGIS format) if spatially distributed on a grid, or to an archive file with a map per timestep if distributed in both space and time. Observations of subjects (e.g. the lakes in the context) can be downloaded to database files, including ESRI shapefiles when the objects have a spatial coverage.
In lieu of individual observations, the user may request views that contextualize a specified concept and summarize the result in complex ways, such as tables or graphs. Such views also become part of the context along with all the observations made to compute them. These can be exported as spreadsheets or other appropriate formats. The table generation features in k.LAB refer to observables using pure semantics, enable flexible specification of aggregations and allow users or modelers to build sophisticated and complex reports with very short specifications. Tables are prominently used, for example, in Natural Capital Accounting applications such as ARIES for SEEA.
As models are computed by the system, a user-readable, structured report is generated and incorporated within the context. The documentation features in k.LAB rely on a simple template language that can be associated to models in k.IM code and allows modelers to link documentation templates to events that are triggered during contextualization (for example, initialization or termination) and to report sections such as introduction, methods, results and discussion. The k.Modeler IDE contains specialized support for writing and organizing documentation in k.LAB projects. By using the Markdown language supplemented with template directives, structured text can be inserted in the generated documentation along with figures, tables, cross-references and bibliographic citations. The k.LAB engine incrementally assembles the report as new models are contextualized, producing a unified document that can be tailored to the context and to the actual results obtained using conditional template directives and context-aware text substitutions. This feature enables the production of very complete textual reports that can be downloaded as PDF through the clients or the API.

The set of outputs obtained and visualized during a k.LAB session ensures the transparency and communicability of the results to a degree not yet seen in a modeling platform. In some situations, even the paths not taken by the resolver can be documented, which may be relevant when multiple resources with close rankings are available in resolution. The possibility of producing digitally signed artifacts (including all outputs, a report, dataflow and full provenance graph, plus — if needed — verifying and documenting the provenance and the peer review status of all resources and models involved) opens the door to the production and the verification of endorsed artifacts when the system is used to produce information from official institutional applications, or in situations when use of the result can have critical consequences in decision-making.

4.5. Extending the runtime system

The k.LAB Engines and Nodes can be extended at the software level to provide new adapters (interfacing to new types of data and services), contextualizers (interfacing to contributed algorithms to use within models), or other functionalities such as authorities. A mechanism to produce components that can be used as plug-ins uses well-defined and documented points of extension in the Java class structure, and is supported by Maven archetypes for convenient project setup, building and deployment. The design of the server components is highly modular, and each existing resource adapter, external package integration (such as the Weka machine learning software library) or functionality extension is written as a component that can be deployed to nodes and services. The contextualization runtime, which executes the resolved dataflows and can load them from a stored k.DL specification, can itself be swapped with an alternative execution runtime if wished, for example to support different runtime platforms (e.g. to run contextualizations on distributed file systems). The k.LAB default runtime is parallelized and multi-threaded, capable of handling concurrent sessions owned by different users and optimizing the use of RAM to enable large-scale simulations.

4.6. Integrating external models

Integration of k.LAB with existing models can proceed in two directions. By using the k.LAB API from within an existing model, the inputs of the model can be satisfied using semantic resolution, streamlining and simplifying data access from a largely unmodified model. By contrast, deep integration of a model into the k.LAB framework normally requires significant redesign, but can make the model and its components available to k.LAB users and other models as part of the k.LAB ecosystem, greatly enhancing its original value.

4.6.1. Integrating k.LAB into existing models

In this integration configuration, the REST API of a k.LAB Engine (or cluster of engines) can be used, after authentication, from within an independent application to enable the use of the k.LAB semantic network without integrating the application itself in k.LAB. Applications that formerly loaded their outputs from the filesystem, relying on configuration files or interactive forms, would at this point simply define the geometry of interest and the semantics for their desired inputs. This paradigm does not make the application itself or its outputs available to k.LAB users, and is therefore less valuable from an integration perspective, but it has a low adoption barrier and can constitute a first step towards a more integrative approach. At the time of this writing no language-specific client libraries have been written to ease the client use of k.LAB from, e.g., Python or Javascript applications, but the direct use of the REST API remains possible.

4.6.2. Integrating existing models into k.LAB

There are several ways to integrate existing models so that they become part of the k.LAB environment. From a comprehensive semantic interoperability perspective, the preferred strategy is to break down the logical data flow inside a model into components that describe each individual concept within the model, then revise each of these components as independent models. This provides the greatest return in terms of integration, by ensuring the full integration of any internal feedbacks and sensitivity to changing boundary conditions. However, this approach also requires the most work to rethink each model’s internal logic, as most models have been written with specific conventions, and even conditions of use in mind, which remain unwritten. This often mandates the generalization of the context of use of each model - for instance, generalizing a hydrologic model originally designed to run at an annual time scale to run on more flexible time steps while remaining faithful to the underlying physical processes. This may be difficult and time-consuming, particularly when the original implementation of the model is unclear, poorly documented, or logically inconsistent.

Preexisting models usually consist of highly connected networks of computations that are difficult to break into components to best fit an interoperable, semantic modeling paradigm. Yet, tightly defined and well-focused models can be used as "functions" when (1) their inputs and outputs are well-defined semantically, (2) data needs are clearly described, and (3) appropriate spatial/temporal scales for their reuse are provided. This is usually most convenient when their internal logic is complex and difficult to break up.

Three possible strategies to make pre-packaged models interoperable with k.LAB include:

Wrapping them into web services and connecting them to an API capable of mediating with k.LAB’s data transfer format. The model will be connected using the "remote" k.LAB adapter, which uses a REST API and can therefore be coupled to model services written in any language. This alternative requires little further work on the models themselves, but requires a "bridge" API for the host programming language to facilitate integration with the k.LAB interface. At the time of this writing bridge APIs exist only for Java, but those for other languages will be developed based on demand.
Isolating the core algorithms in the model and reimplementing them in code as contextualizers using the native k.LAB API. This middle-ground integration strategy neither reuses the original code as-is nor requires a full semantic annotation effort to fully integrate it. This approach is usually the easiest way to bring in existing logics without a major effort. As k.LAB takes care of I/O, data transformation and preparation, data flow, spatial and temporal addressing, and visualization, the rewrite usually only has to cover a small percentage of any original stand-alone model code, normally between 10 and 30%.
Creating a k.LAB contextualizer as an extension that gathers input from the k.LAB environment, passes it to the model for computation, and serves the outputs back. This does not require the mediation of a web service and thus entails more direct connections to the model code. The model may be connected at the code level, which is easiest in Java but can be supported by adapters for other languages. Alternatively, the model may be run as an external application, requiring no coding besides that needed to prepare inputs and gather outputs (this strategy is likely to be computationally inefficient, particularly for dynamic models that require independent runs over multiple time steps). Running as an external application may prove impossible when internal feedbacks must be connected to boundary conditions handled by the k.LAB environment, and while tempting because of the low development barrier, these kinds of solutions tend to have a limited useful life.

Overall, strategy 1 is the most generalizable solution (i.e., more bridge APIs would facilitate the integration of more external models with k.LAB). Strategy 2 is a practical solution when a smaller number of models are targeted for integration. Strategy 3 is the most ad hoc, with several key limitations; as such it can be seen as a generally less desirable strategy.

5. The reactivity layer: behaviors and applications

The semantic modeling approach discussed so far is designed to construct simulated worlds, using the logical descriptions provided for best available networked data and models. The observations that compose these worlds can be construed as the outputs of the underlying modeling, and will incorporate any dynamic behavior that can be stated along with the logical description in k.IM models and contextualizers - typically, process models whose behavior is specified in advance. While many phenomena can be described satisfactorily within this paradigm, others - namely, those where events triggered by specific conditions cause modifications in the structure of the system - can not. Addressing these aspects of agency and reactivity is the purpose of the k.LAB reactivity layer.

The reactivity layer contains a collection of behaviors, i.e. specifications of how any agent (the observations in a context, the context itself, or even the k.LAB session or the user owning it) can react to conditions that come to pass during the course of contextualization. The reactivity layer is key to developing complex, distributed agent-based models that are fully semantically aware, and allows modelers to build interactive visualizations and applications when the behavior is applied to a session. All behaviors take the form of code specified in the k.Actors language, supported by the k.Modeler IDE and used to define behaviors for observations, test cases, batch computations, UI components and interactive applications.

The k.Actors language has a simple, minimal syntax that belies a complex and powerful model of execution. Both k.IM and k.Actors draw their syntax from the English language; if the k.IM language is concerned with representing what observations are and how they are computed, k.Actors is concerned with representing how they behave. For this reason, the linguistic realm of k.IM is that of nouns, adjectives and adverbs, while k.Actors deals mostly with verbs. Compared with k.IM, which is optimized to be usable at the simplest levels by modelers with minimal programmin experience, k.Actors reads less like English and is more suitable to experienced programmers. An annotated example is provided below, with no in-depth discussion, to give a flavor of the language:

behavior demo.restaurant
  "Invite a friend to dinner and if accepted, choose a restaurant in the context"

// the main action will be triggered when the behavior is loaded
action main:
  invite("friend@email.com"): "OK" -> choose({infrastructure:Restaurant}): reserve($)

action invite(friend):
  email("Hi, shall we go out for dinner tonight?", address=friend):
    answer -> sentiment.classify(answer, {im:Outcome}): (
        {im:Positive} -> email("Great", address=[answer.replyAddress]), "OK"
        {im:Negative} -> email("Sorry", address=[answer.replyAddress]), "NO")

In the code above, two actions are defined, each composed of one statement that calls other actions and specifies a chain of events triggered when each of them "responds" (fires). In action main, the verb invite is called, passing an email address as a parameter. The call, defined later in the code, results in an email being sent and its response being processed, eventually firing back a status code ("OK" or "NO") to the calling action. The OK code triggers the choice of a restaurant in the context and its booking when found.

In k.Actors’s concurrent mode of execution, actions may cause events (fire) zero or more times, and those events can be captured by the code that called the action using the : and → operators. When the runtime executes the code, it starts each action and immediately moves on, without waiting for it to fire unless synchronous execution is forced. If the ':' operator follows the call, the actor running the behavior readies itself to process events fired by it, whenever they happen, which may be any time (or never) as long as the actor is "alive". The data associated with the event are matched to the expression that precedes the arrow operator →, and if the match succeeds the code following the operator is executed.

In the simplest cases, behaviors written in k.Actors can be directly bound to the observations created by models using k.IM code:

@bind(city.demo.behavior, select=[self.population > 100000])
model each klab:osm:point:city as infrastructure:City;

which will bind the city.demo.behavior behavior to any city whose population is higher than 100,000. Behaviors can also be bound to observations by actions in other behaviors, based on semantic type or other conditions, or directly from within code specified in k.IM models.

In the forthcoming version 1.0 of k.LAB, observations that are part of contexts in remote k.LAB Engines will be accessible by prepending the URL of a connected context to the identifier of each observation; this opens the door to distributed real-time simulations whose agents can affect each other remotely. The paradigm of distributed, collaborative modeling enacted through the semantic layer can therefore, through the reactivity layer, extend to one where already initialized simulated worlds can interact with each other, building large-scale, multi-server simulations that can track events happening at each side. Institutions with expertise in tracking and predicting real-world phenomena of a particular category can make their digital "peers" available for other models to use. In the reference k.LAB implementation, the actor facilities utilize open source technical solutions originally developed for the Internet of Things, capable of handling the functionalities described to build an "internet of observations" in support of real-time, better informed decision.

5.1. User-side applications

Within the k.LAB runtime, the software "agents" capable of receiving a behavior are not only the observations built within sessions — the sessions themselves and the users that own them are as well. This opens the door to the application of behaviors for purposes beyond the modeling of agents within simulations. In particularly, when a behavior is applied to a user session, the session can be seen as an application whose actions are initiated by users through client software, the consequences of which can trigger observations or other events as required by the application logic. Coupled with the ability of k.Actors to interact with the runtime and use semantics for queries, this feature enables fast and intuitive building of user applications in k.Actors. The k.LAB Explorer web client is equipped to respond to specialized action verbs by creating user interface components (e.g. buttons, text fields, lists); users interacting with these components will "fire" events that are sent back to the k.Actors runtime for processing. The resulting interactive application is typically very quick to build. For example, the following code

app example.ui.minimal
  "A simple demo of UI definition with k.Actors."
   description "This application demonstrates some basic UI widgets and interaction with the
                k.LAB runtime environment. An 'app' is a behavior applied to a k.LAB session."
  style default with #{
    font-size: '0.85em'
  }

@left
action main:

  set outputs []

  %%%
    **Markdown** and HTML text widget between matching percent markers (\%\%\%).
    Write any *markdown* in this field to show formatted text in the UI. The :scroll
    and :collapse attributes control the appearance.
  %%% :scroll :collapse

  /*
   * Groups in parentheses become divisions in the UI and can be styled with layout
   * attributes, titles and other properties through metadata
   */
  (
    button("Set context to France and observe Elevation in it" #fr):
      context(im.countries.france):
        france -> france.observe({geography:Elevation}): (
          outputs.add($)
          fr.disable
        )
    button("Observe vegetation C storage in the current context" #veg):
      submit({ecology:Vegetation chemistry:Carbon im:Mass}): (
        outputs.add($)
        veg.disable
      )
    ) :hbox :name "Sample observations (click to observe)"

  /*
   * a final button enables downloading all the observations accumulated when pressing the
   * buttons above.
   */
  button ("Maps" #mapdownload :tooltip "Download all observations as a zip file"): (
    mapdownload.waiting
    pack(outputs): (
      url -> (
        mapdownload.reset
        download(url, filename="data.zip")
      )
      error -> mapdownload.error(:timeout 1000)
    )
  )

creates a demonstrational application, not explained in detail here, that will show buttons to make observations and collects the results in an array so that the corresponding data can be downloaded in one action. The UI will appear in k.LAB Explorer. Using modular UI components also defined in k.Actors, interfaces such as ARIES for SEEA can be build by minimally trained programmers in a short time (the code for the ARIES for SEEA application at the time of this writing is only about 300 lines long), making sophisticated modeling services immediately available to users and decision makers with very little effort.

In addition to these usages, k.Actors is used as a scripting language to automate repetitive tasks (for example to build global high-resolution map outputs describing a single observable, by computing it in multiple local contexts with fully customized model resolution in each) and to build test suites for all aspects of k.LAB.

6. Current status

The k.LAB software stack is currently in version 0.11; feature-completion and API stability are not guaranteed until version 1.0 is reached. Depending on funding and community uptake, this state is expected to be reachable by roughly 2022 to 2023. k.LAB’s current status can be briefly summarized as follows:

The software can be considered at production levels for the functions that support applications such as the general k.LAB Explorer for the ARIES project and specialized applications like ARIES for SEEA. Visualization and reporting are at near-feature completion for current uses.
Installable containers for k.LAB Nodes and k.LAB Engines are well-developed and used regularly, although few partner nodes besides the central team and the UN exist, and frequent upgrades are necessary for the time being.
Feature completion is at about 90% relative to the 1.0 specification, which is enough for current applications. Further work remains to support full-scale agent-based modeling, real-time applications and other types of use.
Resource adapters are available for most important data formats, services and protocols. Assisted user interfaces to contribute data and models are limited for now to the k.Modeler modeling environment, which is functional but not suitable for non-technical users. More data submission methods and interfaces are in development with planned completion in 2021 to support use by countries and institutions involved in ongoing projects.
The REST API is currently optimized for application and use "within" the system using its own clients based on k.LAB Explorer. More discussion will be needed before a stable, independently usable API specification is published.
Besides an initial grant from the US National Science Foundation, k.LAB has seen a limited but reliable funding stream for its development, with low- to mid-levels of financing but a relatively high stability. The current preference is for a partnership model, with partner organization providing moderate but consistent in-kind or financial contributions over time and participating to decision-making, rather than large individual grants, as continuity and talent retention are more important to guarantee ultimate success than large investments of one-off funding.

Technical inquiries on k.LAB should be addressed to info@integratedmodelling.org.

1. See Villa F, Balbi S, Athanasiadis IN and Caracciolo C. Semantics for interoperability of distributed data and models: Foundations for better-connected information for (slightly outdated) details on the phenomenological model underlying k.LAB’s semantics.

2. The biology:Eucalyptus species identity, used here for simplicity, would in reality be handled through a taxonomic authority: see the Authorities section for details.