Development of CDO Ontologies

This page houses practices followed for development of CDO ontologies. It is the result of needs identified over the early years of CDO ontologies’ socialization and software review processes. Before the 1.0.0 release of the CDO ontologies, it is likely practices on this page will continue to adapt as workflows are refined.

Review checklists

(Link)

GitHub repositories for CDO ontology development use the following checklist templates for coordinating the issue’s progression with the respective ontology committee. To enable GitHub progress-tracking (based on checkbox counts), these templates are inlined as edits into the initial Issue or Pull Request description, as an edit by OC Chair or Coordinator.

Due to a display issue with website font colors, the checklists are presented in the CONTRIBUTE.md file in the website’s source repository, here.

Branching

GitHub repositories in CDO follow two branching practices:

“Git-flow” branching

(Link)

The “Git-flow” branching model used by CDO is based on the description by Vincent Driessen, dated 2010-01-05. Repositories following this branch model generally expect most development to be done in “Feature” branches, branching off of develop. The “Primary” branch (typically named master or main) designates releases with tags and the GitHub release-list interface.

In this branching model, pull requests should target the develop branch, not the primary branch.

The head of the primary branch is typically the current release. There may be some non-release commits made on the primary branch due to needing to program components of GitHub interface elements.

“Continous-release” branching

(Link)

This branching model is used for repositories that do not designate releases. The head of the primary branch (master or main) is the “Current release.”

In this branching model, pull requests should target the primary branch.

Testing

Testing prereleases

(Link)

The CDO ontology Git repositories (including CASE’s ontology repository and UCO’s) follow the “Git-flow” branching model. There is additional consideration put into processing the develop and feature branches:

Part of the testing process for the ontology is assessing impact of proposals, across tooling and existing example data. To assist with this review, CASE provides “Prerelease” ontology builds, available here:

These are monolithic and syntax-normalized builds of the CASE and UCO ontologies. Their states are used to review each of the CASE examples, and their validation SHACL results are stored as files alongside the examples’ source materials. For instance, here are the current validation results for the website’s Asgard example:

CASE-develop.ttl is built according to develop branch states of CASE and UCO, and thus incorporates all of the proposals that are committee-approved and staged for the next release. CASE-unstable.ttl follows an implementation practice that is, well, unstable: Most, or all, proposals under consideration are merged into one branch, before committee review or approvals that would see the proposals merged into develop.

CDO ontologies maintain a -Archive Git repository (CASE’s, UCO’s), whose primary functions are these:

  1. Serve an unstable branch that represents every proposal under committee consideration.
  2. Store an archive of every prior state of the unstable branch.

(The -Archive repositories are not “GitHub forks”, in order to prevent interface confusion from Pull Requests. They do, however, share the master and develop branch histories.)

What makes the unstable branch worth its own archive repository is the branch will not guarantee preservation of its own Git history. Feature branches split off of develop, but do not have a single stable joining point in the ontology development repository where all of their effects can be considered in aggregate. Yet, it is important to discover when in-flight proposals might conflict with one another, and sometimes this is only visible when considering all at once. Meanwhile, the order in which these branches are tested might not be the order in which they are voted upon and accepted into develop by the committee.

The unstable branch will be reset to develop, by the Ontology Committee Chair, Coordinator, or Product Manager, with some left-undefined frequency. To ensure access to prior states of the unstable branch, the -Archive repository will maintain named branches at the time of reset, e.g. archive/unstable-2022-04-01. (This can benefit users who test by using Git submodules and/or Git Bisect, rather than website downloads. The CASE-Examples repository and CASE website both track with submodules.)

To summarize, if a developer wishes to test against some “Prerelease” state:

Profiles

(Link)

The ontologies within CDO, including UCO, are designed as “mid-level” domain ontologies, generally but not entirely scoped within the cyber domain. A “mid-level” ontology is distinct from “top-level” (aka “foundational” or “upper”) ontologies. The rationale for being “mid-level” has been to avoid excluding other potential ontological alignments that exist as independent efforts modeling other domains, such as provenance. Because top-level ontologies are generally not compatible with one another (“Foundational” typically being a distinct status within a knowledge model), to adopt one top-level ontology potentially declines interoperability with another and all adopters of the other. Similarly, other ontologies that do not consider themselves “top-level” are not necessarily compatible with any “top-level” ontology that might be adopted.

CDO ontologies have need of adopting existing efforts in other domains, especially when there is a demonstrated need for something that is adjacent to the cyber domain, such as photographing physical objects. UCO can provide description of the camera; CASE, the photograph-subject’s relevance to an investigation; but, neither CASE nor UCO have, say, the class of Motorcycles as photograph-subject in their scope, nor the photograph’s location being “near” this particular conceptualization of the Washington Monument.

UCO can explore alignment between, say, uco-location:Location and GeoNames’ Feature, but should not do so at the expense of other geospatial representations, such as GeoSPARQL 1.1’s Feature, or BFO 2.0’s spatial region. To explore alignment, CDO ontologies are using “Profile” repositories on Github.

Profiles serve three use cases, which have different strategic objectives:

Though the objectives for each of these use cases differ significantly, the overall implementation method remains consistent for the three, except for the mimicking profile declining to relate UCO to the external ontology with subclassing.

Each “Profile” repository follows this pattern:

These repositories can be brought together to review how well current examples adhere to the profiles’ ontological alignments, whether by confirming graph-individuals’ disjointedness through RDFS expansion, or through a consistency review through OWL-DL expansion. (A Github repository attempting this is currently under development.) Bringing these profiles together is one reason the CDO class is a subclass of the external class, rather than an equivalent class. One of the objectives is to explore whether multiple profiles reveal an inconsistency in unrelated ontologies, when exercised in a CDO example. (The other reason equivalent-class designations are avoided is to avoid inappropriate scope-expansions of CDO rules within adopters’ knowledge graphs, such as individuals under UCO hierarchies generally being urged to end with UUIDs.)

These repositories are each designated as “Exploratory”. Their contents are neither official, versioned beyond Git commit mechanisms, nor subject to Ontology Committee workflows for revisions. They are expected to change as modeling needs are demonstrated through new class, property, and example development. Those wishing to adopt a Profile are encouraged to do so using a Git submodule. Contributions or requests for alignment explorations are welcome.

Shapes

(Link)

Among the objectives within CDO repositories are interchange, and semantic interoperability, of data. One of the steps in establishing semantic interoperability is having clear syntactic rules of usage of various ontologies’ predicates.

Within CDO, ontologies make extensive use of SHACL to simultaneously define concepts and their requirements of usage within graphs. Many other ontologies exist based on OWL, which provides a language for knowledge expansion, but fewer mechanisms than SHACL for data validation. Some invalid data can be recognized with OWL, such as declaring an individual to be a member of two disjoint classes (which translates to a logical inconsistency that the Empty Set has something in it). But certain requirements can’t be tested with OWL due to its “Open world” model, such as minimal cardinality of a property unrelated to other classes or properties: if an individual is specified by OWL to have a certain property used exactly once (such as a ex:Screw having exactly one integer ex:threadCount value), and the graph doesn’t have it, OWL assumes the value exists, but just isn’t specified. SHACL considers this a data error. OWL doesn’t consider 0 distinct specified values a data error, but does consider 2 distinct values to be a data error.

Other ontologies encoded in OWL often provide exploration and coverage of concepts that are not in the scope of CDO outside of specific applications, or that might be close enough to fulfilling a CDO core need that the ontology is considered for adoption by one of the CDO ontologies. CDO applications wishing to use that external ontology often still have a desire or requirement to provide data validation capabilities, which might not be directly available if the external ontology does not provide a “closed world” rule set. Worse, if there is an error in the OWL syntax provided by the external ontology—even one inconsequential to most of the model—OWL ontology validators have the potential to fixate on that error and possibly obscure errors that are more directly relevant to end users among their knowledge graphs’ nodes.

To bridge potential understanding gaps, and to better understand external ontologies, CDO provides SHACL shapes for external ontologies when they fit some need anywhere within the CDO ecosystem. SHACL shapes are provided as a set of repositories on GitHub, each scoped to a particular external ontology. Because SHACL can provide validation rules for any RDF, the shapes are not necessarily limited to OWL ontologies - shapes are provided for the OWL language itself, having been started for UCO Issue 406; and some shapes have been explored for non-OWL RDF schemata, having been started in CASE-Corpora.

The shape repositories can be found on GitHub or on the CDO Project Release Flow diagram, searching for “-Shapes-”.

Practices

The shape repositories follow these practices:

The shape graphs follow the below practices.

Notes

At this time, no programmatic support is provided to convert an OWL ontology into a SHACL graph. Some procedures are known to be algorithmically specifiable—for instance, most of the above list is likely scriptable. However, at least one community member’s experience has found defining SHACL shapes to be a beneficial exercise in actively reading ontologies, as well as finding challenges in defining rules pertaining to some OWL constructs.

Last, the shape graphs have a goal of not needing to be provided by CDO. Data validation is a significant, general need in workflows. Those best suited to provide validation rulesets for an ontology are the maintainers of that ontology. Should a non-CDO ontology maintainer become interested in adopting and incorporating a CDO shapes graph and/or test suite into their software, CDO welcomes this opportunity for exchange and transfer of knowledge.