Methodology for ontology design and construction

In this paper we present a methodology for ontology design and construction which incorporates the most outstanding design principles and a thorough evaluation process. An ontology provides logical formulation of complex problems of decision sciences like risk management, decision making under uncertainty, statistics and forecasting, negotiation and financial analysis. The main stages of this methodology are: requirements specification, formal design, construction


Introduction
For more than two decades the term ontology has been acquiring greater interest among researchers, academics and professionals from different areas of knowledge who have seen the need to use, design or apply ontologies as a solution mechanism to their information systems requirements.The term ontology whose origin is found in Philosophy was adopted by the research community of Artificial Intelligence to formally describe domains of knowledge.Many definitions of the term ontology have been proposed, for example Neches et al. (1991) stated that an ontology "defines the basic terms and relations comprising the vocabulary of a topic area as well as the rules for combining terms and relations to define extensions to the vocabulary".Latter Gruber (1993) defined "An ontology is an explicit specification of a conceptualization", in other words, an ontology is a model of a domain (conceptualization) that is explicitly described (in the form of a specification).Lassila and McGuinness (2001) defined the mandatory properties for an explicit specification of a conceptualization to be considered an ontology: finite and controlled vocabulary, not ambiguous interpretation of classes and relations between terms, and strict hierarchies of the relations of subclasses between classes.
In summary, to facilitate the understanding of the term ontology two descriptions are utilized: from the constitutive elements of ontologies, and from the characteristics that an ontology must present. a.
An ontology defines a set of concepts or classes (vocabulary or conceptualization), the taxonomic relations (hierarchical) between those concepts, the semantic relationships between concepts, individuals or instances, and axioms. b.
Likewise, an ontology must comply with the characteristics of being a formal (expressed in a formal language), explicit (concepts should be made explicit through axioms) and a represent a shared specification (based on the consensus of a group of experts).
Another important aspect of ontologies is the possibility of executing reasoning and inference to produce new concepts or semantic relationships, this is possible due to the formal logic languages that underlie the representation of ontologies.Therefore, an additional element of ontologies is the set of inference rules for knowledge and information gathering, and the set of axioms that produce new definitions through reasoning engines.
Currently there are many professionals related to engineering who have faced the need to build a knowledge base based on ontologies without having a good methodological guide to support them.In this way, many engineers and solution integrators have faced the need for an easy-to-implement methodology that allows them to build quality models that solve very specific industry problems.
The construction of ontologies covers methods, techniques and design principles that have been proposed to support the efficient design of ontologies.The implementation of a methodology for efficient ontology construction aims at producing ontologies that are usable, reusable and easy to maintain.Gómez-Pérez (1999) clarified that a methodology is composed of methods, techniques, processes and activities.Accordingly, in this work the methodology for ontology design and construction is defined as an ordered series of phases that specify the procedures used in the engineering of an ontology or ontology system.
The rest of the paper is organized as follows: the next section describes the proposed methodology and its phases; the third section the application of the methodology for medical diagnosis support is presented as a case of study; comparative analysis of a collection of related methodologies is discussed in fourth section.The final section summarizes the main conclusions.

Proposed Methodology
In this section, a comprehensive ontology design and construction methodology is described, its main goal is to assist ontology developers by providing methods and techniques to support ontology construction adhering to design principles.For the purpose of this work, an ontology system is a global ontology that imports individual ontologies which are semantically related between them inside the global ontology.Figure 1 shows the phases and procedures of the proposed methodology.Quality-oriented.Quality requirements are proposed at the beginning of the design process, these quality requirements are used also at the final phase of evaluation, to verify accomplishment.
This proposed methodology is defined to work with teams integrated with a group of knowledge domain experts, a group of computer programmers and analysts with experience implementing ontologies and applications that exploit ontologies.And one or two ontology engineers.

Ontology Requirements Specification
The objective of the specification of requirements is the identification of the scope of the ontology, the definition of possible scenarios, users, the competence of the ontology, and the quality characteristics that it must attend.In order to specify the requirements of the ontology, the following tasks must be executed: a. Specify the motivation of the ontology, clarifying the possible scenarios, users and applications that will benefit.

b.
Specify the competency of the ontology by consensus within a group of experts in the domain of the ontology and ask them to generate a list of competence questions, that is, questions that they want the ontology to answer.The ontology engineer together with the group of domain experts produce a list of competency questions.Such competency questions are generated by asking the group of domain experts to enunciate direct questions that they expect the ontology system will be able to answer once it is implemented and in production.The list of competency questions will also be useful for the final ontology evaluation.

Ontology Design
The second phase of the methodology aims at producing a formal design of the ontology.This phase consists of the following procedures: term elicitation, ontology modules identification, individual ontology design and formalization using Description Logics (DL) notation.DL´s are formal languages designed for knowledge representation and reasoning; DL´s are decidable fragments of First Order Logic (FOL). a.
Term elicitation consists of producing a seminal list of terms that are relevant for the particular domain of knowledge.In order to achieve this goal, the following activities are defined: using the list of competency questions, the ontology engineer identifies the elementary concepts (nouns) that are required for the representation of the ontology model.

b.
Modules identification consists of deciding the set of individual ontologies that will conform the ontology system.In order to produce this set of ontology modules, it is necessary to group similar terms by using as input the list of terms, the ontology engineer together with the group of domain expert's workout the clustering of terms.This clustering consists of putting together all domain-related terms and assigning a class identification with its name in singular.The resulting list of classes represents the set of ontology modules.Each group is then converted into a single ontology, aiming at separating groups of different domains allowing that each individual ontology can be reused for other applications.
As a result of this phase a global scheme of the set of individual ontologies that will integrate the system is obtained.

Ontology Construction
The objective of the third phase of the methodology is to code all ontology modules by using an ontology editor and a standard language, and integrate all modules into an ontology system.For each ontology module the following activities are defined: c.
Modules integration consists of importing the set of individual ontologies into the ontology system.Once all individual ontologies are imported into the ontology system a consistency checking is mandatory to verify that the integration of individual ontologies do not produce any contradiction.To define the semantic relations between concepts in the integrated ontology, the following activities are done: I.
Ontology system properties definitions.Using the initial list of competency questions and the list of verbs, the ontology engineer should identify and create all necessary object properties between concepts from the different ontologies.

II.
Ontology system axiomatization.Once the integrated ontology system is available, as well as the properties between objects, the creation of axioms and specific restrictions is carried out.

III.
Ontology system population.Finally, the ontology system is filled with individuals that require semantic relationships between ontologies.This final step is used also for evaluation purposes, correction of object properties and axioms is done if necessary during this step.

Ontology Evaluation
According with Gómez-Pérez (1994), Ontology Evaluation refers to the correct building of the content of ontology, ensuring that its definitions correctly implement the ontology requirements and competency questions.The goal of ontology evaluation is to prove compliance of the world model with the world modelled formally."Evaluation means to judge the ontologies, their associated software environments and documentation with respect to a frame of reference during each phase and between phases of their life cycle".Two important aspects are used for evaluation: the competency of the ontology and the quality requirements. a.
Competency of the Ontology.Gruninger and Fox (1994) proposed six characteristics to evaluate a Business Model.These characteristics were proposed to answer the question of "How can one determine which model is correct for a given task?"To give a guideline on the operation of these characteristics, the authors define the concept of competence of the model as follows: "given an appropriately instantiated model and a demonstrator of theorems, the competence of a model is the set of questions that the model should be able to answer".Based on this definition, it is possible to state that the competence of an ontology model is the set of questions that the ontology is capable of answering.
Evaluation of the competency of an ontology system is crucial to verify that a representational model is complete with respect to a given set of competency questions.Evaluation of the competency of the ontology requires the translation of all competency questions into Description Logic (DL) axioms to assure that the ontology system can answer the initial set of competency questions.
b. Quality Requirements.Over the years many researchers have presented and discussed ontology design principles as objective criteria which represent guidelines for the design and evaluation of ontology models, Gruber (1993).Therefore, the quality of an ontology model can be measured as the degree of compliance it has with respect to established design criteria.There are many ontology design principles reported in specialized literature, among others, the following design principles have been selected for the design and construction of the ontology model reported in this paper:

I.
Clarity.According with Gruber (1993), the ontology should communicate the intended meaning of defined terms.Definitions should be stated in formal axioms, and a complete definition (defined by necessary and sufficient conditions) is preferred over a partial definition (defined by only necessary or sufficient conditions).However, the incorporation of complete definitions has a very high cost in relation to memory usage and computing resources.Therefore, this criterion should be carefully considered due to the trade-off between clarity and performance.

II.
Coherence, this principle is also referred as soundness or consistency.Coherence specifies that ontology definitions should be individually sound and should not contradict each other.
Accordingly, the coherence of an ontology can be verified by executing the consistency checking of any reasoner program.

III.
Modularity, modularization of ontologies is defined as the task of decomposing an ontology into independent disjoint skeleton taxonomies.Regarding modularization of ontologies, Rector (2003) stated the importance of modularity as a key requirement for ontologies in order to achieve reutilization, maintainability, and evolution.
Quality evaluation is done by consistency checking to verify that none of the class definitions and axioms has logical contradictions, nor the individuals' instantiated into the ontology.This final activity consists of executing the reasoning tasks of taxonomy classification, compute inferred types, and consistency checking.

Case of Study: Ontological Model for Medical Diagnosis
In this section we describe a case study of the application of the proposed methodology.The objective is to design and implement an ontology system to support the decision making during the diagnosis of medical diseases.

Ontology Requirements Specification
After a series of interviews with a group of doctors (physicians), programmers and a main knowledge engineer, the following requirements were specified: a.This ontology model aims at supporting the general medic (Physician) to review of signs and symptoms of the patient with some disease, and to support the decision making regarding the additional laboratory tests to determine the diagnosis and adequate treatment considering the particular characteristics and profile of the patient. b.
As a result of the meetings with the group of experts, the following list of competence questions was defined:

Ontology Design
In this sub-section we describe the formalization of the main ontology concepts decided for the medical diagnostic requirement.First we introduce the set of terms, a.
Term elicitation consists of producing a seminal list of terms that are relevant for the particular domain of knowledge.The elicitation of terms is done using the list of questions and identifying nouns and verbs.Nouns represent the starting set of concepts that may be used as candidate ontology modules.During term elicitation, there is no need to observe grammatical structure of sentences, only select nouns.The list of terms should be without repetitions and in singular.During the final evaluation of the resulting ontology, this seminal list of terms is used to verify concept coverage (see Table 1).Modules identification consists of defining the set of individual ontologies that will conform the ontology system.In order to produce this set of ontology modules, it is necessary to group similar terms by using as input the list of terms, the ontology engineer together with the group of domain expert's workout the clustering of terms.The resulting global scheme of the ontologies that will integrate the ontology system is shown in Figure 2.  3) The ontology of medical diseases is used to represent the taxonomy of diseases according to the international classification of diseases (CIE 10). Figure 5 shows the class hierarchy of the Disease ontology.

4)
The Medicament ontology is intended to represent the full range of substances and drugs used in the treatment of diseases.Every member of the Medicament class must have an active substance, a presentation, a route of administration and a medical interaction with other medicaments.

Ontology Construction
In this sub-section we describe the implementation of the ontology modules and their further integration into the ontology system. a.
Ontology modules implementation.The ontology modules were implemented using the Protégé ontology editor, and represented using the standard Web Ontology Language (OWL).The ontology model consists of a set of imported, independent and auto-contained ontologies among which a set of semantic relations are defined for particular purposes.

Ontology Evaluation
The goal of ontology evaluation is to prove compliance of the world model with the world modelled formally.a.
Evaluation of the Competency of the Ontology.The ontological model reported in this work considers the following conceptual dimensions: patient, medicament, disease, sign and symptom, antecedents, and diagnostic tests.All these dimensions correlate and intersect at particular points.In this sub section we describe two scenarios to evaluate the competency of the ontology system reported.
Scenario 1. Patient Maria Hernandez is 56 years old, is overweight, has blood pressure of 140/90, her family history includes heart disease and visited the doctor because she has a strong pain in the chest (see Figure 9).Recommended tests: lipid panel, triglyceride, and troponin.Probable diagnostic is: heart disease.
In order to inference the tests recommendations, the following rule is fired:    Evaluation of the Quality.To evaluate the quality of the ontology system, the following design principles were considered: clarity, coherence and modularity.

1.
Clarity.In order to address this design principle, the ontology concepts were implemented using formal axioms, and when possible complete definitions were created, complete definitions are represented as formal equivalences.However, there were various cases where the incorporation of complete definitions made the ontology to expand in memory demand.Figure 12 shows the main concepts defined for the integrated ontology system.Complete definitions were defined for: Disease, Ethnicity, Family Planning Method, and Medicament.Modularization of the ontology model reported in this paper was achieved by dividing the model into six independent ontologies.Each of these ontologies is domain independent and is reusable.
For instance, the Patient ontology can be reutilized and maintained by the social services department and several applications could be useful, such as patient remote assistance, a system to control and supervise drug administration to patients.
The most important design principles were considered and verified through Protégé tools such as reasoners and DLQuery.Ontology consistency checking was executed to verify that none of the class definitions and axioms had logical contradictions, or the individual's instantiated into the ontology.

Analysis of Related Methodologies
An analysis of the most cited and popular ontology design methodologies is described in this section.The analysis of ontology building methodologies focuses specifically on reviewing whether methodologies consider the following characteristics: modular-design, domain-oriented design, usercentered design, the type of development process (iterative and/or incremental), the incorporation of design principles, and competency-based evaluation.
Lenat and Guha (1989) described the methodology they followed to build the Cyc ontology at the Microelectronics and Computer Technology Corporation.The Cyc ontology represents an important attempt to codify a large amount of common-sense knowledge.Authors describe a method composed of three processes to build the ontology: manual extraction and codification of knowledge, computer-human aided extraction and codification of knowledge and full computer aided codification of knowledge.
The methodology of the Cyc project is not oriented to answer a determined collection of competency questions, it rather covers a wide spectrum of common-sense knowledge.
In 1995, Grüninger andFox (1994) and(1995) described the process of building an ontology for the Toronto Virtual Enterprise (TOVE) modelling project.The TOVE ontology was constructed with the objective of representing a common-sense enterprise model.The methodology defined the following procedure: Describe a motivating scenario, define the competency questions, define the terminology of the ontology (objects, attributes and relations), and specify the definitions and constraints on the terminology specification, and evaluation of the ontology by completeness theorems.
The specific ontology design principle that authors address with this methodology is ontology completeness based on competency questions.Even though that authors describe a two ontology example (composed of Activity and Organization ontologies), they do not provide specific guidelines for the application of ontology for design principles.Uschold and King (1995) presented a methodology for developing and evaluating ontologies, later refined by Uschold and Grüninger (1996).This methodology included the following stages to build ontologies: identify the purpose, build the ontology -ontology capture, ontology coding and integration of existing ontologies-, evaluation, and documentation.They incorporated a scoping process during the ontology capture stage which includes grouping.Grouping was defined as the task of structuring terms into work areas corresponding to naturally arising sub-groups.In this sense, grouping is the closest concept to the ontology modularization.However, the Uschold and colleague's methodology does not cover integration guidelines for networked ontologies or ontology systems.Ontology evaluation is not clearly defined nor practical guides or techniques are provided.Bernaras et.al (1996) presented a methodology for building ontologies under the project kactus.The main objective of authors when proposing this methodology was to evaluate the feasibility of knowledge reuse in complex systems.Their methodology stated the following general processes: provide a specification of the application to know the application context and a view of the components; produce a list of terms and tasks; make a preliminary design based on the relevant top-level ontological categories taking as input the list of terms and tasks developed during the previous phase; augment the domain concepts and relations identified during the previous step; refine and structure the ontology in order to make a definitive design.The KACTUS methodology main objective is very similar to the approach presented in this paper, which is to promote ontology reuse throughout the ontology design process.This methodology does not describe a clear evaluation process.
The methontology framework was presented for the first time by Gómez-Pérez (1996).It was later refined and detailed by Fernández et.al (1997) and(1999), Gómez-Pérez (1998), to facilitate the construction of ontologies.methontology defines a development process and life cycle which consists of the following activities: specification, conceptualization, formalization, implementation, and maintenance.Of special interest is the conceptualization activity which sets the following tasks: to build the glossary of terms, to build concept taxonomies, to build binary relation diagrams, to build the concept dictionary, to describe in detail each binary relation, to describe in detail each instance attribute, and to describe in detail each constant.The conceptualization phase of this methodology involves the documentation of intermediate representations that turns out to be an excessive and tedious work for very large ontologies.
CommonKADS (Schreiber, 2000) is an important methodology for knowledge-based systems construction, not specifically tailored for the design and construction of ontologies.However, CommonKADS is a methodology which includes the concept of model components for reuse.This The NEON methodology (Suárez-Figeroa, 2010) is based on the use of ontology design patterns (ODP).This methodology establishes the reutilization of ontologies from a public ontology repository and a set of known ontology design patterns to integrate them by means of a re-engineering process.The general steps defined in this methodology are: 1) identify requirements, 2) identify available design patterns, 3) divide and transform the selected problem into partial problems, 4) match selected partial problems with ontology design patterns, 5) select the design pattern, 6) apply selected patterns to make a composition, 7) evaluate partial designs solutions, and 8) integrate partial solutions.The NEON methodology depends on the existence of a repository of ontology common problems and a collection of ontology design patterns associated with general use cases.When the users of the ontology specify the set of competency questions at the beginning of the methodology, these questions need to be associated with the general use cases.Authors propose the utilization of additional tools to support the end users in the validation and correlation of competency questions.
According with Gangemi and Presutti (2009) a design pattern provides a modelling solution to a frequent ontology design problem.Authors define Content Ontology Design Patterns as small ontologies that mediate between problem types and design solutions and are used as modelling components to the extent that a new ontology can be built from the composition of multiple components.As can be seen in Table 2, the most complete methodologies are METHONTOLOGY and NEON.However, METHONTOLOGY generates an overload of effort for the documentation during design and development of very large ontology systems.The NEON methodology depends on the existence of a repository of ODPs associated with general use cases, end users of the NEON methodology will require extra effort to identify the general use cases that address their particular requirements.

Conclusions
A main problem in Decision Sciences is the formal representation of complex problems with large volume of data, the methodology presented in this paper provides ontologies design principles that could be applied to construct ontological models and formulate scenarios to achieve improvements in decision making process.
This paper promotes the reutilization of ontologies by implementing ontology modules from the beginning of the ontology design, ontologies are seen as reusable modules not as general design patterns.The idea behind this approach is that the owners of resulting ontology modules can reuse them inside the enterprise for more applications, being the designers of the structures of ontologies they will be able to reuse them easily instead of searching in a given repository for general design patterns.
The incorporation of ontology design patterns is a good approach whenever the set of required solutions exist in a public available repository.However, many times ontology design requires the implementation of tailored constructs for complex engineering systems and heterogeneous domains.
Even more, ontology reuse requires the adaptation of the pre-existing ontology design solution to the specific application needs.The notion of ontology design pattern is close to the concept of modularized ontology design, in the sense of reusability of ontology modules and composition of new ontologies based on a set of initial components.

Figure 1 .
Figure 1.Phases and procedures of the proposed methodology.Source: Author's own utilizing an editor to implement the ontology modules defined during the previous phase.Although there are various ontology editors available, our recommendation is to use Protégé 1 , an open-source ontology editor that is kept up to date and with broad support from the developer community.b.Modules population consists of instantiating the ontology with individuals to evaluate the initial definitions, relations and axioms.For each individual ontology consistency checking is executed to verify that none of the class definitions and axioms has logical contradictions.This final activity consists of executing the reasoning tasks of taxonomy classification, compute inferred types, and consistency checking.

Figure 2 .
Figure 2. Phases of the diagnosis and treatment of a patient with a disease.Source: Author's own

Figure 6 Figure. 6 .
Figure6shows the class hierarchy of the Medicament ontology.
Figure7shows the ontology metrics.

Fig. 9 .
Fig. 9. Data properties and object properties of patient Maria Hernandez.Source: Author's own elaborated with Protégé

Figure 10 .
Figure 10.Recommended tests inferred for the registered patient Maria Hernandez based on her signs and symptoms.Source: Author's own elaborated with Protégé

Figure 11 .
Figure 11.Recommended tests inferred for the registered patient Ramon Perez based on his signs and symptoms.Source: Author's own elaborated with Protégé

Figure 12 .
Figure 12.Complete definitions incorporated in the ontology system.Source: Author's own elaborated with Protégé

Figure 13 .
Figure 13.Execution of Pellet reasoning tasks.Source: Author's own elaborated with Protégé methodology recognizes the need for reusing knowledge-model elements or a combination of them, considering that large parts of a given model are not specific and may re-occur in other domains and/or tasks.CommonKADS methodology includes the following steps for knowledge-based systems: knowledge identification, consisting of getting familiarized with information sources, glossary and scenarios, and identifying model components for reuse; knowledge specification, aims at constructing a knowledge specification of the knowledge model by choosing the task template and the initial domain conceptualization; and knowledge refinement, which validates the knowledge model and completes the knowledge bases.Noy and McGuinness presented in (2001) a methodology to build ontologies consisting of the following steps: determine the domain and scope of the ontology, reuse existing ontologies, enumerate important terms, define classes and class hierarchy, define properties of classes, define facets of properties, and create instances.Even though this methodology is well explained, it does not include an evaluation step.