3A. Ontologies and Metadata

(JDrucker 9/2013)

Classification systems are standardized in almost every field, but the politics of their development and standardization are highly charged. An entire worldview is embodied in a classification system, and this can mean that it serves the interests of one group and not another, or that it replicates traditional patterns of exploitation or cultural domination. A sensitivity to these issues is not only important, but enlightening in its own right, since the cross-cultural or cross-constituency perspective demonstrates the power of classification systems, but also, our blindspots.

Classification systems reviewed

  • Describing, naming, organizing
  • Attributes in a non-hierarchical system
  • Hierarchies of information

Classification standards

Standardization is essential in classification systems. (If you call something a potato one day and a tomato the next, how is someone to pick the ingredients for a recipe? And if you list all your music by artist’s name and then one by title, how will you find the lost item?) Consistency is everything. When we are dealing with large scale systems used by many institutional repositories to identify and/or describe their objects, such as the Library of Congress subject headings (LCSH) or the Getty’s Art and Architectural Thesaurus (AAT), the Standards (see Getty, for instance), then the necessity for standardization increases. If institutional repositories are going to be able to share information, that information has to be structured in a consistent and standardized manner, and it has to make use of standard vocabularies.

Standardization is related to the use to which the information will be put. Objects can be organized, as you have seen, in an almost infinite number of ways. Organizing tools according to function makes sense, but organizing books by subject and/or author makes sense, but switch these around, and they would not work.

Classification systems are used to organize collections, identify characteristics of objects in a system, and to name or identify those objects in a consistent way. They have a significant and substantive overlap with taxonomies and ontologies. Taxonomies are, quite literally, naming systems. They are comprised of selected and controlled vocabulary for naming items or objects. Ontologies are models of knowledge. They may or may not classify things, but they organize information and concepts into a structured system. There is no need to try to pin these words – classification, taxonomy, ontology—into hard and fast definitions that are clearly distinct. They are not always distinct, and often resemble each other and are interchangeable with each other. In a general way, taxonomies are lists of terms/names, classification systems describe attributes and relations of objects in a system, and ontologies model knowledge systems. Confused?

Here’s a bit more to confuse you further.

Metadata is the term applied to information that describes information, objects, content, or documents. So, if I have a book on the shelf in the library, the catalogue record contains metadata about that book that helps me figure out if it is relevant and also, where to find it. Standard bibliographic metadata on library records includes title, author, publisher, place of publication, date, and some description of the contents, the physical features, and other attributes of the object. Metadata standards exist for many information fields in libraries, museums, archives, and record-keeping environments.

One of the confusions in using metadata is to figure out whether you are describing the object or its representation. So, if you have a photograph of a temple in Athens, taken in 1902 with a glass plate and a box camera, but it is used to teach architecture, is the metadata in the catalogue record describing the photograph’s qualities, the temple’s qualities, both?

Exercise: Take a look at the Getty AAT, and at the CCO (cataloguing culture objects) and figure out what would be involved in describing such an item. Also, since we use Dublin Core for DH projects in Lab, you might want to look at its fields and terms as well. These are professional standards, and very replete.

Exercise: Characteristics of Ontologies Take the following concepts and look at them in relation to a specific ontology (listed on the wiki page link). Describe these elements or aspects of an ontology.

  • Structural organization of information
  • Concepts in a domain
  • Knowledge model
  • Link to purpose/use

See: The Wikipedia page “Ontology” (http://en.wikipedia.org/wiki/Ontology_(information_science)) and look at the many examples listed there; search on several to see how they are structured.

Alternative ExerciseAnalyzing standard metadata systems Read organizational structure of a domain in Getty (http://www.getty.edu/research/tools/vocabularies/aat/), create a scenario in which it works, and one in which it would not. Look for an area in your project domain.

Fluid Ontologies: The Politics of information vs. Ideology of information

The concept of fluid ontologies weaves through the essay by Wallach/Srinivasan. It makes clear what is at stake in the use of classification and description systems, as well as the naming conventions they use. They emphasize the costs (financial, cultural, human) of mismatches between official and observed approaches to description of catastrophic events. The ways in which objects and events are classified makes a difference in whether a situation involving bio-waste can be resolved or not—and whether it would have more effectively dealt with if the fact that dead animals were involved had been clear. These are not just differences of nomenclature, but of substance.

Wallack and Srinivasan stress that ontologies “act as objects” and “negotiate boundaries between groups.” They also state that they function as “mental maps of surroundings.” The mismatch, however, between official and experiential classification systems results in inefficiencies and even insufficiencies that are the result, in part, of information loss in the negotiation among different stakeholders and resource managers.

Exercise: Can you think of an example from your own experience in which these tensions would be apparent?

Wallack and Srinivasan suggest the concept of fluid ontologies as a partial solution. This would allow adaptive, flexible tags that reflected local knowledge and were inclusive to be joined with the official meta-ontologies managed by the State, which are self-reinforcing and exclusive. This raises a question about how folksomonies and taxonomies/ontologies can be merged together.

The importance of this article is the way it shows what is at stake in creating any classification system. Immediately, we see the politics of information and classification, particularly when we think of politics as instrumental action towards an agenda or outcome. But what about the ideology of information and classification? What is meant by that phrase? If we think of ideology as a set of cultural values, often rendered invisible by passing as natural, then how are classification systems enmeshed with ideological ones?

Exercise: Start creating a taxonomy and/or classification system for your project. Scaling up your projects in imagination, what terms, references, resources would you want to cross-reference repeatedly and have stable in a single entry/list, as a pick-list, so you could use them consistently, and what fields would you want to be able to fill with free text or use to generate tags? Why?

Review: So far we have gone through the exercise of analyzing the components of a Digital Humanities project: user experience/display, repository/storage/information architecture, and the suite of services/activities that are performed by the system. Where do the metadata and classification systems belong in this model? How do they relate to the structure of a project as a whole?

Takeaways:

Metadata is information about data. It describes the data in a document or project or file. Folksonomies and taxonomies can co-exist in a productive tension between crowd-sourced and user-generated metadata and standards that emerge in communities of practice.

Next: Databases, what is data, and how are database structures counter to narrative conventions –or not?

Required Readings for 3B:

  • C2DH Ch. 15 Stephen Ramsay, Databases
  • ** Kroenke, Database_1, Database_2
  • Michael Christie, “Computer Databases and Aboriginal Knowledge”

Study Question for 3B:

  1. What does Michael Christie emphasize in contrasting aboriginal approaches to knowing with western approaches to representing knowledge?

Copyright © 2014 - All Rights Reserved