SGML

As its name implies, the Standard Generalized Markup Language standardizes the application of generalized markup concepts. SGML is not a markup scheme (that is, it does not prescribe a set of generic identifiers), but rather a methodology for creating such schemes. The language, which is based on IBM’s Document Composition Facility Generalized Markup Language, is being developed by the ISO, and has recently been published as an International Standard by that organization. Versions of SGML have already been implemented in several organizations or projects, including British Aerospace, and the New Oxford English Dictionary Project. The generic markup scheme under development as the Electronic Manuscript Project of the Association of American Publishers is also an SGML implementation. It should be noted, too, that Joan Smith of the National Computing Centre is a strong advocate of SGML (Barnard et al. 1988, 28).

SGML is the ISO ‘Standard Generalized Markup Language.’ It defines a powerful language for describing and documenting hierarchically structured documents of arbitrary complexity with simple character stream files. It does not specify a particular set of content object types or ‘tags,’ but rather provides a way for declaring which tags are to be used, along with their permissible relationships. For documents of fixed form such as dictionaries and reference works this can be a great help in establishing a consistent structure. Even in the case of more loosely structured material such as literary texts, the existence of a precise description of the document structure can be of use in analysis. For these texts SGML may be more an aid to the scholar than the author.

SGML defines a document in terms of its OHCO structure: it does not directly specify how to format or process a document, but describes a hierarchical document structure with mnemonic names for the content objects of the data. Thus, it does not prejudice whether a document is to be treated as a database, a word-processing file, or something completely different. It is important to note, however, that this independence does not prevent an SGML-using application from displaying data in any way the user desires. Many products provide tools for assigning an appearance to each content object type, and a “WYSIWYG” display for writing and editing (DeRose et al. 1990, 10).

Most discussions of SGML mention if only in passing that the particular characters and conventions used to represent SGML in a particular document can be redefined. This is of course a necessary consequence of the fact that SGML is not itself a markup language, but a method of describing such languages (Burnard 1991, section 4.1).

Standard Generalized Markup Language (SGML). A standardized set of tags that may be used in the transcription of textual documents for computer-based use. “SGML markup” is commonly used to describe any group of tag sets that adheres to this standard (Kline 1998, 273).

Een goede en betrouwbare encoding standaard is SGML: Standard Generalized Markup Language, in 1986 gepubliceerd als een ISO standaard (ISO 8879)6. Als dusdanig is ze publiek gedefinieerd, exact en consistent gedocumenteerd, wordt ze door de ISO gecontroleerd, en wordt ze internationaal geaccepteerd en ondersteund onder de vorm van gesofistikeerde implementaties. SGML zelf is geen markup-taal of encoding scheme, maar een meta-taal die het mogelijk maakt om markup-talen te definiëren. Een markup-taal moet specifiëren welke markup toegestaan is, welke markup vereist is, hoe de markup onderscheiden wordt van de tekst en wat de markup betekent. SGML voorziet in middelen om de eerste drie te doen. De documentatie van de specifieke markup talen met SGML gecreëerd is nodig voor de laatste vereiste, zoals bijvoorbeeld in de TEIP3 Guidelines (zie infra). Omdat SGML bestaat uit een plain ASCII file, is het volledig onafhankelijk van soft- of hardware en kan het over alle netwerken verspreid worden. De kracht en de flexibiliteit van het mechanisme zorgt er o.a. voor dat dezelfde elektronische tekst voor verschillende doeleinden gebruikt kan worden (Vanhoutte 1998, 111-102).

Om de toegankelijkheid van de gedigitaliseerde documenten te garanderen, is het nodig ze op te slaan in een formaat dat door elk systeem gelezen kan worden. Een dergelijk formaat is SGML (Standard Generalized Markup Language), een metataal die het mogelijk maakt bepaalde opmaaktalen te definiëren, zoals TEI (Text Encoding Initiative). Daarmee kan de tekst beschreven en gecodeerd worden zodat er in een latere representatie bepaalde operaties op uitgevoerd kunnen worden (Van Hulle 1998, 105).

The call for reusability, interchange, system- and software-independence, portability, and collaboration in the humanities was answered by the advent of the Standard Generalized Markup Language (SGML) which became an ISO standard in 1986 (ISO 8879: 1986) (Goldfarb, 1990). SGML is not itself a markup scheme, but a methodology that enables the creation of such schemes. Based on IBM’s Document Composition Facility Generalized Markup Language, SGML was developed mainly by Charles Goldfarb to become a metalanguage for the description of markup schemes that satisfied at least seven requirements for an encoding standard:

  1. The requirement of comprehensiveness;
  2. The requirement of simplicity
  3. The requirement that documents be processable by software of moderate complexity;
  4. The requirement that the standard not be dependent on any particular set or text-entry devise;
  5. The requirement that the standard not be geared to any particular analytic program or printing system;
  6. The requirement that the standard should describe text in editable form; and
  7. The requirement that the standard allows the interchange of encoded texts across communication networks.

Such a markup scheme was exactly what the humanities were looking for in their quest for an encoding standard for the preparation and interchange of electronic texts for scholarly research (Vanhoutte 2004, 10).

Related entries

Comments are closed.