The Davenport Group SUMMARY OF GENERAL MEETING Meeting summary for May 12 and 13, 1992. by Dale Dougherty Meeting location: Digital Equipment Corp, Bellevue, WA ******************************************************** CONTENTS: ATTENDEES MEETING SUMMARY WORKING GROUPS ******************************************************** *ATTENDEES The following people attended the meeting: Ann Bishop, Tandem Computers (ann@mpd.tandem.com) Ralph E. Ferris, Open Systems Solutions, Inc. (ralph@ossi.com) Paul Perrotta, Pyramid Technology (perrotta@pyramid.com) Paul Hammerstrom, Digital Equipment Corp. (hammerstrom@smurf.enet.dec.com) Jon Bosak, Novell (jb@sjb.novell.com) Rich Pusateri, HaL Computer Systems (rtp@hal.com) Michael G. Belanger, Fulcrum Technologies (mikeb@fultech.com) Dale Dougherty, O'Reilly & Associates (dale@ora.com) * MEETING SUMMARY The agenda for the two-day meeting had five items: 1. Overview for new members of the Davenport Group and its activities. 2. Discussion of role of DTDs in achieving an interchange format. 3. Statement of support for SGML by Davenport Group members. 4. Organization of the Davenport Group. 5. Proposals for New Working Groups One of the functions of a general meeting is to serve as an introduction to the Davenport Group for new members, who may then get a better sense of how to participate in the group's activities. It also offers new members the opportunity to discuss issues of practical importance to them, and get feedback from others who are involved in similar work. There is also a need to educate new members on the larger issues, and help them understand and support the rationale for decisions made by the group. We discussed the possibility of producing a document that might help educate new members. I will bring this up in a separate message. We talked about the critical role that DTDs play in moving to SGML. There is a sense that support for SGML has been queued up behind having an industry-wide DTD. While some are waiting for that to happen, others are going ahead and developing their own DTDs. I passed out an article by Dave Hollander of HP, taken from comp.text.sgml that I thought summarized the situation well, as well as covering some of the basics of information interchange: -------------------------------------------------------------- / hpcc01:comp.text.sgml / Markku.Savela@tel.vtt.fi (Markku Savela) / 1:30 am May 6, 1992 / Markku Savela asks: > I think this has been asked before, but I don't think we got clear > answer. The question is: What is the SGML solution to document > interchange? > > Assuming we have unknown set of organizations using their own > established ways of producing documents (with FrameMaker, Interleaf, > MS-WORD, etc), how is SGML going to help exchanging documents > containing text, drawings and images? > The answer to this will change over time; in the future DSSSL and SGML support by Frame, Interleaf et al. has the potential of making interchange "automagicly" happen. Today's answer is tougher. Lets start with what SGML will not help with: 1) drawings and images except an universal method of identifing the notation 2) interchange of printer ready materials 3) interchange of editable text between non-SGML editing systems. Unfortunatly, I know of no standard that can address all three of these interchange needs in a resonable manner. SGML today is the best technology I have used in the past 3+ years for these interchange purposes. The amount of effort put into implementing the necessary system "glue" has been less than for other markup languages. The most encouraging thing about SGML is that the effort has been going down due to an ever increasing variety of SGML smart tools; and is likely to continue to decrease due to the number of publishing systems anouncing SGML support. > I do think we need to have some *standard* DTD which defines the tags > for some basic set of features existing in current widely used text > processing programs. (And I assume this is what Markus Kuhn was asking > too). Before such standard DTD exists, I do have to bet on ODA as an > interchange format. ODA at least have people working on converters > between existing formats (though, I cannot claim that ODA is complete > solution either). I know it sounds resonable to have a "standard" DTD. OSF and some of its member companies have been pursuing this approach. I believe that the results will be made public soon; when the DTD is ready, someone will publish a note here. However, be forewarned, DTDs are designed to capture what is significant to the DTD author. This may mean that what you what will not be part of any DTD except one that you write. We run into that every day; our DTD designed for manuals is being used by people to write inter-office memos. Needless to say, the semantics are not the same and many people grumble about SGML because the do not understand that they are "trying to open a small can with a jackhammer". I think that standard DTDs will evolve and be widely used. There will be many of them and each group of people will want thier own, just like WYSIWYG users tend to use many different templates. This realization is what drives the DSSSL group (I assume, I am not a member) and why as reciepant of many documents from many organizations, I hope SGML does catch on! I can not imagine trying to keep up with the conversion needs otherwise. Maybe someday, there will be enough standard technology available to allow someone to say "convert X to Y" (eg SGML to RTF) and have it happen. I am betting that it will happen using SGML before it happens using any other current format. Dave Hollander ______________________________________________________________________ Paul Hammerstrom gave us an update on the OSF's efforts to develop a DTD for technical documentation. He believed it would be ready for OSF member's approval in the June time-frame. He also felt it would be made available publicly some time after that. That discussion led to defining a need for what we called at first a "DTD repository." We thought there would be a need for the Davenport Group to consider how to manage the review and distribution of DTDs used for interchange. One function might be to coordinate a formal review process of a DTD, submitting review comments to the originator of the DTD, or even managing the revision if the DTD was in the public domain. We thought this process might also be applied to other "standards" established by the group, as well as publically available technology. We decided that this was a subject for a working group to consider and make recommendations of a plan for action. Ralph Ferris took responsibility for developing a mission statement for the working group, and that appears at the end of the summary. Jon Bosak had asked that the Davenport Group develop a formal statement of support for SGML. He felt it would benefit those organizations who needed some evidence of industry-wide support for SGML before committing themselves to it. Jon presented a draft statement, which we discussed and revised. We agreed to include the statement in documents that describe the work of the Davenport Group. We also felt that it could be accepted by members as a description of what it means for a company to be a member of the Davenport Group. The statement reads: To realize the benefits of standardized document interchange and encourage the development of tools supporting interchange standards, organizations participating in the Davenport Group agree to use SGML (Standard Generalized Markup Language, ISO 8879) as the default text interchange format for technical publications. We are interested in any comments on this statement. We discussed the organization of the Davenport Group, and the move to less frequent general meetings and separate working group meetings. I discussed the role of the Steering Committee, which consists of Rich Pusateri, Paul Hammerstrom, Garrett Long of USL and me. We agreed that the Steering Committee consisted a reasonable, representative number of people, and that it seemed to be the best way to manage the group's activities. We agreed that the Steering Committee will decide when to call general meetings, based on having developed an agenda for one. We established that we need to have 10 confirmed attendees to hold a general meeting, and that the confirmation should be received prior to the meeting. This helps establish a guideline for cancelling or rescheduling a meeting if not enough people can attend. We also established a requirement that voting privileges are extended to anyone after they attend one general meeting or working group session. Please let me know if you have any comments on these guidelines. Lastly, we talked about working groups. We had two proposals prepared for the meeting. Craig Boyle who is interested in forming a working group to establish criteria for user testing of on-line documentation systems was unable to attend at the last minute. I will ask him to prepare a written proposal for distribution to the group. Finally, as mentioned above, we came up with a new working group during the meeting. * WORKING GROUPS Before presenting the mission statements for each of the working groups, I'd like to explain my understanding of what a working group is. A working group can be proposed by anyone interested in sponsoring and developing work on a particular subject. The sponsor has the opportunity to use the Davenport Group as a forum to solicit participation and support for the working group. Members may contribute as active particpants while others may submit written comments on the working group's activities. The sponsor can organize the working group as the requirements of the work demand, taking care to be an open forum that encourages the participation of all interested parties. The sponsor should also prepare reports periodically on the activities or work products of the group so as to inform a wider circle of people of these efforts. So, what follows are three working group proposals. Please feel free to respond directly to the individual sponsors (their e-mail address is in the message) or to the group as a whole (davenport@ora.com). _______________Committee For Common Man, Lar Kaufman, (lark@world.std.com) The Committee for Common Man was formed to develop a standard form of online documentation based on, and compatible with, traditional UNIX "man" tools, but using a Standard Generalized Markup Language (SGML) document form as a document interchange standard. The CFCM was formed by me, Lar Kaufman, in 1991 in response to a discussion on the comp.text.sgml wherein it was proposed to develop an SGML Document Type Description (DTD) for manpages. I had recently completed the development of a corporate style guide for creating manpages that would be portable over the major variants of UNIX in the marketplace, and I was acutely aware that no industry standard existed for manpages. I reasoned that a standard manpage needed to be defined before undertaking to map the standard form to an SGML DTD. I began to form the CFCM by contacting people whose postings to internet over a period of time had demonstrated to me a sense of concern for the issues I thought the group should address. My own agenda extended beyond the bare task of developing a manpage DTD. I had developed "an attitude" about man tools. I felt that as variants of the viewing tools and the -man macros proliferated and as UNIX was being ported under a variety of flavors by a number of vendors new to the UNIX marketplace, the usability of manpages were declining. People were creating manpages who were unfamiliar with related tools - "whatis", "apropos", "ptx", and the like. Many of these manpages could be printed, but were nearly useless as online documents. I decided we should reverse the trend toward declining usability in man tools at the same time as we defined a manpage standard. I was aware that the increased structural definition of manpages would permit the development of new, more effective ways to extract information online. My initial contacts formed the core of the first working group of the CFCM, the SGML group. I asked Erik Naggum, a person whose knowledge of SGML was frequently demonstrated on comp.text.sgml, to lead the SGML group of the CFCM, and he agreed to do so. I reasoned that a group of documentation professionals working together over the net could make a great deal of progress very rapidly. The orientation of this group would be to meet the needs of the end-user, and the members of the group would not be promoting proprietary tools or defending proprietary approaches to processing this fundamental form of UNIX documentation. We could issue a standard and some implementing tools, and go for grass-roots adoption of the standard; since the man-based document form would have an SGML analog, such a standard could easily allow movement of manpages into current state- of-the-art usage, with hypertext and interactive media implications. I determined to keep the size of the CFCM small, to maximize the speed with which we resolved issues. We needed enough expertise to ensure that we were adopting a viable approach, but we did not need a group large enough to develop factions that would cripple progress. This has proven to be an effective approach, though not without consequences. The work of the SGML group was stalled when Erik was involved in a serious accident requiring months to heal. I had begun to receive inquiries from major "players" in the development of UNIX industry standards - employees of major corporations, and representatives of the Open Software Foundation and UNIX International. I had assumed that the CFCM was doing dirty work that the "big boys" were uninterested in addressing, and it became clear that this was not so. I had to start thinking in terms of solutions that would be acceptable from the perspective of the vendors that had thousands of manpage documents to support. I made contacts with major industry representatives, sharing my thoughts about what we were trying to do, and soliciting advice and support for our work. Dale Dougherty asked me to make a presentation on the CFCM during this phase. I saw that manpages were an essential document form that should be seen as a subset of a larger set of document structures for document sets. I felt that we were in an excellent position to supply a core definition of a document form that could be used widely as a basis for document interchange in the computer industry. The Davenport Group meeting led to additional contacts and discussions. I was invited to speak to the UI SGML SIG, but had to decline due to financial considerations. I was invited to make a presentation to the OSF SGML DocSIG, which I was able to accept. This seemed to be furthering our work, but I was frequently put in a position where I was told confidential things that I was unable to share even with the CFCM. I decided that it was necessary to engage in a set of CFCM meetings, soliciting participation of UI and OSF representatives, in order to ensure that we developed a standard that would be truly "standard". I felt that this face-to-face approach could encourage the cooperation necessary to get an industry standard adopted, but once again, we kept the meetings small to prevent digression and keep ourselves focused on the immediate objective of defining the elements of a manpage in order to allow us to develop a DTD. During this process, I have developed some contacts in industry that I discuss ideas with and seek advice from. These advisors informally keep me focused on CFCM objectives and ensure that I don't adopt a path detrimental to the acceptance of a manpage standard. It became apparent that it would be necessary to do serious tools development if we were to accomplish wide-spread acceptance of a standard manpage. We needed tools that could take advantage of the additional structural definition of a manpage, to make this form of document much more usable than the current generation of manpage documents. This would give manpage developers an incentive to conform to a standard. I began to put together a tools team, and asked Peter da Silva to lead this group. We have begun consideration of the tools we want to develop, and how best to bootstrap our efforts by adoption of already available publicly-redistributable tools, and using them as a starting point for further development. The Committee for Common Man now stands very close to having a complete manpage structural definition that accomodates most of the current generation of manpages. We anticipate enhancing and enlarging the document type definition to extend to new forms of online documents, but recognize the urgency of completing the work on the current form so DTDs can be developed that encompass the standard. We are concerned that in absence of a timely offering of this standard that the current trend toward proprietary adaptations of manpage forms will accellerate and make it impossible to accomplish a standard for character-based online documentation. My own perception is that the opportunity to establish this standard will soon pass. The work of the CFCM has gone forward with the active participation of players from both the OSF and UI camps. We hope to sponsor this summer a joint conference of the CFCM, OSF, UI, and perhaps other organizations with the specific purpose of securing an endorsement of a joint standard for manpage structures: a document description from which a DTD can readily be established. If we can resolve copyright issues, we would like to establish an actual DTD. Eventually, once this DTD has been adequately "field-tested" we hope to have it submitted as an ISO standard. We will develop a true DTD that meets the standard we set, if we cannot get a true DTD as a result of this conference. The CFCM still has large amounts of work in front of it. After the DTD is developed, we will move to complete the development of freely redistributable tools to ensure that computer user users have access to the means to develop and to use standardized manpages. We will develop a replacement for the current -man macros, that provides both a superset of all the current significant implementations of -man macros and a complete mapping of all the DTD structures, elements, and entities in a defined "canonical" manpage form to permit a direct 1:1 mapping of the SGML form of the manpage to a troff form. (We plan to call this macro package the -md macros.) We will develop a suitable formatter to handle the macro package. We will develop a browser to display the documents in whole and in part for efficient online use. (This tool will probably be called "doc" as in "document".) We think we can do this using a number of currently existing offerings as starting points, vastly shortening tool development time. For example, we currently have cawf and BSD macro packages to select macros from, and have promises of the OSF SML (semantic markup language) macros to use. We have the less utility to use as a starting point for a browser. Both awf and cawf offer potential as formatters. Tools such as xman and hman offer extensions to provide X support. The ICA toolkit will form a basis for document conversion. There are numerous additional tools we have not yet had opportunity to examine. Once the DTD development work is complete, we will develop a process for defining the behavior of the tools that will extract information from the manpage and use it in various ways. This means we will develop design specifications for the minimum performance requirements of the tools that will comprise the manpage toolkit, and publish them. (We may define some tools that we do not undertake to develop; only establish a standard behavior.) We will create tools to assist in conversion of existing manpages into the new, more structured form (a sort of lint program for manpages.) We will develop methods of bi-directional conversion between the DTD form and the canonical manpage form. We plan to develop an interactive program that will query a writer for the information required to complete a manpage. The Committee for Common Man currently is structured into a general working group and two special subgroups. These groups work together by telephone and email. There is also a mailing list for people who are interested in the work that the committee is undertaking. These groups all have mailing aliases so that anyone can send a message to the entire group by posting to the alias: cfcm.working@world.std.com The general working group of the CFCM cfcm.tools@world.std.com The tools development group cfcm.sgml@world.std.com The SGML group. (This mailing list is actually maintained by Erik Naggum: mail sent to cfcm.sgml@world.std.com is forwarded to cfcm.sgml@ifi.uio.no for redistribution.) cfcm.interest@world.std.com The general pool of people who are interested in CFCM activities. (The Davenport Group mailing list is included in cfcm.interest - this structure may invert if our activities are brought into parallel.) We would like to combine efforts with the Davenport Group by functioning as a working group. We feel this will help gain recognition and support for our efforts as well as reduce some of the administrative burden of maintaining a separate group. We think that the efforts of the CFCM fit in quite well with the objectives of the Davenport Group. ______________________SGML Query Language Steve De Rose (ebt-inc!sjd@uunet.UU.NET) A query language allows users to locate or retrieve information using a keyword-oriented syntax that reads much like English. (Of course, variants could be made for other languages along similar lines). Query languages have been developed and used to extract information from relational databases, and the like. Now there are several query languages designed to extract information from SGML-encoded databases. The key functionality is to have ways to refer to *all* the major SGML structures, such as elements, attributes, and content, and to express relations between them (among which the boolean are usually remembered by designers, and the critical relation of containment usually forgotten). So, a couple quick examples for now: find "caliper brake" in in <chapter> find ("frumious" within 4 words before "bandersnatch") inside "poem" find <p> with type=french containing "avec" You get the idea; such syntaxes are easy to implement, and easy for users to learn. You get the advantages of natural language, but not the parsing headaches, since the syntax is strict. There is definitely need for a working group to look into an standardizing an SGML-oriented query language. Unfortunately, ISO is already in a rat's nest of conflicts over it. HyTime specifies a thing called "HyQ", which is a first cut at such but needs much work (by the way, HyTime just became a full ISO standard). DSSSL, which is a draft standard but could become full any time, also has a proposal on the floor for an SGML query language; it also has problems, but is further developed than HyQ. The aerospace industry has been pushing "SFQL", which is a simplistic SQL derivative that has been rejected by the SGML community (this rejection is correct -- SFQL, despite its claims, is incompatible with SGML). The best approach for Davenport may be to push hard on ISO to force HyTime and DSSSL to come up with a unified query mechanism for SGML, possibly as a new work item for WG8 (the committee which handles SGML, HyTime, DSSSL, SPDL, and related standards). This would avoid the horror of introducing yet another competing specification, and throw considerable weight at ISO to get its act together. By providing a list of requirements, Davenport could make it very hard for ISO to provide an inadequate tool (elegance is harder to influence except by direct involvement and attendance). There is a way to set up a formal liaison with an ISO working group, which may permit a bit more leverage. I guess what I'm suggesting is avoiding the burden and conflict of designing syntax, but specifying requirements and pushing ISO to unify its already- active efforts to meet them. This would be good for all involved, I think. It also could happen very fast, and be very widely used and implemented. As for specific requirements, here are a few key ones as I see it (I'm sure there are many more): * retrieval of elements, attributes, and/or content. * ability to return element identifiers of some kind, not just the raw string from the SGML file. Note: this is a major disaster in SFQL: If, say, a paragraph is found, all you can get back is *the paragraph* -- "<p>...</p>" -- there is no indication or way to discover what section it is in, or anything like that. This is bad design to begin with, but a true disaster because a fragment of SGML, by definition, *cannot* be parsed with assurance of correctness when taken out of context (I can provide endless examples if you should need them). * selection of hits based on content, elements, and/or attributes. * provision for expressing all the natural "genetic" relationships between elements: parent-of, ancestor-of, child-of, immediately-preceding-sibling, and so on (and using these as part of search expresions). * given an element identifier, find genetically related element identifiers * provision for limiting search to any subtree of a document * the usual word-level features, such as "for within 4 words before bar" * combination of containment requirements and return-value choices: you should be able to locate quotations which contain certain strings or patterns, but then return the sections which contain them. * the mechanism should not depend on having a notion of "paragraph" -- some tag-sets don't have it, and even more have a million different things which might or might not be called "paragraphs" depending on the user's current purposes. * the usual requirements regarding character set and byte-width independence * there are subtleties of SGML to be considered, such as that formal "ID" attributes (perhaps the most important thing to be able to locate fast) are not necessarily named "ID". * care is needed to be sure that the language doesn't make reference to information that isn't included once an SGML file is parsed -- for example, SGML defines that the order in which the attributes are specified on a single tag is not significant; therefore if a query language requires being able to tell this, no existing SGML products can support it without major overhauls (which would also make them violate SGML). Another example would be distinguishing things based on how they were minimized -- an SGML application isn't normally told by an SGML parser whether a tag was implicit or explicit). A good set of requirements would be a boon to the community, and could get extremely wide use, especially if ISO is persuaded to fulfill it in their query mechanisms. ______________________SGML Resources Ralph Ferris (ralph@ossi.com) The goal of the SGML Resources Subgroup is to propose a plan on how to manage: - DTDs submitted by Davenport members - Documents/Standards generated by the Davenport Group Including: - right to use - distribution - revision control The Subgroup will also discuss what role the Davenport Group might play as a clearinghouse of information with respect to publically available SGML resources. End of Davenport Group Summary of Meeting 5-12/13-1992.