A simple document for explaining an SGML tool\

| | This would require a new function character, for which I've use #, to | mean "interpret the following a-v-l REALLY literally". There will always be a use/mention problem. The above can be done by using entity or character references for the problem characters. | 2) (A much more significant change, probably WAY beyond what would be | considered evolutionary by the committee) As a programming language, | SGML lacks a crucial innovation of the 1960's, namely scoping. I would | like to be able to scope most if not all SGML name spaces, certainly | entities and IDs, possibly elements as well. This requires either a | five-page exposition, which I don't have time for, or none at all. I'd | be interested if anybody has had a similar desire, and what they've | done in the way of work-arounds (yes I know about SUBDOC, but that's a | sledge-hammer to crack a walnut, in my view). I've had some of the same thoughts and have essentially the same problem. There are couple of ways to handle it. One is to ignore the SGML-defined ID/IDREF mechanism and set up application-managed name spaces. While this will work and it's not too hard to set up, it does move you away from the standard, which one should avoid whenever possible. HyTime does define mechanisms by which you could define application-managed name spaces while preserving the ID/IDREF semantic (essentially using a lexical type of ID or IDREF to constrain what are otherwise attributes with a declared value of NAME rather than ID). I have spent some time puzzling on this issue and the conclusion currently holding sway in my brain is a mechanism by which each element that itself has an ID defines a new ID name space such that the IDs of the direct children of an element that has an ID must be unique within the scope of that element, but the IDs of grandchildren need not be unique within that scope. For example: \ | | (1)\ (2)\ | | \ (3)\ | (4)\ Remembering that each element that has an ID defines a new name space, we see that elements (1) and (2) are in the name space of \, and that element (3) is in the name space of element (1), as is element (4), because its parent does not have an ID. Elements (2) and (4) can have the same ID because they are in separate name spaces. The main problem with this is addressing an element unambiguously. This must require specifying the ID parentage of a given element, much like a network address. To locate element (4), I might do something like REFID="parent.child1.child2", where each dot-delimited field is the ID of an ancestor element (reading from left to right like an Internet address). For ambiguous addresses (where more than one element satisfies the ID query), you could define one or more resolution algorithms, such as return all found, return first found in left-list order, return highest in the tree, etc.. This sort of addressing can be completely described using HyTime addressing mechanisms, and therefore I could implement it using standard HyTime location mechanisms. For example, to locate element (4), I could do this: \Indirect reference to element 4 via Nameloc\ \ \ \ select(UseQ(Name_Space_Of "CHILD1") EQ(proploc(CAND, ID) "CHILD2") ) \ Where the function "Name_Space_Of" returns a list of all the nodes within the specified node's ID name space (in other words, all descendants without IDs down to and including the closest descendants with IDs (but not the descendants of descendants with IDs). The query says (if I've coded it right) "from the set of elements within the name space of element "CHILD1" within the scope of element "parent" (defined by the qdomain= attribute of nmquery), find that element whose ID property (the value of the ID= attribute) is equal to 'CHILD2'". If you think of this in terms of network addressing, each element that has an ID defines a new subdomain within the network (imagine you could keep subdividing your local internet host into subnets, and subnets into subsubnets, etc. Even though I can do the addressing using HyTime, I wouldn't want to have to create a query every time I wanted to make an unambiguous address. Instead, I'd want some sort of convenient syntax such as internet-style addresses to locate elements unambiguously. One of the interesting things this sort of naming scheme does is it makes document boundaries no longer a constraint for name spaces. I see this as an advantage, especially in a re-use environment where document boundaries may be fairly fluid. Used in one context, a given SGML fragment may be a complete document, and in another it may be a component in a larger document using a different (but compatible document type). Without a more flexible name space, I can't provide blind interchange of this sort without risk of name collision. For example, if you want to enable the blind re-use of fairly low-level constructs (say sub-sections within chapters), you must either have a scheme like the one above or have some data management system that will manage the IDs across all SGML data subject to re-use to ensure uniqueness, which is sort of like suggesting all internet user IDs must be unique -- not only is it ridiculous, it's not necessary, from a purely technical standpoint. There are some other possible approaches to this problem. You could, for example, make your document boundaries lower, say as low as the lowest expected granularity of re-use, and then use standard HyTime location mechanisms to do addressing across these documents, tying them together into logical documents using HyTime hub documents. But with this approach, we have a convenience problem again because HyTime defines no direct mechanism to refer to an ID in another document; you must use a Nameloc element to provide the indirection. You could, of course, define an application-level method for cross-book reference, but it would be non-standard and there's no guarantee other systems would support it (whereas they probably would support HyTime-defined methods). I don't find this approach very satisfying either. I'm not sure what the best answer is -- I'm not particularly convinced that my proposed solution is even a good idea, much less the best approach, but it's the best I've been able to come up with. I don't think it would be difficult to implement since hierarchically-scoped structures are natural to both SGML processing and object-oriented systems in general, but I don't know what the full implication would be for SGML, HyTime, and use in practice. I do think, however, that we can learn a lot from the solutions developed in telecommunications to solve just these sorts of addressing problems in hierarchical networks. -- Eliot Kimber Internet: drmacro@vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709 Newsgroups: comp.text.sgml Date: 07 Oct 1993 19:00:54 UT From: "Eliot Kimber" \ Message-ID: <19931007.120623.360@almaden.ibm.com> References: <1993Oct7.113508.16986@vax.oxford.ac.uk> Subject: Re: Syntax specs for SGML query languages [Lou Burnard] | There was some discussion on this list a while back about SGML query | languages and I understand that several of them were presented at the | SGML92 conference a year ago. | | Has a consensus now emerged? And if so, is it SGML/Search? Dynatext? | DSSSL? HyQ or what? Within an SGML context, HyQ at this point represents the only existing *standard* for querying SGML and non-SGML data. If nothing else, HyQ represents the set of query *semantics* needed to do "SGML queries", and as such can function as a specification language for more specialized query languages if it is not used directly as the application query language. Systems like SGML/Search and DynaText must provide query languages tailored for end users and therefore will tend to be syntactically simpler and more explicit than HyQ. Likewise for the DSSSL query language (I'm not involved in the DSSSL committee, so I don't know what the status of the DSSSL query language is, but I understand that there is work to make sure that HyQ and the DSSSL query language are functionally if not syntactically equivalent). Thus, from my point of view, the answer is "HyQ" from a specification standpoint, and "whatever you want to use" from an application standpoint, meaning that HyTime processors (which really means all SGML processors from here on out) should use HyQ as the the "query interchange language", regardless of the concrete language exposed to the user. Because HyQ is part of the HyTime standard, I would expect, as an author, that if I put HyQ queries in my document *all* HyTime engines will be able to process (although not necessarily resolve) those queries, but I would not expect that for any proprietary query notation. I might also expect, as an author using a particular application, that I will be given a more "user-friendly" query notation that I can use within the context of that application, but that the application will export my queries for interchange as HyQ queries. | Where is the most convenient place to locate syntax specifications for | any of these query languages? ISO/IEC 10744 of course has the HyQ language definition, but you can also get my HyQ Tutorial from Steve Newcomb's FTP server by anonymous FTP from mailer.cc.fsu.edu, in pub/sgml/HYTIME. This document contains a definition of the HyQ syntax in somewhat more accessible form than in the Standard. -- Eliot Kimber Internet: drmacro@vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709 Newsgroups: comp.text.sgml Date: 08 Oct 1993 01:34:48 UT From: Chet Ensign \ Message-ID: <9310080138.AA06819@mail.netcom.com> Subject: Wrong date for SGML Forum meetings In my posting earlier today, I said that David Harkness of Wordperfect would be showing Intellitag at the general meeting on November 30th. I got my schedule backwards -- and apologies to both David and Chip Pettibone of EBT for doing so. At the November 30th meeting, Peter Jarrum of Novell will be presenting a case study on how they adapted SGML into thier tech writing department in order to deliver the documentation for Netware 4.0 online. Chip Pettibone will be there to demonstrate Dynatext -- the engine Novell choose as their delivery vehicle. You've heard about Dynatext quite a bit on this newsgroup. If you have never seen it before, this is your opportunity. David Harkness will be showing Intellitag at the January 18th meeting. Also at that meeting Ms. Tommie Usdin will give a presentation on developing DTDs. It promises to be a very practical session. Meetings of the Forum are held in the second floor conference area of the McGraw-Hill building in New York's Rockefeller Center area. The M.H building is on 6th Avenue between 48th and 49th Streets. The entrance is on 49th Street. Meetings generally start at 5:30. Membership is not required in order to attend. But we'd sure appreciate it. /chet -- Chet Ensign Information Builders, Inc. 212-736-6250 X4349 internet: doccoe@ibivm.ibmmail.com ibmmail: USUBUVMV@IBMMAIL compuserve: 73163,1414 Newsgroups: comp.text.sgml Date: 08 Oct 1993 09:09:50 UT From: "Henry S. Thompson" \ Organization: HCRC, University of Edinburgh Message-ID: \ References: <28s865$hqb@mailgzrz.TU-Berlin.DE> <15911@barclay.ed.ac.uk> <19931007.114849.214@almaden.ibm.com> Subject: Re: ISO 8879 what cahnges are being considered? Certainly Eliot's proposal for scoping of ID is the kind of thing I had in mind. To briefly motivate my a further example, I should say that my primary use of SGML is in the assembly and publication of multi-lingual text corpora for (computational) linguistic research purposes. This leads fairly directly to a desire to scope elements and attributes, because I often get material from contributors containing a modest amount of low-level markup, which although internally consistent is not consistent across languages. So I'd like to share the higher levels of the element structure from my own DTD across all the contributions, with the option of introducing some alternative structuring within a declared scope. For example, let us say that all documents are structured in terms of \ at or near the top, but that some documents have "thick" markup, with a lot of structure under \

as a scoping element, and then provide (re)declarations of \

within any particular instance of \

. Now this is clearly VERY radical, in that it involves either stating or at the very least invoking temporary declarations in the midst what has heretofore been inviolably document instance. ht -- Henry Thompson, Human Communication Research Centre, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 31 650-4440 Fax: (44) 31 650-4587 ARPA: ht@cogsci.ed.ac.uk JANET: ht@uk.ac.ed.cogsci UUCP: ...!uunet!mcsun!uknet!cogsci!ht Newsgroups: comp.text.sgml Date: 08 Oct 1993 10:13:23 UT From: "Rick Jelliffe" \ Organization: University of Technology, Sydney Message-ID: \ References: \ <1993Sep29.153322.20227@titan.inmos.co.uk> Subject: Re: should we go for SGML? [Glenn Hill] | 1) We would like to remain as far as possible independent from a | proprietary markup format. Our company uses various DTP/word- | processing packages: Interleaf, Framemaker, word-perfect, AMIpro, | ventura etc. What we would like is for anyone to be able to edit a | document with whatever tool they prefer or have a licence for. Ideally | we don't want them to have to convert between formats using filters. | | From what I have read, it seems like a good approach would be store our | documentation in a standard format such as SGML. In this way, any tool | that has an SGML front end will be able to understand the data. Does | this make sense? Is anyone else using SGML in this way? SGML can be used to devise convenient interchange formats for lots of applications. If the applications use similar markup conventions, it can be fairly easy to write translators to and from this. But the trouble is that many DTP/WP packages use very different output formats. It is often easy to translate from an SGML marked-up document into a specific DTP/WP format, but more difficult to do the reverse: expecially if you decide you want to try to represent the underlying structure of the document (e.g., paragraphs, sections, headings) rather than just translate format items into generic format markup (e.g., bolding, font, etc). If you don't want to represent the underlying structure, you may as well use for the immediate future something like RTF until major PC packages start supporting SGML better. If you want to try to recover structural information, the first thing is that the original documents must have some structure to recover in the first place. So even if you decide that you want to move to SGML in the future, it can be useful to start now without any change in technology by * devising the structural generic elements well, * devising a DTD using these elements (the general structure of your documents), * mapping the DTD and elements onto specific procedures for each DTP/WP package, and * training the operators of each WP/DTP package to use the procedures. If you are lucky enough to have a system that uses style sheets rather than individually formatting WYSIWYG-style each item as it comes, then you should try to enforce this, and use it for your procedures. That way your data will be more regular and ordered when it does come time to put an SGML-based conversion system in place. The problem is that there are often as many ways to do something in a DTP/WP package as there are operators: for a conversion package to take into consideration each of them get too complex to be realistic. -ricko Newsgroups: comp.text.sgml Date: 08 Oct 1993 13:59:53 UT From: Eliot Kimber \ Message-ID: <19931008.070722.992@almaden.ibm.com> References: <28s865$hqb@mailgzrz.TU-Berlin.DE> <15911@barclay.ed.ac.uk> <19931007.114849.214@almaden.ibm.com> \ Subject: Re: ISO 8879 what cahnges are being considered? [Henry S. Thompson] | Certainly Eliot's proposal for scoping of ID is the kind of thing I had | in mind. To briefly motivate my a further example, I should say that my | primary use of SGML is in the assembly and publication of multi-lingual | text corpora for (computational) linguistic research purposes. This | leads fairly directly to a desire to scope elements and attributes, | because I often get material from contributors containing a modest | amount of low-level markup, which although internally consistent is not | consistent across languages. So I'd like to share the higher levels of | the element structure from my own DTD across all the contributions, with | the option of introducing some alternative structuring within a declared | scope. There are three different problems at work here that the current SGML standard does not address directly (but that can be solved by application- specific methods): 1. The management of ID name spaces in a re-use environment 2. The inheritance of properties (attribute values) down the structure tree 3. The local definition or re-definition of element structure within document fragments. Problem 2 is easy to solve in the application and should, I think, be part of 8879 (e.g., something like adding #INHERIT to the list of default behaviors: \ In an application, especially one implemented using object-oriented methods, it is almost trivial to provide this inheritance behavior for #IMPLIED attributes. This behavior is so natural and obvious that I consider it something every SGML application should define as part of its base processing definition until such time as either SGML or HyTime define a standard way of expressing it. Note however that you wouldn't want *all* #IMPLIED attributes to inherit necessarily. I divide attributes into the following classes: Identifying properties - Those properties of an element that identify it in some way, of which the standard ID attribute is the prototypcial example. Elements may have more than one identifying property. Identifying properties do not inherit. Hyperlink references - IDREF, Linkends, and their ilk. These attributes serve to establish relationships been elements either by direct or indirect reference (which may include queries). Hyperlink references to not inherit. Descriptive Properties - The "attributes" in entity-attribute formalism, those properties that describe an element, things like language, version, security, etc. Descriptive properties inherit but can be locally overridden. Also called "metadata" attributes. Style properties - Attributes that contain information relating directly to the processing or presentation of elements, things like Style=, font=, color=. Where possible, style properties should be specified as LINK attributes. Style properties may or may not inherit depending on the specific application to which they apply. Style= attributes are always processor-specific and must be removeable or changeable without harm to the data content itself (thus their specification as LINK attributes whenever possible). Thus, by this formalism, only descriptive properties and some style properties are inherited, and I would only expect SGML applications to provide generic inheritance support for descriptive properties. In the InfoMaster Architecture we have formalized this concept by providing a declaration mechanism by which you declare to the InfoMaster processor which attributes are descriptive properties, from which it can then provide generic property inheritance processing with no additional knowledge about the semantics of your application. Problem 3. (local redefinition of element structure rules) is a much tougher nut to crack, and all the robust solutions we've* thought of either have Achiles' heels or represent radical modifications to ISO 8879. For some cases, you can do some tricks with containers. For example, consider the case of P where you have the need for differing types of content, from CDATA (ick) to complex low-level elements. You can get this by having different "paragraph body" elements within P. Consider this content model: \ \ \ \ \ You've now got a fairly flexible system that should meet the full range of needs your user's may have. If you couple this with the sort of architectural-form-based process I proposed in an earlier post, you can support user-defined elements with a reasonable degree of function and reliability. ---------------------------------------------- *Those of us within IBM trying to figure out how to support our users' need for interchange of SGML fragments while enabling a reasonable degree of flexibility in the specific elements used in a document) -- Eliot Kimber Internet: drmacro@vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709 Newsgroups: comp.text.sgml Date: 08 Oct 1993 16:34:45 UT From: "Michael G. Popham" \ Organization: University of Exeter, UK. Message-ID: \ Subject: IADS installation tips If you have had problems trying to install the IADS software that was recently made available on behalf of the International SGML Users' Group, here are a few things to check/try out: * Ensure that you always used the binary mode when ftp-ing/copying the disk*.zip files. * When you unzip the files, make sure that you are re-creating the original directory structure (eg. make sure the -d switch is set if you use PKUNZIP to unzip the files). If you don't do this, you'll still end up with alot of unzipped files, but installation won't work! * The approach most likely to be successful (on the basis of email I've received) involves the following steps: 1) Create a directory on your hard disk (eg. C:\\disk1) 2) Unzip _both_ the disk*.zip distribution files into this directory (ensuring that you set any switches to recreate directory structures eg. -d with pkunzip) 3) Launch Windows, and use File Manager to run the file WINSTALL.EXE 4) Let the installation software install IADS in the directory C:\\IADS (as suggested in the dialogue box). 5) You should get a message telling you that the IADS application has been successfully installed, and added to Program Manager. 6) In Program Manager, you should see a new group called IADS. It should contain 8 icons labelled: IADS Author, IADS Reader, ViewImage Author, ViewImage Reader, Stylesheet, Link Verifier, RPSTL Author, and RPSTL Reader. 7) If you can't see the new group, check that it isn't hidden behind another group. You could also try re-launching Windows, and/or using the Windows setup options to add the IADS applications by hand. 8) When you start an IADS application, login as "guest" as this requires no password. * If you are feeling brave/inquisitive/foolhardy, and want to recreate the original distribution disks: 1) Get three blank floppy disks (double sided, high density) 2) Use the appropriate DOS command to supply volume labels. Label one disk IADSAUTHONE, and another IADSAUTHTWO (the disks I received from the Users' Group had the serial numbers 3867-10F5 and 093B-10F4 respectively, if you think that will help!) The third disk is intended for the DTD, and has no label. 3) Create some directories on your hard disk (eg. called disk1, disk2, disk3) 4) Remembering to set any switches to recreate directory structures etc., unzip disk1.zip into directory disk1, disk2.zip into directory disk2, then move the file IADSTM.DTD from the disk2 directory into the disk3 directory (!). 5) Maintaining the directory structures, copy all the contents of directory disk1 onto the floppy labelled IADSAUTHONE, copy all the contents of directory disk2 onto the floppy labelled IADSAUTHTWO, and the contents of directory disk3 onto your third (unlabelled) floppy. 6) Cross your fingers. 7) Put the first floppy (IADSAUTHONE) into your floppy disk drive (eg. A:) 8) Make your floppy disk drive the current active drive. 9) Type in "install" and press the enter key. 10) With luck, this will start Windows, and a dialogue box will appear and tell you the default drive into which IADS will be installed (C:\\IADS). If Windows doesn't start up, you could always try launching Windows manually, and using File Manager to run A:\\WINSTALL.EXE 11) Click on "Ok", and the installation should begin. When you are prompted to supply the second disk, insert the floppy labelled IADSAUTHTWO. If all goes well, you should then be able to proceed as from step 5 shown in the other approach to installing IADS (above). 12) Don't be too surprised if the installation program refuses to recognize your IADSAUTHTWO disk as the disk it needs. Console yourself with the knowledge that many people have reached this point and also failed. Either a) adopt the first approach, b) contact the Users' Group for copies of the original distribution disks and documentation c) work out what's gone wrong, devise a solution, and post it to comp.text.sgml in the spirit of good-neighbourliness! I hope this helps those of you who have been having problems installing IADS. (These problems almost certainly stem from the way I zipped the files or the way you unzipped them, and so the IADS developers and the International SGML Users' Group are entirely innocent!) Assuming you do manage to install IADS, I hope you enjoy using the software. And once again I would like to thank the developers, the USAMICOM Integrated Materiel Management Center, and the International SGML Users' Group for making the software available. Michael Popham -- SGML Project - C.D.O Email:M.G.Popham@exeter.ac.uk Computer Unit - Laver Building Phone:+44-(0)392-263946 North Park Road, University of Exeter Fax: +44-(0)392-211630 Exeter EX4 4QE, United Kingdom Newsgroups: comp.text.sgml Date: 08 Oct 1993 17:37:59 UT From: "Gerhard Brey" \ Organization: Inst. f. Geschichte der Naturwiss., Muenchen (Germany) Message-ID: \ References: \ \ Subject: Re: Announcing release of IADS software [Pertti M{kitalo] | Has anyone succeeded the installation of IADS. I've tried, but it says | that it needs the IADS Authoring (1.2) disk 1. | | If someone could help me I'd be very happy. I didn't succeed either. Although I have 16 MB of RAM plus a swapfile, when I start WINSTALL, I get a message that there is not enough memory. I would be grateful for any help. Gerhard Brey -- : Gerhard Brey : Institut fuer Geschichte : : : der Naturwissenschaften : : : der Universitaet Muenchen : : : 80306 Muenchen : : ug301ab@sunmail.lrz-muenchen.de : Germany : Newsgroups: comp.text.sgml Date: 08 Oct 1993 21:03:06 UT From: Chet Ensign \ Message-ID: <9310082107.AA02529@mail.netcom.com> References: \ Subject: Re: SGML tools for MS Windows [Daniel Tauber] | Can anyone give me any pointers to MS Windows based SGML tools? Somebody correct me if I'm wrong, please, but I heard that the GRIF SGML editor is available on Windows. The info that I have on them is: Grif S.A. Immeuble "Le Florestan" 2, boulevard Vauban B.P. 266 78053 St. Quentin en Yvelines Cedex FRANCE Jean-Charles d'Harcourt, Sales Manager (33) 1-30-12-14-30 FAX (33) 1-30-64-06-46 This info comes from "The SGML Source Guide," a listing of SGML product and service providers published by the GCA. It would be worth getting if you're really going to be tracking down SGML systems. You can order it by calling 703-519-8157. Also, Arbortext is developing a Windows version of the ADEPT SGML editor for sometime next year. /chet -- Chet Ensign Information Builders, Inc. 212-736-6250 X4349 internet: doccoe@ibivm.ibmmail.com ibmmail: USUBUVMV@IBMMAIL compuserve: 73163,1414 Newsgroups: comp.text.sgml Date: 09 Oct 1993 05:15:08 UT From: "Gary Houston" \ Organization: Statistics New Zealand Message-ID: <1993Oct9.051508.20304@stats.govt.nz> References: <1993Sep29.153322.20227@titan.inmos.co.uk> \ <19930930.053456.706@almaden.ibm.com> Subject: Re: should we go for SGML? [Eliot Kimber] | There is also nothing preventing, in theory, the SGML encoding of the | graphic information itself (at least for vector graphics, wouldn't make a | lot of sense for bitmaps: \). About a year ago now I posted | a DTD for just such a purpose, but never got any response back on it (not | even a "that's the most absurd idea I've ever heard of"). I don't think it's absurd. Here's how a hexagon is represented in Xfig (a drawing program available on the net): #FIG 2.1 80 2 2 3 0 1 -1 0 0 0 0.000 0 0 0 419 169 488 129 488 49 419 9 350 49 350 129 419 169 9999 9999 which is documented somewere, but there's no reason in principle why an SGML format couldn't be used instead. This kind of "embedded SGML everywhere" idea would be more practical with an interface to the parser as a shared library, I suspect. Gary Newsgroups: comp.text.sgml Date: 09 Oct 1993 11:57:06 UT From: "Pertti M{kitalo" \ Organization: Helsinki University of Technology, Finland Message-ID: \ References: \ \ \ Subject: Re: Announcing release of IADS software [Pertti M{kitalo] | Has anyone succeeded the installation of IADS. I've tried, but it says | that it needs the IADS Authoring (1.2) disk 1. | | If someone could help me I'd be very happy. My problems was that I was unzipping with pkunzip whitout -d switch. So if you had problems try that. [Gerhard Brey] | I didn't succeed either. Although I have 16 MB of RAM plus a swapfile, | when I start WINSTALL, I get a message that there is not enough memory. My system has similar memory and also a swapfile. After the unzipping with -d switch everything gone ok. Pertti M{kitalo Newsgroups: comp.text.sgml Date: 09 Oct 1993 16:05:35 UT From: "Brian E. Travis" \ Organization: SGML Associates, Inc. Message-ID: <750182735snx@sgml.com> Subject: \ Abstracts, October, 1993 The October, 1993 issue of \ The SGML Newsletter is in the mail. As a service to the readers of this list, the table of contents, along with a brief description of each article, is listed below. \, published monthly, is the only regular source of information for the SGML industry: articles, reviews, product news, tutorials, standards updates, user group information, case studies, and useful technical tips and information. \ is designed to be an objective voice of the SGML community. We accept no advertising and encourage frank and open debate on the issues faced by the information publisher. For more information about \, contact SGML Associates at +1 303-680-0875, by fax: +1 303-680-4906, or by e-mail: tag@sgml.com. --------------------------------------------------------------------------- \ October 1993 --------------------------------------------------------------------------- DTD Nets By Jordi Farres In theory, a DTD can be used for multiple purposes (printing, retrieval, and version control, for example). Since the DTD describes the structure of the data, an application should be able to gather what it needs from the document and map it into its own internal formats. However, real-world limitations in available applications require some modification of DTDs in order to have a workable system. This article describes the use of "DTD Nets" that allow a hierarchical view of the same data using overlaid DTDs. The SGML Tutorial by Eric van Herwijnen Reviewed by Michael R. Hahn and Peter G. MacHarrie The electronic version of van Herwijnen's popular book, Practical SGML, has been released for Microsoft Windows- compatible computers. In addition to the text of the original book, the electronic version includes an interactive parser so the student can see what effect certain changes in the document or DTD have on the output. The product is reviewed in this article. Letters to the Editors SGML Tips and Techniques PUBLIC Identifiers with SGMLS Keeping Track of File Position in OmniMark News Report from X3V1 Standards Meeting SGML Forum of New York Upcoming Activities AT\&T Delivers SGML Documents on CD-ROM 1993 Calendar Goudy Center Presents SGML Seminar -- Brian E. Travis brian@sgml.com Principal Consultant Managing Editor, Tele: +1 303 680-0875 Information Architects, L.L.C. \ The SGML Newsletter Fax: +1 303 680-4906 Newsgroups: comp.text.sgml Date: 09 Oct 1993 16:16:23 UT From: "Brian E. Travis" \ Organization: SGML Associates, Inc. Message-ID: <750183383snx@sgml.com> Subject: Goudy Center Presents SGML Seminar Goudy Center Presents SGML Implementation Seminar Rochester Institute of Technolgy's Goudy International Center for Font Technology and Aesthetics is presenting "Jump-Starting your SGML Implementation", a two-day seminar on the issues surrounding the implementation of SGML, Thursday and Friday, November 18-19, in the School of Printing Management and Sciences. This seminar is designed to help users get a fast start on their SGML development project, covering such areas as analysis and design requirements for implementing an SGML-based system in an organization, as well as providing a framework for implementation. Empahsis will be placed on the importance of defining requirements and staffing the conversion and implementation process effectively. The two-day seminar is designed to give managers and technical personnel a one-day overview of SGML and its business advantages in the area of traditional and database publishing, as well as the ability to define data in a standardized, non-system- dependent manner. The second day will be devoted to more technical issues, such as creating a DTD and understanding the issues surrounding writing an SGML application. The power of SGML and database technology will also be explored. Instructors for the seminar are Brian Travis of Information Architects, L.L.C., and Dale Waldt of Thomson Professional Publishing. They have combined experience of more than 20 years in the design and implementation of SGML-based systems and are well-known in the industry. Cost for the event is $295 for one day and $495 for both days. For more information or to register, contact Margaret von Koschembahr, program director, at +1 716-475-2052. The Goudy Center is the first world-wide independent resource center for disseminiation of information about type -- its history, development, and its use in documents. The Center offers courses focused on the typography and design of documents, as well as courses addressing a broad range of technical, aesthetic, historic, and legal aspects of alphabets and typefaces, including those from both Western and non-Western cultures. Newsgroups: comp.text.sgml Date: 10 Oct 1993 13:10:12 UT From: "Gary Benson" \ Organization: Fluke Corporation, Everett, WA Message-ID: \ References: \ <1993Sep29.153322.20227@titan.inmos.co.uk> \ Subject: Re: should we go for SGML? [Rick Jelliffe] | SGML can be used to devise convenient interchange formats for lots of | applications. If the applications use similar markup conventions, it | can be fairly easy to write translators to and from this. | | But the trouble is that many DTP/WP packages use very different output | formats. It is often easy to translate from an SGML marked-up document | into a specific DTP/WP format, but more difficult to do the reverse: | expecially if you decide you want to try to represent the underlying | structure of the document (e.g., paragraphs, sections, headings) rather | than just translate format items into generic format markup (e.g., | bolding, font, etc). If you don't want to represent the underlying | structure, you may as well use for the immediate future something like | RTF until major PC packages start supporting SGML better. . . . <<< some good advice omitted for brevity >>> | If you are lucky enough to have a system that uses style sheets rather | than individually formatting WYSIWYG-style each item as it comes, then | you should try to enforce this, and use it for your procedures. | | That way your data will be more regular and ordered when it does come | time to put an SGML-based conversion system in place. The problem is | that there are often as many ways to do something in a DTP/WP package | as there are operators: for a conversion package to take into | consideration each of them get too complex to be realistic. This is our current situation. Our department recently took delivery of a bunch of PCs that are networked, replacing our VT220 terminals. Now we have people clamoring to put emacs behind us and move to a DTP package as our text editor. For years, we have insisted on straight ASCII for archiving purposes and because we knew that when we eventually move fully to SGML, it will be easier to incorporate the tags into ASCII than into files in some proprietary format. Yet if we have our authors all designing their own methods/style sheets/design elements it is going to be impossible to later screw in an information management system. I can think of two options: 1. Continue to insist on straight ASCII text for publishing and archival purposes regardless of what the author does in terms of formatting interim copies. This seems to waste the effort the writer puts into formatting, forces the duplication of that effort for the final published work, and may even cause some loss of intended markup if the writer's skill does not exactly match the capabilities of the ASCII-to-SGML converter. 2. Select a DTP package as the standard and develop style sheets for that package. This has two major disadvantages: first, it locks us into a proprietary format that may or may not stay compatible over the years, and secondly, it requires a massive effort immediately to create and disseminate standardized style sheets. In addition, because commercial DTP packages by their very nature are continually changing, we also would never be certain that our style sheets and macro packages are the best possible implementation of the particular DTP program. Perhaps even more insidious is the way this effectively gives over control of our processes to the whims of the DTP programmer. Tough questions, few answers. We now have more publishing power available at the writer's desktop than huge corporations commanded 10 years ago. The challenge is in harnessing this power and managing the changes. Someone once compared managing software engineers to herding cats. Managing information in the midst of an information explosion is like organizing cats into a Drum and Bugle Corps. -- Gary Benson-_-_-_-_-_-_-_-_-_-inc@sisu.fluke.com_-_-_-_-_-_-_-_-_-_-_-_-_-_- Women are the other half of the sky. -Mao Tse Tung Newsgroups: comp.text.sgml,comp.lang.lisp Date: 10 Oct 1993 23:04:46 UT From: "Pilch Hartmut" \ Organization: Leibniz-Rechenzentrum, Muenchen (Germany) Message-ID: \ Keywords: sgml sgmls emacs lisp asp converter Summary: simple e-lisp program that creates lisp input for poweutput transformation programs Subject: sgmls2lisp ;; sgml2lisp -- sgml output formatting tool using SGMLS, EMACS and Lisp ; PURPOSE ; Generate a lisp program that acts as a filter ; in converting SGML text to any user-specified format. ; The generated converter operates on the output of ; the SGML parser SGMLS (copyleft by J. Clarke) and ; performs the same task as SGMLSASP. ; But conversion algorithms needn't any longer conform to ; the restricted code of the Amsterdam Parser (ASP), ; but are free to draw on the vast resources of a ; leading artificial intelligence language. ;; SOFTWARE DEPENDENCY ; Both SGMLS and EMACS are needed for generating the lisp-data. ; The executable file sgmls V 1.0 must be accessible via the PATH. ; EMACS generates an optional dummy converter and performs the conversion. ; It can be in interactive, editing mode or be run on an e-lisp batch ; as a commadline interpreter. ; Any LISP interpreter should be able to output the lisp-data to the ; user-specified format. Therefore interpreters of other Lisp ; dialects than E-Lisp can be used to write the converter. ;; HOW IT WORKS ; 1. run sgm-to-lisp on your sgml-document, save the output in lisp-data.el ; 2. run dtd-to-lisp on your dtd, save the output in converter.el ; 3. do M-x-load-file on converter.el and lisp-data.el in sequence. ; Now you have performed your first dummy conversion generating ; the empty string as output. ; 4. Make a copy of converter.el for each application for which you ; want to write a converter, e.g. converter-LaTeX.el, converter-lout.el, ; converter-nroff.el, converter-ps.el. Modify these files until you ; get the wanted output. ; ; For a converter-LaTeX.el you may write something like this: ; ; (defun DOC (arg) ; (insert ; "\\\\documentstyle[" APTSIZE "," ALANGUAGE "]{" AFORMAT "}" (newline) ; "\\\\begin{document}" (newline) ; arg ; (newline) ; "\\\\end{document}" (newline) ; ) ; ; or, for a converter-bourneshell.el, a syntagm such as ; ; \ ; \ ostasien ; \ ftp.lrz-muenchen.de ; \ major ftp site for East-Asian software applications, ; administered by a group of German scholars ; \ ; ; may be formatted by the following e-lisp functions: ; ; (defun FTPALIAS (arg) ; compound ; (setq SNAME "nosite") ; initialize components ; (setq SADR "site.nowhere") ; (setq SCOMMENT "") ; (arg) ; read component values ; (concat ; format compound ; (concat (newline) SNAME "=\\"" SADR "\\";export " SNAME) ; (if (not (equal SCOMMENT "")) ; (concat (newline) "# " (remove-linebreaks SCOMMENT)) ; "") ; ) ; (defun NAME (arg) (setq SNAME arg)) ; (defun ADR (arg) (setq SADR arg)) ; (defun COMMENT (arg) (setq SCOMMENT arg)) ; ; so as to produce the shell-script entry ; ; ostasien=ftp.lrz-muenchen.de;export ostasien ; # major ftp site for East-Asian software applications, administered by a group of German scholars ; ; Some basic principles to be induced from the examples are: ; ; 1. "(insert (concat .. arg ..)" ; is used in the topmost GI node and only there, ; as in the above example DOC. ; 2. "(setq ..) (arg) (concat *template*)" ; is used in complex (i.e. non-#pcdata) elements. ; The lower level GIs are initialized, then read in, then ; formatted according to the *template*, ; as in the above example FTPALIAS ; 3. "(concat arg)" can be simplified to "(arg)" in simple (i.e. ; #pcdata) elements. The lisp functions for these elements ; have no other form than that of NAME and ADR above. ;; BUGS / TO-DO-LIST ; The dummy converter that you have to start with is rather ; primitive. It would not be very difficult to generate a more ; sophisticated dummy converter, that would already fully apply ; the above principles. ; ; The macros invoke regexp replacement commands over and over again ; rather than doing an optimized replacement at a lower ; programming level. That makes them easy to write but time-consuming ; to execute. The best way to solve this problem will be to discard ; the present tool and incorporate its functions in sgmls itself, ; i.e. to allow sgmls to be invoked with a commandline syntax like ; ; sgmls [--lispprog] [--lispdata] [sgmlfile] ; ; where "--lispprog" would produce the output of function dtd-to-lisp, ; "--lispdata" of function sgm-to-lisp. ;; AUTHOR ; \ ; \ ; \Pilch\Hartmut ; \M.A., staatl.gepr. Dolmetscher f\&ue;r Chinesisch ; \ ; \\D\80687\Von-der-Pfordten-Str.\9 ; \\49\89\5804845\567642 ; \ucc02aa@lrz.lrz-muenchen.de ; \ ;;PROGRAM TEXT (setq case-replace nil) (defun replace-regexp-all (a b) (beginning-of-buffer) (replace-regexp a b nil) ) (defun shell-command-on-buffer (kmd) (interactive "scommand: ") (shell-command-on-region (beginning-of-buffer) (end-of-buffer) kmd nil 1)) (defun convert-simple-functions () (interactive) (replace-regexp-all "\\\$[^\\\\\\\\]\\\$\\"" "\\\\1\\\\\\\\\\"") ;protect quotation marks (replace-regexp-all "^-\\\$.*\\\$$" "\\"\\\\1\\"") ;convert field delimiters (replace-regexp-all "^(\\\$\\\\w+\\\$$ ^\\"\\\$.*\\\$\\"$ ^)\\\\1$" "(\\\\1 \\"\\\\2\\")" ) ;convert functions ) (defun convert-tokens () (replace-regexp-all "^\\\$\\\\w+\\\$ TOKEN \\\$\\\\w+\\\$$" "(setq \\\\1 \\"\\\\2\\")") ) (defun convert-endmark () (end-of-buffer) (previous-line 3) (replace-regexp "^C" "(sgmls-output-end)") ) (defun convert-remaining-functions () (replace-regexp-all "^(\\\$\\\\w+\\\$$" "(\\\\1 (concat ") (replace-regexp-all "^)\\\$\\\\w+\\\$$" " )) ;\\\\1") ) (defun sgmls-to-lisp () " convert sgmls output to a series of lisp functions, to whom application-specific meanings must defined in a series of defun-statements, before they can generate input for the intended application. " (interactive) (convert-simple-functions) (convert-remaining-functions) (convert-tokens) (convert-endmark) ) (defun sgm-to-lisp () " parse sgml doc using external parser sgmls and produce e-lisp code using e-lisp function sgmls-to-lisp " (interactive) (shell-command-on-buffer "sgmls") (switch-to-buffer "*Shell Command Output*") (sgmls-to-lisp)) (defun dtd-to-lisp () " generate dummy defun statements from a dtd, which must be in the current buffer, and write them to the *occur* buffer " (interactive) (list-matching-lines "!element" nil) (switch-to-buffer "*Occur*") (beginning-of-buffer) (kill-line 1) (replace-regexp-all "^.*!element \\\$\\\\w*\\\$ .*$" "\\\\1") (mark-whole-buffer) (upcase-region (region-beginning) (region-end)) (replace-regexp-all "^\\\$\\\\w+\\\$$" "(defun \\\\1 (arg) (concat arg))") (end-of-buffer) (insert "(defun sgmls-output-end () (setq ok \\"ok\\"))") ) Newsgroups: comp.text.sgml Date: 11 Oct 1993 08:40:59 UT From: Bruce Duyshart \ Organization: University of Melbourne Message-ID: <9328418.10367@mulga.cs.mu.OZ.AU> Subject: Opinion on ODA and CDA vs. SGML ? Hi, I've been following the threads in this group for a few months now and I thought here was as good a place as any to ask what the current opinion of the ODA and CDA standards. From what I've read, ODA was initially heralded along with SGML as being the new standards for document exchange. Both are ISO standards but it seems that SGML has really taken off, supported I suspect from the fact that it was adopted as a CALS standard by the DoD. What is the current standing with ODA (and it's associated interchange format ODIF)? I have heard little if anything of it's existence in recent times. The next standard I'd like to know about is CDA. I understand a little about this format developed by DEC (I think) and know that it is able to embed graphics into the file format itself. CDA and SGML, I believe, are both being explored by Microsoft for inclusion into Word. Is this an each way bet on an eventual winner in the document exchange stakes or are they intended for totally different purposes? Any opinions would be be gratefully received. Thanks Bruce -- Bruce Duyshart Lecturer Phone +61 3 344 4648 IT Co-ordinator Fax +61 3 344 5532 Department of Architecture & Building E-Mail bhd@arbld.unimelb.EDU.AU University of Melbourne Australia Newsgroups: comp.text.sgml Date: 11 Oct 1993 13:32:02 UT From: "Henrik Pettersson" \ Message-ID: \ Subject: HyTime Supplementary Material Hello! I'm working on my Master's thesis on the subject of HyTime and I need some supplementary material. I would be very grateful if anyone could help me with the following things: 1. Does anyone know of any example documents or DTDs using HyTime constructs? I'm reading the ISO/IEC 10744:1992(E) document and I need examples of documents and/or DTDs in order to understand the concepts better. 2. I would also appreciate references to other literature about HyTime, like articles and papers (there aren't any books yet, are there?). I have found two articles: the first in IEEE Computer vol 24, no 8, August 1991, by Charles F. Goldfarb, and the second in CACM, November 1991, by Steven R. Newcomb, Neill A. Kipp and Victoria T. Newcomb. 3. Are there any HyTime systems available? I've heard about HyMinder from TechnoTeacher, Inc, and a system from the University of Massachusets at Lowell, but I don't know much about them. Please mail any answers to henrikp@stdoca.ericsson.se (or to this newsgroup). Thanks in anticipation for anyone who has time to respond to these questions. Best regards, Henrik Pettersson -- Ericsson Telecom AB email: henrikp@stdoca.ericsson.se ST/ETX/TX/FD Phone: +46 8 6812333 S-126 25 Stockholm Fax: +46 8 7193055 Sweden Newsgroups: comp.text.sgml Date: 11 Oct 1993 16:26:57 UT From: "Wayne L. Wohler" \ Message-ID: <19931011.093835.390@almaden.ibm.com> References: <28s865$hqb@mailgzrz.TU-Berlin.DE> <15911@barclay.ed.ac.uk> <19931007.114849.214@almaden.ibm.com> \ Subject: Re: ISO 8879 what cahnges are being considered? [Henry S. Thompson] | Certainly Eliot's proposal for scoping of ID is the kind of thing I had | in mind. To briefly motivate my a further example, I should say that | my primary use of SGML is in the assembly and publication of | multi-lingual text corpora for (computational) linguistic research | purposes. This leads fairly directly to a desire to scope elements and | attributes, because I often get material from contributors containing a | modest amount of low-level markup, which although internally consistent | is not consistent across languages. So I'd like to share the higher | levels of the element structure from my own DTD across all the | contributions, with the option of introducing some alternative | structuring within a declared scope. | | For example, let us say that all documents are structured in terms of | | \ at or near the top, | | but that some documents have "thick" markup, with a lot of structure | under \

, and others have "thin" markup, with | | \ | | I'd like to be able to identify \

as a scoping element, and then | provide (re)declarations of \

within any particular instance of | \

. Now this is clearly VERY radical, in that it involves either | stating or at the very least invoking temporary declarations in the | midst what has heretofore been inviolably document instance. This sounds like an application of subdoc to me, but there are a few problems. Not the least of which is that SUBDOC doesn't work as well as it could. In 9.4, the standard states that a SUBDOC entity reference is treated like a data character. This makes it difficult to have subdoc references in element content. Of course, with SUBDOC entity references (and data entity references for that matter) there is no way to restrict the context in which references occur based on element hierarchy. The only restriction is that they must occur where data characters may occur (which in the case of subdocs is not very handy). A good rule for applications should be that subdoc and data entity references may only occur on entity attributes (which allows one to control the context by controlling the context of the element on which they are specified). If this didn't break existing SGML documents, this would be a good idea for the revision. Then we would need to have applications provide a means of processing such entity attribute subdocument references and you would be set. A blue sky notion that I just had which also addresses this requirement without an external entity reference is to allow inline subdocument declarations ... the subdoc must conform to the same SGML declaration anyway .... humm. -- Wayne L. Wohler Internet: wohler@vnet.ibm.com Dept G82/910M IBMMAIL: USIB29WX@IBMMAIL Publishing Solutions Development Phone: 1-303-924-0470 IBM Corporation PO Box 1900 Boulder, Colorado 80301-9191 Newsgroups: comp.text.sgml Date: 11 Oct 1993 18:32:22 UT From: "Jann VanOver" \ Organization: Boeing Computer Services Message-ID: <107035@bcsaic.boeing.com> Subject: Sudden Performance Problems with Omnimark Hi, all! I've got a VERY large file that has usually parsed in 2 1/2 hours on our SGI equipment with omnimark (version 2.0). Suddenly, the latest version of the file is taking a VERY long time - 15-20 hours! Exoterica has told us that there's a problem with memory allocation and is sending a newer version (2.2) of the parser. We are still waiting for it to see if it fixes the problem. My question is - what could we have changed in the data that would suddenly increase the parse time so much? The number of elements and attributes is approximately the same. The only real difference is there are 20,000 new refid attributes, but since there were over 200,000 before, I can't understand why this would change the performance so drastically. The System setup is all the same - same CPU, network, etc. The parse was run during a time when the system was NOT being used for anything else, so we've factored out a network load problem. And - has anyone else been experiencing performance problems with this parser? -- Jann VanOver vanover@atc.boeing.com -- or more directly, vanover@zulu.ca.boeing.com Newsgroups: comp.text.sgml Date: 11 Oct 1993 19:36:59 UT From: "Trevor Jenkins" \ Message-ID: <750368219snz@apusapus.demon.co.uk> References: <9328418.10367@mulga.cs.mu.OZ.AU> Subject: Re: Opinion on ODA and CDA vs. SGML ? [Bruce Duyshart] | From what I've read, ODA was initially heralded along with SGML as | being the new standards for document exchange. Both are ISO standards | but it seems that SGML has really taken off, supported I suspect from | the fact that it was adopted as a CALS standard by the DoD. What is | the current standing with ODA (and it's associated interchange format | ODIF)? I have heard little if anything of it's existence in recent | times. During the standardisation process of both SGML and ODA/ODIF there were many occasions when the supports of ODA thought that those of us in the SGML were encroaching upon their territory. This was never the case. The two standards are amined at different markets, viz publishing and the office (that's what the O is for :-) To compare the two is equivalent to comparing the (standadized) programming languages Cobol and Fortran. Now it happens that there are some overlapping areas where one standard is being applied rather than the other. Personally I would rather ``bend'' SGML to an ODA-style usage than use ODA but then I'm biased as for several years I was part of the British delegation to WG 8 meetings, including attendance at several meetings where the philosophical differences between SGML and ODA/ODIF were trashed out. I've gone on record with my technical objections to the formulation about ODA/ODIF (as part of the public review of ODA/ODIF) -- and it's those technical objection that make me use SGML for most ``ODA'' things. | The next standard I'd like to know about is CDA. I understand a little | about this format developed by DEC (I think) and know that it is able | to embed graphics into the file format itself. CDA and SGML, I | believe, are both being explored by Microsoft for inclusion into Word. | Is this an each way bet on an eventual winner in the document exchange | stakes or are they intended for totally different purposes? CDA has its roots in ODA. Writing a front-/back-end converter is possible though tedious (I know, I've written one of each) and requires that you are good at book-keeping both in your specification and in the convertor(s). CDA included SGML *-end convertors -- in that light you can consider CDA to be an alternative to SDIF (the SGML document interchange format) though it is by no means the light-weight format that SDIF itself is. | Any opinions would be be gratefully received. Opinions I have in plenty. :-) Rather than start a flame-war here I'm prepared to exchange email with interested parties. Regards, Trevor. -- Trevor Jenkins Re: "deemed!" 134 Frankland Rd, Croxley Green, RICKMANSWORTH, WD3 3AU, England email: tfj@apusapus.demon.co.uk phone: +44 (0)923 776436 radio: G6AJG "We need bigger and better books", Jimmy Tingle (Damned in the USA) Newsgroups: comp.text.sgml Date: 12 Oct 1993 11:00:36 UT From: "Michael G. Popham" \ Organization: University of Exeter, UK. Message-ID: \ Subject: Wanted: SGML Consultant/Analysts, Midlands (U.K.) This is posted on behalf of the recruitment company I.T Midland. Contact *THEM* for any further information (details below). ******************************************************************* ============================================ SGML - Consultant Analysts - Midlands (U.K.) ============================================ I am currently recruiting on behalf of a Midlands based consultancy who have an urgent requirement for two people/consultants who are fully experienced in SGML working preferably on Database Design. You will need to be a confident and flexible individual with plenty of initiative as you will be representing the company on site. The rewards are good, incorporating a salary of up to 25K (pounds sterling) and a company car if appropriate. They will help with relocation expenses and legal fees and provide rented accommodation at their expense if applicable. * Due to the increasing demand for SGML skills, my client would also be * * interested in the C.V.'s of peole who have this expertise at any level. * Please call Keith Jackson at I.T. Midland on 0602 484066 or Fax your C.V. on 0602 483696. (If calling from outside the UK, these numbers should start +44-602). N.B. we regret that we can only consider applications from citizens of EC member states. Newsgroups: comp.text.sgml Date: 12 Oct 1993 12:38:26 UT From: "David Peterson" \ Organization: University of Vaasa, Finland Message-ID: <1993Oct12.123826.24880@uwasa.fi> Subject: Re: FAQ where is it? Could somebody mail me a copy of the FAQ please? David Newsgroups: comp.text.sgml Date: 12 Oct 1993 16:10:19 UT From: "Matthias Butt" \ Organization: TUBerlin/ZRZ Message-ID: <29ektb$78v@mailgzrz.TU-Berlin.DE> References: <28s865$hqb@mailgzrz.TU-Berlin.DE> <19931005.112257.95@almaden.ibm.com> Subject: Re: ISO 8879 what cahnges are being considered? At first I'd like to thank Eliot for his rather helpful (and very practical) comments on my original posting. I do see that the HyTime extensions could be practical solutions to the two problems which I lamented about. However, because this thread seems to have drawn at least some interest, I'd like to add some comments: 1. I think the acceptance of any standard (official or quasi) will often depend on its scope being well-defined and then the standard covering the defined scope to the largest possible extent. I am not going to give an account of what the scope of ISO 8879 should cover in my opinion, but I think it is quite clear that identifiers and identifier references (for cross referencing, etc.) and attributes and their values clearly do fall into this scope. This is clear in particular as ISO 8879 does indeed try to address these issues and my lament was about the inadequacy of what ISO 8879 offers here (as opposed to the complete lack of any solutions offered). Indeed for both problems there are practical work-arounds which extend the syntactic restrictions applied to a document instance in some suitable form. I have implemented such work-arounds myself. I certainly do see that a standardized (or widely accepted) work-around has its advantages over a completely idiosyncratic solution. However, even if the work-around is an ISO Standard itself it stays a work-around and as such it does spoil the acceptability of the original standard. Maybe it does so even more because by introducing a work-around as a part of another standard even the standards organization themselves seem to admit that the original standard is incomplete! 2. As for the support of features defined in HyTime to address some of the issues that are not handled by ISO 8879: I am not as optimistic as Eliot with respect to SGML parsers, editors, formatters, etc., supporting such features in the near future! After 5 years of ISO 8879 there are still very few systems that support even the core features of this standard properly (I think there is still *no* system supporting all of the constructs defined in ISO 8879 on the market - right?). The Tables example has shown how hard it is to find a consensus between developers on how additional features should be supported with the result that now there are ISO tables, CALS tables, proprietary models such as Softquad's and only few systems support any of these models properly. (BTW, I do not want to suggest that a table model should be part of ISO 8879 such as scoped identifiers and arbitrary but restrictable attribute values! There seems to be good reason *not* to include application specific and formatting/processing oriented things into ISO 8879.) 3. As a result, I do still believe that some method of maintaining independent sets of IDs or even better some general scoping mechanism as proposed in some of the subsequent postings should be integrated into ISO 8879 as well as some method to get rid of the ridiculous fact that two attributes to the same element cannot have the same declared value. 4. Eliot has explained the second restriction to my satisfaction making clear that it is indeed related to the parsing process in case that some markup minimization is employed. OK. In any case with respect to commercial application this makes the thing just more ridiculous! I don't know, why things like tag minimization, shortrefs, and the like have been introduced to begin with. While some of these features do indeed make it easier to create SGML document instances using a plain text editor I simply cannot see where such a process should take place in a commercial environment (with the possible exception of software companies and their engineers creating documentation with such methods). I think that such methods of creating documents are absolutely unacceptable to nowadays users and indeed I do not know a single commercial application of SGML (be it in the aircraft industry, defense, publishing or whatever) where somebody would consider even for a second to have typists enter tons of documentation with an ASCII editor! Therefore it does not seem to be acceptable to have severe restrictions in what can be expressed in SGML based solely on mechanisms which are not used in commercial environments anyway. In fact I believe that the attribute values are not the only instance of such restrictions and furthermore I believe that many of the performance problems that most SGML applications still have today (even on heavy-weight workstations) are another result of ISO 8879 being created with no or wrong conceptions about the environments that would eventually employ this standard. 5. Just another complaint about the current standard: The inclusion/exclusion mechanism always strikes me as being conceptually incomplete and somewhat odd. The mechanism is incomplete because in effect it provides a method to change the permitted content of an element based on its context within the document instance. But only a very restricted set of changes can be applied. The restriction seems to be arbitrary, motivated only by the relatively easy method which can be used to express the possible changes to permitted element content (i.e., inclusion/exclusion). A complete solution would allow to redefine the content model of any element locally. This would also solve a very practical problem with inclusions and exclusions that occurs over and over again and (to my opinion) visibly spoils the quality of many DTDs: The problem stems from the (strange!) fact that the exceptions (inclusions/exclusions) apply not only to the contents of nested elements but also to the contents of the element being declared. For exclusions this is simply ridiculous as instead of \ we might as well write \. The only thing that is really interesting about exclusions is that something can be excluded from the contents of elements further down in the document tree. With inclusions one might make the point that some contents can be specified easier using an inclusion although I believe as long as the immediate content of the element being declared is concerned anything that can be expressed with an inclusion can also be expressed within the content model proper. I think everybody has experienced the problems resulting from the fact that I cannot include or exclude anything from the contents contained elements without including or excluding it as well in the immediate content of the element that has the inclusion or exclusion specified. An example: In your P (text paragraph) elements there are many things permitted to mix with the PCDATA content but there is no element like EMPHASIS for generic highlighting of phrases, because all highlighting should be based on specific contents (which is expressed by appropriate elements). Now there is a QUOTATION-BLOCK element to represent one or more paragraphs of cited text. Naturally in QUOTATION-BLOCK most of the elements permitted in general paragraphs must be excluded, because 1., you never want automatic text being generated by the formatter to occur within a quotation, 2., most of your specific elements don't make sense in a quotation anyway, and 3., you want maximum control over the final formatting of the quotation, in particular you want to make sure that text which is highlighted (italics etc) in the cited text will be highlighted in your Quotation as well. Apart from the differences in allowed content it would seem quite natural to make your QUOTATION-BLOCK elements content model to be just a sequence of P elements. After all, a quotation consists of paragraphs just as your regular running text! However, in most cases this won't work. First of all, you would have to exclude from QUOTATION-BLOCK all the elements that may occur in P elements outside of QUOTATION-BLOCK. This may already lead to slight disturbances because you might want to end your QUOTATION-BLOCK with a reference to the cited book but you cannot use your appropriate element (let it be named LITREF) here while excluding it from the Ps within QUOTATION-BLOCK, i.e., you cannot say \ While this is already too bad, further trouble is on the way when it comes to the EMPHASIS element that you want to use within quotations but nowhere else. This: \ Will not work, because you don't want to allow for emphasized text in between but only within the P (aragraphs)! So you take the hard route and put emphasis into the content model of P and exclude it in all other elements which have P in their content model (bad luck if these happen to be hundreds!). However, even that might not work if your DTD allows for quotation blocks somewhere nested within P elements. Admittedly allowing for block structures to mix with elements containing PCDATA immediately may be no good practice although it can be found frequently (e.g., paragraphs allowing for lists whose items in turn may contain a sequence of paragraphs). But even if we do obey the rules of good DTD design we may be stuck: Apart from QUOTATION-BLOCK we may want an element for small quotations that should be formatted in line with the surrounding text. No problem, let's declare a QUOTATION element that is put into the content model of P. But now because we had to exclude EMPHASIS from all contexts that allow P (except for QUOTATION-BLOCK) we cannot use it within QUOTATION, either ... An endless story with typically the only solution being to declare additional elements (e.g., a QP elements for paragraphs in QUOTATION-BLOCK) to the effect that users get very disturbed about the large number of different elements where they cannot see a reason to differentiate. Any comments? With regards, Matthias Butt Newsgroups: comp.text.sgml Date: 12 Oct 1993 16:32:06 UT From: "Eliot Kimber" \ Message-ID: <19931012.104707.116@almaden.ibm.com> References: <28s865$hqb@mailgzrz.TU-Berlin.DE> <19931005.112257.95@almaden.ibm.com> <29ektb$78v@mailgzrz.TU-Berlin.DE> Subject: Re: ISO 8879 what cahnges are being considered? [Matthias Butt] | 1. I think the acceptance of any standard (official or quasi) will | often depend on its scope being well defined and then the standard | covering the defined scope to the largest possible extent. I am not | going to give an account of what the scope of ISO 8879 should cover | in my opinion but I think it is quite clear that identifiers and | identifier references (for cross referencing, etc.) and attributes | and their values clearly do fall into this scope. ... However, even | if the workaround is an ISO Standard itself it stays a workaround | and as such it does spoil the acceptability of the original | standard. Maybe it does so even more because by introducing a | workaround as a part of another standard even the standards | organisation themselves seem to admit that the original standard is | incomplete! Because the HyTime standard is so closely linked to ISO 8879, it is difficult to not think of many of the aspects of HyTime as extensions to SGML, if for no other reason than that the logical place to implement many aspects of HyTime is in the SGML parser itself. It is also likely that those aspects of HyTime that are logically extensions to ISO 8879 may be in fact incorporated into 8879 as part of the revision process. The way I look at it, HyTime defines ways of doing things ISO 8879 doesn't account for but that most applications will need. It defines these things in an ISO standard. This gives me a firm foundation on which to base working systems that need these functions *without* having to wait for a revision to ISO 8879. Should ISO 8879 be revised to include these features, so much the better, but in the meantime, I'm still using standard mechanisms to get done what I need to get done. | 2. As for the support of features defined in HyTime to address some of | the issues that are not handled by ISO 8879: I am not as optimistic | as Eliot with respect to SGML parsers, editors, formatters, etc., | supporting such features in the near future! After 5 years of ISO | 8879 there are still very few systems that support even the core | features of this standard properly (I think there is still *no* | system supporting all of the constructs defined in ISO 8879 on the | market - right?).... The SGML applications we're designing for use within IBM require support for the LINK and SUBDOC features (which are the only unimplemented features of SGML that I'm concerned with (I have no use for CONCUR or RANK)), as well as basic HyTime features such as HyLex, reftype, namelocs, and so on. The applications we're designing cannot be fully implemented (and thus cannot realize their full potential) without these features. To the degree that we want to use off-the-shelf products to support and implement these applications (which do, the ability to select from a pool of available tools being one of the compelling benefits of standards), we are effectively registering a requirement against all SGML product vendors to provide these features. If you're an SGML product vendor and we're currently evaluating or using your product, you've already gotten some sort of formal requirement along these lines. Because IBM is so big, and therefore our range of applications and sub-enterprises is so large, not to mention the simple diversity of tastes, it's likely the case that there is room in IBM for every useful SGML product that can support our applications. But note that you have to support *our* applications -- I will not compromise my design to meet a limitation of a given product, nor will I employ non-standard ways doing anything defined in an applicable standard. Therefore, if your product does not have support for these features of SGML and HyTime, it is highly likely that I cannot use your product. Not "won't use" or "won't want to use" but *won't be able to use*. If that's not motivation to implement, I don't know what is. It is on that basis that I base my confidence. | 3. As a result I do still believe, that some method of maintaining | independent sets of IDs or even better some general scoping | mechanism as proposed in some of the subsequent postings should be | integrated into ISO 8879 as well as some method to get rid of the | ridiculous fact that two attributes to the same element cannot have | the same declared value. Wayne Wohler and I have started giving some serious thought to how SUBDOC could be used to solve this problem. SUBDOCs naturally define independent name spaces and it is relatively easy to create sub-set DTDs for subdocs that are logically fragments of documents but are syntactically complete documents. It happens that this use of SUBDOC also solves the problem of interchanging locally-defined elements because a subdoc, being, syntactically, a complete document, has its DTD subset bound with it. The main problem with using SUBDOCs as though they were document fragments is that SUBDOC entity references, like data entity references, can only occur where character data is valid, so they cannot occur in element content (contexts where only elements, and not data, is valid). However, you can get around this by adding to some or all elements a CONREF entity attribute used to reference SUBDOC entities. This approach also has the advantage of serving to constrain the context within which a given subdoc entity reference can occur, ensuring that the SUBDOC content is valid at the point of reference. For example, in the IBMIDDoc language, documents consist of one or more divisions (Div), which are nestable sections. Div is the natural object of re-use. If we define a new document type, called Div, which is that subset of the IBMIDDoc DTD that defines Div's and their content, I can create documents like so, where the ContentRef= attribute takes the names of SUBDOC entities: \ \ \ ] > \ \ \

\ \ \

\ \ \ \ \First Chapter \

This is the first chapter \

\ \ \ \ \ ]> \

Texas: The \Blue Bonnet State \

The \Blue Bonnet is the state flower of Texas. \

My processing application would be defined such that the processing of SUBDOC entities would be identical to processing normal external text entities *except* that ID references would have to be qualified with a docorsub= value (i.e., via a named location form element). My IDDOCDIV entities are now completely self contained, the references from within the main document are constrained by the content rules governing the elements that make the reference, and my ID name spaces are isolated by the SUBDOC boundaries, making blind interchange of logical document fragments reliable. The use of SUBDOCs to define the ID scoping has the advantage that the scoping is under author control rather than under some sort of algorithmic or application control. The more I think about this use of SUBDOCs, the more I like it, although we've only just started exercising this idea and haven't thought it through completely. It does have the advantage that, except for the way the application treats the data in a subdoc, it completely uses standard mechanisms defined in ISO 8879 and ISO 10744 (for cross-document references). -- Eliot Kimber Internet: drmacro@vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709 Newsgroups: comp.text.sgml Date: 12 Oct 1993 22:02:43 UT From: "Trevor Jenkins" \ Message-ID: <750463363snz@apusapus.demon.co.uk> References: \ Subject: Re: IADS installation tips [Michael G. Popham] | Assuming you do manage to install IADS, I hope you enjoy using the | software. And once again I would like to thank the developers, the | USAMICOM Integrated Materiel Management Center, and the International | SGML Users' Group for making the software available. I'm very surprised that the SGML Users' Group have allowed their imprimateur to be put on this product! I successfully FTP'ed the installation files from sgml1.ex.ac.uk to my PC and the system worked well enough but there I have a philosophical objection to the product. Throughout ISO 8879 the use of processing instructions is depracted but the whole intent of IADS seems to be to re-cast PIs as SGML tags. Worse still is that in much of the system tags are used directly for the LAYOUT of a document rather than for it's logical structure. Whilst the use of IADS might bring SGML to an even larger audience any new comers are going to be sadly mis-led about SGML by this product. -- Regards, Trevor. --------------------------------------------------------------------------- Trevor Jenkins Re: "deemed!" 134 Frankland Rd, Croxley Green, RICKMANSWORTH, WD3 3AU, England email: tfj@apusapus.demon.co.uk phone: +44 (0)923 776436 radio: G6AJG "We need bigger and better books", Jimmy Tingle (Damned in the USA) Newsgroups: comp.text.sgml Date: 13 Oct 1993 00:01:16 UT From: Lee Fife \ Message-ID: <199310130001.AA08260@avalanche.com> Subject: Re: ISO 8879 what cahnges are being considered? #3224 Thanks Matthias for the interesting comments. Right now, I want to respond to only one point: [Matthias Butt] | 3. As a result, I do still believe that some method of maintaining | independent sets of IDs [should be added to 8879] ... as well as | some method to get rid of the ridiculous fact that two attributes to | the same element cannot have the same declared value. Here at Avalanche, we use SGML for a lot of different purposes. One of the main uses is as a data definition language for large information structures that our programs operate on. These structures may have little resemblence to typical text documents, representing instead such things as directed graphs, rule bases, etc. We also found the restriction on duplicated tokens in attribute definition lists to be unworkable. In this context, we're treating SGML as a programming language and the restriction that the same set of enumerated types cannot be used as the value of two different attributes for a given element is not acceptable. (Think of boolean flags.) Our solution: Add a new mode to our SGML parser where it runs with Avalanche extensions enabled. One of the extensions allows duplicated tokens and disallows omission of the attribute name for the attributes using these duplicated tokens. When parsing SGML documents intended to feed our datastore, we enable these extensions. When parsing anybody else's documents or documents intended for other (more normal) uses, we don't enable them. For our internal uses, this is no longer a problem, although I'd still like to see it addressed in 8879. -Lee -- Lee Fife lee@avalanche.com Avalanche Development Co 947 Walnut Street Boulder, CO 80302 303-449-5032 Newsgroups: comp.text.sgml Date: 13 Oct 1993 15:06:17 UT From: "Bruce Hunter" \ Message-ID: \ References: <28s865$hqb@mailgzrz.TU-Berlin.DE> Subject: Re: ISO 8879 what cahnges are being considered? Although I have sympathy (and empathy) with much of what Matthias says, I just want to add few comments about some particular points he raises. [Matthias Butt] | 4. \ | I don't know, why things like tag minimization, shortrefs, and the like | have been introduced to begin with. While some of these features do | indeed make it easier to create SGML document instances using a plain | text editor I simply cannot see where such a process should take place | in a commercial environment (with the possible exception of software | companies and their engineers creating documentation with such | methods). I think that such methods of creating documents are | absolutely unacceptable to nowadays users and indeed I do not know a | single commercial application of SGML (be it in the aircraft industry, | defense, publishing or whatever) where somebody would consider even for | a second to have typists enter tons of documentation with an ASCII | editor! This does indeed happen, in the field of retroconversion. Often economic factors mean that it is sometimes preferable to ship your existing documents to somewhere offshore and have them re-keyed as SGML files. Where labour is cheap enough, this is more cost-effective than investing in complex (implies a learning curve) and costly (relatively) SGML authoring tools. In all such cases I have worked on the keyboarding agency really does use a plain ASCII editor, and gets people to key in everything manually. In these cases minimization features can be a godsend to help you reduce keying costs. I'm not saying that this is an ideal situation, just that it is one which does occur in commercial projects. In the projects I have been involved in these minimal files are then passed through a parser (or, more often, also a "pre-parser") to produce canonical files. | In fact I believe that the attribute values are not the only instance | of such restrictions and furthermore I believe that many of the | performance problems that most SGML applications still have today (even | on heavyweight workstations) are another result of ISO 8879 being | created with no or wrong conceptions about the environments that would | eventually employ this standard. OK, Matthias is partly right, but no-one has a crystal ball. There are some restrictions built into the Standard which are based on what it was reasonable to expect a computer to be able to do in a reasonable time at the time the Standard was being developed (content models which need no lookahead to determine ambiguities [ie. a roughly LL(1) grammar from a DTD], no duplicate values for some attribute types, etc). There are some "restrictions" which are only by default (name lengths, capacities, etc). There are also some more fundamental restrictions, which relate to the current discussion about scoping. OK, I admit that these are also much more solvable now than they were 10 years ago, but at that time some of these were major obstacles (look at the restrictions on content models and content model algebra - these are there because at the time it wasn't felt reasonable to expect a parser to require even single token lookahead - never mind having to try and deal with a context-sensitive grammar!). Introducing scoping (in its fullest sense) as a concept to the Standard needs a major rethink of the scope (no pun intended) of the Standard. To do so would change the level of the grammar that a DTD defines from context-free to context-sensitive. This would thence require a commensurate increase in the sophistication of tools needed to process SGML files. The availability of tools (or technology) seems to have been a concern of the Standard's developers, and, as an engineer, I can't from a practical viewpoint disagree with this (although I can also share the feelings of those who do not feel that this should be a factor) [how's that for fence-sitting :-)?] Also, as Matthias correctly points out, the tools that are out there now (some 7 years after the publication of the Standard) are all restrictive to one extent or another. To further add to the set of requirements such tools must satisy will serve to set back even further the arrival of tools which perform all the functions the user may be led to expect (just a point of view - I can't prove any of this). | \ I agree with the frustrations felt by Matthias, in that exceptions provide a limited half-way house between context-free and context-sensitive languages. They are a result (IMHO) of the mixed-influences bearing upon the Standard's developers at the time, along with such linear constructs as marked sections. I wouldn't presume to suggest what all these influences were, but I reckon it does appear that those which came from the publishing field have sometimes taken precedence over what the CS purists may prefer. As an example, and to show empathy with Matthias, in a current project I am working on it would be wonderful if content models were allowed to be context-sensitive. In this particular application, large SGML files are decomposed down to a certain level, and these "chunks" are then stored in a relational database (a typical "record-oriented" -- with all the restrictions that implies -- application, I'm afraid). When editing these "chunks" (or records) it would be wonderfully-easy to create a DTD to say that the content of a record may vary based on previous parentage. Unfortunately, in the way that I would like to do it, this is not possible. We have to create specific work-arounds to allow us to do something equivalent. To summarise, I can agree with some of Matthias' frustrations, but we don't live in a perfect world, and we need to make the best of what is available. I'm not saying that we shouldn't pursue improvements, just that we shouldn't necessarily denigrate what we have got without accepting that other than today's conditions may have prevailed at the development stage. Just my opions, no offence intended to anyone. Best wishes, Bruce Hunter SGML Systems Engineering bruce@sgml.dircon.co.uk Newsgroups: comp.text.sgml,comp.infosystems.www Date: 14 Oct 1993 13:08:28 UT From: "Ted Dunning" \ Message-ID: \ References: \ <60.2934.4893.0N185587@canrem.com> <6OCT199314012903@cns35.fnal.gov> Subject: Re: Why We Need a New Math Notation That Is [John Goodwin] | I am not suggesting we abandon the ideographic system of Math symbols, | but that we supplement it with another alphabetic one. This is | precisely analogous the debate whether Chinese should be represented in | E-mail by a typesetting system like:** | | Macro::ming[void] | < | | /box /center/horizontal-stroke /*the ideograph for "sun"*/ | | /invert/U-box /center(/horizontal-stroke /horizontal-stroke) | /* ideograph for "moon"*/ | > | | ming(). | | or by the alphabetic Pin-Yin system. Math needs an alphabet of its own. Frankly, Chinese in email should be represented by one of the encoding systems which support direct encoding of ideographs. Neither pin-yin nor an ASCII-ized graphical representation are adequate for different reasons. The tradeoffs are very different for mathematics, but it can be argued that the analogy between entry methods for Asian languages and entry methods for mathematics is a good one and that the conclusion to be drawn is that the internal representation in the machine need not correspond directly to either what you type, nor to what you see. The model of a keystroke = a byte = a glyph does not hold in the real world. Newsgroups: comp.text.sgml Date: 14 Oct 1993 13:13:50 UT From: "Ted Dunning" \ Message-ID: \ References: <9310061920.AA05531@thelonius.mitre.org> Subject: Re: Quantities in SGML Declaration [John Burger] | One thing that occured to me was to DEcrease some of the quantities, so | that an application could allocate less memory for whatever internal | data structures it uses. This is analagous to using shorter variable names so that a compiler uses less memory. As such, I would call it a false economy. Newsgroups: comp.text.sgml Date: 14 Oct 1993 18:37:00 UT From: Peter Bergstrom \ Message-ID: <2CBE0D79@noak.vxo.telub.se> Subject: Question on Entity Sets Hi! I've stumbled over something strange (?) in a test of Entity Sets defined in Annex D to ISO 8879, and I have a few questions. In the %ISOpub set the following entities are defined (among others): \ \ \ All other entities are "resolved" onto it's own name, i.e., the name of the entity is repeated in between the square brackets in the entity data. Why are not these three entities defined in the same way? This also leads me to questioning the syntax in those entities: Have I understood the notation, and should a parser treat those entities in the way I have always believed? I'll try to explain: I thought the square brackets indicated that the entity should not be replaced by text, but rather by a new entity with the name that was inside the square brackets. Today, I'm unable to find any reference to what the square brackets mean in the standard, nor in SGML Handbook. Can someone point me to the definition, and/or explain what it all means? All parsers I've seen so far produces some sort of "output", and so far they have all treated those entities in the way described above, I mean "resolved" the entity in a new (?) entity. But is this strictly correct, or what can you expect from a parser? Is the output clearly defined somewhere, or is it never defined in any more detail than "data passed on to an application"? Anyway, to be able to use the parsers I have, I'm planning to change the definitions of the entities above into: \ \ \ This solves my problem in my system, but how shall I ensure that others who use my DTD, and therefore maybe another, and literally correct, ISO entity set, changes their entity set to be used prior to SGML export accordingly? I don't want to receive the entities \&emsp3; \&emsp4; or \&ballot; since they are not defined in any of the ISO entity sets. And if I change the definition, am I then committing an error in still referring to the ISO entity set, including the copyright clause? In hope of being enlightened once more, Peter Bergstrom Telub Inforum AB Sweden Newsgroups: comp.text.sgml Date: 14 Oct 1993 20:32:25 UT From: Chet Ensign \ Message-ID: <199310142057.28518.ifi@ifi.uio.no> Subject: Larry Bohn, Pres. SGML Open, talks to SGML Forum of NY This article summarizes the October general meeting of the SGML Forum of New York. The meeting was held on October 5, 1993 at the McGraw-Hill Conference Center, McGraw-Hill Inc, New York. Cesare Del Vaglio, President of the Forum began with an announcement of the schedule for the next two general meetings. On November 30, Peter Jerrum of Novell will present a case study on Novell's adoption of SGML for their technical writing department. Chip Pettibone of EBT will demonstrate Dynatext, the SGML online document viewer that Novell used. On January 18th, Tommie Usdin of ATLIS Consulting will give a presentation on developing DTDs. David Harkness of WordPerfect Corp. will present Intellitag for Windows, their tool for linking WP into SGML. PRESENTATION Larry Bohn, Senior Vice President at Interleaf, Inc. and President of SGML Open Larry Bohn spoke about the mission, objectives and goals for SGML Open, starting with a quick look at the history of open systems consortia. Alliances between manufacturers were first done by hardware manufacturers as a way to define interface standards and assure customers that products would work together smoothly. Open systems consortia for software are a more recent development, but they mirror the growing concern that organizations have in protecting their investment in data. SGML Open is a consortium of software and services providers. In addition to makers of SGML systems, members include Apple, Caterpillar, Dow Jones, Frame, Interleaf, Mead Data Central, Novell, Oracle, and Xerox.There are currently 30 voting members. The goal is to have 50 voting members and 250 subscribing members within the year. It's Board of Directors is: Chairman: Yuri Rubinsky, Softquad Chief Marketing Officer: Pam Gennussa, Database Publishing Systems Chief Technical Officer: Paul Grosso, ArborText President: Larry Bohn, Interleaf Dave Seaman, InfoDesign Bruce Brown, Oracle Jay Cambias, Westinghouse The mission of SGML Open is to: ** Accelerate the adoption, application and implementation of SGML. ** Establish interoperability guidelines for SGML products. ** Complement the work of the standards bodies by addressing the Standard in light of real world situations. A primary objective for the consortium is to encourage the market to grow. Larry showed a chart illustrating how interest in SGML is growing much faster than adoption. Recognizing that one of the factors slowing down adoption is that it is still not easy to put a system in place, he said; "We have to make it easier for customers to adopt SGML." Larry then summarized the marketing and technical agendas for SGML Open. Marketing objectives for the next year are to promote a 50% increase in customer investment, publicize the fact that vendors are busy addressing important technical issues, and play a central role in the broader open systems movements (such as OSF). SGML Open has defined their target as MIS managers and industries with "high information density and high technology adoption." One function of SGML Open will be to amortize the marketing costs for the members by educating the marketplace about SGML. They will be producing publicity materials, "beginner's kits," case studies, a newsletter, a recognizable logo and other services. "Eventually, anyone interested in SGML can get a kit from us that will educate them about the standard. The vendors can then concentrate on educating customers about their products." SGML Open also has a defined technical agenda. "SGML is a broad standard," Larry said; "and interoperability requires resolution of a lot of nuts-and- bolts-types of problems." The technical agenda for the next year is to articulate a definition of the requirements for interoperability of SGML systems, organize a technical working group to begin addressing those problems, and propose a resolution for the entity management issue. "We want to address the entity management problem right away so that we can show some early successes", he said. The technical committee also intends to develop guidelines for RFP writers. This will help customers to recognize and address all the important issues before they start to talk to vendors. To date, SGML Open has held a formal meeting of the board of directors and several telephone conferences to set the agenda for the coming year. Paul Grosso of ArborText has organized a technical issues working committee. SGML Open will give a one day training session at the next Seybold conference and they'll hold a meeting at SGML '93. They are also actively seeking an executive director and expect to have that position filled in the next few months. The floor was then opened for questions. In response to a question about levels of investment in SGML, Larry referred to a study by David Goodstein that forecast 2x growth in the SGML market over the next several years. He also mentioned a Deloitte & Tuche study that reported 30% of the largest U.S. companies will adopt SGML in the next few years.Interleaf itself has increased its investment in SGML 3 fold over the past three years and it is now the second largest R\&D group in the company. In response to a question about Microsoft, Larry said that he understands they have an effort underway to add SGML capabilities to Word. He said that he'd heard a product was already in the early stages of beta testing. Microsoft has expressed their interest in joining the consortium once they have a product on the market. In response to a question about the relative cost of SGML products and services, Larry said that the software is around 20% of the total investment for an SGML system. His experience suggests that the highest percentage of the cost goes to data conversion and application development. David Silverman of Data Conversion Laboratory added that, in their experience, training is also a significant portion of the expense. In response to a question about the competing standards, Larry said that SGML Open really thinks there are only two and ODA (Office Document Architecture, another ISO standard), is not one of them. The Open Document Movement organized by Apple to compete with OLE is one competitor, and Microsoft is the other. "Only Microsoft has the market clout to take something like RTF and make it the standard." But he doesn't believe that Microsoft intends to do that. They already have a lot of SGML expertise in-house, they have a product in development and they have themselves recognized what SGML has to offer; their multimedia products Encarta, Cinemania and Bookshelf were all created using SGML. Nevertheless, Larry cautioned that it is important to keep evangelizing the desktop vendors, especially Microsoft, Adobe, and Apple. They need to continue to see that there is a market demand for them to embrace open systems. Larry can be reached on the internet at bohn@.ileaf.com He invited anyone interested in joining SGML Open to contact him for more information. VENDOR PRESENTATION Gerry Fischer, ArborText - ADEPT Series of SGML Products After the break, Gerry Fischer of ArborText demonstrated their ADEPT series products: ADEPT EDITOR, a native SGML authoring & editing program, PUBLISHER, a FOSI-based batch pagination and composition system, and ARCHITECT, the tool for building DTDs and incorporating them into the system. ARCHITECT makes it possible to use any DTD you choose, but Gerry pointed out that many the standard industry DTDs are already included, including the AAP, J2000 and CALS DTDs and DOCBOOK. The ADEPT EDITOR is a native SGML editor. Although it shows the text in a semi-WYSIWYG with the element start and end tags in "collar stay" markers, the underlying file is never converted from SGML to an internal format. EDITOR includes an interactive parser that -- unless explicitly turned off -- always acts to keep the instance valid. It controls what the writer can do at any given point in the document. EDITOR can pop-up windows that show what can be included at the cursor and a writer can simply double-click to insert the element. EDITOR has elegant tools for creating tables and equations. The table editor currently requires either the CALS or the AAP table DTDs; Gerry said that ArborText intends to support any other table DTD that looks like a de facto standard. The equation editor is a cut-and-paste environment that makes liberal use of icons and push-buttons. It uses the AAP math DTD. EDITOR can exploit the SGML structure and hierarchy to show outlines, partial display of a document down to a certain level and hypertext links within the document. EDITOR can also take generated text from PUBLISHER back into the document. This lets you view the SGML document with tables of contents, counters, cross-references, etc, resolved and in place. EDITOR is extremely programmable. All the menus, keys and mouse actions can be programmed and the user interface can be extensively customized. The programming features of EDITOR will enable you to attach behaviors to tags themselves. The programming language is an ArborText macro language that leans heavily towards "C." EDITOR runs on all major UNIX platforms under Motif and Open Look. Gerry said that a practical configuration would be a SPARC II workstation with 32-34 Megs of memory. ArborText is working on a port of EDITOR to Windows. He expected it to be in beta-testing by the 1st quarter of next year. It will not be a complete port of the UNIX product. It will not have PUBLISHER nor will it have the ability to roll generated text back into the instance. However, it will have the semi-WYS display, table and equation editors and an external entity manager. PUBLISHER is a FOSI-based pagination and composition system for SGML instances. In keeping with its history, it is geared towards CALS-type technical documents. These are relatively simple styled documents. PUBLISHER is not a system for producing elegantly styled layouts. However,ArborText intends to extend the publishing engine, including supporting DSSSL when it becomes a standard. Gerry said that the speed of PUBLISHER is 15 - 30 seconds per finished page depending on the number of tables, equations, and graphics contained in the formatted document. Gerry can be reached on the internet at gnf@arbortext.com. Anyone interested in the ADEPT series products can contact either Gerry or Norma Haakonstad (njh@arbortext.com) for more information. Their phone number is (313) 996-3566. ABOUT THE FORUM The SGML Forum of New York is a nonprofit organization devoted to the exchange of ideas and information about SGML. Organized primarily as a user group, the Forum seeks to promote an understanding of the scope and benefits of the SGML standard and to further its practical application within a variety of industries, including publishing, financial services, insurance, pharmaceuticals, and telecommunications. Membership helps support the Forum's activities, but it is not required for attending general meetings. If you would like to receive word about future Forum meetings, send your name and address to: SGML Forum of New York, Inc. Bowling Green Station P.O. Box 803 New York, NY 10274-0803 We will be happy to put you on our mailing list. Newsgroups: comp.text.sgml Date: 15 Oct 1993 12:10:17 UT From: "Jan Ekstr�m" \ Organization: L.M. Ericsson A/S Message-ID: <1993Oct15.121017.6534@ericsson.se> Subject: E-mail address to Zandar Corporation? Hello netters, Do you know if Zandar Corporation have an E-mail address ? This company have made a program called TagWrite, which should be able to change a document written in Word for Windows (saved in RTF format) to a tagged document like SGML-documents. I would also like to hear from anybody who have experience with this program!!! Regards Jan Ekstr�m -- Jan Ekstr�m (Dep. LMD/T/LG)| E-mail: lmdjeg@lmd.ericsson.se L. M. Ericsson A/S | Phone.: +45 33 88 35 32 8, Sluseholmen | Fax...: +45 33 88 31 23 DK-1790 Copenhagen V, Denmark | Memo..: LMD.LMDJEG Newsgroups: comp.text.sgml Date: 15 Oct 1993 15:14:23 UT From: Christian Saucier \ Organization: \ Message-ID: \ Subject: SGML public domain parser I've heard that there is a public domain DTD parser that will certify an instance (tagged document) file. I don't have much more information about it, has somebody heard about that or does someone knows if there are any other SGML public domain goodies available somewhere by FTP? Thanks for any information. Christian. Newsgroups: comp.text.sgml Date: 15 Oct 1993 15:17:18 UT From: "Wayne L. Wohler" \ Message-ID: <19931015.085258.44@almaden.ibm.com> Subject: Question on Entity Sets [Peter Bergstrom] | I've stumbled over something strange (?) in a test of Entity Sets | defined in Annex D to ISO 8879, and I have a few questions. | | In the %ISOpub set the following entities are defined (among others): | | \ | \ | \ | | All other entities are "resolved" onto it's own name, i.e., the name of | the entity is repeated in between the square brackets in the entity | data. Why are not these three entities defined in the same way? | | This also leads me to questioning the syntax in those entities: Have I | understood the notation, and should a parser treat those entities in | the way I have always believed? | | I'll try to explain: I thought the square brackets indicated that the | entity should not be replaced by text, but rather by a new entity with | the name that was inside the square brackets. Today, I'm unable to | find any reference to what the square brackets mean in the standard, | nor in SGML Handbook. Can someone point me to the definition, and/or | explain what it all means? I haven't had a conversation discussing this with any of the people who actually defined the syntax used here but our use of these entity sets has always assumed that there is nothing fixed or 'standard' about the content of the replacement text for these entities. SDATA means that the replacement text of the entity is inherently system-dependent and therefore is subject to be changed for any system on which they are processed to facilitate that processing. At IBM, we have changed them to contain DCF BookMaster symbol references, in those cases where we had symbols which applied. The writers of the standard had to adopt SOME syntax that was not process specific for use within the standard for the content of these entities and you see the result. The importance of this work is that the names of the entities are standard, and therefore, for all applications using these definitions it will not be necessary to invent new entity names. Also the transformation of document data from one application to another is enhanced. And finally the effort required to support a new application will not include defining support for a new set of character entity names. As an example of how this might work, consider two processing systems that support the same SGML application but have very different underlying processors, which support referencing characters not found in the current character set in very different ways. For each of these systems, one can change the replacement text for each of entities in these entity sets so that the appropriate action is taken on each of these systems. Now take a document that refers to these entities on one system and send it (without the public entity set definitions) to the other. Everything still works because at both installations, the locally supported definitions for SDATA content are used. So it is not necessary to have a standard for SDATA content since SGML already has the level of indirection that is required built in. Now there is nothing that states that another level of indirection might not be used. In that case, each processor would have to recognize the standard SDATA notation and perform the conversion to its own primitives. This would prevent the proliferation of many different entity sets for each process (there could be several at a single site, depending on the number of different processes used at a given site). Given that such a syntax has already been codified (although not rigorously defined), it may be useful to use it but it is not necessary in order to realize the benefits of SGML. Like so many things in SGML, there are several options and the key to good design is picking one which meets your needs yet does not cause an undue burden on the processing system. Your data, the most important part of all of this, will be unaffected by your choice. | In hope of being enlightened once more, Hopefully this provided some light and won't generate too much heat :-). -- Wayne L. Wohler Internet: wohler@vnet.ibm.com Dept G82/910M IBMMAIL: USIB29WX@IBMMAIL Publishing Solutions Development Phone: 1-303-924-0470 IBM Corporation PO Box 1900 Boulder, Colorado 80301-9191 Newsgroups: comp.text.sgml Date: 15 Oct 1993 16:27:39 UT From: "Gary Benson" \ Organization: Fluke Corporation, Everett, WA Message-ID: \ References: <28s865$hqb@mailgzrz.TU-Berlin.DE> \ Subject: Re: ISO 8879 what cahnges are being considered? [Matthias Butt] | I don't know, why things like tag minimization, shortrefs, and the like | have been introduced to begin with. While some of these features do | indeed make it easier to create SGML document instances using a plain | text editor I simply cannot see where such a process should take place | in a commercial environment (with the possible exception of software | companies and their engineers creating documentation with such | methods). I think that such methods of creating documents are | absolutely unacceptable to nowadays uesers and indeed I do not know a | single commercial application of SGML (be it in the aircraft industry, | defense, publishing or whatever) where somebody would consider even for | a second to have typists enter tons of documentation with an ASCII | editor! [Bruce Hunter] | This does indeed happen, in the field of retroconversion. Often | economic factors mean that it is sometimes preferable to ship your | existing documents to somewhere offshore and have them re-keyed as SGML | files. Where labour is cheap enough, this is more cost-effective than | investing in complex (implies a learning curve) and costly (relatively) | SGML authoring tools. | | In all such cases I have worked on the keyboarding agency really does | use a plain ascii editor, and gets people to key in everything | manually. In these cases minimisation features can be a godsend to | help you reduce keying costs. I'm not saying that this is an ideal | situation, just that it is one which does occur in commercial projects. | In the projects I have been involved in these minimal files are then | passed through a parser (or, more often, also a "pre-parser") to | produce canonical files. We use ASCII because we are pretty sure it's going to be consistent through the years. You can't be certain when the next RTF or PDF (Adobe's new document format), or whatever is going to spring into being and become the so-called standard for 6 months or a year. ASCII has the advantage of longevity, something that you find precious little of here in the middle of the information revolution. Besides, we hire writers for their technical writing skills, not for page layout and typography. Any time spent doing anything other than writing diminishes that writer's effectiveness. Particularly now, with so many good tools coming out for pre-parsing text files, there is little justification for doing "desktop publishing". Why do you suppose all those DTP packages invariably include a "Save As ... ASCII" selection? In addition, ASCII has proven to be the only common element for all systems and file formats. It is the lingua franca of computer communication. Many products convert in various directions between various prorietary formats, but with ASCII, you always have a known starting point, and EVERY pacakge will be able to read ASCII files, regardless of what formats they are able to output. In our case, there is a historical reason for our attachment to ASCII archives ... when we did typesetting, we wanted to have the writers communicate their markup with no codes other than to indicate heading levels. The general placement and location of text objects gave clues as to the typography required, a concept we refered to as "implied markup". Later, when we were able to use perl to recognize the same implications as those we used for the typesetter operator, we were able to automate typesetting. Now, we use the exact same implied markup to output Agfa CAPS generic codes, which looks to be our last stop before we go fully into SGML. One thing has remained constant: the ASCII source files. I can literally take a ten-year old file and run it through a perl program today that will generate pages exactly like the original typeset galley, hand pasted-up manual. Actually, the pages will be better, because machine pagination can work on finer bits of text than a layout artist would have been willing to do more than a couple of times in any given manual. No other "standard file format" even existed 10 years ago, and the RTF of today may be totally incompatible with the one we have 10 years from now, if it even persists that long. -- Gary Benson-_-_-_-_-_-_-_-_-_-inc@tc.fluke.com_-_-_-_-_-_-_-_-_-_-_-_-_-_- The greatest dangers to liberty lurk in insidious encroachment by men of zeal, well meaning but without understanding. -Justice Louis Brandeis Newsgroups: comp.text.sgml Date: 15 Oct 1993 21:34:28 UT From: "Eric R. Skinner" \ Organization: Exoterica Corporation Message-ID: <1993Oct15.213428.29924@xgml.com> References: <2CBE0D79@noak.vxo.telub.se> Subject: Re: Question on Entity Sets [Peter Bergstrom] | In the %ISOpub set the following entities are defined (among others): | | \ | \ | \ | | This also leads me to questioning the syntax in those entities: Have I | understood the notation, and should a parser treat those entities in | the way I have always believed? Strictly speaking a parser should only expand the entity. The expanded characters should be flagged as SDATA. | I'll try to explain: I thought the square brackets indicated that the | entity should not be replaced by text, but rather by a new entity with | the name that was inside the square brackets. Today, I'm unable to | find any reference to what the square brackets mean in the standard, | nor in SGML Handbook. Can someone point me to the definition, and/or | explain what it all means? The standard doesn't say anything about those square brackets. What you are used to here is the semi-standard behavior of certain applications wrapped around parsers. Some older applications which simply "normalized" data would grab the SDATA expansion text, strip the square brackets, and turn the expansion text into an entity reference. This would work most of the time but would break with the three entities you list above. In other words, the occurence of "\✗" in the instance would be normalized by the application to "\&ballot;" in the output which is clearly wrong. To be fair, our own XGML Normalizer product did this, amongst others. Normalizer hasn't been sold commercially for years. Current products which do "normalization" shouldn't grab the expansion text of the SDATA entity; they should rely on the entity name itself. You don't have to use the square bracket convention with SDATA entities. Since ISO was defining certain entity names, they also had to come up with some standard encodings for those entities. Since the application has to be on the lookout for the expansion text, you can use whatever you want as long as the application is ready for it. | But is this strictly correct, or what can you expect from a parser? Is | the output clearly defined somewhere, or is it never defined in any | more detail than "data passed on to an application"? As I said above, the parser is not responsible for this behavior. The application is. So with modern applications, you don't have a problem. In modern applications, if you expect users to use the ISO entity sets, you have to tell the application to expect SDATA text wrapped in square brackets, and you have to tell the application what to do with all the possible strings. This may seem like a lot of work but of course the point is that this mechanism provides system independence. In OmniMark, for instance, I would write "translate rules" that looked for SDATA text matching the expansion of the entities, and associate processing actions with each one. Like this: translate SDATA "[emsp3 ]" ; processing actions for the 1/3 em-space go here, ; such as this example for a TeX-based system: output "\\one-third-emspace" translate SDATA "[emsp4 ]" ; or perhaps if I *had* no capacity to deal with ; emspaces, I'd just want to output a space output " " translate SDATA "[ballot]" ; processing actions go here | Anyway, to be able to use the parsers I have, I'm planning to change the | definitions of the entities above into: | | \ | \ | \ | | This solves my problem in my system, but how shall I ensure that others | who use my DTD, and therefore maybe another, and literally correct, ISO | entity set, changes their entity set to be used prior to SGML export | accordingly? I don't want to receive the entities \&emsp3; \&emsp4; or | \&ballot; since they are not defined in any of the ISO entity sets. If you write to me telling me what software you're using and what you're trying to do, perhaps I can help. In particular, if you're using old Exoterica software we should be able to suggest something. -- Eric R. Skinner ers@exoterica.com Exoterica Corporation Tel +1 613 722 1700 Ottawa, Canada Fax +1 613 722 5706 Product information: info@exoterica.com Newsgroups: comp.text.sgml Date: 16 Oct 1993 21:06:31 UT From: "Allen Renear" \ Organization: Computing and Information Services, Brown Univ. Message-ID: \ Subject: Job: SGML/TEI Literary Textbase N.B. This is an SGML/TEI textbase project. The NEH/Brown Women Writers Project is adjusting its organizational structure to cope with the increasing scale and complexity of its projects. Part of this restructuring is hiring a new Director to replace the two part-time Co-Directors. We believe this new position is an unusual opportunity for a qualified manager of scholarly projects to participate in an influential enterprise -- one that is both recovering previously inaccessible women's writing and at the same time exploring the new technologies of scholarship. We would appreciate your posting this notice and bringing it to the attention of anyone you think would be qualified. Susanne Woods, Franklin and Marshall College Allen Renear, Brown University Co-Directors, Brown Women Writers Project (This notice is being cross-posted to HUMANIST, EXLIBRIS, and PACS-L.) ------------------------------------------------------------------------ Position Available Director Women Writers Project Brown University The Director administers a major NEH-funded computer textbase project that is collecting women's writing in English, 1330-1830; conducting new research on texts, cultural history, and information technology; producing print and electronic publications and curricular materials; and developing innovative approaches to scholarship and teaching. The Director reports to the Dean of the Faculty at Brown University and to the Chair of the WWP Board of Scholars and has full managerial and administrative responsibility for the Project, overseeing all WWP activities and personnel, allocating resources, coordinating workflow; determining, in concert with the Chair of the Board of Scholars, strategic directions for research, development, and fund raising; and representing the Project to the scholarly community. Qualifications: three years of experience managing scholarly publishing, editing, bibliographical, or textbase projects, or equivalent. Advanced degree in literary or library studies required, Ph.D. preferred. Desirable: experience in textual criticism, scholarly editing, pre-Victorian women's literature, or humanities computing. A adjunct faculty appointment in an appropriate academic department may be possible for qualified candidates. Salary is competitive. This position may involve considerable travel to academic conferences in North America and Europe. To apply send a current c.v. to Marilyn Netter, Search Coordinator, Box 1852, Brown University, Providence, RI 02912 by October 29. For further information phone 401-863-3729 or fax 401-863-7412. Please post or distribute Newsgroups: comp.text.sgml Date: 18 Oct 1993 10:13:41 UT From: "Martin Josko" \ Organization: Leibniz-Rechenzentrum, Muenchen (Germany) Message-ID: <1993Oct18.101341.6390@news.lrz-muenchen.de> Subject: storing SGML-documents in a relational and fulltext database Dear newsgroup reader We are developing a relational and fulltext database system (UNIX) for storing SGML documents. Therefore we need a parser to fill the relational structures of this database without loosing any information. Which of the free available parsers is in your opinion the best one to do this job, ARC-SGML, ASP-SGML or another parser? -- ()=======================================================() // Martin Josko Technical University Munich //\\\\ //TU-Muenchen Department of Mathematics // \\\\ //DVS-Weihenstephan Statistics and Data Processing // \\\\ //D-85350 Freising Freising Germany // () // // // // // // // // // ()========================================================() // \\\\ email: martin@pollux.edv.agrar.tu-muenchen.de \\\\ // \\\\ phone: +49-(0)8161-71-4506 \\\\ // \\\\ fax: +49-(0)8161-71-4409 \\\\// ()=====================================================() Newsgroups: comp.text.sgml Date: 18 Oct 1993 14:14:24 UT From: "Glenn A. Adams" \ Organization: MIT Artificial Intelligence Lab Message-ID: <29u8c0INN3r3@life.ai.mit.edu> References: <28s865$hqb@mailgzrz.TU-Berlin.DE> \ \ Subject: Re: ISO 8879 what cahnges are being considered? [Gary Benson] | In addition, ASCII has proven to be the only common element for all | systems and file formats. It is the lingua franca of computer | communication. Many products convert in various directions between | various prorietary formats, but with ASCII, you always have a known | starting point, and EVERY pacakge will be able to read ASCII files, | regardless of what formats they are able to output. It may be the lingua franca for computers, but it isn't the lingua franca for the users of computers. May I suggest that, though your view is sound from a historical perspective, it is quite parochial with respect to the growing demands of the world of users for whom American Engish is not necessarily their language of choice. You might want to read the article by Charles Petzold, "Move Over, ASCII! Unicode is Here," in the latest PC Magazine, October 26, 1993, to get a glimpse of what the future will be like. Glenn Adams Newsgroups: comp.text.sgml Date: 18 Oct 1993 15:36:20 UT From: Tony Harrison \ Message-ID: <1993Oct18.103620.2999@tower> Subject: SGMLS as DTD/Instance Validator Hi: I need advice on the correct way to validate dtds and instances from the command line using sgmls. For example, given: Given a DTD entitled tony.dtd: Given an instance entitled tony.sgm using a dtd entitle tony.dtd: Many thanks in advance for the help. Tony Harrison Newsgroups: comp.text.sgml Date: 18 Oct 1993 15:40:12 UT From: "Peter Flynn" \ Organization: University College, Cork Message-ID: \ References: <29u8c0INN3r3@life.ai.mit.edu> Subject: Re: ISO 8879 what cahnges are being conside [Gary Benson] | In addition, ASCII has proven to be the only common element for all | systems and file formats. It is the lingua franca of computer | communication. Many products convert in various directions between | various prorietary formats, but with ASCII, you always have a known | starting point, and EVERY pacakge will be able to read ASCII files, | regardless of what formats they are able to output. [Glenn A. Adams] | It may be the lingua franca for computers, but it isn't the lingua | franca for the users of computers. May I suggest that, though your | view is sound from a historical perspective, it is quite parochial with | respect to the growing demands of the world of users for whom American | Engish is not necessarily their language of choice. | | You might want to read the article by Charles Petzold, "Move Over, | ASCII! Unicode is Here," in the latest PC Magazine, October 26, 1993, | to get a glimpse of what the future will be like. ^^^^^^^^^^^^^^^^^^^ Quite right, but Gary was talking about NOW, and right now, ASCII is all we have. One day, perhaps manufacturers will all move to Unicode, and then we'll see what kind of deep ordure we end up in. ///Peter Newsgroups: comp.text.sgml Date: 19 Oct 1993 06:19:22 UT From: Bruce Duyshart \ Organization: University of Melbourne Message-ID: <9329216.29732@mulga.cs.mu.OZ.AU> Subject: What is SDIF ? Hi, Could someone please enlightem me as to the definition of the term SDIF and where I might find some additional information about it and where it is implimented. Thanks in advance, Bruce -- Bruce Duyshart Lecturer Phone +61 3 344 4648 IT Co-ordinator Fax +61 3 344 5532 Department of Architecture & Building E-Mail bhd@arbld.unimelb.EDU.AU University of Melbourne Australia Newsgroups: comp.text.sgml Date: 19 Oct 1993 09:15:16 UT From: "Dominic Dunlop" \ Organization: British National Corpus, Oxford University, GB Message-ID: <1993Oct19.091516.1211@onionsnatcorp.ox.ac.uk> References: \ \ <29u8c0INN3r3@life.ai.mit.edu> Subject: Re: ISO 8879 what cahnges are being considered? [Glenn A. Adams] | [ASCII] be the lingua franca for computers, but it isn't the lingua | franca for the users of computers. May I suggest that, though [this] | view is sound from a historical perspective, it is quite parochial with | respect to the growing demands of the world of users for whom American | Engish is not necessarily their language of choice. As somebody involved with Unicode from its inception, Glenn can be relied upon to promote it. Fair enough. I'm all for wider character repretoires, and am happy to know that it's ISO's policy that new standards for information technology, and existing standards under review, should accommodate the 16-cum-32 bit coded character set of ISO 10646 (of which Unicode is an easily-accessible subset), in addition to older standards such as the 8-bit 8859-x (a superset of ASCII) and the 7-bit 646 (similar to ASCII, but permitting annoying -- if understandable -- territory- dependent variations). The important phrase here is IN ADDITION TO: backward compatibility is important, although my view is that small aspects of it must sometimes be sacrificed in the altar of progress. (Not everybody agrees. Hi, Keld!) It seems reasonable to me that, as 8879 is reviewed, one aim should be to make sure that there is nothing in it that precludes the use of the coded character set of ISO 10646. I haven't examined the issue in depth, but I can't think of anything in the current version that stops one from using 10646 (or its Unicode subset) as the coded character set for SGML documents today. OK -- you have to use a pretty restricted subset of it in the SGML declaration, but, that done, you can do what you like in the privacy of your own workspace. (Provided you can find SGML-aware software that can process a document using wide characters. Anybody know if such a thing exists yet?) You can even pass the document verbatim to other people who have software to process it, and whom you can reach over links which don't stamp on any character more than 7 bits wide. But any international standard must recognise that it will take a while for all computer users to become so blessed; it must provide for a representation which, through the use of a painfully restricted character repertoire, is likely to be exchangeable between any two parties over any link without prior arrangement. Another point that many seem to miss -- particularly in this newsgroup, which is something of a haunt of hackers (in the best sense) -- is that, as SGML finds an ever-wider audience, very few of the users of SGML documents should be aware of all that messy business with pointy brackets and vaguely mnemonic public entity set members. When they see a lower-case-e-with- acute-accent on their screen, they don't know or care how it is encoded in the underlying file. They especially don't care that it happens to be a 16-bit encoding while they're working with it locally, but is translated to something weird beginning with an ampersand if (for example) they mail it to somebody else who will process it locally as an 8-bit character. As Glenn and others continue to promote the attractions of wide character sets, I'm sure they'll become more widely (hah!) used. As they do, more sites will make it their business to ensure that they can exchange documents in this form without the rigmarole of conversion to and from some transfer format. But that time hasn't come yet, and, until it does, the standard's got to accept that the lowest common denominator is an ugly 7-bit character set, and accommodate it. But without precluding wider and more commodious alternatives. -- Dominic Dunlop Newsgroups: comp.text,comp.text.sgml,alt.hypertext Date: 19 Oct 1993 09:33:35 UT From: "Hiroaki Ikeda" \ Organization: Dept of Electrical and Electronics Engineering, Chiba University Message-ID: \ Keywords: IEC 417, Graphical Symbols, Hypertext, Standard Summary: IEC Standard 417, Graphical Symbols Subject: Announcing WWW Hypertext IEC Standard 417 as A Shared Electronic Publication October, 1993 Hirokai Ikeda and Yasuhiko Higaki Department of Electrical and Electronics, Chiba University 1-33 Yayoi, Inage, Chiba 263, Japan. ikeda@hike.te.chiba-u.ac.jp higaki@hike.te.chiba-u.ac.jp We are very pleased to announce a shared electronic publication of the IEC Standard 417 to the Internet as trial bases, with permission of the Central Office, the International Electrotechnical Commission (IEC). The original paper-based publication from the IEC, Geneva, has been faithfully reproduced as a hypertext with graphics in Ikeda Laboratory. The standard is for graphical symbols for use on equipment. It has been maintained, will continuously be maintained and supplemented by the IEC SC3C, in accordance with the needs in the fields of electrotechnology. The newly announced electronic publication consists of more than 500 frames prepared originally in HTML. Major parts of English and French texts were prepared by Mr. Bodin, former Secretary of IEC SC3C. Japanese text will also be available soon. The URL is "http://www.hihe.te.chiba-u.ac.jp/" which you can access using xmosaic. Any comments and suggestions are welcome to improve its functionality. Please contact with \. Newsgroups: comp.text.sgml,comp.infosystems.wais,comp.infosystems.gopher,comp.infosystems.www Followup-To: comp.text.sgml Date: 19 Oct 1993 10:31:19 UT From: "Dominic Dunlop" \ Organization: British National Corpus, Oxford University, GB Message-ID: <1993Oct19.103119.1465@onionsnatcorp.ox.ac.uk> Subject: Sizing a system for public-access free text search [I'm requesting follow-ups to comp.text.sgml, as I read that newsgroup. Feel free to cross-post to other groups if you think your reply is of interest to their readers.] I'm investigating the options for setting up a public-access textbase of about two gigabytes offering facilities for regular expression-based free- text search over the Internet. I know how much disk will be needed (think of a number and double it), but have no feel for the amount of computer horsepower involved. Can you help, given your experience in running your own public textbase? We're rolling our own search software because it needs to be at least a little aware of SGML (Standard Generalized Markup Language), but you can presume it'll be comparable in speed and resource-hungriness to whatever it is that you know about -- not least because we may well have copied the freely-available code you're using. So, can you tell me how much computer I need in order to provide bearable response (let's say under ten seconds for the delivery of the first hit on a simple query requiring examination of the index only -- the cost of our indexing method is supposed to be independent of the number of hits found) under the following loads: 1. Single user making queries 2. Single user making queries plus single FTP user 3. Two simultaneous users making uncorrelated queries 4. Two simultaneous users making uncorrelated queries plus two FTP users 5. Five simultaneous users making uncorrelated queries 6. Five simultaneous users making uncorrelated queries plus five FTP users 7. Ten simultaneous users making uncorrelated queries 8. Ten simultaneous users making uncorrelated queries plus ten FTP users 9. Twenty simultaneous users making uncorrelated queries 10. Twenty simultaneous users making uncorrelated queries plus twenty FTP users This is all strictly back-end. Presumably, the remote users' computers are looking after providing front ends of varying complexity and prettiness. It would help me if you could express things in terms of Sun products, since that's the environment I know best. I'm after replies like ``an IPC with 32 megabytes of RAM will do (3), provided the disks are local, but would be creaking with (4), groaning with (5) and dying with (6).'' But, by looking at all those ``Tested Mettle'' columns in back-issues of UNIX Review, I can scale anything you care to send. Usual thing. I'll summarize to all the groups on the Newsgroups line if there's a worthwhile response. -- Dominic Dunlop Newsgroups: comp.text.sgml Date: 19 Oct 1993 12:48:07 UT From: "Ingemar Allkvist" \ Organization: Ellemtel Telecom Systems Labs, Stockholm, Sweden Message-ID: <1993Oct19.124807.18312@eua.ericsson.se> Subject: Amsterdam SGML Parser Where can I get some information on the ASP? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Ingemar Allkvist, OSS-FH, Rum 363:029, Tel 4924 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Newsgroups: comp.text.sgml Date: 19 Oct 1993 13:05:37 UT From: "Larry Beck" \ Organization: Grumman Data Systems-Woodbury Message-ID: <1993Oct19.130537.25185@gdstech.grumman.com> References: \ <750463363snz@apusapus.demon.co.uk> Subject: Re: IADS installation tips I agree with your comments about IADS. I reviewed this product for a project I was working on for the US Army and was also disturbed by this use of PIs. I wonder if you noticed the restriction forced on NAMELENGTH? The system restricts you to 8 character names. This really hurts. It also seems to be limited to one specific kind of document, the DTD appears to be hardwired into the thing. I think the SGML User's Group should look carefully at what they provide as good SGML software. LAB [Editor's note: This is in reply to an article by Trevor Jenkins of 1993-10-12. \] Newsgroups: comp.text.sgml Date: 19 Oct 1993 14:10:37 UT From: Tony Harrison \ Message-ID: <1993Oct19.091037.3009@tower> Subject: Thanks for SGMLS Help Hello again: I want to thank James Clark, Donald Gignac, and Tim Ringwood for their help with sgmls. It was greatly appreciated. Tony Harrison Newsgroups: comp.text.sgml Date: 19 Oct 1993 15:17:57 UT From: Eliot Kimber \ Message-ID: <19931019.084240.72@almaden.ibm.com> References: <9329216.29732@mulga.cs.mu.OZ.AU> Subject: Re: What is SDIF ? [Bruce Duyshart] | Could someone please enlightem me as to the definition of the term SDIF | and where I might find some additional information about it and where it | is implimented. SDIF stands for SGML Document Interchange Format, and is completely defined in "ISO 9069:1987 Information Processing - SGML Support Facilities - SGML Document Interchange Format (SDIF)". SDIF defines, in the abstract, the mechanism by which one or more SGML documents can be packaged into a single object for the purpose of transmitting said object between systems. The key features of an SDIF package are: 1. All entities (with the possible exception of public entities) are included in the package, such that the receiving system can reconsititute the entities. 2. Information about the original character set is preserved, enabling automatic character set remapping when the SDIF package is unpacked. An SDIF system consists of two parts: the SDIF packer, which takes as input one or more SGML documents and produces as output a SDIF package; and an SDIF unpacker, which takes as input an SDIF package and produces as output one or more SGML documents. This is very much like the PKZIP/PKUNZIP program, except that the packager and unpacker must have intrinsic understanding of SGML. In fact, I implemented a toy SDIF packer using PKZIP as the actual packaging mechanism by writing an SGML Translator application that processes a document to build a list of files to be packaged and then giving that list to PKZIP to do the packing. (Note that this is not a real SDIF implementation because there is no complementary unpacker, and I'm not using SGML declarations to drive the character set remapping.) I'm not aware of a complete SDIF implementation, but it shouldn't be too hard to build one (the hardest part is probably the character set business). SDIF is roughly analogous to Bento and other schemes for packaging "compound" documents, although SDIF is geared specifically for SGML, while Bento, for example, is optimized for the storage and retrieval of multimedia content (providing interleaving schemes and the like, for example). Note that with an SDIF system, there is no requirement to maintain the *system* names of entities on different platforms. As long as the entity declarations in the unpacked document are correct, the SDIF unpacker should be able to use whatever method it wants (or the user wants) to create the local versions of the contained entities. This will work best if all entities are given public identifiers by which users know the entities so that users never see system identifiers. I would suggest to SGML Open that one of their priorities should be to define an industry-standard SDIF system, at least a functional spec and API, if not the actual encoding of the SDIF package itself. Note finally that there is no direct conflict between SDIF and other packaging methods. For example, a Bento could contain an SDIF package or an SDIF package could contain a data entity that was itself a Bento package. -- Eliot Kimber Internet: drmacro@vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709 Newsgroups: comp.text.sgml Date: 19 Oct 1993 16:31:40 UT From: C. M. Sperberg-McQueen \ Organization: University of Illinois at Chicago, ADN Computer Center Message-ID: <93292.113140U35395@uicvm.uic.edu> References: <1993Oct18.103620.2999@tower> Subject: Re: SGMLS as DTD/Instance Validator [Tony Harrison] | I need advice on the correct way to validate dtds and instances from | the command line using sgmls. For example, given: | | Given a DTD entitled tony.dtd: | | Given an instance entitled tony.sgm using a dtd entitle tony.dtd: I am not sure what exactly it is that you have not been able to figure out from the documentation, so bear with me if this explains something other than what you need explained. When you want validation pure and simple (i.e., no output), the main point is that you want to suppress the standard output of sgmls, and have it issue only error messages, or error messages and its report on capacities used. That is, use the -s option. I combine this with the -g -s and -v options, and wrap a shell-level procedure called VALIDATE around it, which invokes: sgmls -egsv \ \ the SGML declaration is optional. The other complication you may be suffering from is figuring out how best to link the document instance with the DTD. There are effectively four approaches I have seen used, depending on where you put (1) the document type declaration (i.e., the '\". Put the document instance in another file, which starts "\" and ends with "\" (unless the tags are omissible and omitted). Invocation is then sgmls -egsv dtdfile.dtd docfile.doc This is more or less what some parsers seem to expect you to do, especially if they work with compiled DTDs. It bothers me because the document instance is not labeled with its document type, but obviously it's useful if you want to do the kind of tricks David Megginson has spoken of in this group, where you parse the same instance different ways with different DTDs. 3 put the doctype declaration in the same file as the instance; the file begins "\" or "\]]>", starts the instance with an explicit or implicit "\", and ends it with an explicit or implicit "\". Element declarations etc. are in system files named in the doc type declaration or in the DTD subset. Invocation with sgmls is sgmls -egsv file.doc This is what I do most of the time. 4 Put the instance in one file, the doc type declaration in a second, and the other declarations in a third. The second file of this set will read in its entirety something like \ If it is called foodecl.dtd, and the instance is call foodoc.doc, then the invocation might be sgmls -egsv foodecl.dtd foodoc.doc I haven't figured out a reason to do this beyond faking out processors which don't expect the document type declaration to be in the same file as a document instance. Since current software mostly doesn't seem to make this assumption, I haven't had to do this in a few years. If there are other possible ways of dividing the material into files, I haven't figured them out. -C. M. Sperberg-McQueen ACH / ACL / ALLC Text Encoding Initiative University of Illinois at Chicago Newsgroups: comp.text.sgml Date: 19 Oct 1993 16:35:56 UT From: "Peter Flynn" \ Organization: University College, Cork Message-ID: \ References: <1993Oct19.103119.1465@onionsnatcorp.ox.ac.uk> Subject: Re: Sizing a system for public-access free text I can't answer exactly, but as a comparison, this is a Sun Sparc IPX with 32MB memory and 2Gb disk. Right now, it's running the campus news service, so there are 11 nntp connections, 2 local X-windows users, 5 VT100 users logged in from various places and assorted http and ftp users, say 1 of each at any given time. This is over 20Mbps campus Ethernet and a 64Kbps link to the Internet. It seems to work OK but I wouldn't want to load it very much more. ///Peter [Editor's note: This is reply to Dominic Dunlop's question of today. \] Newsgroups: comp.text.sgml Date: 19 Oct 1993 16:38:23 UT From: "Peter Flynn" \ Organization: University College, Cork Message-ID: \ References: <1993Oct18.103620.2999@tower> Subject: Re: SGMLS as DTD/Instance Validator [Tony Harrison] | I need advice on the correct way to validate dtds and instances from | the command line using sgmls. For example, given: | | Given a DTD entitled tony.dtd: sgmls -S sgml.dec tony.dtd ^ capital `S' | Given an instance entitled tony.sgm using a dtd entitle tony.dtd: sgmls -S sgml.dec tony.dyd tony.sgm | Many thanks in advance for the help. | | Tony Harrison You'll have to get yourself the sgml.dec file from somewhere. There's our copy on curia.ucc.ie in pub/sgml if you can't find it anywhere else. ///Peter Newsgroups: comp.text.sgml Date: 19 Oct 1993 17:28:30 UT From: "Terry Smith" \ Organization: TANDEM Computers, Inc (Integrity Systems Division) Message-ID: <1993Oct19.172830.23475@integrity.uucp> Subject: Questions: ArborText Users I'm looking for people using ArborText Publisher and Document Architect. If you're one of these people, please respond to me by email. I'd like to ask you some questions and would be happy to share my own experiences. I'm currently running a demonstration copy of 5.0 and have imported the OSFDTD and DocBook DTD. -- ------------------ UNIX Documentation ---------------------- Terry Smith Tandem Computers Inc tsmith@mpd.tandem.com 14231 Tandem Blvd (512) 244-8871 Austin, TX 78728 Newsgroups: comp.text.sgml Date: 19 Oct 1993 19:46:48 UT From: "David B. O'Donnell" \ Organization: Enhanced Cybernetic Logic Operating Systems Group Message-ID: \ Keywords: SGML NOVICE Summary: Looking for guides and introductory material Subject: Utter Beginner In Seek of Enlightenment I am very much an SGML novice, having worked with IBM's Document Composition Facility GML a few years ago. I've recently come to a point where I feel a need for a more structured approach to documentation, and I'd like to take a look at SGML. If anyone can point me in the direction of guides, documentation or introductory material which is available over the Internet, I would be most appreciative. Please reply directly to my e-mail address (atropos@netlab.cis.brown.edu) as time constraints make me unable to guarantee a daily review of this newsgroup. Thank you for your time and assistance. --David B. O'Donnell atropos@netlab.cis.brown.edu (and other places) Newsgroups: comp.text.sgml Date: 20 Oct 1993 01:30:31 UT From: "Christian Saucier" \ Organization: Universite de Sherbrooke -- departement de Mathematiques et d'Informatique Message-ID: \ Subject: SGML and TeX Hello folks, I was wondering if there is any kind of work done somewhere to SGMLised TeX, or to use TeX as an SGML document formatter. TeX being (to me) a very good document formatter, I'm just thinking that it could be possible to have a program that would take a SGML document and produce a TeX output. I haven't started thinking _really hard_ on this but if it's technically possible, I might start working on such a thing. Am I just dreaming here? Is there a major 'deadend' impossibility that I'm not thinking about? Or has someone already done that and am I just not aware of it? Anyway, let me know guys. Thanx, Christian Saucier Newsgroups: comp.text.sgml Date: 20 Oct 1993 09:07:12 UT From: Pamela Gennusa \ Organization: SGML Users' Group Message-ID: <009744B0F6CEF440.206004D7@dpsl.UUCP> Subject: IADS - response from SGML Users' Group This message is related to recent mail about the SGML Users' Group's role in releasing the IADS software. The SGML Users' Group does not give an imprimatur. Neither ARCSGML nor IADS, the two pieces of software put into the public domain by the Users' Group, have received an endorsement by the Users' Group to my knowledge. Although it might be attractive to do such a thing at first glance, it has been rejected as not practicable by the Group. We regularly run product news in our Newsletter which is always preceded by a warning that "no value judgement is passed on any products or services". If anyone out there is interested in providing a vetting service for the Users' Group, gratis (natch), and is then willing to get a concensus from the Executive Board and/or a substantial portion of the membership, and is then willing to pay for insurance (just in case we get it wrong and get sued), and is then willing to be available to answer questions about products we have endorsed, please let me know. Seriously, though, as a volunteer, charitable-status organisation, the Users' Group does not have the resources to check out software or products, nor do we have a standing committee to do such work. I welcome comments on software put out there, whether positive or negative. This information will help users to see what others think. And in a market society, it will help weed out inferior products. If you don't like a product or a piece of software, fine, let's us know, but could we get out of the habit of having a loaded gun and looking for somewhere to point it? -- Pamela Gennusa, President Phone: +44 793 512 515 SGML Users' Group Fax: +44 793 512 516 608 Delta Business Park Email: dpsl!plg@visionware.co.uk Great Western Way Swindon, Wiltshire SN5 7XF United Kingdom Newsgroups: comp.text.sgml Date: 20 Oct 1993 10:11:33 UT From: "Klaus Harbo" \ Organization: Euromath Center, University of Copenhagen Message-ID: \ References: \ Subject: An SGML translation tool... [Christian Saucier] | I was wondering if there is any kind of work done somewhere to SGMLised | TeX, or to use TeX as an SGML document formatter. | | TeX being (to me) a very good document formatter, I'm just thinking | that it could be possible to have a program that would take a SGML | document and produce a TeX output. | | I haven't started thinking _really hard_ on this but if it's | technically possible, I might start working on such a thing. | | Am I just dreaming here? Is there a major 'deadend' impossibility that | I'm not thinking about? | | Or has someone already done that and am I just not aware of it? I'm working on a tool that more or less covers what you describe, even though I'm not trying to do a translator that specifically converts SGML document instances to TeX, since that approach seems to put unnecessary constraints on the tool. I have just started using version 2 of the Text Encoding Initiative (TEI) DTD. I have created a translation of my SGML document instances to LaTeX using this tool. (Thanks for help getting started with the TEI DTD goes to Lou Burnard...) The approach I'm taking is to change the SGMLS parser to produce events for an interpreter rather than lines of characters on stdout. The actions taken by the interpreter are specified in a translation specification. The translation spec contains declarations for actions to be taken for each element to be translated, as well as actions for #PCDATA and so on. The actions to be taken is specified (well, programmed) in TCL (Tool Command Language, created by John Osterhout at UC Berkeley), an interpreted embeddable programming language. The idea is most easily explained by giving an example... Given a simple DTD like: \ \ \ \ \ \ \ the document instance: \ \ \ \A simple document for explaining an SGML tool\ \ \

First SECTION title\

\Text text text text text text text. Bla blabla bla \Moretext moretext moretext moretext moretext. \

First SUBSECTION title\

\Text text text text text text text bla blabla bla \Moretext moretext moretext moretext moretext Moretext moretext moretext moretext moretext moretext moretext moretext moretext. Moretext moretext moretext moretext moretext moretext. \

\ is translated into: \\documentstyle{article} \\author{Klaus Harbo} \\title{A simple document for explaining an SGML tool} \\begin{document} \\maketitle \\chapter{First SECTION title} Text text text text text text text. Bla blabla bla Moretext moretext moretext moretext moretext. \\section{First SUBSECTION title} Text text text text text text text bla blabla bla Moretext moretext moretext moretext moretext Moretext moretext moretext moretext moretext moretext moretext moretext moretext. Moretext moretext moretext moretext moretext moretext. \\end{document} The translation spec to be used is given in a processing instruction. I'm not sure that this is the best way to do it (as a matter of fact I'm pretty sure that it isn't, since the processing you want to perform on a document instance should not be dependent of the instance itself). This may change later. The above translation was reached with the following translation spec (braces and backslash are escaped, since they are used in the TCL syntax): element MYMEMO { start { puts "\\\\documentstyle\\{[attrValue STYLE]\\}" puts "\\\\author\\{[attrValue AUTHOR]\\}" } end {} } element BODY { start { puts stdout "\\\\begin\\{document\\}" puts "\\\\maketitle" } end { puts "" puts "" puts stdout "\\\\end\\{document\\}" } } element TITLE { start { puts stdout "\\\\title\\{" nonewline } end { puts stdout "\\}" } } element HEADER { start { puts "" puts "" when { {in SECTION in SECTION in SECTION} { puts stdout "\\\\subsection\\{" nonewline } {in SECTION in SECTION} { puts stdout "\\\\section\\{" nonewline } {in SECTION in BODY} { puts stdout "\\\\chapter\\{" nonewline } } } end { puts stdout "\\}" nonewline } } element PARA { start { puts "" puts "" } end {} } PCDATA { puts stdout $data nonewline } SDATA { case $entname { emsp { puts stdout "---" nonewline } aring { puts stdout "\\{\\aa\\}" nonewline } Aring { puts stdout "\\{\\AA\\}" nonewline } default { puts stdout $data nonewline } } } RECORDEND { puts "" } I know that the above is longish, but I thought that I'd bring the whole translation spec for completeness sake (when "details" are omitted, you often cannot tell exactly the amont of work needed to work out these "details"). I find the following are notable details in the spec above: - The translation spec is sort of a mix of declarative and imperative programming. - You make declarations for each element, but within the declarations you use an imperative programming language. - For each element you specify an action associated with the beginning and end of elements in the instance. The actions are differentiated by GI. - Attribute values are accessed with [attrValue \], similarly there are constructs like [isImplied attrName], [isDefined attrName] (ie. not implied) and a few others. - The 'when' construct lets actions depend on the hierarchical context. In addition to the 'in' construct (immediate parent in parse tree) there's also a 'within' construct (ancestor in the parse tree). - SDATA entities are handled by the name of entity reference (not its replacement data, even though you have access to that too). - Record End events are handled explicitly. At this point the translation is quite slow, since all of the interpreter is implemented in interpreted TCL. Porting it to C should not be too difficult, however, if performance is essential. The tool isn't finished yet and - perhaps more importantly - it is yet largely undocumented. However, if there is sufficient interest, I will consider putting the tool in the public domain. My motivation for this posting is therefore both to reply to the question ("yes, some work has been done; no, this tool is not quite ready for general use yet") and to try to find if there is sufficient interest. (?) So, to return to the original posting: even though I'm not finished, I think I have done a substantial part of the work needed to make a fairly generic translation mechanism that will - among other things - convert (many, it isn't OmniMark) SGML document instances to TeX. Regards from Copenhagen, Klaus. -- | Klaus Harbo | e-mail: Klaus.Harbo@euromath.dk | | Euromath Center (EmC) | phone (direct): +45 3532 0713 | | Universitetsparken 5 | phone (sw.board): +45 3532 1818 | | DK-2100 Copenhagen | fax: +45 3532 0719 | Newsgroups: comp.text.sgml Date: 20 Oct 1993 11:23:00 UT From: "Michael G. Popham" \ Organization: University of Exeter, UK. Message-ID: \ Subject: University/College SGML Courses -- a questionnaire I often receive requests from companies looking for sources of SGML-aware students. Whilst locality and other computing skills are important, the common theme is that the students (as potential employees) should already know something about SGML. I thought it might be useful to collect all this information, and put it in a public place (so that if someone asks *you* where they can get some SGML-aware graduates, you'll know where to look!). In the first instance, I will put all the replies in a special file/directory on our FTP server. The information will be maintained on a rolling basis, and updated when I have the time. So, if you run a course with an SGML component at your institution (anywhere in the world -- not just UK/Europe/US), please complete and return the attached questionnaire. (If you are a student on such a course, encourage your course tutor/supervisor to return the questionnaire). If your institution runs more than one course (e.g., SGML is studied as part of an undergraduate BSc in Computer Science, and as part of an MSc/PhD programme in Information Science) feel free to return more than one copy of the questionnaire giving details. Give as much or as little information as you wish. If you submit a revised/updated questionnaire at some future date, please make this clear so that I know to delete the previous version. NB. This questionnaire is designed to collect information about formal taught courses or research programmes. Please do NOT submit CVs. Return questionnaires to: M.G.Popham@exeter.ac.uk (with a suitable subject line). ** Academic institutions and courses ONLY please (commercial trainers ** should seek to put an entry in the GCA's "SGML Source Guide" or our ** forthcoming "SGML Directory") Michael Popham ------------------------------------------------------------------------- SGML Project - C.D.O Email:M.G.Popham@exeter.ac.uk Computer Unit - Laver Building Phone:+44-(0)392-263946 North Park Road, University of Exeter Fax: +44-(0)392-211630 Exeter EX4 4QE, United Kingdom ------------------------------------------------------------------------- QUESTIONNAIRE-QUESTIONNAIRE-QUESTIONNAIRE-QUESTIONNAIRE-QUESTIONNAIRE-QUE ------------------------------------------------------------------------- Edit the following questionnaire to give as much information as you wish. Please retain a copy to update information in future. NB. I've included some example replies (on lines starting with > ). Please *delete* them from your reply. The idea is that potential employers will be able to scan all the replies for a location/skill that meets their requirements, and then get in touch with the named contact person. INSTITUTION: > University of Kent at Canterbury, U.K CONTACT PERSON: > Prof. John Thomas EMAIL: > JT@ukc.ac.uk PHONE: > +99-(9)999-9999 x.999 FAX: > +99-(9)999-9999 ADDRESS: > Computing Lab., UKC, Canterbury, Kent, KT2 7NZ, UK DEGREE/COURSE TITLE: > BSc. Computer Science (SGML taught in 2nd year) > MSc. Computer Science (1 year conversion course > for none Comp.Sci. graduates) COURSE DURATION: > 36 months full-time, or 48 months inc. one year's work-placment (BSc.) > 12 months full-time, 24 part-time (MSc.) SGML CONTENT/COMPONENTS: > Introduction to SGML (6 taught hours) > DTD design (10 taught hours) > Comparision of SGML and ODA (4 taught hours) > > Assignment #1: Design a simple DTD for a text book > Assignment #2: Write a script to map the output of an SGML parser > (sgmls), given a document conforming to the student's > text book DTD, into nroff macros or LaTeX SGML SOFTWARE USED: > sgmls parser > SoftQuad's Author/Editor > Exoterica' OmniMark NON-SGML COURSE CONTENT/COMPONENTS: > Students are taught to program in Pascal, C and C++ > Other modules taught: Networks and Communications (esp. the OSI > model, X25, TCP/IP), compiler design, software engineering (using > Pascal/C), Operating systems and environments (esp. Unix, X11, > Motif etc), Assembly Language (680X0), systems analysis and design > (using SSADM), database design (esp. using SQL with Ingres databases). > > Both the BSc. and MSc. courses cover the same modules, but BSc. > students undertake more project work and study each module in greater > depth. ARE STUDENTS AVAILABLE FOR WORK PLACEMENT?: > BSc. students may undertake a year's work placement with a company, > during which time they are free to work on any project. A 10,000 word > report on the project and/or their experiences must be submitted at > the end of this year out. > > Part-time MSc. students can base their thesis work on a practical > project carried out for an employer, subject to their supervisor's > approval WILL YOU UNDERTAKE COMMERCIAL SGML-RELATED R\&D WORK?: > Depends. Students who go on from the MSc to do PhD work may be > available and willing. Alternatively, we may be able to set up a > special project group of post-Doctoral researchers. Contact me to > discuss this further. ANY OTHER INFORMATION: > The aim of both courses is to prepare students for a career in software > engineering or systems analysis. MSc. students already hold degrees > in non-computing related subjects, but must demonstrate an aptitude for > computing before being awarded a place. > > The module in Electronic Publishing forms a major component in the > taught MSc. course. Students study SGML as part of this module, but > they also learn about EP in general, formatting languages such as > TeX, LaTeX and nroff, page description languages (PostScript), printing, > database publishing, and delivering information on CD-ROM. > > Required reading includes Eric van Herwijnen's "Practical SGML", and > students also have access to "The SGML Tutorial" DynaText book > > Students are able to do an SGML-based project for their MSc. thesis. > 50% of those who take the taught MSc. course go on to do a PhD > > The Department has a strong interest, and carries out extensive > research in the field of electronic publishing, and SGML and ODA in > particular. Prof. Thomas is an expert in the field of publishing from > databases of SGML documents. We hope to carry out some HyTime-based > research projects very shortly (academic year `94-`95). DATE SUBMITTED: > 25th October, 1993 Newsgroups: comp.text.sgml Date: 20 Oct 1993 11:25:21 UT From: "Peter Flynn" \ Organization: University College, Cork Message-ID: \ References: <1993Oct19.172830.23475@integrity.uucp> Subject: Re: Questions: ArborText Users [Terry Smith] | I'm looking for people using ArborText Publisher and Document | Architect. If you're one of these people, You must be kidding. The product is fabulous, but at $6,000 per _copy_ and no academic discount last time I asked, it's _way_ beyond reach. ///Peter Newsgroups: comp.text.sgml Date: 20 Oct 1993 11:29:36 UT From: "Peter Flynn" \ Organization: University College, Cork Message-ID: \ References: \ Subject: Re: Utter Beginner In Seek of Enlightenment [David B. O'Donnell] | I am very much an SGML novice, having worked with IBM's Document | Composition Facility GML a few years ago. I've recently come to a | point where I feel a need for a more structured approach to | documentation, and I'd like to take a look at SGML. | | If anyone can point me in the direction of guides, documentation or | introductory material which is available over the Internet, I would be | most appreciative. Please reply directly to my e-mail address | (atropos@netlab.cis.brown.edu) as time constraints make me unable to | guarantee a daily review of this newsgroup. The Text Encoding Initiative has produced (as part of their version 2) a good general-level introductory text, a sort of `Gentle Guide to SGML'. You can pick this up by anonymous ftp at sgml1.ex.ac.uk in tei/p2/drafts/p2sg.ps (POstscript version) or p2sg.doc (plaintext version). Recommended. ///Peter Newsgroups: comp.text.sgml Date: 20 Oct 1993 11:37:31 UT From: "Peter Flynn" \ Organization: University College, Cork Message-ID: \ References: \ Subject: Re: SGML and TeX [Christian Saucier] | I was wondering if there is any kind of work done somewhere to SGMLised | TeX, or to use TeX as an SGML document formatter. | | TeX being (to me) a very good document formatter, I'm just thinking | that it could be possible to have a program that would take a SGML | document and produce a TeX output. My crude but functional attempt (PC version) is on curia.ucc.ie in pub/tex/sgml2tex.zip and the paper I wrote on it is also \available\. | I haven't started thinking _really hard_ on this but if it's | technically possible, I might start working on such a thing. Oooh. An offer of effort? | Am I just dreaming here? Is there a major 'deadend' impossibility that | I'm not thinking about? No, you're right down the right track, IMHO. | Or has someone already done that and am I just not aware of it? Several other people have produced bigger and better things. I have heard of a very good sgml-->TeX engine developed in a commercial company for internal use, and I'm trying to persuade the authors to present a paper on what they did to next year's TeX conference (Santa Barbara, summer 94). I have no idea if the program will be available or not. ///Peter Flynn Secretary, TeX Users Group Newsgroups: comp.text.sgml Date: 20 Oct 1993 11:53:37 UT From: "Larry Beck" \ Organization: Grumman Data Systems-Woodbury Message-ID: <1993Oct20.115337.8205@gdstech.grumman.com> References: \ Subject: Re: SGML and TeX I believe that the Arbortext produce (The publisher) uses TeX as its composition engine. I suggest you speak to them. LAB [Editor's note: This is in response to Christian Saucier's request. \] Newsgroups: comp.text.sgml Date: 20 Oct 1993 13:00:35 UT From: "Terje Holmboe" \ Organization: Dept. of Informatics, University of Oslo, Norway Message-ID: <1993Oct20.130035.13960@ifi.uio.no> Subject: HTML, TeX and related... Hopefully I will not be flamed to much to ask a question like this here, but I am uncertain as to which newsgroup might be more appropriate, so here goes... Are there any tool(s) for converting LaTeX and TeX documents into HTML? Terje Holmboe Newsgroups: comp.text.sgml Date: 20 Oct 1993 13:42:36 UT From: "Glenn A. Adams" \ Organization: MIT Artificial Intelligence Lab Message-ID: <2a3f8cINNikt@life.ai.mit.edu> References: \ <29u8c0INN3r3@life.ai.mit.edu> <1993Oct19.091516.1211@onionsnatcorp.ox.ac.uk> Subject: Re: ISO 8879 what cahnges are being considered? [Dominic Dunlop] | As somebody involved with Unicode from its inception, Glenn can be | relied upon to promote it. Fair enough. Actually, I started out as an opponent of Unicode. The reason was that I had just invested a year or so of my time building a very general purpose text subsystem which attempted to support all character sets at the same time (sound familiar, Erik?). I had the idea then that the ONE was never as good as the MANY - I tend to polytheism by nature anyway. Well, after I had built this marvelously general system and sold it to a few companies I discovered this rather interesting thing: performance. It seemed that some clients, such as a database company, had these things called benchmarks that their customers were very fond of using on their products. Well, how shall I say it, my wonderful generic text system wasn't known for its blazing speed. Thus began a process of optimizing the heck out of it until it got close to an acceptable performance. But, then, who could read the code anymore but me? And, after a few months went by not looking at the code, it started looking like someone elses code even to me: i.e., completely opaque; and I consider that I follow very good coding style conventions. Well, to make a long story shorter, I rethought the idea of using one internal character set for such a system, and, behold, the scales fell from my eyes. I could get all I wanted *and* make it perform well too. No more generic mess to deal with. So, there ends the story of my becoming a Unicode fan. Even with all the warts that have accumulated due to the process of becoming an ISO standard, I still think its the best solution available to developers of new software today. And, given the enormous difficulty of producing 10646 (mostly political), I don't think there will be any alternatives for a long time to come, if ever. [I don't consider ISO 2022 as an internal processing format to be a viable alternative.] | The important phrase here is IN ADDITION TO: backward compatibility is | important, although my view is that small aspects of it must sometimes | be sacrificed in the altar of progress. (Not everybody agrees. Hi, | Keld!) Yes, I certainly agree. It will take quite a while before wide character encodings become used as an interchange format (if ever). That is why I currently use (and promote) the use of File System Safe UTF (FSS-UTF) as the preferred interchange format for 10646 encoded data. It has the nice properties that: - it employs an 8-bit serialization of 10646 UCS-2 & UCS-4 - ASCII characters in 10646 (i.e., 00000000 - 0000007F) are encoded as ASCII codes (00 - 7F) - it allows mixing UCS-4 and UCS-2 code forms It does have a few detractors: - it requires 8-bit communication paths, or, alternatively, it requires a subsequent transformation for 7-bit paths - it employs 3 octets per code element for most UCS-2 characters, e.g., all CJK Ideographic characters. - it employes 2 octets per code element for 8859-1 C1 & G1 (right half) characters (but then this would be true for UCS-2 also). - it is not transparent on 8859-1 systems that interpret either C1 or G1 characters of 8859-1; though it is transparent on 8859-1 systems that do not specially interpret C1 or G1 octets. Even with these detractors, it does provide a portable interchange format for 10646 at least on 8-bit clean systems, with the special property that ASCII characters in 10646 are compressed to one octet each. Existing software that does not interpret character data directly, or which uses only compiler or runtime string routes can easily process FSS-UTF data directly. Software which goes beyond this would be better off to convert to UCS form for processing. [This is the approach taken by my software, and also by the Plan-9 software which is the largest body of code based on FSS-UTF.] | It seems reasonable to me that, as 8879 is reviewed, one aim should be | to make sure that there is nothing in it that precludes the use of the | coded character set of ISO 10646. The only extension that I see which might be useful is to alter the syntax for specifying NAME CHARS in 8879. Currently one has to enumerate their code numbers and specify both upper case and lower case correspondences; in the case of 10646, where one has over 30,000 characters -- most of which do not even employ the notion of CASE (e.g., CJK ideographs) -- the task of enumerating all CJK ideographs as possible name characters would be quite onerous. Not that it isn't possible: it would just make a DTD quite large if it were incorporated into the SGML declaration. | as SGML finds an ever-wider audience, very few of the users of SGML | documents should be aware of all that messy business with pointy | brackets and vaguely mnemonic public entity set members. When they see | a lower-case-e-with-acute-accent on their screen, they don't know or | care how it is encoded in the underlying file. For this reason, it is quite possible to use all 10646 characters in existing ASCII based SGML document encodings by defining entity references for every 10646 character outside of ASCII; however, this will produce some very large documents in Japan, China, etc., which are completely unreadable in this form. However, if in this case the actual SGML document instance is just viewed as an interchange format, then one could certainly accomplish all the needs addressed by 10646 by the use of entity references. I hate to see what the size of the file would be though. | As Glenn and others continue to promote the attractions of wide | character sets, I'm sure they'll become more widely (hah!) used. As | they do, more sites will make it their business to ensure that they can | exchange documents in this form without the rigmarole of conversion to | and from some transfer format. But that time hasn't come yet, and, | until it does, the standard's got to accept that the lowest common | denominator is an ugly 7-bit character set, and accommodate it. But | without precluding wider and more commodious alternatives. I agree strongly. I'm not advocating everyone switch to Unicode or 10646 overnight. What I do think is important is considering it for the future and trying not to build limitations into new systems or standards that will prevent its use when desired. For those who are interested in using it sooner than later, I suggest considering FSS-UTF as the preferred interchange format and using UCS-2 or UCS-4 as the processing format [I actually use both depending on whether I see a UCS-4 code outside of the BMP (UCS-2).] My original response to this thread was prompted by the idea that ASCII is the way to go for the future because it is the way of the past. I would disagree with this opinion and offer instead that 10646 is the way of the future because ASCII was so successful in the past. After all, 10646 (and Unicode) are based on the idea that we only need "one" character set. ASCII was very successful at representing American English (at least to a strong degree); 10646 will be successful at representing the rest of the world's languages too. What is not the way of the future is a mixed character set encoding such as that provided by ISO 2022. That way lies madness -- or at least abysmal performance! Regards, Glenn Adams Newsgroups: comp.text.sgml Date: 20 Oct 1993 16:48:40 UT From: "Larry Beck" \ Organization: Grumman Data Systems-Bethpage NY Message-ID: <1993Oct20.164840.20086@gdstech.grumman.com> References: <009744B0F6CEF440.206004D7@dpsl.UUCP> Subject: Re: IADS - response from SGML Users' Group [Pamela Gennusa] | This message is related to recent mail about the SGML Users' Group's | role in releasing the IADS software. | | The SGML Users' Group does not give an imprimatur. Neither ARCSGML nor | IADS, the two pieces of software put into the public domain by the | Users' Group, have received an endorsement by the Users' Group to my | knowledge. | | Although it might be attractive to do such a thing at first glance, it | has been rejected as not practicable by the Group. We regularly run | product news in our Newsletter which is always preceded by a warning | that "no value judgement is passed on any products or services". I see your point and I agree with you. Unfortunately, if the SGML User's Group places something in the public domain, it gives the appearance of its placing an imprimatur on it. And, as you probably realize, appearance is everything. Perhaps it would be better if the SGML User's Group refrained from placing _anything_ in the public domain. If a developer wants something in public domain, let him or her do it. This will redirect the flames you've been receiving elsewhere. LAB Newsgroups: comp.text.sgml Date: 20 Oct 1993 17:10:34 UT From: "Wayne L. Wohler" \ Message-ID: <19931020.101352.551@almaden.ibm.com> Subject: SGML and Quark Express and/or PageMaker I am investigating the use of Quark Express and/or PageMaker in an SGML info environment. Does anyone have any ideas, suggestions or experiences they can share either here or privately? -- Wayne L. Wohler Internet: wohler@vnet.ibm.com Dept G82/910M IBMMAIL: USIB29WX@IBMMAIL Publishing Solutions Development Phone: 1-303-924-0470 IBM Corporation PO Box 1900 Boulder, Colorado 80301-9191 Newsgroups: comp.text.sgml Date: 20 Oct 1993 18:38:06 UT From: Quest \ Organization: Quest Systems Corp Message-ID: <1993Oct20.183806.8827@quest.com> Keywords: SGML/Object Oriented/C++/jobs Summary: Openings for senior SGML engineers Subject: Openings for SGML experts ** CAREER OPPORTUNITIES ** CAREER OPPORTUNITIES ** Join a fast paced company for an exciting adventure into the world of advanced software products. If you like an exciting and challenging environment where the competition is stiff and the opportunity is vast you will enjoy working at Quest Windows. This is only for the competitive technical person that is a self starter who is extremely versatile and talented, is an intelligent planner, recognizes competitive product features and deficiencies, takes initiative to improve the products, is a very strong and hard worker, is quality minded, and has a lot of internal self motivation. In addition a person with strong industry and competitive awareness with a visionary ability to predict or recognize trends and implement these into the product planning and implementation process. Quest Windows Corporation, Santa Clara, California is a leader in Graphical User Interface and Object Oriented software development tools and applications. Quest has had many firsts in the GUI and Object Oriented market and wishes to continue as a market leader by recruiting highly skilled professionals for fast paced and dynamic commercial product development and longer term research and development projects. Quest is looking for individuals with a strong work ethic, good communication skills, and the ability to get quality work done on schedule. Positions are available to qualified candidates for Senior Product Managers, Chief Software Scientists, Senior Software Architects, Senior Software Engineers or those with 2+ years of directly related experience, Software Quality Engineers or those with 2+ years of directly related experience. Quest Windows offers very competitive salaries, employee stock options, and health benefits. All responses are strictly confidential and all contacts with candidates are handled discretely. Since there are limited positions available, qualified candidates should respond as soon as possible with a complete and detailed resume, salary history and complete current references to: email: Internet: jobs@quest.com UUCP: questsys!jobs FAX: 408/ 988-8357 Mail: Quest Windows Corporation Personnel - Technical Staffing 4677 Old Ironsides Drive Santa Clara, CA 95054 Please - ABSOLUTELY NO RECRUITING FIRMS Looking for technical staff with at least 2+ years directly related experience with the design, implementation, documentation, testing or deployment of software in one or more of the following areas. Partial experience in each area is good and may qualify you for a position. Put down all related experience on your resume or cover letter in detail: Object Oriented GUI Software Design: Stanford InterViews internals and/or the X Consortium Fresco. This includes the C++ language, Interactors, Glyphs, WidgetKits, Unidraw, IDraw, IBuild, structured graphics, and CORBA objects. Object Oriented GUI Builders: Next generation object oriented GUI Builder technology using object oriented design. Reusable Object Oriented Modules: OMG CORBA compliant reusable objects and object oriented class libraries. Detailed knowledge of the OMGC++ binding and the Corba DDL language definition a big plus. GUI Compliance: Motif, Windows NT, Windows 3.X, and MAC look and feel and style guideline compliance. Internationalization, drag and drop, mouse-less operation. Protability between Windowing and Operating Systems: UNIX (BSD, SRV3, SVR4), Windows NT, Windows 3.X Macintosh System 7 for GUI toolkits, tools and portable network communication. GUI Builders for the C language and C++ bindings to native toolkits: GUI builders for OSF/Motif, NEXT, Windows NT, Macintosh System 7. GUI Components: Portable GUI objects and widgets. C/C++ language: C/C++ language interpreters and development enviornments. Applications: Document preparation and graphics, document management and control, SGML, spreadsheets, hypertext and hypermedia systems, network management tools, system administrator applications, programming tools, CASE tools, productivity applications, groupware, e-mail/FAX systems. Databases: Client/Server implementations, SQL and SQL2, high-end relational database, next generation object oriented databases, portable database layers. Multimedia and VR: Multimedia and Virtual Reality development environments, multimedia applications, VR applications, device drivers to Multimedia and VR devices. Newsgroups: comp.text.sgml Date: 20 Oct 1993 20:26:11 UT From: "Trevor Jenkins" \ Message-ID: <751148771snz@apusapus.demon.co.uk> References: <009744B0F6CEF440.206004D7@dpsl.UUCP> Subject: Re: IADS - response from SGML Users' Group [Pamela Gennusa] | The SGML Users' Group does not give an imprimatur. Neither ARCSGML nor | IADS, the two pieces of software put into the public domain by the | Users' Group, have received an endorsement by the Users' Group to my | knowledge. The fact that it was the SGML Users' Groups which released these two pieces of software implies some agreement upon the essential SGMLness of the offering by that group. However, there is surely a vast difference between the authority of the authors of these ``products'' (unless I've been mis-lead about who the real author of ARCSGML is :-) | If anyone out there is interested in providing a vetting service for | the Users' Group, gratis (natch), and is then willing to get a | concensus from the Executive Board and/or a substantial portion of the | membership, and is then willing to pay for insurance (just in case we | get it wrong and get sued), and is then willing to be available to | answer questions about products we have endorsed, please let me know. The text of ISO 8879 has a substantive clause concerning conformance. Conformance labs are therefore able to issue the necessary paperwork stating whether or not a product complies with the provisions of the standard or not. | Seriously, though, as a volunteer, charitable-status organisation, the | Users' Group does not have the resources to check out software or | products, nor do we have a standing committee to do such work. I | welcome comments on software put out there, whether positive or | negative. This information will help users to see what others think. | And in a market society, it will help weed out inferior products. I did not mean to suggest that conformance with 8879 was something that the SGML Users' Group should test for but rather that the association of said group with a product that is only cursorily SGML does neither the product, the group or the SGML market place any good. | If you don't like a product or a piece of software, fine, let's us | know, but could we get out of the habit of having a loaded gun and | looking for somewhere to point it? My shots were well aimed. I chose a precision weapon rather than a blunder bus. :-) Those looking to retaliate can find my writings on SGML in the literature -- it is an exercise for the reader to figure out where that text is. :-) -- Regards, Trevor. --------------------------------------------------------------------------- Trevor Jenkins Re: "deemed!" 134 Frankland Rd, Croxley Green, RICKMANSWORTH, WD3 3AU, England email: tfj@apusapus.demon.co.uk phone: +44 (0)923 776436 radio: G6AJG "We need bigger and better books", Jimmy Tingle (Damned in the USA) Newsgroups: comp.text.sgml Date: 21 Oct 1993 06:15:54 UT From: Jeffrey McArthur \ Message-ID: <9310210215.memo.71059@BIX.com> References: \ Subject: Re: SGML and TeX Typesetting SGML via TeX can be done. I am currently doing it. There are a couple of difficulties. There are two approaches to typesetting SGML via TeX. The first is to use a pre-processor to convert the SGML tagging into something a bit more TeX friendly. The second is to set up TeX to run directly from the SGML coding. TeX makes a couples of assumptions about data files. One assumption is that the data is broken up into lines. Most implementations of TeX are configured with a 1K input buffer. That is, the longest line that TeX can process is 1024 characters long. If the SGML file does not insert line breaks occationally (and some SGML editors do not) the pre-processor needs to add the line-breaks. The line breaks are one of the subtle differences between TeX and SGML. I have spent the last couple of months writting a set of TeX macros to directly typeset some SGML files, without requiring a preprocessor. The macros required to do this are quite complex. Some SGML constructs are a pain to deal with. For example, SGML is case insensitive, by default. If possible, change the DTD to require that the tags be all one case. Short forms and a few other features are a royal pain to deal with. Not impossible, but hard. It is much easier to make the SGML input file more consistant. I found it simplified all the TeX macros significanlty if none of the end tags were optional. If the DTD requires all the end tags, and your parser checks to make sure that they are there (unlike Author Editor which does not complain if a required end-tag is omitted) then it is much easier to write a set of macros to typeset the data. Tables are an interresting problem. I have a set of macros for Soft Quads Table DTD that supports about 90% of the features. The only things lacking are support for all the various types of ROWSEP and COLSEP (currently only VNONE and VSINGLE are supported) and ROWSPAN. Also COLSTART and ROWSTART are ignored. In theory, you could code up a table using the Soft Quad table model where the data was coded from top to bottom, left to right using ROWSTART and COLSTART specifications in each cell. I am not sure, but I don't think Author Editor will process this correctly if you do this. But the DTD allows it. Why any one would do that is beyond me, but it is possible. -- Jeffrey M\\kern-.05em\\raise.5ex\\hbox{\\b c}\\kern-.05emArthur a.k.a. Jeffrey McArthur ATLIS Publishing phone: (301) 210-6655 12001 Indian Creek Court fax: (301) 210-4999 Beltsville, MD 20705 email: j_mcarthur@bix.com Newsgroups: comp.text.sgml Date: 21 Oct 1993 08:57:38 UT From: "Steve Heaney" \ Organization: Schlumberger Geco-Prakla Message-ID: <2a5iu2$4dv@gorgon.gatwick.sgp.slb.com> References: <1993Oct20.130035.13960@ifi.uio.no> Subject: Re: HTML, TeX and related... [Terje Holmboe] | Hopefully I will not be flamed to much to ask a question like this | here, but I am uncertain as to which newsgroup might be more | appropriate, so here goes... | | Are there any tool(s) for converting LaTeX and TeX documents into HTML? Terje, Your best bet for HTML related questions would be the comp.infosystems.www newsgroup. As for your question, a (definitive?) list of HTML tools can be found at \. Under the suject of Generating HTML you will find reference to a very good LaTeX to HTML tool written by Nikos Drakos, as well as a bunch of other useful stuff. Steve -- Steven Heaney Schlumberger Geco-Prakla Internet: heaney@delft.sgp.slb.com Newsgroups: comp.text.sgml Date: 21 Oct 1993 10:27:52 UT From: "N F Drakos" \ Organization: University of Leeds, U.K. Message-ID: <1993Oct21.102752.29496@gps.leeds.ac.uk> References: <1993Oct20.130035.13960@ifi.uio.no> Subject: Re: HTML, TeX and related... [Terje Holmboe] | Hopefully I will not be flamed to much to ask a question like this | here, but I am uncertain as to which newsgroup might be more | appropriate, so here goes... | | Are there any tool(s) for converting LaTeX and TeX documents into HTML? For information on how to retrieve, install and use the latex2html translator see: http://cbl.leeds.ac.uk/nikos/tex2html/doc/latex2html/latex2html.html Nikos Drakos Computer Based Learning Unit University of Leeds email: nikos@cbl.leeds.ac.uk WWW : http://cbl.leeds.ac.uk/nikos/personal.html Newsgroups: comp.text,comp.text.desktop,comp.text.tex,comp.text.sgml,comp.text.frame,comp.text.inter Date: 21 Oct 1993 11:09:50 UT From: "R. E. Jones" \ Organization: Computing Lab, University of Kent at Canterbury, UK. Message-ID: <303@larch.ukc.ac.uk> Subject: TEP'94 -- CALL FOR PAPERS REVISED CALL FOR PAPERS TEP94 TEP94 is a workshop in Darmstadt on 12-13 April 1994. This is a second call for papers on issues concerning the teaching of electronic publishing. The revised deadline for submission of papers is 1 NOVEMBER 1993. For further information, or submission of a paper, please contact: Mary Dyson, TEP94 workshop Department of Typography & Graphic Communication University of Reading 2 Earley Gate Whiteknights Reading RG6 2AU UK Telephone: 44 734 31 80 84 Fax: 44 734 35 16 80 e-mail: ltsdyson@uk.ac.rdg [DO NOT REPLY TO rej@ukc.ac.uk] Newsgroups: comp.text.sgml Date: 21 Oct 1993 15:03:26 UT From: "Matthias Butt" \ Organization: TUBerlin/ZRZ Message-ID: <2a68bu$knj@mailgzrz.TU-Berlin.DE> References: <28s865$hqb@mailgzrz.TU-Berlin.DE> \ \ Subject: Re: ISO 8879 what cahnges are being considered? Hi everybody: While the discussion about character sets and their encoding in this thread surely is worthwhile I would like to point out that it is not related to my posting which started this thread. In saying | that such methods of creating documents are absolutely unacceptable to | nowadays users and indeed I do not know a single commercial application | of SGML (be it in the aircraft industry, defense, publishing or | whatever) where somebody would consider even for a second to have | typists enter tons of documentation with an ASCII editor! I did not mean to say anything about whether ASCII or whatever else should be used to *represent* SGML data but merely wanted to point out that in most commercial environments (as well as in a growing number of private and academic environments) a plain text editor (be it based on ASCII, EBCDIC or whatever) is not an appropriate tool for entering that data. The reason being that such an appropriate tool (which could make commercial users switch to SGML from their WYSIWYG word processors and page layout programs) should a) support SGML in - letting the user enter tags, attribute values, entities, etc. via a menu or dialog box rather than having them type in all the markup, - allowing only appropriate markup and contents to be entered at any given point in a document, - allowing to validate documents and parts thereof on the fly from within the editing system, - supporting the creation and maintenance of cross-references, - etc. b) support the authoring, editing and correction processes in allowing to display and output documents in some formatted fashion with suppressed markup characters and at least some markup mapped onto user selectable formatting attributes (inline/block, fonts and sizes, margins, tabulators, vertical spacing, etc.). A variety of such programs (e.g., SoftQuad Author/Editor, Datalogics WriterStation, ArborText SGML Editor) addressing the requirements given above to at least some extent do exist on the market. The point that I wanted to make is that SGML (as laid out in ISO 8879) does not properly address such systems because 1) some necessary things (like cross-references and restricting attribute values) are implemented in a sub-optimal fashion by ISO 8879; and 2) the way SGML is designed it seems to be *very* hard to build such input systems with an acceptable performance. Indeed all SGML input systems that I have been working with do have very severe performance problems when they are employed to their full potential. The same is also true for output systems, in particular if output must be generated on the fly on the basis of the SGML data. My point then is that one of the reasons for SGML not being suited to the requirements of "industrial strength" input and output systems lies in the fact that SGML has been designed to support input on what I call plain text editors. Undoubtedly features like markup minimization and shortrefs are intended to make life easier for people keying in SGML data with such an editor. It is not my intention to judge the appropriateness of these design decisions in general. Also I do see that some of these decisions (including such that are not related to the use of text editors but rather to limitations of available hardware) must be regarded in their historical context as pointed out by Bruce Hunter (uad1231@dircon.co.uk): | There are also some more fundamental restrictions, which relate to the | current discussion about scoping. OK, I admit that these are also much | more solvable now than they were 10 years ago, but at that time some of | these were major obstacles (look at the restrictions on content models | and content model algebra - these are there because at the time it | wasn't felt reasonable to expect a parser to require even single token | lookahead - never mind having to try and deal with a context-sensitive | grammar!). Introducing scoping (in its fullest sense) as a concept to | the Standard needs a major rethink of the scope (no pun intended) of | the Standard. To do so would change the level of the grammar that a | DTD defines from context-free to context-sensitive. This would thence | require a commensurate increase in the sophistication of tools needed | to process SGML files. The availability of tools (or technology) seems | to have been a concern of the Standard's developers, and, as an | engineer, I can't from a practical viewpoint disagree with this | (although I can also share the feelings of those who do not feel that | this should be a factor) [how's that for fence-sitting :-)?] I suspect, however, that the appropriate historical context is not that of 1986 (when ISO 8879 was published) nor that of 10 years ago (at this point graphical user interfaces, WYSIWYG systems, object oriented programming, ..., were already very much present) but rather 15 or 20 years ago when the first steps to create a standard for generic markup were taken. Therefore one of the problems seems to be the long period between something being designed and its becoming a standard. This problem is made even worse on the grounds of the restrictive policies applied to further development of the standard. Admittedly when saying that using ASCII (or plain text for that matter) editors were totally unacceptable for *all* commercial environments I knew that this is not quite true. Bruce is certainly right in saying that | This does indeed happen, in the field of retroconversion. Often | economic factors mean that it is sometimes preferable to ship your | existing documents to somewhere offshore and have them re-keyed as SGML | files. Where labour is cheap enough, this is more cost-effective than | investing in complex (implies a learning curve) and costly (relatively) | SGML authoring tools. Some other commercial situations employing plain text editors for entering SGML data were pointed out to me in private mails and I am certainly willing to agree. However, I would still maintain that in *most* commercial environments such methods will not be considered even for a second. I believe that 98 percent of the typing in commercial environments today (with the notable exception of commercial typesetting, where character-based, albeit formatting-oriented, markup is widely used) WYSIWYG text processors or page layout programs with graphical user interfaces are being used. In all these environments it is extremely hard to convince people of switching to SGML even with tools like Author/Editor being available. It would be completely impossible if people would have to switch 'back' to plain text editors with no support for SGML. Furthermore I believe that some of the exceptions, in particular the off shore retro-conversion business that Bruce is talking about, do often not fall into the center of the area that is (or should be) addressed by SGML. In my opinion the most important application of SGML and the only area where there is no alternative to SGML is content-oriented markup. The concept of generic markup, which is addressed by SGML seems to be a good deal wider than the concept of content-oriented markup. Generic markup can be format-oriented and there are certainly many applications calling for mainly format oriented generic markup. From what I have seen so far the retro-conversion business often proves to be one such application. For offshore typists not being familiar with the subject matter and other specifics of the texts they are typing there is only the formatting of the documents being retyped. This is typically mapped onto SGML markup based on a relatively generic dtd which certainly does abstract from some specifics of the formatting but is not geared towards closely capturing the content structure of the documents. Because such dtds are typically relatively simple (and because the cost of labour is so low) it is indeed possible here to do the typing with a simple text editor. However, if the content structure of a document is to be closely captured, a very complex dtd will typically called for. I am not sure whether I could create a document instance validating against such a dtd even if I had written the dtd myself without support of a system like Author/Editor. For people who are only partly familiar with the dtd and the intentions behind certain structures this seems to be absolutely impossible. Gary Benson (inc@tc.fluke.COM ) is absolutely right in saying that | Besides, we hire writers for their technical writing skills, not for | page layout and typography. Any time spent doing anything other than | writing diminishes that writer's effectiveness. Particularly now, with | so many good tools coming out for pre-parsing text files, there is | little justification for doing "desktop publishing". Why do you | suppose all those DTP packages invariably include a "Save As ... ASCII" | selection? However, just the plain text is not enough either. After all the texts will have to be printed (or otherwise prepared for presentation) finally and even more important documents and their parts have to be maintained and retrieved based on their contents. This is where SGML comes in. The author should not deal with the formatting but he should bother a good deal with the contents of her document and its structure. So far the only standard that allows to mark up texts based on their contents is SGML. This is why I believe that it is important to address the shortcomings of SGML which have been discussed here and on other occasions in comp.text.sgml. One important result of this discussion was that some of these issues are addressed by other standards based on SGML (in particular by HyTime). I have already expressed my opinion that this is not enough on the long run. With respect some of the points raised in this thread I still believe that the constructs in SGML which support the input of SGML data with plain text editors are of comparably minor importance. If it should prove that these are responsible for some of the identified problems abandoning them should be considered. Appropriate options for backwards compatibility could still be supplied. Furthermore because it is relatively trivial to canonicalize any given document instance with existing tools I think that the restrictions imposed on further development of ISO 8879 could be lifted such that not any document valid according to the current standard must be valid according to all future variants of the standard but only any canonical document. I don't think however that such considerations would help with respect to performance. Here a completely new design might be called for. I would be interested in hearing opinions on this. Another area which might require a complete re-design is the scoping stuff. However, I believe that it should be possible at least to substantially improve the current standard while maintaining the backwards compatibility required. A first step would be a new (additional) inclusion and exclusion mechanism which applies not directly to the contents of the element being declared but only to the contents of nested elements. Matthias Newsgroups: comp.text.sgml Date: 21 Oct 1993 17:25:15 UT From: "Eliot Kimber" \ Message-ID: <19931021.131839.716@almaden.ibm.com> Subject: Linking Into SGML From Non-SGML Using HyTime As many different vendors and organizations start to move their information into SGML for the purpose of providing online help, they will run into the problem of how to associate a location in the program for which help is being provided with a location in the SGML help so that the appropriate bit of help can be found and presented. There are a number of strategies that can be applied to this problem, but if you want to use HyTime methods (which I strongly suggest), you run into a problem. The purpose of this post is to suggest a HyTime solution to this problem that represents a general solution applicable to a wide range of implementations, specific applications, levels of function and complexity. If you look at SGML-based hyperlinking in general, and HyTime in particular, you'll notice that the point of view is typically from the SGML document looking out--in other words, linking from your SGML-encoded information to something else, either another SGML object or some non-SGML object. At first look, HyTime provides no clear mechanism for linking in the reverse direction, from non-SGML stuff into SGML content. This problem surfaces in almost all online help systems because the direction of linking is from the system object to its help, not the other way around. How to solve the problem? Most existing help systems provide some sort of ad-hoc mapping from system-specific locations to SGML locations (IDs). For example, in the OS/2 IPF help system, each help topic is associated with a corresponding program object by coding the program object's "resource identifier" on the division element, something like this: :h1 id=fnfield res='AB000034FFCA'.Help For Filename This works, but the RES= attribute represents a non-standard, non-SGML, platform-specific identifier that will only have meaning in one context (OS/2 help) and may interfere with re-use and interchange of the data (working from the assumption that the help information is probably re-used in several contexts, including the printed product documentation and other help systems on other platforms). Therefore, a more general solution would include some sort of separate mapping from system-specific locations to SGML locations that each different help system would use to access the correct information, something like: # Resource number to ID map for IPF # Resource number SGML ID AB000034FFCA fnfield This works, but it's an ad-hoc method that is neither standardized nor sophisticated. Each different help system will probably have a different mapping, each of which will have to be maintained and none of which can necessarily be used by any other system. So, given that we need some sort of mapping scheme and that we'd like to to be standardized, sophisticated, and at least potentially re-usable, what can we do. It turns out that existing HyTime constructs can be used to define just this sort of mapping if we twist our thinking about how hyperlinks get processed just a little bit. While the normal viewpoint of HyTime is from the SGML looking out, this is not the only view. If we consider the association between program objects and help for those objects to be hyperlinks (which they are), and more importantly, to be *bi-directional* hyperlinks, then we can view the linkage from either end of the link. In a HyTime context, the fact that a hyperlink is bi-directional simply means that the processing system has to be able to resolve the link from either direction, and in the case of non-SGML-to-SGML links, that means it has to be able to take a non-SGML location, determine which links have that location as an anchor, and go from there. The actual HyTime constructs needed to create this map are: o the independent link (ilink) form to define the association between the non-SGML location and the SGML location o the notation-location and/or NMQuery forms to define the non-SGML locations o the named location form to define the SGML location if indirection is needed (such as for links to multiple locations or links across documents). The HyTime ilink form represents a multiple-anchor link, like so: \ Ilink content, if any. \ The Linkends= is defined as IDREFs and points to the anchors. The AnchRole= attribute serves to associate named anchor roles with the IDs in Linkends (the meaning of the roles is defined by the application, just as for element names). An ilink element can be empty or it can have whatever content makes sense for a given application, often elements describing the meaning of a given link. For the specific task of an object-to-help map, we might define an instance of the Ilink form like this: \ \ Here I've make the content of the ilink element the other HyTime location elements needed to complete the mappings. This is purely for neatness--there is no HyTime constraint on where location elements go so you are free, as a DTD designer, to define their location to be whatever makes the most sense for your application. For this application, it makes sense to keep the pieces of each map entry together in one place, so why not contain them in the map entry itself? The IPF mapping above would translate into this HelpMapEntry: \ \AB000034FFCA\ \ The Linkends= attribute points to the two anchors (the two columns in our mapping table). The ID "object1" points to the notation location element that contains the actual resource number of the program object. The ID used on the notloc element is purely arbitrary because it is only meaningful within the context of this map--thus it could be automatically generated, for example. The ID "fnfield" is the ID of the actual SGML element containing the help for the object (in this example the map is part of the same document as the help itself). If the map were a separate document, which is probably the better approach, we have to use a little indirection to connect to the SGML element, as shown in this more complete example: \ \ &#DSC;> \\ \Program to help mapping for product X on OS/2 using IPF as the help system. \ \ \AB000034FFCA\ \ \fnfield\ \ \ \ This more complete example shows both the declaration of the help document as a separate entity, in this case a complete SGML document, as indicated by the notation SGMLDocument, which has been defined to be the HyTime notation form "document", indicating that it represents a complete SGML document. The document type HelpMap then contains one or more HelpMapEntry elements, preceded by a Desc element which simply provides a place to document the map itself. Finally, note the Nameloc element that has been added to the HelpMapEntry. The purpose of the Nameloc is to associate a local ID (local to the map document) with an ID in another document (in this case the document containing the actual help text). It does this by containing a name list element. The content of NMList is one or more element IDs, thus defining the target element, and the HyTime DocorSub= attribute takes the name of the document entity that contains those IDs. Note that I can use the same ID for the Nameloc element as is used for the help topic since the map document and the help document are different ID name spaces. At this point you're probably saying "my goodness, that's an awful lot of typing to replace a simple two-field table, why should I bother?" The reasons to bother are: 1. The map above can be processed by any HyTime system (or non-HyTime system that can at least resolve nameloc and notloc), which means it's standardized and pretty much universally interchangeable among HyTime systems. 2. The use of notations and notation locations helps to clearly identify the non-SGML location methods used so that both humans and computers can know if they can use those methods, and if not, where to look for information on how they can. At a minimum, any system set up to handle notloc will be able to look at the notation and tell you up front if it can handle it, rather than waiting until it tries to use it and fails. It also allows the reliable support of different location methods by a single system since it knows it need only look at the notation value to know what sort of location processing to use. This differentiation by notation could also be used to limit the searching done to resolve a given non-SGML location. For example, if you had locations in different notations in the mapping table but were given an IPF location, the system need only check those entries that use the IPF notation, rather than checking all entries. 3. Having eaten the base cost of using HyTime methods, you are prepared to use location methods more sophisticated than simple one-to-one explication--specifically queries. It is the ability to use queries to do the mapping that makes this approach potentially very powerful in a reliable and interchangeable way. One way to use queries to do the mapping would be in a situation where instead of mapping unique non-SGML identifiers to SGML IDs, you want to map probably-unique descriptive text to probably-unique SGML element content. For example, in Kent Summers' and David Durand's X-Help specification, they use the widget name and hierarchy to determine which help topic to access for a given widget. If we assume that in the help source there is an element that contains some or all of the widget name as it's content, we can define the mapping as a single entry using queries: \ &#DSC;> \ \ \ select(DOMTREE UseQ(Has_Widget_Name \&widgetname;)))) \ \ select(select(DOMTREE UseQ(Has_GI widgetname)) EQ(dataloc(CAND NORM (1 -1) ) \&widgetname;))) ) \ \ This help map consists of a single entry that contains a query for each anchor. The widgetloc anchor (satisfying the "object" role) uses a query that locates a widget with a given name. The helploc anchor uses a query that finds all elements with the GI "widgetname" and then selects from that set of elements those elements whose content is equal to the widget name. The made-up HyQ function "Has_Widget_Name()" would use a notation-specific property location to retrieve the name of a given X-motif widget--I have omitted the definition of this query for simplicity, although it might make an interesting example of how to use property sets to query the properties of non-SGML objects). The way a processor would actually use this mapping is to sort of run the widgetloc query backwards since at the start the help server knows what the widget name is since it was passed to it as part of the "get help for this widget" request. It seems a little backward because the query points out from the HelpMapEntry element to the non-SGML anchor. Remember though that these links are bi-directional, so I can also point (traverse) from the anchor to the HelpMapEntry element, which is what we do to find the help associated with a given widget. Since the processing to get help is from the non-SGML anchor to the SGML anchor, we do use the helploc query to really find the SGML element that describes the widget in question, represented by the SDATA entity "\&widgetname;", whose replacement text would be determined by the system at the time the "get help" request) is made. The above example is, of course, not complete for a real system, but you get the idea. By adding more queries, it would be possible to represent the various fall-back strategies defined in the X-Help spec or used by similar systems (for example, if you can't find help for the widget, see if you can find help for the widget's parent class). Another example of this sort of query-based mapping might be to map section numbers to elements by using structural locations. In other words, if you're given as input the section number "1.3.4", if you know the elements that represent the hierarchical divisions, you can use HyTime tree locations to find the element that would have the section number 1.3.4 (assuming the assignment is completely structure based), or, if the section number is an attribute of the division element, you could use property locations to find the element whose "number=" attribute had the value "1.3.4". The possibilities are pretty much endless, and all done with the same basic set of HyTime elements and location methods. The only variable becomes the specific functionality provided by a given processing system--the location specifications themselves are more or less blindly interchangeable, especially if you use a standard notation for queries (e.g., HyQ). For the simple one-to-one explicit mapping table, a real system would probably use the table to load its own internal data structures with the mapping to give good performance. In the case of query-based mapping, it might "pre-resolve" the queries, if possible, or pre-index the data or whatever optimization might be possible. But note that a generalized HyTime system would still be able to interpret the mapping table directly and resolve it as needed with no pre-processing or optimization. Finally, note that to implement support for this method of defining mappings does *NOT* require that you have a fully-functional HyTime engine. Far from it. The location methods used in these examples can be supported easily using existing SGML systems. If you want to use queries, you might define some implementation restrictions or even define a set of "canned" queries that you've implemented directly (rather than using a generalized HyQ interpreter). The key is that we're using HyTime as system-neutral way of defining the relationships between things so that they are interchangeable between different systems and implementations--the fact that the system that uses these definitions may not actually be a HyTime system (or may not consider itself a HyTime system) is immaterial. The complete definition of the HelpMap document type is as follows: \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Comments on DTD: (1) This notation is used to declare entities that represent complete documents (satisfying the restriction that Docorsub= must refer to SUBDOC or document entities). The HyTime notation form "document", specified as a fixed data attribute for the notation indicates to HyTime systems that entities using this notation are in fact complete SGML documents. (2) The notation "IPFRES" is used on the notation location element to indicate that the notation location is a resource number as used to associate OS/2 Presentation Manager program objects with help panels in IPF help. The actual location of the IPFRES specification is largely unimportant since the notation name simply serves as an identifier. Presumably there is some OS/2 programming document that defines the rules for forming resource numbers, which is what the public identifier should resolve to in practice. (3) The notation HyQ is used to indicate the query notation used with the NMQuery element. In this example, only the HyQ notation is defined, but you could define other query notations if you wanted. For example, if your help system were DynaText-based, you might find it more expedient to use the DynaText query language directly. (4) These HyTime elements are all direct instantiations of the architectural forms found in ISO/IEC 10744. I've omitted attributes that are not obviously applicable to the example and for which the HyTime standard defines a default. If we gave this DTD more thought, it might make sense to define GIs for these elements that relate directly to the roles they play in establishing the links to the different anchors. -- Eliot Kimber Internet: drmacro@vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709 Newsgroups: comp.text.sgml Date: 21 Oct 1993 18:13:06 UT From: John Burger \ Message-ID: <9310211813.AA10381@thelonius.mitre.org> Subject: Instance normalizer Does anyone have a pointer to a public domain Unix application that does document instance normalization? This would produce a "maximized" version of an instance, acceptable to an application that supported no minimization features. If necessary, I can easily build such a thing on top of, say, SGMLS, except that this would also necessarily expand entity references, etc. I would rather avoid that. Thanks. John Burger john@mitre.org Newsgroups: comp.text.sgml Date: 21 Oct 1993 20:55:48 UT From: "Nick Carr" \ Message-ID: <9310212055.AA16346@ditsydh.syd.dit.csiro.au> Subject: Re: ISO 8879 what cahnges are being considered? [Matthias Butt] | Some other commercial situations employing plain text editors for | entering SGML data were pointed out to me in private mails and I am | certainly willing to agree. However, I would still maintain that in | *most* commercial environments such methods will not be considered even | for a second. I believe that 98 percent of the typing in commercial | environments today (with the notable exception of commercial | typesetting, where character-based, albeit formatting-oriented, markup | is widely used) WYSIWYG text processors or page layout programs with | graphical user interfaces are being used. In all these environments it | is extremely hard to convince people of switching to SGML even with | tools like Author/Editor being available. It would be completely | impossible if people would have to switch 'back' to plain text editors | with no support for SGML. The question begs asking - if you are happy using Word for Windows, why switch to SGML of any flavour? If you are not in a commercial publishing environment, why would you want to use SGML. We are heavily committed to SGML, but I still use WordPerfect 5.1 for all my own work. Text editors work fine for SGML mark-up (although we are now quite enthusiastic about WordPerfect's Intelli\, warts and all). WE and some of our clients have done some large jobs with various editors. My personal opinion is that if you haven't been trained well enough to understand what the markup means, maybe you shouldn't have your hands on the data. -- Nick Carr Allette Systems (Australia) Level 10, 91 York St Sydney NSW 2000 Australia Newsgroups: comp.text.sgml Date: 21 Oct 1993 21:13:06 UT From: Steve Pepper \ Organization: Falch Hurtigtrykk as, Oslo, Norway Message-ID: <1993Oct21.211306.27895@falch.no> References: <19931020.101352.551@almaden.ibm.com> Subject: Re: SGML and Quark Express and/or PageMaker [Wayne L. Wohler] | I am investigating the use of Quark Express and/or PageMaker in an SGML | info environment. Does anyone have any ideas, suggestions or | experiences they can share either here or privately? Our typesetting department uses Quark XPress for most of its page layout and our production people would love me (and my life would be a _lot_ easier) if I could use XPress to layout all our SGML jobs. Unfortunately, I can't. XPress has a tagging system which is very comprehensive and allows you to represent a large part of XPress' formidable array of typographic features, including obscure things like base line shifts, font sizes down to 1/1000 of a point, etc., etc. But XPress is very much an interactive program, and it is weak on batch formatting -- which is what I expect when I've gone to all the trouble of getting my document into SGML! XPress cannot generate any text itself: no automatic running headers or footers, no tables of contents, no indexes, no cross-references. There are Xtensions on the market that do some of these things, but usually in a way that isn't of much help. For example, Sonar Bookends lets you build indexes, but only on the basis of a concordance list (is that what it's called?). That is usually not good enough: in my experience you seldom want to index _every single occurrence_ of a particular word or phrase. You want to index _some_ of them (the ones in your \ element, or whatever). Also, you want to index words which don't actually appear in that form in your document. But because XPress has no way of identifying index words internally, an Xtension developer can't do it either. Another problem with XPress relates to multi-column text, where you want some textual elements to span more than one column: For example, a 2-column publication where chapter heads span the whole page. To create that in XPress, you have to set up a 2-column base page layout into which you pour your text, and then you must create new text boxes (spanning both columns) for each of your chapter heads, cutting and pasting the text as you go. Again, not something you ought to have to do manually, when all the information necessary to do it automatically is present in the SGML file. Also on the subject of multi-column text: there is no way to automatically balance columns (you have to resize the text box manually). Thirdly, there is no way of importing graphics (not to mention positioning them) using XPress tags. Fourthly, XPress is not very good at tables. It uses a simple tabulator (restricted to 20 tab settings) and has no concept of table cells (in which text may wrap). Again, there are Xtensions (e.g., Tableworks), but these are not (as far as I know) very suitable for batch import. Subheads in Fifthly (?), if you want your subheads in the margin, like in margins this paragraph, you're in deep shit again, because XPress expects you to create separate text boxes for them, which you have to do manually (as for the chapter heads mentioned above). You can't even fake them, as you can in some other applications, because of the lack of support for cell based tables. And finally, XPress doesn't handle automatic vertical justification as well as I usually need. You can specify that the vertical positioning of text inside a text box should be 'top', 'bottom', 'centered' or 'justified'. If you choose 'justified' you can specify the maximum amount of vertical justification between paragraphs; if that is not enough, XPress will insert extra space between lines. What you can't do is specify different amounts of vertical justification for different 'styles' (i.e. kinds of paragraph). You can't, for example, say vertical justification is allowed above paragraphs with the styles "subhead1" and "subhead2", but not above paragraphs with the style "body text". Having said all that, we _have_ used XPress for a couple of simple applications, and if you don't need the things mentioned above it's dead easy to generate the tagged ASCII that XPress likes. The product we have used most for page layout from SGML files is Ventura Publisher, which doesn't have the typographical finesses of XPress, but can generate ToCs, headers, footers and indexes (not sure about xrefs), handles text spanning multiple columns (though not tables, unless you intervene manually) and is good on vertical justification. There are other things I don't like about Ventura, but more about that another time! By the way: We've also looked at FrameMaker and had to reject it (up to now) because of problems with vertical justification and column spanning. But it is certainly far more suitable than XPress, since it is very strong on generated text. It would be interesting to know of other DTP applications that are as suitable as (or better than) Ventura... Best regards, Steve -- pepper@falch.no ------------------------------------------------------------------ falch hurtigtrykk a.s, postboks 130 kalbakken, n-0902 oslo, norway tel +47 2216 3040 fax +47 2216 2350 Newsgroups: comp.text.sgml Date: 21 Oct 1993 22:25:59 UT From: "Eliot Kimber" \ Message-ID: <19931021.162450.760@almaden.ibm.com> References: <28s865$hqb@mailgzrz.TU-Berlin.DE> \ \ <2a68bu$knj@mailgzrz.TU-Berlin.DE> Subject: Re: ISO 8879 what cahnges are being considered? [Matthias Butt] | I don't think however that such considerations would help with respect | to performance. Here a completely new design might be called for. I | would be interested in hearing oppinions on this. If there is an inherent problem with the performance of SGML editing systems (and I'm not convinced there is), I suspect that that problem is because of the need for SGML editors to maintain the structure tree and element properties, something that a flat word processor does not have to do. Since most SGML editors read the SGML into internal data structures in any case, I find it difficult to believe that any re-design that kept the basic SGML concepts of structure and attributes constrained by a DTD would be able to solve that problem. I think the difference between full- function SGML editors and word processors is like the difference between painting programs and CAD systems. The complexity of the data being manipulated by SGML systems compared to word processors is very great, especially with the latest generation of content- and hypertext-focused DTDs like OSFDOC, Docbook, and IBMIDDoc. Considering what they do, I think the SGML editors out there today perform pretty well, considering their developers didn't have the luxury of optimizing their data format for speed, which is what all the word processor vendors do (and why you can't reliably interchange the data they produce). I'm probably too deep into SGML to be able to be objective, but I have a difficult time imagining what a significantly different design of SGML that continued to meet the requirements of human readability and createability would look like. Certainly we might be able to come up with some syntactic changes, but you can, by and large, do that today by defining variant syntaxes. -- Eliot Kimber Internet: drmacro@vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709 Newsgroups: comp.text.sgml Date: 22 Oct 1993 04:23:27 UT From: "John Wilkins" \ Organization: Monash University, Melbourne Australia Message-ID: \ References: <19931020.101352.551@almaden.ibm.com> Subject: Re: SGML and Quark Express and/or PageMaker [Wayne L. Wohler] | I am investigating the use of Quark Express and/or PageMaker in an SGML | info environment. Does anyone have any ideas, suggestions or | experiences they can share either here or privately? We are going the other way, from Quark/FrameMaker (not PageMaker) to SGML. It is trivial to use a grep editor (eg, Nisus on the Mac) to convert SGML tags to QXP tags, or FM MML tags. Less trivial to convert to FM MIF formats. Probably very easy to convert SGML to PM tags. You'll also need to convert the SGML special character tags to the platform dependent ascii codes for those characters. Before anyone hops in, I KNOW FrameBuilder has SGML import paths using a parser. I just wish it had a direct support for an SGML tagged file. I don't mind using an SGML editor to have the stuff created, but I just HATE 2 step import processes (when you're importing 300+ files). -- John Wilkins - Manager, Publishing Monash University, Melbourne Australia Internet: john_wilkins@udev.monash.edu.au Tel: (+613) 565 6009 Monash neither knows, nor approves, of what I say Newsgroups: comp.text.sgml Date: 22 Oct 1993 06:23:23 UT From: "Daniel Tauber" \ Message-ID: \ References: <19931020.101352.551@almaden.ibm.com> Subject: Re: SGML and Quark Express and/or PageMaker [Wayne L. Wohler] | I am investigating the use of Quark Express and/or PageMaker in an SGML | info environment. Does anyone have any ideas, suggestions or | experiences they can share either here or privately? I just talked to two people about SGML at the Quark booth at the Seybold Publishing Conference. The first person told me that some third party developers are working on an SGML filter for Quark. The second person said "he cannot talk about unannounced features or products." I came away with the impressions a) Quark is working on a SGML filter, and b) I do not want to do business with them. Dan Tauber dat@netcom.com Newsgroups: comp.text.sgml Date: 22 Oct 1993 06:28:51 UT From: "Patrik Gustafsson" \ Organization: Abo Akademi University, Finland Message-ID: \ References: <19931020.101352.551@almaden.ibm.com> <1993Oct21.211306.27895@falch.no> Subject: Re: SGML and Quark Express and/or PageMaker [Wayne L. Wohler] | I am investigating the use of Quark Express and/or PageMaker in an SGML | info environment. Does anyone have any ideas, suggestions or | experiences they can share either here or privately? [Steve Pepper] | < LOTS OF TEXT DELETED WHY QUARK XPRESS IN NOT SUITABLE > Well, we here at the Department of Computer Science at Abo Akademi University, in cooperation with Pira International in England, are actually working on including SGML-support into Quark XPress. As this work is a part of RACE project R2037, DIDOS, I am not allowed to discuss it in any detail, but thought that you people wanted to know that work is being done on the subject. (RACE = Research and Technology Development in Advanced Communications Technologies in Europe) I am not giving any opinions on what Steve Pepper writes, except that I think that he might be partly right and partly wrong. -- Patrik Gustafsson Lemminkainengatan 14-18 A Dept. of Comp. Sci. FIN-20520 Abo Abo Akademi University Finland Newsgroups: comp.text.sgml Date: 22 Oct 1993 11:11:51 UT From: "Francis Cave" \ Organization: Pira International Message-ID: <751288311snz@pira2.demon.co.uk> References: <2a68bu$knj@mailgzrz.TU-Berlin.DE> Subject: Re: ISO 8879 what cahnges are being considered? [Matthias Butt] | ... The author should not deal with the formatting but he should | bother a good deal with the contents of her document and its structure. | So far the only standard that allows to mark up texts based on their | contents is SGML. Sorry to digress from the topic of this thread, but I am not sure that SGML is in fact the right tool for content-oriented markup by authors. In a recent conversation with an author/editor (who happens to be a philosopher), I discussed this very point. The author claimed to have looked at SGML for this purpose and to have rejected it on the grounds that the hierarchical structures most readily represented in SGML do not relate well to the way that authors organize their ideas. While publishers (including authors concerned over-much with putting their ideas into publishable form) are concerned with chapters, sections, lists and paragraphs, many authors, concerned purely with recording their ideas in words, find such structural constraints an impediment. Support for this view can be provided by analysing the evolution of an author's manuscript from first to final draft. The changes that the author makes do not necessarily follow structural boundaries. Ideas can start in the middle of one paragraph and finish in the middle of the next, or can be in unconnected fragments scattered about a piece of text. Maybe these problems are more acute for some authors than for others (e.g., philosophers?), but it does add weight to the oft-expressed view in some quarters that coding is not a function that naturally devolves to authors. -- Francis Cave Publishing Group Pira International Randalls Road Leatherhead Surrey KT22 7RU United Kingdom Tel 0372 376161 Fax 0372 377526 email cave@pira2.demon.co.uk Newsgroups: comp.text.sgml Date: 22 Oct 1993 14:42:33 UT From: "Michael G. Popham" \ Organization: University of Exeter, UK. Message-ID: \ Subject: Fonts for sale A colleague in the U.S. has started his own type foundary, making and selling PostScript fonts for slightly unusual symbol sets and alphabets. Currently, he is supplying the following: Glagolitic/Croatian Cunneiform: Ugaritic, Old Persian, Buginese/Makassar Pi Fonts: Pi Serif Three (crosses and other characters that may be found separating words in legends). and less unusual: Box drawing (single/double/thick/thin lines etc.) Seven different fonts of Math Symbols (inc. Engineering, Chemistry and Physics symbols) Math Format: two fonts, (Math) Pi Font. All the above are available in PostScript Type 1 format and can thus be used on a large variety of laser printers and phototypsetters and with many PC packages, such as FrameMaker and Word for Windows. Package contents: * Type 1 outline font software (PFB files) * AFM files containing character metrics * PFM fiels containing character metrics for Windows * INF files for use with install programs * CFG file a configuration file for the package * a user guide giving hints on the use of these fonts. For more information contact: Berglund Type Foundry 2409 Fifth Street Boulder, CO 80304 USA Tel: +1 303 440 4479 Fax: (as above - call before sending) Newsgroups: comp.text.sgml Date: 22 Oct 1993 16:40:32 UT From: "Michael G. Popham" \ Organization: University of Exeter, UK. Message-ID: \ Subject: CANCELLED - SGML UK AGM (London, 26 October 1993) I've *just* received a fax saying that the AGM of the UK Chapter of the International SGML User's Group, due to be held in London on 26th October 1993, has been "cancelled due to a small response in reply to a late mailing. Those who have already paid their attendance fees will not be liable for payment for the next meeting". This fax was sent to me by Gaynor West (who will presumably be trying to contact anyone else who has already sent off the reply form + cheque). Gaynor can be reached at: SGML Users' Group PO Box 361 Swindon, Wiltshire SN5 7BF Tel: 0793 512515 Fax: 0793 512516 So if any of you were hoping to go, I'll see you at the next meeting! Michael Popham -- SGML Project - C.D.O Email:M.G.Popham@exeter.ac.uk Computer Unit - Laver Building Phone:+44-(0)392-263946 North Park Road, University of Exeter Fax: +44-(0)392-211630 Exeter EX4 4QE, United Kingdom Newsgroups: comp.text.sgml Date: 22 Oct 1993 20:36:27 UT From: Chet Ensign \ Message-ID: <199310222037.AA07524@naggum.no> References: <2a68bu$knj@mailgzrz.TU-Berlin.DE> <751288311snz@pira2.demon.co.uk> Subject: Re: ISO 8879 what cahnges are being considered? [Francis Cave] | ... but I am not sure that SGML is in fact the right tool for | content-oriented markup by authors. What exactly do you mean when you say "content-oriented?" Like SGML or not, it is the only content-oriented markup scheme you've got. | The author claimed to have looked at SGML for this purpose and to | have rejected it on the grounds that the hierarchical structures | most readily represented in SGML do not relate well to the way that | authors organize their ideas. As a writer I can identify with your point. But I'd also say that whether or not it is the right tool depends on the circumstances. A lone philosopher developing his thoughts through the vehicle of a manuscript, who faces no requirement that his work mesh with that of others, has no need for SGML. Twenty technical writers working on a body of text that must conform to required specs and come together into a single published entity seamlessly need SGML badly, whether they know it or not. (You can tell what I do for a living!) I think many of the advantages of using SGML accrue not to the individual author but rather to the team. As for the lonely thinker, let him or her choose whatever tool s/he finds least annoying (since s/he already has enough to wrestle with) and let the publisher hire an SGML tagger after the fact. -- Chet Ensign Information Builders, Inc. 212-736-6250 X4349 internet: doccoe@ibivm.ibmmail.com ibmmail: USUBUVMV@IBMMAIL compuserve: 73163,1414 Newsgroups: comp.text.sgml Date: 22 Oct 1993 22:31:17 UT From: Chet Ensign \ Message-ID: <199310222232.AA07940@naggum.no> References: <1993Oct15.121017.6534@ericsson.se> Subject: Re: E-mail address to Zandar Corporation? [Jan Ekstr�m] | Do you know if Zandar Corporation have an E-mail address? Don't know -- I have release 3.0 of their product but no email address is listed in the manual. | This company have made a program called TagWrite, which should be able | to change a document written in Word for Windows (saved in RTF format) | to a tagged document like SGML-documents. | | I would also like to hear from anybody who have experience with this | program!!! We tested TagWrite 3.0 for Windows at our company about a year and a half back. TW is a GUI-type tool that you use to create text- processing rules. The rules make up a template that can convert text and markup into other text and markup. One feature that made TagWrite attractive is that they have captured a lot of RTF smarts into their rule-building tools. Each rule has three basic parts: a top line (which contains a pattern to be compared against the stream of text in the document), a bottom line (which contains a pattern describing what is to be written back out when the rule fires), and a "Next element" box to tell it where to go after executing the rule. The rules can be numbered, prioritized and clustered into routines and subroutines. The pattern in the top and bottom line gets made up of a mix of text, tokens and supertokens. Text is actual physical text to match or write out. The tokens are place holders in the rule for characters or word processing commands (such as ITAL for the italics format code or �H for a non-breaking space character). Supertokens are ... hummm, how can I explain this? They are like vacuum cleaners. They are triggered by specific conditions in the text and, once started, they keep on sucking up everything in the file until they reach another specific condition that shuts them off. For example, the TeXt + # supertoken is started by any letter or number or format codes like bold, underline, etc. It is stopped by a hard new line code in the text. So, if you had {TeXt + #} in the top rule and \

{TeXt + #} in the bottom rule, as soon as the rule was triggered, it would suck up all the text until the end of the paragraph and then write it all back out to the output file preceded by \

. We haven't continued to use it because we found it too difficult to create and maintain templates that could really deal with our files. It was too easy to create spaghetti code. We could accomplish the same result faster using something like REXX or AWK. The real problems, however, lay in the Word files themselves. I'm sure that you've heard it here over and over again; you can't get reliable, consistent markup from a GUI word painter. Even with a well-defined set of styles I found that authors were too variable in the way they formatted the document. It was very hard (and, by the way, it still is) to account for every permutation of formatting that they came up with. (Make that "they" "we." I got a real surprise the first time I converted one of my own documents.) Turns out, in particular, that it is not easy to figure out where something **ENDS**. Text structures embedded inside of other text structures (for instance, lists inside of other lists) are particularly hard to isolate -- at least around here. I had one case yesterday where an item in a numbered list had two paragraphs, both indented, then a paragraph on the far left margin, then the numbering picked up again where it had left off. I assumed that this was a mistake, until I talked to the writer and discovered that this was exactly what he had intended to do. So the list wasn't over, but the paragraph wasn't part of the list and ....!!! After all this yack, bottom line: simple documents and people with minimal programming experience and TagWrite will probably do everything you want. But for anything more demanding I don't think it is all that practical. /chet -- Chet Ensign Information Builders, Inc. 212-736-6250 X4349 internet: doccoe@ibivm.ibmmail.com ibmmail: USUBUVMV@IBMMAIL compuserve: 73163,1414 Newsgroups: comp.text.sgml Date: 22 Oct 1993 23:30:14 UT From: "Matthias Butt" \ Message-ID: <2a9qe6$p3d@mailgzrz.TU-Berlin.DE> References: <28s865$hqb@mailgzrz.TU-Berlin.DE> \ <2a68bu$knj@mailgzrz.TU-Berlin.DE> <19931021.162450.760@almaden.ibm.com> Subject: Re: ISO 8879 what cahnges are being considered? [Eliot Kimber] | If there is an inherent problem with the performance of SGML editing | systems (and I'm not convinced there is), I suspect that that problem | is because of the need for SGML editors to maintain the structure tree | and element properties, something that a flat word processor does not | have to do. Since most SGML editors read the SGML into internal data | structures in any case, I find it difficult to believe that any | re-design that kept the basic SGML concepts of structure and attributes | constrained by a DTD would be able to solve that problem. I think the | difference between full-function SGML editors and word processors is | like the difference between painting programs and CAD systems. The | complexity of the data being manipulated by SGML systems compared to | word processors is very great, especially with the latest generation of | content- and hypertext-focused DTDs like OSFDOC, Docbook, and IBMIDDoc. | Considering what they do, I think the SGML editors out there today | perform pretty well, considering their developers didn't have the | luxury of optimizing their data format for speed, which is what all the | word processor vendors do (and why you can't reliably interchange the | data they produce). By and large Eliot (and other people making the same point in postings and private mail) is right here. The performance problems which various SGML applications do seem to have must probably be attributed to the complexity of the relations expressed in an SGML document and not to the syntax of the language used to express these relations (i.e., SGML). I am not sure how many applications actually do employ their own internal data structures to represent SGML data. I know that SoftQuad Author/Editor does this. I believe the ArborText Publisher/SGML-Publisher products don't do this but rather parse the data (representad plainly in SGML) on the fly during editing. I don't know what other systems do. My impression is that many do use SGML more or less directly. E.g., I would believe that DynaText stores the SGML data in some inverted format (index) but feeds plain SGML into the screen formatter (but I don't know that!). I am still not sure in how far the syntax of SGML does cause problems (in terms of performance) for systems that do work on plain SGML data (parsers, formatters, translators, ...). It does seem to be the case that standard tools (e.g., all the UNIX utilities) do find it hard to deal with SGML which is *not* related to the expressive power of SGML (which does by no means exceed that of typical programming languages/data structures) but soleley to its very strange syntax (which was apparently designed to save keystrokes in certain editing situations). I cannot say to have a clear idea how much these issues also influence performance. However, because Eliot is right in saying that it is always possible to translate SGML into more efficient data structures for internal handling by the application, I agree that performance should be a minor point in all discussions re possible (or rather desired) changes to ISO 8879. What really is important are the issues raised in this thread with respect to expressive power, namely cross referencing mechanisms, local re-definition of content models (both can be subsumed under the heading of scoping constructs) and better ways to restrict the values of attributes (and possibly character data content as well). Nick Carr \ with respect to my complaints about SGML (and SGML applications) still often not being suited to commercial environments and about the problem of convincing users of nowadays text processors of switching to SGML: | The question begs asking - if you are happy using Word for Windows, why | switch to SGML of any flavour? If you are not in a commercial | publishing environment, why would you want to use SGML. We are heavily | committed to SGML, but I still use WordPerfect 5.1 for all my own work. I think the answer is quite easy: For commercial users (large corporations dealing with large quantities of text data) there are many definite advantages in switching a) to a standardized data format and b) to content-oriented markup. b) in particular promises the advantage of being a good basis for document management, reuse of information, advanced retrieval methods, simultaneus publishing on various media, new media (such as HyperText), publishing on demand, and other developments. However, these advantages don't come for free. Although no one can expect these to be free, it certainly does not actually ease the introduction of SGML in such environments. But let's say you are through with all the arguments and financing and people do want to settle for the real thing: The least thing you want to see than is that people get the impression of moving backwarts instead of forwards. This imporession would be bound to occur if you would tell the editors who are used to programs like Word for Windows (or whatever) where they chose their style sheets from a menu, get decently formatted output on the fly, use spell checkers and whatever, if you tell these people that now in order to employ SGML they have to use good old VI or another plain text editor. People simply won't accept this. (It may still be true that they would be faster because all the WYSIWYG software makes them loose so much time with fancy formatting that is never going to be used in the end anyway.) This is why SGML editors which a) support SGML and b) support the user in giving them readable formatted screen output, menus to chose their elements from, etc., are so important for SGML to be accepted in commercial environments. This is also why I believe that it is of minor importance for SGML to support data capturing on plain text editors. Matthias Newsgroups: comp.text.sgml Date: 23 Oct 1993 04:09:36 UT From: "Wayne L. Wohler" \ Message-ID: <19931022.230620.370@almaden.ibm.com> References: <19931020.101352.551@almaden.ibm.com> <1993Oct21.211306.27895@falch.no> Subject: Re: SGML and Quark Express and/or PageMaker Steve, thank you for your very detailed response. Our application with Quark is really for reuse of data by folks who expect to tailor the pages to their specific purpose, not for automatic pagination so some of the problems you outline may not be a problem for us. Your response will require some study. Thanks again. -- Wayne L. Wohler Internet: wohler@vnet.ibm.com Dept G82/910M IBMMAIL: USIB29WX@IBMMAIL Publishing Solutions Development Phone: 1-303-924-0470 IBM Corporation PO Box 1900 Boulder, Colorado 80301-9191 [Editor's note: This is in reponse to Steve Pepper's message. \] Newsgroups: comp.text.sgml Date: 23 Oct 1993 08:18:10 UT From: "Matthias Butt" \ Organization: TUBerlin/ZRZ Message-ID: <2aapc2$jpd@mailgzrz.TU-Berlin.DE> References: <2a68bu$knj@mailgzrz.TU-Berlin.DE> <751288311snz@pira2.demon.co.uk> Subject: Re: ISO 8879 what cahnges are being considered? [Francis Cave] | Sorry to digress from the topic of this thread, but I am not sure that | SGML is in fact the right tool for content-oriented markup by authors. | In a recent conversation with an author/editor (who happens to be a | philosopher), I discussed this very point. The author claimed to have | looked at SGML for this purpose and to have rejected it on the grounds | that the hierarchical structures most readily represented in SGML do | not relate well to the way that authors organize their ideas. While | publishers (including authors concerned over-much with putting their | ideas into publishable form) are concerned with chapters, sections, | lists and paragraphs, many authors, concerned purely with recording | their ideas in words, find such structural constraints an impediment. I think SGML does not say too much about what the structure of your documents should be like. In particular SGML doesn't force this structure to be very hierarchical. All this is a matter of the DTD. As most documents, including philosophical texts (at least those I have been confronted with when doing my philosophy) do have some hierarchical structure, a suitable DTD will probably also have some levels of hierarchy. In addition, however many flat elements differenciating various kinds of content can (and typically should) be provided by the DTD. For a philosophical text, e.g., we may well want a hierarchical structure providing things like chapters, sections and sub-sections. Then, within the sub-sections, however we have a sequence of paragraph mixed with other elements such as graphics, formulae, quotation blocks, examples, etc., in a non-hierarchical fashion. Within the paragraphs we find again a non- hierarchical arrangement of various elements (e.g., definition terms, greek terms, indexed passages, inline quotations, etc.). Furthermore the paragraphs can be classified according to their content, e.g., as introducing a new concept, elaborating on a concept already introduced, reproduction of an argument, refutation of an argument, etc using attributes. Other attributes could be used to assign keywords to paragraphs and higher level sections. So what your philosopher friend probably needs is just a DTD suited to his purposes. I can well imagine, that the DTDs he has found in typical books on SGML do not quite entice him too much. | Support for this view can be provided by analysing the evolution of an | author's manuscript from first to final draft. The changes that the | author makes do not necessarily follow structural boundaries. Ideas | can start in the middle of one paragraph and finish in the middle of | the next, or can be in unconnected fragments scattered about a piece of | text. Granted. Although I do believe that a good text (in particular a good philosophical text) should be structured and ordered in some rather canonical fashion. This is certainly a matter of taste (and maybe of philosophical belief). Still, what you nai]me here points on one more shortcoming in SGML: SGML does not allow for elements to overlap and it does not provide very good mechanisms to represent overlapping structures. Overlapping surely do exist (and do find their justification) in a large variety of circumstances. Various examples of this as well as some solutions (which are typically not to elegant given the inherent limitations of ISO 8879) have been presented in this news group over the years. Although to me this doesn't seem to be the most important desideratum it should maybe still be discussed in the context of possible changes and amendments to ISO 8879. Matthias Newsgroups: comp.text.sgml Date: 23 Oct 1993 12:48:22 UT From: "Philip Thrift" \ Organization: TI Central Research Laboratories, Dallas Message-ID: \ Subject: ebook reader I am looking for any available (PD or commercial) software that will present books originally in (un-marked-up) ASCII form in a visual book style in DOS/Windows. This software would be in two steps: A. markup the ASCII source (detecting chapter headings, etc.) B. present the tagged document in a book-like graphical window (something that looks like an opened book, with buttons for turning pages, etc.) Perhaps ToolBook could be used to make such an application, but I was wondering if there were something more direct. I include an example below. Philip Thrift NET: thrift@ra.csc.ti.com TI Central Research Laboratories TEL:(214) 995-7906 P.O. Box 655936 M.S. 134 FAX:(214) 995-2836 Dallas, TX 75265 ------------------------------------------------------------ Example: There are lots of books available in ASCII form, for example: DRACULA c 1897 by Bram Stoker CHAPTER 1 Jonathan Harker's Journal 3 May. Bistritz.__Left Munich at 8:35 P.M, on 1st May, arriving at Vienna early next morning; should have arrived at 6:46, but train was an hour late. Buda-Pesth seems a won- derful place, from the glimpse which I got of it from the ... A. would produce a marked-up version of the text (something like) \ \<AUTHOR = "Bram Stoker"> \<COPYRIGHT = 1897> \<CHAPTER = "Jonathan Harker's Journal"> 3 May. Bistritz.__Left Munich at 8:35 P.M, on 1st May, arriving at Vienna early next morning; should have arrived at 6:46, but train was an hour late. Buda-Pesth seems a won- derful place, from the glimpse which I got of it from the ... \</CHAPTER> B. would display the result of A. in a visual book-like display. </message> <message id="<ROBIN.93Oct23171120@utafll.utafll.uta.edu>" date="2960406680"> Newsgroups: comp.text.sgml Date: 23 Oct 1993 23:11:20 UT From: "Robin Cover" \<robin@utafll.uta.edu> Organization: UT Arlington Message-ID: \<ROBIN.93Oct23171120@utafll.utafll.uta.edu> Subject: ACM moves closer to SGML I forward this posting, possibly of interest to readers of comp.text.sgml: > From oclc-news@oclc.org Fri Oct 22 12:30:34 1993 > Date: Fri, 22 Oct 93 11:27:19 EDT > From: Marifay_Makssour@oclc.org FOR IMMEDIATE RELEASE FOR MORE INFORMATION CALL: Nita Dean, OCLC (614) 761-5002 Janet Nunn, IDI (614) 761-7262 OCLC AND IDI TO DEVELOP ELECTRONIC PUBLISHING SYSTEM FOR ACM DUBLIN, Ohio, Oct. 21, 1993--OCLC and its subsidiary, Information Dimensions, Inc. (IDI), have been selected to develop an electronic publishing system for ACM (the Association for Computing Machinery). The OCLC/IDI in-house electronic publishing system will integrate the various ACM publishing functions into a unified, automated system that will encompass the writing, editing, composition, production, archiving, and, eventually, distribution of documents and publications. ACM publishes an estimated 40,000 pages per year, including books, journals, conference proceedings, and internal publications. "High-quality print journals, magazines, and books will continue to play an important role in ACM's distribution of leading-edge thinking and knowledge about information technology," said Joseph S. DeBlasi, ACM executive director. "But having it all in electronic form will make everything more widely and readily available in a timely, selective, and even interactive manner. We believe an electronic publishing system will be especially important to ACM membership and to the association's future, maintaining as it will our position on the forefront of major developments in the field of information technology." K. Wayne Smith, president and chief executive officer, OCLC, stated: "This is an important project. It combines the strength of ACM's publishing program with OCLC/IDI's innovative approaches in electronic publishing. It underscores OCLC/IDI's commitment to add new, electronic dimensions to a publisher's existing program that will make it not only timely, but cost-effective and user-friendly." The OCLC/IDI approach will be based on open systems architecture, which will let ACM upgrade modules cost-effectively as technology advances. The approach will use Standard Generalized Markup Language (SGML); BASISplus, IDI's document database management system; and BASIS SGMLserver, IDI's new storage manager built to accept, query, retrieve, and manipulate SGML document components as separate objects. Using the new system, ACM editors will be able to receive documents from contributors in a variety of word-processing formats and enter them electronically into a working database where they can be edited, transmitted for review, and processed for composition and printing. The system will also enable the search of stored documents and the collection of documents on selected subjects. For example, a search on "parallel processing" could retrieve three chapters and four sections from seven different documents, which the system would then combine into a new document on parallel processing. The ACM system will be completed in 12 to 18 months. The ACM electronic publishing system will combine OCLC's experience with electronic publishing and user interfaces that can be operated without training, and IDI's database management systems that have been used in many different document management applications. IDI's BASISplus software is consistently rated the fastest, largest, most dependable, and most cost-effective document-database engine on the market. OCLC is the distributor of the _Online Journal of Current Clinical Trials_, under a joint venture with the American Association for the Advancement of Science. This fall OCLC is introducing the _Online Journal of Knowledge Synthesis for Nursing_ with Sigma Theta Tau, International Honor Society of Nursing, and an electronic version of _Electronics Letters_ with the Institution of Electrical Engineers. OCLC has also produced easy-to-use interfaces for The FirstSearch Catalog, an online reference service for library patrons, and DiscLit, a full-text database with bibliographic citations designed for literature students. ACM, founded in 1947, is the oldest and largest not-for-profit educational and scientific computer organization in the world. The association has upwards of 80,000 members internationally. ACM publishes refereed periodicals, newsletters, conference proceedings, books, and reference publications, including _Computing Reviews_, the _ACM Guide to Computing Literature_, and _The Graduate Assistantship Directory_. IDI is an international software company specializing in information management technology. Its software products are installed at over 2,200 sites worldwide. OCLC acquired IDI earlier this year. OCLC is a nonprofit computer library service and research organization whose computer network and services link more than 17,000 libraries in 52 countries and territories. (NC) -30- </message> <message id="<1993Oct23.233003.15408@falch.no>" date="2960407803"> Newsgroups: comp.text.sgml Date: 23 Oct 1993 23:30:03 UT From: Steve Pepper \<pepper@falch.no> Organization: Falch Hurtigtrykk as, Oslo, Norway Message-ID: <1993Oct23.233003.15408@falch.no> References: \<CFCown.91v@csc.ti.com> Subject: Re: ebook reader [Philip Thrift] | I am looking for any available (PD or commercial) software that will | present books originally in (un-marked-up) ASCII form in a visual book | style in DOS/Windows. This software would be in two steps: | | A. markup the ASCII source (detecting chapter headings, etc.) | B. present the tagged document in a book-like graphical window | (something that looks like an opened book, with buttons | for turning pages, etc.) Sounds like you need A. FastTAG (Avalanche) to do visual recognition of your ASCII file and produce an SGML instance, and B. DynaText (EBT) to create and present an electronic book. You'll find info on both products in Robin Cover's bibliography in the SGML archive at ftp.ifi.uio.no. Steve -- </(pepper)steve> pepper@falch.no ------------------------------------------------------------------ falch hurtigtrykk a.s, postboks 130 kalbakken, n-0902 oslo, norway tel +47 2216 3040 fax +47 2216 2350 </message> <message id="<2CCB0D6F@noak.vxo.telub.se>" date="2960464620"> Newsgroups: comp.text.sgml Date: 24 Oct 1993 15:17:00 UT From: Peter Bergstrom \<pebe@telub.se> Message-ID: <2CCB0D6F@noak.vxo.telub.se> Subject: SGML and STEP In the October 93 issue of SGML Users' Group Newsletter, there is a small posting in the "SGML Year in Review" regarding SGML/STEP activities going on in Working Group 14. Does anybody have any more information on the work undertaken by this group? Is there for example any ideas about connections between SGML and Express? -- Peter Bergstrom Telub Inforum AB pebe@telub.se </message> <message id="<2adm36$o4@figment.dircon.co.uk>" date="2960475282"> Newsgroups: comp.text.sgml Date: 24 Oct 1993 18:14:42 UT From: Bruce Hunter \<bruce@sgml.dircon.co.uk> Organization: The Direct Connection Ltd Message-ID: <2adm36$o4@figment.dircon.co.uk> References: \<CFCown.91v@csc.ti.com> <1993Oct23.233003.15408@falch.no> Subject: Re: ebook reader You should also take a look at Folio Views 3 from Folio Corporation : 2155 North Freedom Boulevard Suite 150 Provo UT 84604 USA Tel 1 800 543 6546 or send email to compuserve address 75060,1013 It is similar to Dynatext, though more restricted in terms of cross-platform support (it runs on a PC under DOS or Windows). It also currently lacks an API and developers tools. But it is also a lot less expensive (about �400 in the UK). It does not directly import SGML files, it has its own documented ASCII import/export format. But I am currently working on some tools to provide import/export utilities to/from SGML to Folio's flatfile format. I plan to post beta versions of these tools to the Folio Forum archive on Compuserve in the next couple of weeks. If anybody else is interested but does not have access to Compuserve, let me know and I'll see if I can find an FTP site also. These beta tools are the first stage in what will hopefully be a generalised SGML import utility for Folio Views. At present, they support import/export to SGML files conforming to a DTD I have created expressing the semantics of the Folio flatfile format. Truly generalised SGML import/export is the next step, and is under development. These beta tools are really just to prove the concept, but they are also useful in themselves for existing Folio users in that the export to SGML utility also cleans up existing flatfiles and allows them to be parsed with an SGML parser, picking out some errors which may otherwise be overlooked. I'm currently looking for beta testers who already have large Folio databases ("infobases" in Folio terminology) who would be willing to try these tools on as many existing files as possible and report back to me the results. Anybody out there interested? Thanks. Best wishes, Bruce Hunter SGML Systems Engineering bruce@sgml.dircon.co.uk [Editor's note: This is in response to Philip Thrift's request. \</E>] </message> <message id="<19931025.061614.294@almaden.ibm.com>" date="2960543293"> Newsgroups: comp.text.sgml Date: 25 Oct 1993 13:08:13 UT From: "Eliot Kimber" \<drmacro@vnet.IBM.COM> Message-ID: <19931025.061614.294@almaden.ibm.com> References: \<CFCown.91v@csc.ti.com> Subject: Re: ebook reader [Philip Thrift] | I am looking for any available (PD or commercial) software that will | present books originally in (un-marked-up) ASCII form in a visual book | style in DOS/Windows. This software would be in two steps: | | A. markup the ASCII source (detecting chapter headings, etc.) | B. present the tagged document in a book-like graphical window | (something that looks like an opened book, with buttons | for turning pages, etc.) You might also take a look at Lotus' SmarText system, which provides generic viewing function for a wide variety of word processor formats, leveraging off of Lotus Ami Pro's strong import facility. I'm not sure what the cost is, but I don't think it's too expensive. Unfortunately, SmarText does not currently provide direct support for import of SGML. -- Eliot Kimber Internet: drmacro@vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709 </message> <message id="<1993Oct25.095127.1@aa.wl.com>" date="2960545887"> Newsgroups: comp.text.sgml Date: 25 Oct 1993 13:51:27 UT From: "Dave Feryus" \<feryusd@aa.wl.com> Organization: Parke-Davis/Warner-Lambert Message-ID: <1993Oct25.095127.1@aa.wl.com> Subject: GCA SGML tutorial Does anyone have any experience/opinions with GCA SGML tutials? A user and implementor tutorial is coming up Nov. 15-19, and I need an introductory course SGML so that I can write and maintain DTD's. Any thoughts would be welcome. The course outline is : User Tutorial Day 1 (length-hh:mm and subject) 2:45 introduction 3:30 element declaration syntax and labs, Doctype and comment declaration syntax Day 2 1:15 Internal parameter and general entities 1:30 attributes and markup lab 2:00 more attributes and dtd reading 1:30 external identifiers, notation declarations, external special entities, and entity redifinition Day 3 1:45 alternative element content models, processing instructions, character references and marked sections 1:30 tag minimization, sgml declaration, review SGML implementor tutorial Day 3 3:30 document analysis tips with supporting examples and lab Day 4 1:15 dtd writing tips 3:30 element declaration writing exercises 1:30 special rules for dtd writing Day 5 1:15 dtd correction exercise 1:30 dtd writing exercise for tech manual 1:45 parsing and correcting tech manual dtd's -- | Dave Feryus | | Ann Arbor | </message> <message id="<1993Oct25.144441.13623@news.lrz-muenchen.de>" date="2960549081"> Newsgroups: comp.text.sgml Date: 25 Oct 1993 14:44:41 UT From: "Martin Josko" \<martin@pollux.edv.agrar.tu-muenchen.de> Organization: Leibniz-Rechenzentrum, Muenchen (Germany) Message-ID: <1993Oct25.144441.13623@news.lrz-muenchen.de> Subject: Please help: storing of SGML-documents in databases Hello, I am writing a diploma degree with the theme "The storage of SGML-documents in a relational full-text database". Therefore I use the relational component Transbase and the full-text component Myriad from Transaction Software in Munich. During my investigations I found the article "SGML*CASE The Storage of Documents in Databases" from Hans Schouten which is now the basis of my considerations. This article proposes a conceptional schema for storing SGML-documents in a relational database. The idea is to store the WHOLE document information (the document instance AND the DTD) in several relations without losing any information about the structure of the SGML-document. Based on this relational schema many applications are conceivable. In order to fill this database with information a very flexible parser is needed which producing an importable output. (I need the DTD in the output!) During my extensive investigations I have found information about a lot of SGML-parsers (freely available and commercial). There are already some interesting offers (e.g., Omnimark). Now to my requests: Does anybody have any experience in this field described above? Is there any free available and REALLY comforming software to this theme? Please help me, because otherwise my diploma becomes a mission in life for me! I thank you in advance. Best regards, -- Martin Josko Technical University Munich TU-Muenchen Department of Mathematics DVS-Weihenstephan Statistics and Data Processing D-85350 Freising Freising Germany email: martin@pollux.edv.agrar.tu-muenchen.de phone: +49-(0)8161-71-4506 fax: +49-(0)8161-71-4409 </message> <message id="<571.9310261125@brwbf.inmos.co.uk>" date="2960623552"> Newsgroups: comp.text.sgml Date: 26 Oct 1993 11:25:52 UT From: Ian Blythe \<MCUCOMM@isnet.inmos.co.uk> Message-ID: <571.9310261125@brwbf.inmos.co.uk> Subject: Emaths continued Just to add my input to the ongoing thread/war on Maths-happy E TEXT: My company is investigating the use of SGML as a potential long-term strategy for all our documentation. As a result we have been requesting information from all sources of SGML writers, reader, and parsers we can identify. One company we have received information from is GRIF S.A here in France. Their documentation shows that they can include both tables, mathematical formulae (and formulae in tables). _All_ in SGML form. For further information contact GRIF directly at: Immeuble "Le Florestan" 2, boulevard Vauban B.P. 266 St Quentin en Yvelines 78053 Cedex France Tel. +33 (1) 30 12 14 30 Fax: +33 (1) 30 64 06 46 (for non-French readers that is country code: 33, the (1) is for Paris.) (I'm an English ex-pat) I am in no way affiliated with GRIF S.A., other than having just read their literature, also this is _NOT_ an endorsement by my company. Ian Blythe -- i'net: MCUCOMM@isnet.inmos.co.uk CIS: 100116,3072 X400: C=GB, Admd=ATTMAIL, Pmd=SGS-THOMSON, OrgName=HPEDP, Name =Ian Blythe </message> <message id="<1993Oct26.132432.19505@titan.inmos.co.uk>" date="2960630672"> Newsgroups: comp.text.sgml Date: 26 Oct 1993 13:24:32 UT From: "Glenn Hill" \<glenn@cheetah.inmos.co.uk> Organization: INMOS Limited, Bristol, UK Message-ID: <1993Oct26.132432.19505@titan.inmos.co.uk> Subject: public DTDs I have seem various references to public and "standard" DTDs but have never seen such a beast published. Could someone please point me in the right direction for a standard DTD for a technical reference book. Much obliged. Glenn </message> <message id="<19931026.091926.395@almaden.ibm.com>" date="2960639343"> Newsgroups: comp.text.sgml Date: 26 Oct 1993 15:49:03 UT From: "Eliot Kimber" \<drmacro@vnet.IBM.COM> Message-ID: <19931026.091926.395@almaden.ibm.com> References: <1993Oct26.132432.19505@titan.inmos.co.uk> Subject: Re: public DTDs [Glenn Hill] | I have seem various references to public and "standard" DTDs but have | never seen such a beast published. | | Could someone please point me in the right direction for a standard DTD | for a technical reference book. The following document types are either available or under development. None are ISO or national standards (that I know of), but all are either accepted industry standards or trying to be industry standards. The list I'm aware of is (I've not listed contacts for these various DTDs because I don't have the information -- I hope responsible parties will speak up): o CALS 20081 The US DoD document type for things like aircraft maintenance manual. Defined as part of the CALS initiative. I'm not conversant with this DTD, but there are lots of people who are. All of the major SGML vendors have built-in support for the CALS DTD to one degree or another since this was the first major use of SGML in industry. o AAP DTDs These DTDs were developed by the American Association of Publishers for the purpose of supporting the publishing of all types of books and serials. I believe the AAP is in the process of revising the DTDs. o OSF-Book, OSF-REF These two DTDs were developed by the Open Software Foundationn for interchange of OSF product documentation among OSF members. Therefore they are intended primarily for computer software and hardware information, both conceptual and reference. o Docbook DTD The Docbook DTD is intended to support the publication of mass-market technical documentation such as published by O'Reilly and Associates, Prentice Hall, and the like. It is also being applied to computer technical documentation as well. o IBMIDDoc This DTD has been developed by IBM to support its computer documentation, hardware and software. It's primary design focus has been on re-usability and modularization of information, as well as maximizing the descriptive power of the language to facilitate high-function retrieval of information. IBMIDDoc, and its underlying architecture are both non-proprietary in the sense that we make the IBMIDDoc and InfoMaster specifications freely available and are encouraging anyone we can buttonhole to support the language. Like OSF-Book and Docbook, IBMIDDoc is intended to provide a common interchange mechanism for technical documentation. While the IBMIDDoc DTD is specifically focused on IBM's product documentation needs, it was designed from the first to be both modular and extensible. In addition, the InfoMaster architecture is intended to inform the development of concrete DTDs that share a common structural base and are therefore reliably transformable from one to the other, as well as processable by a common code base (in fact, I've already gone through the exercise of declaring the OSF-Book DTD as an InfoMaster-conforming application, which required only small modifications to two content models). For more information on IBMIDDoc or InfoMaster, contact either myself at the address below or Wayne Wohler, wohler@vnet.ibm.com. o J2008 DTD This is a DTD currently under development for the automotive industry to meet the new EPA requirements for delivery and interchange of automobile maintenance information. I don't know more about it than that. o ATA DTDs This is a similar effort by the airline industry to define a DTD for aircraft maintenance information (I think). Don't know the status here either. o Telecom DTD I believe the telecommunication's industry is working on DTD as well, but I haven't heard too much about it (I'd like to know more, as I am effectively the SGML expert for IBM's Networking Systems Division). Other DTDs that are available, but that are less applicable for reference documentation include the Hypertext Markup Language (HTML) defined by the World Wide Web for use with the Mosaic browser. At this time, HTML is more focused on presentation than content identification to make it easy to provide good real-time formatting from the markup directly. One of the reasons that there are so many different DTDs, and that there will continue to be so many, is that the power of SGML is the ability to create data management applications that support your data directly, much as you create relational databases that support your data directly. The difference between relational databases and SGML applications is that for most processing applied to SGML data, much of the processing is the same and many of the base constructs are the same. You find paragraphs, lists, phrases, and headings in just about every DTD. Thus there tends to be more commonality among differing SGML applications then there will tend to be in SQL applications, for example, which naturally leads to the thought that perhaps there can be One True Document Type that can be used by all. Of course, there can't be if you want your SGML-encoded data to relate directly to the products and services you provide or to the types of things you work with if you don't sell things for a living. My personal feeling, and the reason we developed the InfoMaster Architecture, is that we can capture the commonalities in architectures, freeing DTD designers to design precise application-specific DTDs while leveraging off of common enabling tools and with a reasonable degree of assurance that data in one DTD can be transformed into another that conforms to the same architecture with a minimum of loss (there will always be some, of course). This is the same logic that drove the development of the HyTime standard and we simply applied it to a different domain. -- Eliot Kimber Internet: drmacro@vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709 </message> <message id="<2ajqsj$nj1@server.cs.vt.edu>" date="2960649555"> Newsgroups: comp.text.sgml Date: 26 Oct 1993 18:39:15 UT From: "William Wake" \<wakew@jingluo.cs.vt.edu> Organization: Virginia Tech, Blacksburg, Virginia Message-ID: <2ajqsj$nj1@server.cs.vt.edu> References: <199310222037.AA07524@naggum.no> Subject: Re: ISO 8879 what changes are being considered? I've been following the back-and-forth exchanges over the need for "SGML Lite" for a while now. Our group has been using SGML (or SGML-like systems) for archival storage of articles and bibliographic data, and looking into using it for external representation of objects from our database. I'd really like to support an SGML Lite with these features: + single concrete syntax + no minimization, shortrefs, etc. + parsable by lex/yacc or its equivalent As an INTERCHANGE language, all these variants get in our way. They're certainly useful for someone doing retro-conversion, so I'm not arguing for eliminating them entirely. Last July, Eliot Kimber wrote: | I don't think it's fair to say that SGML is too complicated to | implement. The existence of many conforming SGML tools is proof of | that. From where I sit, the lack of SGML tools is very noticeable. I think at least part of the reason is the difficulty of parsing it. It's hard to "step lightly" into SGML. For a C or Pascal compiler, I can write a few pages of lex and yacc code, and have something that can syntactically verify programs. Because of all the variants, and the apparent intermixing of scanning and parsing, doing the equivalent for SGML is much more work. I would love to see a formally-recognized, easy-to-implement SGML subset, standardized so that there would be economic incentive to implement it. -- Bill Wake, Project Envision, Virginia Tech wakew@cs.vt.edu </message> <message id="<19931026.161447.271@almaden.ibm.com>" date="2960665220"> Newsgroups: comp.text.sgml Date: 26 Oct 1993 23:00:20 UT From: "Eliot Kimber" \<drmacro@vnet.IBM.COM> Message-ID: <19931026.161447.271@almaden.ibm.com> References: <199310222037.AA07524@naggum.no> <2ajqsj$nj1@server.cs.vt.edu> Subject: Re: ISO 8879 what changes are being considered? [William Wake] | I'd really like to support an SGML Lite with these features: | | + single concrete syntax | + no minimization, shortrefs, etc. | + parsable by lex/yacc or its equivalent Why not simply define your system-level SGML declaration so that it uses the RCS and doesn't use any of the optional features? If you don't need to validate the documents (which you shouldn't since you can use SGMLS, which is free, to do any validation you need to do), then you should be able to parse the documents with lex/yacc. You won't necessarily be able to accept unnormalized documents from others, but you can certainly meet your own requirements. The only bugaboo I can think of that this won't solve is the handling of empty elements (either with a declared content of EMPTY of or with a CONREF attribute specified). One way to handle this is to not allow DTDs that use empty elements and CONREF. | Last July, Eliot Kimber wrote: | >I don't think it's fair to say that SGML is too complicated to | >implement. The existence of many conforming SGML tools is proof of | >that. | | From where I sit, the lack of SGML tools is very noticeable. | I'm wondering what sort of tools you'd like to have that you don't have available today? Since SGMLS is freely available, that should solve your parsing needs, especially since its output is designed to be parsed with tools like lex and awk. -- Eliot Kimber Internet: drmacro@vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709 </message> <message id="<schwartz.751733066@lead17>" date="2960721866"> Newsgroups: comp.text.sgml Date: 27 Oct 1993 14:44:26 UT From: "Steven R. Schwartz" \<schwartz@rtsg.mot.com> Organization: Motorola Cellular Infrastructure Group Message-ID: \<schwartz.751733066@lead17> References: \<saucc00.751080631@roselin> <9310210215.memo.71059@BIX.com> Subject: Re: SGML and TeX [Jeffrey McArthur] | There are two approaches to typesetting SGML via TeX. The first is to | use a pre-processor to convert the SGML tagging into something a bit | more TeX friendly. The second is to set up TeX to run directly from | the SGML coding. What about an almost straight-through parser to convert SGML to LaTeX? It seems that this could consist of some very simple LEX & YACC code. Since the SGML philosophy is very similar to the LaTeX philosophy, it should be an easy route. Also, it would allow you to separate the grammer (DTD) from the formatting (LaTeX ".sty" file) while keeping them "collected" together (with the same basename, for example). Does anyone know if this has been done? Does anyone think I'm in "tooley-land" on this one? -- Steve Schwartz s.r.schwartz@ieee.org </message> <message id="<2am1qv$4pu@cnn.MOTOWN.GE.COM>" date="2960722207"> Newsgroups: comp.text.sgml Date: 27 Oct 1993 14:50:07 UT From: "Milt Benjamin" \<mbenjami@motown.ge.com> Organization: MM, Moorestown NJ Message-ID: <2am1qv$4pu@cnn.MOTOWN.GE.COM> Keywords: SGML CALS DTD FOSI Subject: SGML/CALS Job Market Much interest has developed in SGML/CALS job market. In general, one can say that: a) In the past few months, openings have increased dramatically. b) Many of these new openings have been on the east coast of the US, but others have been in Europe and Australia. c) Salaries range from $5OK to $100K US depending on years of experience, degree (if any) and current salary. d) The jobs are in the comercial, government, and publishing sectors. e) A large and rapidly growing literature has evolved on the subject with hundreds of new articles appearing regularly in technical journals. If you have comments or questions, call Ann at (908) 828-2155 US or Email. </message> <message id="<2am5g6$b05@server.cs.vt.edu>" date="2960725958"> Newsgroups: comp.text.sgml Date: 27 Oct 1993 15:52:38 UT From: "William Wake" \<wakew@jingluo.cs.vt.edu> Organization: Virginia Tech, Blacksburg, Virginia Message-ID: <2am5g6$b05@server.cs.vt.edu> References: <19931026.161447.271@almaden.ibm.com> Subject: Re: ISO 8879 what changes are being considered? [Eliot Kimber] | Why not simply define your system-level SGML declaration so that it | uses the RCS and doesn't use any of the optional features? That's basically what we've been doing (on an ad-hoc basis). I think a lot of people could use a minimal-level version though, and that it should be standardized formally. | I'm wondering what sort of tools you'd like to have that you don't | have available today? I guess the scarceness I perceive corresponds to the relative expense. For example, we'd like to do the equivalent of word processing (ala Author/Editor). In the non-SGML word processing arena, dozens if not hundreds of WYSIWYG editors are available from $0 to $50 on up. In SGML, all I know of on the low end is Author/Editor, which gets expensive if you want to run on workstations or develop your own DTDs. Similarly for parsing or conversion tools: there are several very expensive tools, and there may be a PD tool or two, but there seems to be nothing in the middle. | Since SGMLS is freely available, that should solve your parsing needs, | especially since its output is designed to be parsed with tools like | lex and awk. We mostly want to be tool users, not tool developers. It's a lot harder to work with something like SGMLS than to fill in a formatting form as Author/Editor does. === I really believe the difficulty of the concrete syntax in SGML has led to a lot of this. The abstract syntax is not that complex: essentially entity expansion then a tree of named nodes. But the interactions of variant concrete syntaxes and intermingling of scanning with parsing obscure this. Someone spoke earlier this summer of proving SGML parsers correct. This is certainly more talked about than done in the compilers field. Yet, you get a much higher level of confidence of correctness from looking at a yacc script for something than 50 equivalent pages of code. -- Bill Wake, Project Envision, Virginia Tech wakew@cs.vt.edu </message> <message id="<jsuttorCFKKF4.FKJ@netcom.com>" date="2960736735"> Newsgroups: comp.text.sgml Date: 27 Oct 1993 18:52:15 UT From: "Jeff Suttor" \<jsuttor@netcom.com> Message-ID: \<jsuttorCFKKF4.FKJ@netcom.com> Subject: FYI: ACM & SGML & OCLC [Editor's note: This is a duplicate of the article forwarded by Robin Cover on 1993-10-23, article 3292 as numbered on this list. \</E> Note: OCLC has also done some other interesting things with electronic publishing using SGML. > From oclc-news@oclc.org Fri Oct 22 12:30:34 1993 > Date: Fri, 22 Oct 93 11:27:19 EDT > From: Marifay_Makssour@oclc.org FOR IMMEDIATE RELEASE FOR MORE INFORMATION CALL: Nita Dean, OCLC (614) 761-5002 Janet Nunn, IDI (614) 761-7262 OCLC AND IDI TO DEVELOP ELECTRONIC PUBLISHING SYSTEM FOR ACM DUBLIN, Ohio, Oct. 21, 1993--OCLC and its subsidiary, Information Dimensions, Inc. (IDI), have been selected to develop an electronic publishing system for ACM (the Association for Computing Machinery). The OCLC/IDI in-house electronic publishing system will integrate the various ACM publishing functions into a unified, automated system that will encompass the writing, editing, composition, production, archiving, and, eventually, distribution of documents and publications. ACM publishes an estimated 40,000 pages per year, including books, journals, conference proceedings, and internal publications. "High-quality print journals, magazines, and books will continue to play an important role in ACM's distribution of leading-edge thinking and knowledge about information technology," said Joseph S. DeBlasi, ACM executive director. "But having it all in electronic form will make everything more widely and readily available in a timely, selective, and even interactive manner. We believe an electronic publishing system will be especially important to ACM membership and to the association's future, maintaining as it will our position on the forefront of major developments in the field of information technology." K. Wayne Smith, president and chief executive officer, OCLC, stated: "This is an important project. It combines the strength of ACM's publishing program with OCLC/IDI's innovative approaches in electronic publishing. It underscores OCLC/IDI's commitment to add new, electronic dimensions to a publisher's existing program that will make it not only timely, but cost-effective and user-friendly." The OCLC/IDI approach will be based on open systems architecture, which will let ACM upgrade modules cost-effectively as technology advances. The approach will use Standard Generalized Markup Language (SGML); BASISplus, IDI's document database management system; and BASIS SGMLserver, IDI's new storage manager built to accept, query, retrieve, and manipulate SGML document components as separate objects. Using the new system, ACM editors will be able to receive documents from contributors in a variety of word-processing formats and enter them electronically into a working database where they can be edited, transmitted for review, and processed for composition and printing. The system will also enable the search of stored documents and the collection of documents on selected subjects. For example, a search on "parallel processing" could retrieve three chapters and four sections from seven different documents, which the system would then combine into a new document on parallel processing. The ACM system will be completed in 12 to 18 months. The ACM electronic publishing system will combine OCLC's experience with electronic publishing and user interfaces that can be operated without training, and IDI's database management systems that have been used in many different document management applications. IDI's BASISplus software is consistently rated the fastest, largest, most dependable, and most cost-effective document-database engine on the market. OCLC is the distributor of the _Online Journal of Current Clinical Trials_, under a joint venture with the American Association for the Advancement of Science. This fall OCLC is introducing the _Online Journal of Knowledge Synthesis for Nursing_ with Sigma Theta Tau, International Honor Society of Nursing, and an electronic version of _Electronics Letters_ with the Institution of Electrical Engineers. OCLC has also produced easy-to-use interfaces for The FirstSearch Catalog, an online reference service for library patrons, and DiscLit, a full-text database with bibliographic citations designed for literature students. ACM, founded in 1947, is the oldest and largest not-for-profit educational and scientific computer organization in the world. The association has upwards of 80,000 members internationally. ACM publishes refereed periodicals, newsletters, conference proceedings, books, and reference publications, including _Computing Reviews_, the _ACM Guide to Computing Literature_, and _The Graduate Assistantship Directory_. IDI is an international software company specializing in information management technology. Its software products are installed at over 2,200 sites worldwide. OCLC acquired IDI earlier this year. OCLC is a nonprofit computer library service and research organization whose computer network and services link more than 17,000 libraries in 52 countries and territories. (NC) </message> <message id="<acg-sgml.751748524@access>" date="2960737383"> Newsgroups: comp.text.sgml Date: 27 Oct 1993 19:03:03 UT From: Debbie Lapeyre \<acg-sgml@access.digex.net> Message-ID: \<acg-sgml.751748524@access> Keywords: Poster, SGML, conference Summary: Call for posters for SGML '93 in Boston Subject: Call for Posters CALL FOR POSTERS SGML '93, the annual conference for the SGML technical community, will be held December 6-9 at the Sheraton Boston Hotel and Towers. Recent SGML conferences have been using a form of poster session to provide a forum for informal discussions. This year poster presentations will be a little different from previous years. Posters go up at the beginning of the conference and will be visible throughout the conference. Presenters will be asked to "man" their posters for only one of several designated poster presentation times, and to answer questions on the posters rather than give lectures. You will be informed of your presentation schedule closer to the conference. Think of this as a way to publish a very short article, and discuss it with interested people. For SGML '93 we will be providing attendees with bound conference proceedings at registration. In order to do this we must ask you to supply copies of your posters and a title and abstract of your poster presentation by November 8th. If you provide clean black-on-white copies of up to four posters in 8 1/2" by 11" format by November 8th the conference will: * Blow them up to poster size; * Transport them to the Conference site; and * Hang them for you. The title and abstract you provide will be included in the formal conference proceedings, and copies of your posters, if you provide them in time. We recommend that posters be clear, legible originals, with type no smaller than 10 point. Many people find that if they use type as small as 10 point and blow it up to poster size the posters are more attractive if they use two columns. Illustrations are a very effective way to communicate using the poster medium. We are providing several forums for vendors to describe and display their products, and ask that they refrain from doing product presentations during the technical program. If you feel that you must introduce your company, product, project, or problem in order to provide the context for your talk, please restrict that introduction to one poster, and concentrate on the problem, idea, or experience you are wish to convey. If you have any questions about posters, the program, or on your participation please feel free to call Debbie Lapeyre, Poster Chair. Thank you for your participation. Debbie Lapeyre, ATLIS Consulting Group - Poster Chair (301/816-4311) Yuri Rubinsky, SoftQuad - Conference Chair (416/239-4801) Tommie Usdin, ATLIS Consulting Group - Conference Co-chair (301/816-4307) Joy Blake, GCA - Conference Coordinator (703/519-8177) </message> <message id="<1993Oct27.230434.23747@news.cs.brandeis.edu>" date="2960751874"> Newsgroups: comp.text.sgml Date: 27 Oct 1993 23:04:34 UT From: John Lavagnino \<lav@binah.cc.brandeis.edu> Organization: Brandeis University Message-ID: <1993Oct27.230434.23747@news.cs.brandeis.edu> References: \<saucc00.751080631@roselin> <9310210215.memo.71059@BIX.com> \<schwartz.751733066@lead17> Subject: Re: SGML and TeX [Steven R. Schwartz] | What about an almost straight-through parser to convert SGML to LaTeX? | | It seems that this could consist of some very simple LEX & YACC | code.... I don't mean to sound negative about this and other approaches that have been aired here, but, for the benefit of those who perhaps are not aware of the whole range of possibilities, I'd like to point out that converting SGML to TeX or LaTeX is really quite a simple problem: 1) Get sgmls. 2) Write some translation rules for use with the sgmlsasp program that comes with sgmls. You can specify conversions for start- and end-tags, and there are simple ways to incorporate attribute values. 3) Set up your entity sets to translate the entity references to appropriate TeX constructs. 4) Run sgmls and sgmlsasp to convert SGML to TeX/LaTeX. That's all you really need to do. It doesn't require any programming even at the level of lex and yacc, and all the software is free. You can devise approaches that run faster, of course: you can develop a single program that does all this, where the above requires running two. Many suggestions seem to turn on skipping the validation stage, though, which is something I'd warn against. It's helpful in eliminating mysterious errors to a degree that is hard to imagine until you've experienced it. -- John Lavagnino Department of English and American Literature, Brandeis University </message> <message id="<SVATTIKU.93Oct27191137@future.sales.gba.nyu.edu>" date="2960752297"> Newsgroups: comp.text.sgml Date: 27 Oct 1993 23:11:37 UT From: "Shreekant Vattikuti" \<svattiku@future.sales.gba.nyu.edu> Organization: Stern School of Business, New York University Message-ID: \<SVATTIKU.93Oct27191137@future.sales.gba.nyu.edu> Subject: SGML and RTF I am currently working on a project for the U.S. Patent Office to convert patent applications created in Microsoft Word and other word processors/ page layout/formatters to the ST.32 DTD specification of SGML. Currently we are working on just RTF. A problem that we have encountered is that Equations are created in Word by a mini-application called Equation Editor. When the RTF format is used, two representations of the equation are saved, since it is treated like a embedded object. One is a picture format and the other is the actual data used by the Equation Editor. We need to interpret this information somehow so that we can tag it, but the problem is that we do not have the specification for the Equation Editor format and the picture format is only an image. Does anyone have a suggestion on how to tackle this? We cannot package any non-shareware/freeware software with our program, since the package will be widley distributed. -- "You egomaniacal idiot... ...You can't destroy this planet. You can't even come close."- Ian Malcolm, Jurassic Park(the book) Shreekant Vattikuti-Multiclass Wild Mage/Priest of Numbers and Thought svattiku@sales.nyu.edu </message> <message id="<19931027.164201.773@almaden.ibm.com>" date="2960753333"> Newsgroups: comp.text.sgml Date: 27 Oct 1993 23:28:53 UT From: "Eliot Kimber" \<drmacro@vnet.IBM.COM> Message-ID: <19931027.164201.773@almaden.ibm.com> References: <19931026.161447.271@almaden.ibm.com> <2am5g6$b05@server.cs.vt.edu> Subject: Re: ISO 8879 what changes are being considered? [Eliot Kimber] | Why not simply define your system-level SGML declaration so that it | uses the RCS and doesn't use any of the optional features? [William Wake] | That's basically what we've been doing (on an ad-hoc basis). I think a | lot of people could use a minimal-level version though, and that it | should be standardized formally. The restrictions represented by an SGML declaration that allows no options and uses the RCS *is* a formal definition, by definition, since 1) the RCS is defined in 8879 and 2) the SGML declaration is, by definition, the formal declaration of what your system supports. I don't see how it can get more formal than this. [Eliot Kimber] | I'm wondering what sort of tools you'd like to have that you don't | have available today? [William Wake] | I guess the scarceness I perceive corresponds to the relative expense. | For example, we'd like to do the equivalent of word processing (ala | Author/Editor). In the non-SGML word processing arena, dozens if not | hundreds of WYSIWYG editors are available from $0 to $50 on up. In | SGML, all I know of on the low end is Author/Editor, which gets | expensive if you want to run on workstations or develop your own DTDs. I figured it was probably editors. I'm surprised nobody's provided an Emacs shell or something. Maybe one of the SGML editor vendors would consider making a low-cost version of their product available? It certainly couldn't hurt the SGML market. It is true that the SGML industry is just starting to move from the mode where all the suppliers were startups largely feeding off the defense industry where they had to maximize returns, but the market is changing rapidly and there is clearly the need for lower-cost tools of useful function and quality. I wish I could do more to help, because I'd like to see people have access to affordable tools too. I should think that the increase in the use of SGML by academia should lead to the development of some affordable tools before too long. -- Eliot Kimber Internet: drmacro@vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709 </message> <message id="<TED.93Oct28000415@lole.crl.nmsu.edu>" date="2960755455"> Newsgroups: comp.text.sgml Date: 28 Oct 1993 00:04:15 UT From: "Ted Dunning" \<ted@crl.nmsu.edu> Message-ID: \<TED.93Oct28000415@lole.crl.nmsu.edu> References: \<jsuttorCFKKF4.FKJ@netcom.com> Subject: Re: FYI: ACM & SGML & OCLC [Jeff Suttor] | Note: OCLC has also done some other interesting things with electronic | publishing using SGML. | | > From oclc-news@oclc.org Fri Oct 22 12:30:34 1993 | > Date: Fri, 22 Oct 93 11:27:19 EDT | > From: Marifay_Makssour@oclc.org | | DUBLIN, Ohio, Oct. 21, 1993--OCLC and its subsidiary, Information | Dimensions, Inc. (IDI), have been selected to develop an electronic | publishing system for ACM (the Association for Computing Machinery). | | The OCLC/IDI in-house electronic publishing system will integrate the | various ACM publishing functions into a unified, automated system that | will encompass the writing, editing, composition, production, | archiving, and, eventually, distribution of documents and publications. It would be nice if this meant that some or all ACM publications would be available in electronic format. I would guess that they won't, though, due to the fact that the ACM gets more money for killing trees than burning coal to move electrons. </message> <message id="<1993Oct28.122841.26809@news.lrz-muenchen.de>" date="2960800121"> Newsgroups: comp.text.sgml Date: 28 Oct 1993 12:28:41 UT From: "Martin Josko" \<martin@pollux.edv.agrar.tu-muenchen.de> Organization: Leibniz-Rechenzentrum, Muenchen (Germany) Message-ID: <1993Oct28.122841.26809@news.lrz-muenchen.de> Subject: problems installing ASP on AIX 3.2 When I exactly follow the instructions for installation, I get this error message: >make generator .. .. "../sgml.g", line 23 : Sgml_document : is already a token .. .. Could somebody give me an advice what to do ? -- Martin Josko Technical University Munich TU-Muenchen Department of Mathematics DVS-Weihenstephan Statistics and Data Processing D-85350 Freising Freising Germany email: martin@pollux.edv.agrar.tu-muenchen.de phone: +49-(0)8161-71-4506 fax: +49-(0)8161-71-4409 </message> <message id="<19931028.070207.210@almaden.ibm.com>" date="2960805238"> Newsgroups: comp.text.sgml Date: 28 Oct 1993 13:53:58 UT From: "Wayne L. Wohler" \<wohler@vnet.IBM.COM> Message-ID: <19931028.070207.210@almaden.ibm.com> References: \<SVATTIKU.93Oct27191137@future.sales.gba.nyu.edu> Subject: Re: SGML and RTF [Shreekant Vattikuti] | I am currently working on a project for the U.S. Patent Office to | convert patent applications created in Microsoft Word and other word | processors/ page layout/formatters to the ST.32 DTD specification of | SGML. Currently we are working on just RTF. A problem that we have | encountered is that Equations are created in Word by a mini-application | called Equation Editor. When the RTF format is used, two | representations of the equation are saved, since it is treated like a | embedded object. One is a picture format and the other is the actual | data used by the Equation Editor. We need to interpret this | information somehow so that we can tag it, but the problem is that we | do not have the specification for the Equation Editor format and the | picture format is only an image. Does anyone have a suggestion on how | to tackle this? We cannot package any non-shareware/freeware software | with our program, since the package will be widley distributed. Have you considered NOT converting the information at all? I don't know what you are planning to do with the informaiton after you have it in SGML form but given that you have an image and the RTF, you might just leave it at that, at least for now. Use the image when you need to present the information, use Word to modify just the equations (admittedly this will require some filtering or other processing on the way in and out of Word). SGML certainly has the capability for refering to data encoded in other formats. Another unknown (to me) is if the DTD you are using has any language defined for representing mathematics equations? If it does not and in your system you must convert the math to an SGML representation, there are a couple math DTDs available. Do you know which you would like to use? -- Wayne L. Wohler Internet: wohler@vnet.ibm.com Dept G82/025Z IBMMAIL: USIB29WX@IBMMAIL Publishing Solutions Development Phone: 1-303-924-5943 IBM Corporation PO Box 1900 Boulder, Colorado 80301-9191 </message> <message id="<9310281400.AA10829@netcomsv.netcom.com>" date="2960805488"> Newsgroups: comp.text.sgml Date: 28 Oct 1993 13:58:08 UT From: Chet Ensign \<DOCCOE@IBIVM.IBMMAIL.COM> Message-ID: <9310281400.AA10829@netcomsv.netcom.com> References: \<CFCown.91v@csc.ti.com> <19931025.061614.294@almaden.ibm.com> Subject: Re: ebook reader [Eliot Kimber] | You might also take a look at Lotus' SmarText system... Unfortunately, | SmarText does not currently provide direct support for import of SGML. Thomas Rearick, the developer and product manager for SmarText, was at the Online '93 conference and he said that he was thinking about ways to put SGML smarts into SmarText. However, he was wondering aloud how much reason there was to do it -- either from a technical point of view or a marketing point of view. So there is a possibility that Lotus may commit to (or may already have committed to) adding some SGML support into the product. Tom is Tom_Rearick.LOTUS@crd.lotus.com if anybody wants to give him encouragement. Another product to check out is ISYS from Odyssey Development Group. Their address is: 650 S. Cherry Street Suite 220 Denver, Colorado 80222 (303) 394-0091 or (800) 992-4797 ISYS is a text indexing, retrieval and display program that runs on UNIX, DOS and Windows. It can read all the main PC word processing formats and every graphic format I can think of. Our International Department uses it as the tool for distributing marketing literature, fact sheets, newsletters and some manuals to our overseas offices. They're very happy with it. /chet -- Chet Ensign Information Builders, Inc. 212-736-6250 X4349 internet: doccoe@ibivm.ibmmail.com ibmmail: USUBUVMV@IBMMAIL compuserve: 73163,1414 </message> <message id="<CFM2Au.39n@dove.nist.gov>" date="2960806564"> Newsgroups: comp.text.sgml Date: 28 Oct 1993 14:16:04 UT From: "Bob Bagwill" \<bagwill@sst.ncsl.nist.gov> Organization: NIST Message-ID: \<CFM2Au.39n@dove.nist.gov> References: \<saucc00.751080631@roselin> <9310210215.memo.71059@BIX.com> \<schwartz.751733066@lead17> <1993Oct27.230434.23747@news.cs.brandeis.edu> Subject: Re: SGML and TeX Gary Houston's \<ghouston@stats.govt.nz> gf-0.39, which uses sgmls version 1.1, is pretty nice: gf is short for "general formatter", i.e., a program capable of formatting documents which conform to the ISO "general" document type definition (DTD). It can convert SGML documents conforming to a small number of DTDs into various output formats: LaTeX, ASCII, RTF and Texinfo. However not every output format can be generated for every DTD. It also does HTML, which is why I use it. With tkWWW, you've got a free, graphical (if not truly WYSIWYG), hypertext editing and printing system. -- Bob Bagwill rbagwill@nist.gov </message> <message id="<9310281443.AA19372@netcomsv.netcom.com>" date="2960807975"> Newsgroups: comp.text.sgml Date: 28 Oct 1993 14:39:35 UT From: Chet Ensign \<DOCCOE@IBIVM.IBMMAIL.COM> Message-ID: <9310281443.AA19372@netcomsv.netcom.com> Subject: MS fact sheet on SGML Here is something you might find interesting: exerpts from a fact sheet titled "Microsoft Word & The SGML Standard" that Microsoft has posted on Compuserve. The file is in the MS Word forum in LIB 2 and is named SGML.EXE. (It's hot off the presses, too. It was just put up.) First, the disclaimer. In the copyright statement, Microsoft says: "The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication." So MS hasn't committed to anything, although they do say that they plan to release the Windows version of the product in the first half of 1994. Having said that, here is the proposed MS SGML tool. They are calling it "SGML Author for Word" and their goals for it are: "* Make SGML easy, and make SGML authors more productive. * Allow end users to create SGML without knowing "the standard." * Allow MIS to configure the converter for any DTD." The product is a Word add-on. It willl require Word 6.0 or later. The product consists of two parts: a converter that is used by the writers, and a separate mapping application that is used by MIS. The product works by mapping styles to SGML mark up. For the end users, MS sees it working like this: "... end users simply construct their documents in Word as they normally would, except they must use styles for all formatting. To ensure that they use the styles appropriately, the users format according to an MIS provided style guide and set of Word templates. To create SGML, the user then saves the file as SGML just as they would export to any other file format. Once the user has chosen to save an SGML representation of the file, an ASCII text file is created which contains syntactically correct (i.e., parseable) SGML." The converter parses the instance after the conversion. If it encounters errors, "the converter may modify the Word file to ensure conformity to the DTD. For example, a DTD might have a \<list> element which required that there be at least two \<items> in the list. If the user had only created one list item, the converter would create a necessary, albeit empty, second item and inform the user of this fact. The results of any necessary modifications are returned to the user in the form of a new Word file which has been annotated to describe in Word terminology why the file was changed." I'd assume that this also means the converter -- or the mapping scheme created by the system's administrator -- will know how to identify the end points of elements, but that is not spelled out. In my experience, you can't assume that a switch in styles identifys the end of the element. As for setting up the converter, MS says: "To ensure that the desired result is achieved, the converter has to be pre- configured to create the appropriate SGML. This is done by creating a mapping file using a provided Mapping Application. This application is geared at the SGML knowledgeable individual, and it allows this individual to build specific mappings between Word templates (i.e., styles) and the structures in the SGML DTD. Where standard DTD's do exist (i.e., CALS), Microsoft will provide pre- assembled mapping files and templates. For customers who have built their own DTD's, they will need to use the mapping application to build corresponding templates and mapping files." That's the gist of it. Again, if you want to pick up a copy of the document, it is SGML.EXE in the Microsoft Word forum on Compuserve, Library 2. It is a self-extracting compressed file. Just execute it to get the Word document. /chet -- Chet Ensign Information Builders, Inc. 212-736-6250 X4349 internet: doccoe@ibivm.ibmmail.com ibmmail: USUBUVMV@IBMMAIL compuserve: 73163,1414 </message> <message id="<23221@villars.cme.nist.gov>" date="2960814274"> Newsgroups: comp.text.sgml Date: 28 Oct 1993 16:24:34 UT From: "Josh Lubell" \<lubell@lurch.cme.nist.gov> Organization: National Institute of Standards and Technology, Gaithersburg, MD Message-ID: <23221@villars.cme.nist.gov> Subject: Experience with Open Text Has anyone had any experience with Open Text Corporation's text database software? I am interested in any experiences people have had with their database engine, APIs, and/or client applications (viewers, etc.). Also, how bug-free is their software, how readable is their documentation, and how is customer support? My group is working on a project involving providing online access to technical standards groups of numerous standards documents. The documents will be stored as SGML text files. We want to provide the ability to search based on the documents' structure, as well as give users with write privileges the power to make modifications to sections of documents. An open systems architecture is very important to us. Any experiences would be helpful. </message> <message id="<2ap168$8ht@figment.dircon.co.uk>" date="2960822870"> Newsgroups: comp.text.sgml Date: 28 Oct 1993 18:47:50 UT From: Bruce Hunter \<bruce@sgml.dircon.co.uk> Message-ID: <2ap168$8ht@figment.dircon.co.uk> References: <1993Oct28.122841.26809@news.lrz-muenchen.de> Subject: Re: problems installing ASP on AIX 3.2 [Martin Josko] | When I exactly follow the instructions for installation, | I get this error message: | | >make generator | .. | .. | "../sgml.g", line 23 : Sgml_document : is already a token | .. | .. | | Could somebody give me an advice what to do ? Get sgmls? :-) </message> <message id="<1993Oct28.185905.18660@gdstech.grumman.com>" date="2960823545"> Newsgroups: comp.text.sgml Date: 28 Oct 1993 18:59:05 UT From: "Larry Beck" \<lab@gdstech.grumman.com> Organization: Grumman Data Systems-Bethpage Message-ID: <1993Oct28.185905.18660@gdstech.grumman.com> Subject: Intellitag I've just started to work with Intellitag to see if it has enough horsepower to handle a DoD Technical Manual. So far, results do not seem promising. Has anyone out there used this product. If so, I'd like to chat with you. Please respond to me via e-mail, not on the net. I'll be out of town next week, but I'll respond when I get back. Thanks; Larry Beck </message> <message id="<CFMrrG.FrE@tc.fluke.COM>" date="2960839559"> Newsgroups: comp.text.sgml Date: 28 Oct 1993 23:25:59 UT From: "Gary Benson" \<inc@tc.fluke.COM> Organization: Fluke Corporation, Everett, WA Message-ID: \<CFMrrG.FrE@tc.fluke.COM> References: <199310222037.AA07524@naggum.no> <2ajqsj$nj1@server.cs.vt.edu> Subject: Re: ISO 8879 what changes are being considered? [William Wake] | I've been following the back-and-forth exchanges over the need for | "SGML Lite" for a while now. : | I'd really like to support an SGML Lite with these features: | + single concrete syntax | + no minimization, shortrefs, etc. | + parsable by lex/yacc or its equivalent : | From where I sit, the lack of SGML tools is very noticeable. I think | at least part of the reason is the difficulty of parsing it. It's hard | to "step lightly" into SGML. For a C or Pascal compiler, I can write a | few pages of lex and yacc code, and have something that can | syntactically verify programs. Because of all the variants, and the | apparent intermixing of scanning and parsing, doing the equivalent for | SGML is much more work. | | I would love to see a formally-recognized, easy-to-implement SGML | subset, standardized so that there would be economic incentive to | implement it. Second. This is the most succint description I have yet read that illuminates what appears to be the major reason SGML acceptance is proceeding as it is. Apparently, only if you have the resources of an IBM or a consortium of some kind can you make an entry. Little development groups are locked out by this lack of a meaningful, official, upwardly-compatible subset. Someone once said that standardization is the first step in the process of shutting off innovation. SGML needs to keep fresh ideas coming, and SGML Lite could help. -- Gary Benson-_-_-_-_-_-_-_-_-_-inc@tc.fluke.com_-_-_-_-_-_-_-_-_-_-_-_-_-_- A successful tool is one that was used to do something undreamed of by its author. -S. C. Johnson </message> <message id="<1993Oct29.041308.27182@stats.govt.nz>" date="2960856788"> Newsgroups: comp.text.sgml Date: 29 Oct 1993 04:13:08 UT From: "Gary Houston" \<ghouston@stats.govt.nz> Organization: Statistics New Zealand Message-ID: <1993Oct29.041308.27182@stats.govt.nz> References: \<SVATTIKU.93Oct27191137@future.sales.gba.nyu.edu> <19931028.070207.210@almaden.ibm.com> Subject: Re: SGML and RTF [Shreekant Vattikuti] | I am currently working on a project for the U.S. Patent Office to | convert patent applications created in Microsoft Word and other word | processors/ page layout/formatters to the ST.32 DTD specification of | SGML. Currently we are working on just RTF. A problem that we have | encountered is that Equations are created in Word by a mini-application | called Equation Editor. When the RTF format is used, two | representations of the equation are saved, since it is treated like an | embedded object. One is a picture format and the other is the actual | data used by the Equation Editor. We need to interpret this | information somehow so that we can tag it, but the problem is that we | do not have the specification for the Equation Editor format and the | picture format is only an image. Does anyone have a suggestion on how | to tackle this? We cannot package any non-shareware/freeware software | with our program, since the package will be widley distributed. [Wayne L. Wohler] | Have you considered NOT converting the information at all? I don't | know what you are planning to do with the informaiton after you have it | in SGML form but given that you have an image and the RTF, you might | just leave it at that, at least for now. Use the image when you need | to present the information, use Word to modify just the equations | (admittedly this will require some filtering or other processing on the | way in and out of Word). SGML certainly has the capability for | refering to data encoded in other formats. Then you would lose the obvious benefit of SGML for this application. Ideally the Patent Office would simply declare their DTD for electronic submissions, and let private enterprise supply the tools. A policy such as "submissions may be made using the ST.32 DTD or in RTF with embedded equations in Microsoft undocumented format" would be absurd. </message> <message id="<22594.199310290904@mailhub.ggr.co.uk>" date="2960873580"> Newsgroups: comp.text.sgml Date: 29 Oct 1993 08:53:00 UT From: Sonia Illes \<shi4093@ggr.co.uk> Message-ID: <22594.199310290904@mailhub.ggr.co.uk> Subject: AWK for Windows We at Glaxo are currently investigating the adoption of SGML and in particular ways of moving RTF to SGML and vice versa. Does anyone know of, or have any details of AWK for Microsoft Windows and possibly any related AWK scripts they would be willing to let me have? Thanks very much. Sonia -- Sonia Illes Glaxo Group Research Ltd Tel: (+44) 81 966 2104 Internet:SHI4093@ggr.co.uk Fax: (+44) 81 423 4070 </message> <message id="<22831.smithn@orvb.saic.com>" date="2960878701"> Newsgroups: comp.text.sgml Date: 29 Oct 1993 10:18:21 UT From: "Norman E. Smith" \<smithn@orvb.saic.com> Message-ID: <22831.smithn@orvb.saic.com> Subject: RAST2SGM Last night, wrote RAST2SGM to convert the output of RAST.EXE back into normal SGML. RAST is the utility included with SGMLS that converts SGMLS output into a form closely matching SGML; I forget the official description of what RAST does. The idea is to start wtih a minimized SGML file, parse it with SGMLS, run the SGMLS output through RAST, then the RAST output through RAST2SGM and end up with a 'normalized' SGML file (with end tags expanded). I have wanted something like RAST2SGM for some time because I don't have any tools at home for editing SGML files. Taking advantage of minimization makes editing SGML files with a normal ASCII editor much less painful. I need end tags expanded because my SGML interpreter wants SGML files without minimization, hence RAST2SGM. I probably should have worked on the SGMLS output directly, but the RAST form seemed simpler to deal with at first glance. I also ignored attributes in this initial implementation. Bottom line is if there is interest, I'll upload the program to Eric for FTP access. Norm </message> <message id="<751896413snz@pira2.demon.co.uk>" date="2960885213"> Newsgroups: comp.text.sgml Date: 29 Oct 1993 12:06:53 UT From: "Francis Cave" \<cave@pira2.demon.co.uk> Organization: Pira International Message-ID: <751896413snz@pira2.demon.co.uk> References: <2aqudf$ele@figment.dircon.co.uk> Subject: Re: RAST2SGM [Bruce Hunter] | And that's about all there is to it. A sample from a replacement file | I've used in the past is | | \<fd> "\<fd>\\n" | \</fd> "\</fd>\\n" | \<fen> "\<fen lp=\\"[lp]\\">" | \<fi> "\<fi>" | \<fig> "\<fig>\\n" | \<figblk> "\<figblk>\\n" | \</figblk> "\</figblk>\\n" | \<fl> "\<fl>" | \<fr> "\<fr>" | \<ftnote> "\<ftnote id=\\"[id]\\">\\n" | \<ftntref> "\<ftntref ref=\\"[ref]\\">" What if the attribute value is #IMPLIED? Can anyone suggest how to obtain valid output in this case? -- Francis Cave Pira International Randalls Road Leatherhead KT22 7RU United Kingdom Tel +44 372 376161 Fax +44 376 377526 email cave@pira2.demon.co.uk </message> <message id="<2ar1vgINNsp9@rs18.hrz.th-darmstadt.de>" date="2960886192"> Newsgroups: comp.text.sgml Date: 29 Oct 1993 12:23:12 UT From: "Joachim Schrod" \<schrod@iti.informatik.th-darmstadt.de> Organization: TH Darmstadt, FG Systemprogrammierung Message-ID: <2ar1vgINNsp9@rs18.hrz.th-darmstadt.de> References: \<saucc00.751080631@roselin> <9310210215.memo.71059@BIX.com> \<schwartz.751733066@lead17> <1993Oct27.230434.23747@news.cs.brandeis.edu> Subject: Re: SGML and TeX [Steven R. Schwartz] | What about an almost straight-through parser to convert SGML to LaTeX? | | It seems that this could consist of some very simple LEX & YACC | code.... [John Lavagnino] | I don't mean to sound negative about this and other approaches that | have been aired here, but, for the benefit of those who perhaps are not | aware of the whole range of possibilities, I'd like to point out that | converting SGML to TeX or LaTeX is really quite a simple problem: | | 1) Get sgmls. | | 2) Write some translation rules for use with the sgmlsasp program | that comes with sgmls. You can specify conversions for start- and | end-tags, and there are simple ways to incorporate attribute | values. | | 3) Set up your entity sets to translate the entity references to | appropriate TeX constructs. | | 4) Run sgmls and sgmlsasp to convert SGML to TeX/LaTeX. | | That's all you really need to do. It doesn't require any programming | even at the level of lex and yacc, and all the software is free. I second that. And you might want to look at the software from the QWERTZ project, which used sgmls for the transformation. They provide DTDs for the most common LaTeX styles and the mapping software. Available from all good SGML archives under the name Format. E.g., from ftp.th-darmstadt.de [130.83.55.75] directory pub/text/sgml/Format Don't reinvent the wheel -- share and enjoy. -- Joachim =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Joachim Schrod Email: schrod@iti.informatik.th-darmstadt.de Computer Science Department Technical University of Darmstadt, Germany </message> <message id="<19931029.065225.214@almaden.ibm.com>" date="2960886525"> Newsgroups: comp.text.sgml Date: 29 Oct 1993 12:28:45 UT From: "Eliot Kimber" \<drmacro@vnet.IBM.COM> Message-ID: <19931029.065225.214@almaden.ibm.com> References: <9310281443.AA19372@netcomsv.netcom.com> Subject: Re: MS fact sheet on SGML [Chet Ensign] | They are calling it "SGML Author for Word" and their goals for it are: | | "* Make SGML easy, and make SGML authors more productive. | * Allow end users to create SGML without knowing "the standard." | * Allow MIS to configure the converter for any DTD." | | The product is a Word add-on. It willl require Word 6.0 or later. | | The product consists of two parts: a converter that is used by the | writers, and a separate mapping application that is used by MIS. The | product works by mapping styles to SGML mark up. | | For the end users, MS sees it working like this: | | "... end users simply construct their documents in Word as they | normally would, except they must use styles for all formatting. To | ensure that they use the styles appropriately, the users format | according to an MIS provided style guide and set of Word templates. | To create SGML, the user then saves While it's certainly encouraging to see Microsoft move toward SGML in their products, it's somewhat frustrating to see yet another word processor vendor promulgate what should now be a thoroughly discredited idea that users can "simply construct their documents ... as they normally would." Every experience I've had or seen with this sort of approach suggests that it is almost impossible for writers, on their own, to follow the SGML- imposed structural constraints sufficiently to allow reliable transform out to SGML. I think that to suggest otherwise demonstrates both a lack of understanding of the practical and potential uses of SGML, and irresponsibly raises the expectation level of people beyond what the tools can reasonably provide. There is, in theory, no reason that an SGML-based authoring system cannot be just as easy to use and just as presentationally rich as any of the current word processing tools--the only difference between a flat word processor and an SGML system is the presense of structural constraints and a separation of style information from SGML-encoded content. I think that Frame has at least the right idea in thinking they can make a structured editor that is every bit as functional as their unstructured product -- the only mistake Frame made was they didn't put quite enough SGML knowledge into their internal data structures--otherwise they appear to have it about right--the other vendors can learn a lot from Frame's experience. It may be that taking the easy way is the only way that word processing vendors can make the move to SGML, and if so, I can accept it, but I would like to see the vendors position their tools as short-term tactical moves in a longer-term plan to provide true SGML functionality. The impression I get from Microsoft's announcement is that they see SGML as just another export format, and if that's all it ever is, my users will never have a compelling use for their product. -- Eliot Kimber Internet: drmacro@vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709 "But Ranger Doug, can't we just use some proprietary data format instead of this SGML stuff?" "Sure Slim, that would be the easy way, but it wouldn't be the Cowboy Way." </message> <message id="<JJC.93Oct29134544@jclark.jclark.com>" date="2960891144"> Newsgroups: comp.text.sgml Date: 29 Oct 1993 13:45:44 UT From: "James Clark" \<jjc@jclark.com> Organization: None, London, England Message-ID: \<JJC.93Oct29134544@jclark.jclark.com> References: <22831.smithn@orvb.saic.com> Subject: Re: RAST2SGM [Norman E. Smith] | Last night, wrote RAST2SGM to convert the output of RAST.EXE back into | normal SGML. RAST is the utility included with SGMLS that converts | SGMLS output into a form closely matching SGML; I forget the official | description of what RAST does. | | The idea is to start wtih a minimized SGML file, parse it with SGMLS, | run the SGMLS output through RAST, then the RAST output through | RAST2SGM and end up with a 'normalized' SGML file (with end tags | expanded). It is not possible to write a program that - only uses the RAST output (ie the ESIS information) as input, and - works with SGML documents conforming to arbitrary DTDs, and - produces an instance that together with the original prolog will be an equivalent (in the sense of having identical ESIS information), conforming SGML document. There are several reasons for this: - the end tag for elements whose declared content is EMPTY must be omitted even if the SGML declaration specifies OMITTAG NO, but ESIS does not tell you whether the declared content of an element has been declared as EMPTY or just happens to be empty. - the end tag for elements with an explicit content reference must be omitted, but ESIS does not tell you whether an attribute was declared as CONREF. - for internal SDATA entities, ESIS only gives you the replacement text and not the entity name. - if you have a processing instruction that resulted from a reference to a PI entity, then the processing instruction could contain a PIC delimiter and thus could not be entered directly, but ESIS does not tell you the name of the entity. It would also be tricky to get record ends right. James Clark jjc@jclark.com </message> <message id="<2aqudf$ele@figment.dircon.co.uk>" date="2960908655"> Newsgroups: comp.text.sgml Date: 29 Oct 1993 18:37:35 UT From: Bruce Hunter \<bruce@sgml.dircon.co.uk> Message-ID: <2aqudf$ele@figment.dircon.co.uk> References: <22831.smithn@orvb.saic.com> Subject: Re: RAST2SGM Hi Norm, If you just wanted a normalized SGML file, I'm puzzled why you didn't just run SGMLSASP on your SGMLS output. This would have handled all the attributes as well. The drawback of using SGMLSASP is that the SGMLS doc doesn't describe the ASP replacement file format. If you don't have any doc or papers on ASP you're a bit stuck. For those that don't, following this is a description of the necessary replacement file format. It can be laborious to create this replacement file manually, and in the past I've normally knocked up some quick awk scripts to create a default replacement file which just normalises the input file (i.e., outputting all start tags with attributes and all end tags). Thinking about it now, it might be useful to make this a bit more generalised and make it available as an exe file so that anyone can then use SGMLSASP as a simple normalizer. If there's any interest in this I'll have a go at it over the weekend and make it available via the ftp sites. ASP REPLACEMENT FILE FORMAT The ASP relacement file is a simple mapping file between the start and end tags found in the input file and a replacement string, such as \<para> "\<para>" \</para> "\</para>" Tags that are not defined in the replacement file are mapped to the empty string and do not appear in the output. You can also use the normal C escape mechanism to include other characters, such as \<para> "\<para>\\n" Attribute values are accessed by specifying their name in square brackets, such as \<para> "\<para id=\\"[id]\\">" And that's about all there is to it. A sample from a replacement file I've used in the past is \<fd> "\<fd>\\n" \</fd> "\</fd>\\n" \<fen> "\<fen lp=\\"[lp]\\">" \<fi> "\<fi>" \<fig> "\<fig>\\n" \<figblk> "\<figblk>\\n" \</figblk> "\</figblk>\\n" \<fl> "\<fl>" \<fr> "\<fr>" \<ftnote> "\<ftnote id=\\"[id]\\">\\n" \<ftntref> "\<ftntref ref=\\"[ref]\\">" Hope this is of some use. Best wishes, Bruce Hunter SGML Systems Engineering bruce@sgml.dircon.co.uk [Editor's note: This is in response to Norm Smith's message. \</E>] </message> <message id="<9310292003.AA14113@netcomsv.netcom.com>" date="2960913564"> Newsgroups: comp.text.sgml Date: 29 Oct 1993 19:59:24 UT From: Chet Ensign \<DOCCOE@IBIVM.IBMMAIL.COM> Message-ID: <9310292003.AA14113@netcomsv.netcom.com> Subject: A way around RTF I'm posting this because of the recent discussions about converting RTF. We have played around with converting RTF files for a while, but we were always leery about putting the programs we developed into wholesale use. In particular, because of the fear that we might run into some unexpected situation where content was tossed out during the final RTF clean-up. Tim Benthall, the VP of my division and a real Word hacker, came up with a simple solution. He wrote a Word Basic macro that takes whatever style name is applied to the text and prints it at the start of the line in between < and >. When the Word file is saved as text only, these styles are saved out with it and we bypass RTF completely. The nugget Word Basic commands are: a$ = StyleName$() b$ = "<" + a$ + ">" Insert b$ The first command assigns the style name to variable a. The second assigns \<stylename> to b. The third inserts the contents of b into the text at the insertion point. I'm not going to say that this gives you anything close to a parsable instance. But, depending on how well styles have been used to format the Word file, it can give you some explicit tags to further process. I'm working on a template now that writers outside our dept. will use. It has an explicit set of style names, including stop-styles, and that, combined with this technique, will make the process of converting from Word into our markup much easier. /chet -- Chet Ensign Information Builders, Inc. 212-736-6250 X4349 internet: doccoe@ibivm.ibmmail.com ibmmail: USUBUVMV@IBMMAIL compuserve: 73163,1414 </message> <message id="<19931029.150951.30@almaden.ibm.com>" date="2960920557"> Newsgroups: comp.text.sgml Date: 29 Oct 1993 21:55:57 UT From: "Wayne L. Wohler" \<wohler@vnet.IBM.COM> Message-ID: <19931029.150951.30@almaden.ibm.com> References: \<SVATTIKU.93Oct27191137@future.sales.gba.nyu.edu> <19931028.070207.210@almaden.ibm.com> <1993Oct29.041308.27182@stats.govt.nz> Subject: Re: SGML and RTF [Wayne L. Wohler] | Have you considered NOT converting the information at all? I don't | know what you are planning to do with the informaiton after you have it | in SGML form but given that you have an image and the RTF, you might | just leave it at that, at least for now. Use the image when you need | to present the information, use Word to modify just the equations | (admittedly this will require some filtering or other processing on the | way in and out of Word). SGML certainly has the capability for | refering to data encoded in other formats. [Gary Houston] | Then you would lose the obvious benefit of SGML for this application. | Ideally the Patent Office would simply declare their DTD for electronic | submissions, and let private enterprise supply the tools. A policy | such as "submissions may be made using the ST.32 DTD or in RTF with | embedded equations in Microsoft undocumented format" would be absurd. It is possible my remarks were misinterpreted ... the 'information' I was suggesting not be converted was the equation data, not all of the data found in RTF. While it is distasteful to leave the equation data in such a form (even supplemented with image), there is by no means a consensus regarding what the 'best' form for maths formats is and the selection of a format depends greatly on the specific conditions that the original article writer is confronted with. Leaving equations in a format for which tools exist to edit and print may be the 'best' solution for now. Stating my position more generally, it is not necessary to convert every aspect of incoming data to SGML to realize the benefits of using SGML in general. Surely you aren't saying that to have the equations from patents encoded in RTF and image negates these benefits. -- Wayne L. Wohler Internet: wohler@vnet.ibm.com Dept G82/910M IBMMAIL: USIB29WX@IBMMAIL Publishing Solutions Development Phone: 1-303-924-0470 IBM Corporation PO Box 1900 Boulder, Colorado 80301-9191 </message> <message id="<saucc00.751938273@roselin>" date="2960927073"> Newsgroups: comp.text.sgml Date: 29 Oct 1993 23:44:33 UT From: Christian Saucier \<saucc00@DMI.USherb.CA> Organization: Universite de Sherbrooke -- departement de Mathematiques et d'Informatique Message-ID: \<saucc00.751938273@roselin> References: \<saucc00.751080631@roselin> <9310210215.memo.71059@BIX.com> \<schwartz.751733066@lead17> <1993Oct27.230434.23747@news.cs.brandeis.edu> Subject: Re: SGML and TeX [John Lavagnino | I don't mean to sound negative about this and other approaches that | have been aired here, but, for the benefit of those who perhaps are not | aware of the whole range of possibilities, I'd like to point out that | converting SGML to TeX or LaTeX is really quite a simple problem: Hmmm... I don't know if it would be that simple for a complex DTD like IBMIDDOC. The company I work for is currently using IBM BookMaster GML and is going towards IDDOC (a kind of SGMLized & updated version of BookMaster GML). This is a quite complex DTD with complex tables support, syntax diagrams & others... I was thinking that TeX would be a good formatter for this. Tough I'm not sure is sgmlasp would be powerful enough for such a process. Have any of you worked with Omnimark? It's supposed to be a powerful (commercial) product that can convert files from/to different formats. C. </message> <message id="<2atoou$lhv@figment.dircon.co.uk>" date="2960973377"> Newsgroups: comp.text.sgml Date: 30 Oct 1993 12:36:17 UT From: Bruce Hunter \<bruce@sgml.dircon.co.uk> Message-ID: <2atoou$lhv@figment.dircon.co.uk> References: \<SVATTIKU.93Oct27191137@future.sales.gba.nyu.edu> Subject: Re: SGML and RTF [Shreekant Vattikuti] | A problem that we have encountered is that Equations are created in | Word by a mini-application called Equation Editor. When the RTF format | is used, two representations of the equation are saved, since it is | treated like a embedded object. One is a picture format and the other | is the actual data used by the Equation Editor. We need to interpret | this information somehow so that we can tag it, but the problem is that | we do not have the specification for the Equation Editor format and the | picture format is only an image. Does anyone have a suggestion on how | to tackle this? I have been following the discussion on this thread, and I agree with Wayne's suggestion that it is not always necessary, or best, to convert *all* data into an SGML form. Equations cause problems, which have been discussed at length on this group previously, for which there is no simple or universal solution. But if you really do want to convert them into some SGML-coded form, and are willing to put in the effort required, I have a suggestion. The equation editor in Word is a cut-down version of MathType, from: Design Science Inc 4028 Broadway Long Beach Ca 90803 Tel (310) 433 0685 You will need to get hold of a copy of this. Replace the equation editor that comes with Word by MathType. Open up a document. When you come to an equation click on it to load it into MathType. Set the clipboard paste format to TeX. Paste the equation to the clipboard. Save the clipboard to a file. What you will have in this file is a plain Tex representation of the equation, preceded by a load of code which is specific to MathType. When MathType re-reads this equation it in fact reads this code and ignores the TeX. You now have two choices. You can use the TeX representation as input to a conversion program you'll need to create to turn this into the SGML-coded equation form of your choice. Or you could contact MathType and ask for a description of the format of their coding scheme and work on that. In a project I worked on earlier this year I investigated the latter, as a customer was looking for a bi-directional conversion between equations from MathType to the ArborText equation DTD fragment. The people at MathType were generally helpful and did offer to make available a description of their coding scheme. However, my customer dropped the idea so I didn't pursue it any more. The MathType people did say that they were looking closely at SGML, but were waiting for some concensus on what was the preferred DTD fragment to use (I fear they'll be waiting for some time!). So something can be done if you want, but it will involve a lot of effort. Best wishes, -- Bruce Hunter SGML Systems Engineering bruce@sgml.dircon.co.uk </message> <message id="<CFpsAK.IM0@ucdavis.edu>" date="2960980220"> Newsgroups: comp.text.sgml Date: 30 Oct 1993 14:30:20 UT From: "Greg Shenaut" \<fzshenau@dale.ucdavis.edu> Organization: University of California, Davis Message-ID: \<CFpsAK.IM0@ucdavis.edu> Subject: SGML FAQ sought Is there a FAQ for this group? I would appreciate an ftp pointer. -- Greg Shenaut -- gkshenaut@ucdavis.edu </message> <message id="<19931030.123938.101@almaden.ibm.com>" date="2961001203"> Newsgroups: comp.text.sgml Date: 30 Oct 1993 20:20:03 UT From: "Eliot Kimber" \<drmacro@vnet.IBM.COM> Message-ID: <19931030.123938.101@almaden.ibm.com> References: \<saucc00.751080631@roselin> <9310210215.memo.71059@BIX.com> \<schwartz.751733066@lead17> \<saucc00.751938273@roselin> Subject: Re: SGML and TeX [John Lavagnino] | I don't mean to sound negative about this and other approaches that | have been aired here, but, for the benefit of those who perhaps are not | aware of the whole range of possibilities, I'd like to point out that | converting SGML to TeX or LaTeX is really quite a simple problem: [Christian Saucier] | Hmmm... I don't know if it would be that simple for a complex DTD like | IBMIDDOC. The company I work for is currently using IBM BookMaster GML | and is going towards IDDOC (a kind of SGMLised & updated version of | BookMaster GML). This is a quite complex DTD with complex tables | support, syntax diagrams & others... | | I was thinking that TeX would be a good formatter for this. Tough I'm | not sure is sgmlasp would be powerfull enough for such a process. There are many complex aspects of IBMIDDoc that cannot be handled by a simple straight-through sort of process. This is because IBMIDDoc was designed from the point of view that an SGML document is, ultimately, a data base from which data is retrieved for the purpose of creating the output. In a pure retrieval environment, you are not constrained by things like source ordering. IBMIDDoc was thus designed to be implemented using database retrieval technologies rather than more traditional straight translators. However, the IBMIDDoc language is also designed so that it can (within some limits) be used to represent traditional document structures and you could define a relatively simple processor that would work with a subset of valid IBMIDDoc documents. IBMIDDoc is certainly not an isolated case -- I'm starting to see more and more DTDs that are taking a more database-oriented approach, using SGML to express real-world structures and relationships rather than composition system data structures. I'm just about to "complete" the first phase of my IBMIDDoc to BookMaster GML transform. I'm up to about 1300 lines of REXX written on top of the IBM SGML Translator. My guess is that it would be about the same volume of OmmiMark code. (If anyone is ever interested in implementing an OmniMark IBMIDDoc/InfoMaster Architecture processor, I'd be happy to share my algorithms.) Note that the IBMIDDoc language, in the abstract, has nothing to say about presentation style, so it makes no inherit demands in terms of composition function, which means that TeX is certainly an appropriate tool to use for rendering printed instances of IBMIDDoc documents--it's just that you probably wouldn't want to write the database processing function in TeX as well (not that TeX isn't up to it. For example, IBMIDDoc's bibliography list generation function could probably be mapped onto BibTex without too much trouble). -- Eliot Kimber Internet: drmacro@vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-254-5160 IBM Corporation Research Triangle Park, NC 27709 "But Ranger Doug, can't we just use some proprietary data format instead of this SGML stuff?" "Sure Slim, that would be the easy way, but it wouldn't be the Cowboy Way." </message> <message id="<2auihc$5n0@bbs.pnl.gov>" date="2961001452"> Newsgroups: comp.infosystems.www,comp.text.sgml Date: 30 Oct 1993 20:24:12 UT From: "David E Bernholdt" \<gg502@fermi.pnl.gov> Organization: Battelle - Pacific Northwest Laboratories Message-ID: <2auihc$5n0@bbs.pnl.gov> Subject: Program output --> hypertext? I work in a field where we often run long calculations which produce a great deal of detailed output. As we run larger and larger problems, it becomes increasingly unwieldy to browse the output, or to locate specific items of data. It occurs to me that navigation through these outputs might be facilitated by setting it up as a hypertext document and producing a table of contents or index linking to appropriate sections of the output. I would like to talk to anyone who has tried things along this line who is willing to share their experiences, or provide pointers to literature on the subject. Thanks in advance. -- David E. Bernholdt, MSIN K1-90 | Email: de_bernholdt@fermi.pnl.gov Molecular Science Research Center | Phone: 509 375 4387 Pacific Northwest Laboratory, P.O.B. 999 | Fax: 509 375 6631 Richland, WA 99352 | I speak only for myself! </message> <message id="<19931031.001@sfo.naggum.no>" date="2961103472"> Newsgroups: comp.text.sgml Date: 01 Nov 1993 00:44:32 UT From: Erik Naggum \<erik@naggum.no> Message-ID: <19931031.001@sfo.naggum.no> Subject: About comp.text.sgml This messages contains some information about the newsgroup, and assorted services related to the newsgroup. If you are about to ask _the_ most frequently asked question on this newsgroup, "where is the FAQ?", you will find the answer below. It would be helpful if people could give some indication of what they are looking for when they ask this question. The newsgroup's topic is "ISO 8879 SGML, structured documents, markup languages," as it was defined when the newsgroup was created by Edward Vielmetti in September of 1990. Ed sent out the Call for Votes on 1990-08-02, but I have lost track of the actual creation date; it was probably the first week of September, 1990. The earliest trace of e-mail about comp.text.sgml that I have seen dates back to 1990-01-20, so if we let history begin with the year 1990, we are not too much off. The Standard Generalized Markup Language (SGML) itself was created by Charles Goldfarb, and through an excruciatingly slow and painful process known as "standardization", the specification for the language was published by the International Organization for Standardization, ISO, in 1986 as ISO 8879:1986, full title "Information processing -- Standard Generalized Markup Language (SGML)". A few fixes to the specification were published in 1988, called Amendment 1, ISO 8879/A1:1988. Charles Goldfarb's work dates back to the late 1960's, with the very successful language GML, for Generalized Markup Language, or Goldfarb, Mosher, and Lorie. More on the history of the language can be found in the article "A Brief History of the Development of SGML", available in the SGML Handbook, the reference for which is near the end of this message. In the first few months of its existence, sufficiently interesting material had been posted to the newsgroup that I wanted an archive of the newsgroup. This was at the time when I was trying to figure out what the "Entity end" really meant, so if anyone can attest to the usefulness of USENET as a means of learning and open communication between interested people, it is me. The first article in the archive is dated 1991-03-20, although almost 300 articles had appeared in the newsgroup before that time. Systematic archiving did not start until early April 1991, but from then on, every article that has appeared on the newsgroup, with the exception of canceled articles, have been archived. The archive is found at FTP.IFI.UIO.NO in /pub/SGML/comp.text.sgml. There are two trees: by.date and by.msgid, so that retrieval can be by these two keys. The first article is "by.date/1991/03/20/114250.Lennestal" and also known as "by.msgid/3635@lulea.telesoft.se". The time format in the by.date tree should be obvious. In addition to anonymous FTP, the archive can be accessed through WAIS source "comp.text.sgml.src". The SGML archive contains more than the newsgroup archive. At the time of writing, it contains the ARC SGML parser by Charles Goldfarb in the directory "ARC-SGML", the Amsterdam SGML Parser in "ASP-SGML", some information about the Computer-assisted Acquisition and Logistics Support standards from the DoD in "CALS", a list of ISO 10646-1:1993 names and character numbers for use with SGML systems in "CHARSET", proceedings of the Davenport group in "DAVENPORT", various user-supplied programs in "DEMO", some public document type definitions in "DTD", some public entity sets in "ENTITIES", a draft set of answers to Frequently Asked Questions in the file "FAQ.0.0", the FORMAT package that handles SGML-to-LaTeX mapping in the directory "FORMAT", some information about the now approved ISO standard ISO 10744 Hypermedia/Time-based Structuring Language in "HyTime", the distribution images of the Interactive Authoring and Display System in "IADS", the Integrated Chameleon Architecture version 1.5 in "ICA", an enhanced version by James Clark of the ARC SGML parser in "SGMLS", some information about the SGML Users' Group Special Interest Group on Hypertext and Multimedia in "SIGhyper", archives of the Text Encoding Initiative in "TEI", Robin Cover's excellent SGML bibliography in the file "bibliography" and some useful Emacs-lisp functions in the directory "elisp". A recent addition is the file "standards" which contains a comprehensive list of ISO standards that users of SGML might find interesting, compiled by Heather Davenport (no relation to the Davenport group). The SGML archive is liberal in what it accepts. Requests for information on how to put things in the archive can be obtained by writing the archive maintainer at \<SGML.archive@ifi.uio.no>. The newsgroup archive is updated manually, after articles have been edited to obtain a consistent style of presentation. Excessive quoted text (e.g., signatures and cascades) is trimmed, and the article is reformatted. Spelling is corrected and capitalization is applied where missing or wrong, but no other changes are made. The edited articles are also available on a mailing list that helps people who are not able to receive or send USENET news access to the newsgroup. The mailing list service consists of two parts: a news-to-mail service that distributes archived articles, and a mail-to-news service that allows non-USENET users to post articles. The mail-to-news service has _one_ restriction on posting: if the article quotes something that has not appeared on the newsgroup, the article will not be posted immediately, but held for a few days to wait for the quoted article to appear. If this does not happen, the article is returned to the author to request that the appropriate copyright and/or "reprinted by permission" note is attached. USENET effectively removes your copyright to your articles, unless you take great pains to hang on to it, so text that is already copyright by someone else, or, worse, is private communication, must not be distributed. Other than this rule, everything will be posted, although I do reserve the right to discourage posting. Insisting that it be posted works, though. Since you read this article either on the mailing list or in the newsgroup, keep the following for friends who ask for information: To subscribe, send a note to \<comp-text-sgml-request@naggum.no>. You will be added to the mailing list if I can talk to a system that is willing to accept mail for you and which does not send error messages to wrong places. (This is not only embarrassing for me, but I won't know of the problem before several such messages have been sent.) To post an article, send it to \<comp-text-sgml@news.naggum.no>. If you reply to an article you have received from the mailing list, this will be automatic. The only thing I ask you to do is to keep the article number that appears in the Subject header. If you don't, it will take me more time to post the article, and this may cause delays. There are other FTP sites and possibly mirrors of my archive. I have not made any attempt to list them, since the information I have is at least six months old and may no longer be valid. Instead, I ask anybody who has an archive or FTP site to send me a note to that effect, so it can be included in the next revision of this message. If you look for the definition of SGML, there is only one place to go at present: Charles F. Goldfarb: The SGML Handbook; Oxford University Press, 1990. ISBN 0-19-853737-9. This book will set you back almost $100, but will also answer all your questions about the language itself. The rest may be in the archive. If you have Internet access, you are well advised to search the archive. Several issues have been discussed a few times over, and in the absence of any strong reason to want to stop further discussion, the archive material should be regarded as background material rather than authoritative answers, even though the authoritative answer may be found therein. No attempt has been made to point those articles out. A word of encouragement as well as caution: Everybody who is something in the SGML world, in particular the creator of the language and the ISO working group that defined SGML, as well as several vendors of outstanding SGML products, are listening to what you say in this newsgroup. If you have problems with the language or an implementation of it in a product, or with implementing it, and particularly if you have a suggestion for a solution to some, you may find that your article is quoted at ISO meetings and taken very seriously. Know that your voice is heard, and that your input is very much appreciated. In fact, the archive for this newsgroup is a source of information for the review process now taking place in the ISO working group. As of early October, the estimated readership of this newsgroup was 32,000 people. 79% of the USENET community receives the newsgroup on their system, and can read it if they want to. In the preceding month, 191 messages comprising 441K were posted to the newsgroup. Only 14% of the articles are crossposted to other newsgroups, which indicates a continued specific need for the newsgroup, although only 1% of the total USENET readership reads it. Through a probable coincidence, equally many readers subscribe to news.sysadmin (the guys who run this show), and slightly fewer read comp.mail.mime and comp.software.testing. I regard this as saying that our 1% of the population may be the 1% most interesting people on the Net. This newsgroup certainly has been known for the highest "signal to noise ratio" on USENET over its three years of existence, and this can only continue with the support of readers like you. Best regards, \</Erik> -- Erik Naggum \<erik@naggum.no> \<SGML@netcom.com> ISO 8879 SGML Chairman, SGML SIGhyper \<SGML.SIGhyper@naggum.no> ISO 10744 HyTime "Memento, terrigena. Memento, vita brevis." ISO 10646 UCS </message> <message id="<19931031.002@sfo.naggum.no>" date="2961118334"> Newsgroups: comp.text.sgml Date: 01 Nov 1993 04:52:14 UT From: Erik Naggum \<erik@naggum.no> Message-ID: <19931031.002@sfo.naggum.no> Subject: SGML and RELATED ISO STANDARDS The following list also exists as ftp.ifi.uio.no:/pub/SGML/standards. This file will be continually revised, and your assistance will be appreciated. Write to Heather Davenport \<davenpth@hmco.com> or me. My deepest thanks to Heather for doing this wonderful job. Best regards, \</Erik> -- Erik Naggum \<erik@naggum.no> \<SGML@netcom.com> ISO 8879 SGML Chairman, SGML SIGhyper \<SGML.SIGhyper@naggum.no> ISO 10744 HyTime "Memento, terrigena. Memento, vita brevis." ISO 10646 UCS =========================================================================== SGML and RELATED ISO STANDARDS 1993-10-29 Compiled by Heather Davenport \<davenpth@hmco.com> Edited by Erik Naggum \<erik@naggum.no> Introduction The following document contains a list of standards that relate to SGML, and others which may be pertinent to the application of SGML on a system. This list was compiled for personal purposes, but is posted here in hopes that it might be of use to others in the group. Errors and suggestions for entries should be reported to Heather Davenport at the above address. The prices quoted are those reported by ANSI over a period of time, and are meant to be an indication rather than an official quote from ANSI. Recently, ANSI has changed their pricing method and the standards' costs are no longer based on their page count. Therefore, the older "Page Count --> Cost" chart has not been included here. For current prices consult: American National Standards Institute 11 West 42nd St. New York, NY 10036 USA PH: 212-642-4900 FX: 212-302-1286 Acknowledgements and Sources of Information American National Standards Institute J. Smith: _SGML and Related Standards_ C. F. Goldfarb: _The SGML Handbook_ comp.std.internat FAQ Erik Naggum James Mason Kosta Kostis Michael Sperberg-McQueen Harry Gaylord Alan Melby Graphic Communications Association Global Engineering Documents Abbreviations ISO International Organization for Standardization IEC International Electrotechnical Commission IS International Standard TR Technical Report DIS Draft International Standard DTR Draft Technical Report CD Committee Draft PDTR Proposed Draft Technical Report DP Draft Proposal AD Addendum AM Amendment DAM Draft Amendment PDAM Proposed Draft Amendment (CD equiv for Amendments) COR Technical Corrigendum MP Multipart standard (indicates common information for all parts) NA not available Conventions A multipart standard is listed with the common information for all parts given first, with an "(MP)" identification after the reference number. Subsequent parts are listed with the title beginning with "Part", and the format of the reference number is nnnn-pp:yyyy, for standard nnnn, part pp, year yyyy. Singlepart standards use the reference number format nnnn:yyyy. Reference number Title or Part Title Edition Pages $US price-quote-date --------------------------------------------------------------------------- ISO 639 (MP) Code for the Representation of Names of Languages ISO 639:1988 Code for the Representation of Names of Languages Bilingual edition Ed. 1 17p. $46.00 1993-09-01 ISO CD 639-2 Part 2: Alpha-3 Code Ed. 1 (curr. 46 p.) NA 1993-04-30 \ ISO/IEC 646:1991 Information Technology -- ISO 7-bit Coded Character Set for Information Interchange Ed. 3 15p. $43.00 1993-09-01 ISO 2022:1986 Information Processing -- ISO 7-bit and 8-bit Coded Character Sets -- Code Extension Techniques Ed. 3 25p. $57.00 1993-09-01 ISO 3166:1988 Codes for the Representation of Names of Countries Ed. 3 54p. $84.00 1993-10-12 ISO/IEC 4873:1991 Information Technology -- ISO 8-bit Code for Information Interchange -- Structure and Rules for Implementation Ed. 2 19p. $51.00 1993-10-12 ISO 6937 (MP) Information Processing -- Coded Character Sets for Text Communication ISO 6937-1:1983 Part 1: General Introduction Ed. 1 12p. $35.00 1993-10-12 ISO 6937-2:1983 Part 2: Latin Alphabetic and Non-Alphabetic Graphic Characters Ed. 1 37p. $72.00 1993-10-12 AD 1:1989 Ed. 1 5p. $28.00 1993-10-12 ?? DIS 8211 \ \ ISO 8613-4:1989 Part 4: Document Profile Ed. ? ??p. $51.00 1993-10-12 ISO 8613-5:1989 Part 5: Office Document Interchange Format (ODIF) Ed. ? ??p. $105.00 1993-10-12 ISO 8613-6:1989 Part 6: Character Content Architectures Ed. ? ??p. $105.00 1993-10-12 ISO 8613-7:1989 Part 7: Raster Graphics Content Architectures Ed. ? ??p. $78.00 1993-10-12 ISO 8613-8:1989 Part 8: Geometric Graphics Content Architectures Ed. ? ??p. $84.00 1993-10-12 \ ISO/IEC 8613-10:1991 Part 10: Formal Specifications Ed. ? ??p. $95.00 1993-10-12 AM 1:1991 -- Formal Specification of the Document Profile Ed. ? ??p. $71.00 1993-10-12 AM 2:1991 -- Formal Specification of the Raster Graphics Content Architectures Ed. ? ??p. $46.00 1993-10-12 AM 3:1992 -- Formal Specification of the Character Content Architectures Ed. ? ??p. $69.00 1993-10-12 AM 4:1992 -- Formal Specification of the Geometric Graphics Content Architectures Ed. ? ??p. $62.00 1993-10-12 AM 5:1993 -- Formal Specification of the Defaulting Mechanism for Defaultable Attributes Ed. ? ??p. $95.00 1993-10-12 ISO/IEC 8632 (MP) Information Processing Systems -- Computer Graphics -- Metafile for the Storage and Transfer of Picture Description Information ISO/IEC 8632-1:1992 Part 1: Functional Specification Ed. 2 332p. $175.00 1993-10-12 DAM 1:1993 Ed. 1 93p. $116.00 1993-10-13 PDAM 2:1993 Ed. 1 ??p. $49.00 1993-10-13 DAM 3:1993 Ed. 1 ??p. $109.00 1993-10-13 ISO/IEC 8632-2:1992 Part 2: Character Encoding Ed. 2 88p. $105.00 1993-10-12 DAM 1:1993 Ed. 1 11p. $22.00 1993-10-13 PDAM 2:1993 Ed. 1 ??p. $22.00 1993-10-13 DAM 3:1993 Ed. 1 ??p. $52.00 1993-10-13 ISO/IEC 8632-3:1992 Part 3: Binary Encoding Ed. 2 71p. $98.00 1993-10-12 DAM 1:1993 Ed. 1 10p. $22.00 1993-10-13 PDAM 2:1993 Ed. 1 ??p. $22.00 1993-10-13 DAM 3:1993 Ed. 1 ??p. $46.00 1993-10-13 ISO/IEC 8632-4:1992 Part 4: Clear Text Encoding Ed. 2 56p. $84.00 1993-10-12 DAM 1:1993 Ed. 1 9p. $22.00 1993-10-13 PDAM 2:1993 Ed. 1 ??p. $18.00 1993-10-13 DAM 3:1993 Ed. 1 ??p. $39.00 1993-10-13 ISO 8859 (MP) Information Processing -- 8-bit Single Byte Coded Graphic Character Sets \ ISO 8859-1:1987 Part 1: Latin Alphabet No. 1 Ed. 1 7p. $31.00 1993-10-12 \ ISO 8859-2:1987 Part 2: Latin Alphabet No. 2 Ed. 1 6p. $28.00 1993-10-12 \ ISO 8859-3:1988 Part 3: Latin Alphabet No. 3 Ed. 1 5p. $28.00 1993-10-12 \ \ ISO 8859-4:1988 Part 4: Latin Alphabet No. 4 Ed. 1 5p. $28.00 1993-10-12 \ ISO/IEC 8859-5:1988 Part 5: Latin/Cyrillic Alphabet Ed. 1 5p. $28.00 1993-10-12 ISO 8859-6:1987 Part 6: Latin/Arabic Alphabet Ed. 1 5p. $28.00 1993-10-12 ISO 8859-7:1987 Part 7: Latin/Greek Alphabet Ed. 1 5p. $28.00 1993-10-12 ISO 8859-8:1988 Part 8: Latin/Hebrew Alphabet Ed. 1 5p. $28.00 1993-10-12 ISO/IEC 8859-9:1989 Part 9: Latin Alphabet No. 5 Ed. 1 5p. $28.00 1993-10-12 \ ISO/IEC 8859-10:1992 Part 10: Latin Alphabet No. 6 Ed. 1 15p. $39.00 1993-10-12 \ ISO 8879:1986 Information Processing -- Text and Office Systems -- Standard Generalized Markup Language (SGML) Ed. 1 155p. $128.00 1993-10-12 AM 1:1988 Ed. 1 15p. $43.00 1993-10-12 ISO 9069:1988 Information Processing -- SGML Support Facilities -- SGML Document Interchange Format (SDIF) Ed. 1 8p. $31.00 1993-10-12 ISO/IEC 9070:1991 Information Technology -- SGML Support Facilities -- Registration Procedures for Public Text Owner Identifiers Ed. 2 12p. $36.00 1993-10-12 PDAM 1:1993 Ed. 1 ??p. NA 1993-04-30 ISO/IEC 9281 (MP) Information Technology -- Picture Coding Methods ISO/IEC 9281-1:1990 Part 1: Identification Ed. 1 8p. $28.00 1993-10-12 ISO/IEC 9281-2:1990 Part 2: Procedure for Registration Ed. 1 4p. $22.00 1993-10-12 ISO/IEC 9282 (MP) Information Processing -- Coded Representation of Pictures ISO/IEC 9282-1:1988 Part 1: Encoding Principles for Picture Representation in a 7-bit or 8-bit Environment Ed. 1 23p. $48.00 1993-10-12 ISO/IEC 9282-2:1992 Part 2: Incremental Encoding of Point Lists in a 7-bit or 8-bit Environment Ed. 1 23p. $48.00 1993-10-12 ISO/IEC 9541 (MP) Information Technology -- Font Information Interchange ISO/IEC 9541-1:1991 Part 1: Architecture Ed. ? ??p. $105.00 1993-10-12 COR 1:1992 \ ISO/IEC 9541-2:1991 Part 2: Interchange Format Ed. ? ??p. $61.00 1993-10-12 ISO/IEC 9541-3:1992 Part 3: Glyph-Shape Representations Ed. 1 ??p. NA 1993-10-15 \ ISO/IEC DIS 9541-4 Part 4: Application Specific Extensions Ed. ? ??p. $22.00 1993-10-15 ISO TR 9544:1988 Information Processing -- Computer-Assisted Publishing -- Vocabulary Ed. ? ??p. $66.00 1993-04-30 \ ISO/IEC TR 9573 (MP) Information Processing -- SGML Support Facilities -- Techniques for Using SGML ISO/IEC TR 9573:1988 Information Processing -- SGML Support Facilities -- Techniques for Using SGML Ed. 1 124p. $120.00 1993-10-12 ISO/IEC TR 9573-11: Part 11: Application at ISO Central Secretariat for 1992 International Standards and Technical Reports Ed. 1 73p. $79.00 1993-10-12 ISO/IEC TR 9573-13: Part 13: Public Entity Sets for Mathematics and 1991 Science Ed. 1 82p. $87.00 1993-10-12 ISO/IEC PDTR 9573-15: Part 15: Public Entities for Non-Latin Based Alphabets 1993 Ed. 1 ??p. NA 1993-10-29 ISO/IEC 10036:1993 Information Technology -- Font Information Interchange -- Procedure for Registration of Glyph and Glyph Collection Identifiers Ed. 1 13p. $36.00 1993-10-12 ISO/IEC TR 10037:1991 Information Technology -- SGML and Text-entry Systems -- Guidelines for SGML Syntax-Directed Editing Systems Ed. 1 11p. $36.00 1993-10-12 ISO/IEC 10175 (MP) Information Technology -- Text and Office Systems -- Document Printing Application (DPA) [for the Representation of Print Services Incorporating an SPDL Presentation Process] ISO/IEC DIS 10175-1 Part 1: Abstract Service Definition and Procedures Ed. 1 ??p. $168.00 1993-10-12 ISO/IEC DIS 10175-2 Part 2: Protocol Specification Ed. 1 ??p. $39.00 1993-10-12 ISO DIS 10179 Information Technology -- Text and Office Systems -- Document Style Semantics and Specification Language (DSSSL) \ Ed. 1 ??p. $109.00 1993-10-29 ISO/IEC DIS 10180 Information Technology -- Text Composition -- Standard Page Description Language (SPDL) \ Ed. 1 ??p. $200.00 1993-10-29 ISO/IEC TR 10183 (MP) Information Technology -- Text and Office Systems -- Office Document Architecture and Interchange Format -- Testing Methodology and Abstract Test Cases ISO/IEC DTR 10183-1 Part 1: Implementation Testing Methodology Ed. 1 ??p. $24.00 1993-04-30 ISO/IEC PDTR 10183-2 Part 2: ?? Ed. 1 ??p. $54.00 1993-04-30 ISO/IEC 10646 (MP) Information technology -- Universal Multiple-Octet Coded Character Set (UCS) ISO/IEC 10646-1:1993 Part 1: Architecture and Basic Multilingual Plane Ed. 1 754p. $325.00 1993-08-09 \ ISO/IEC CD 10743:1991 Information Technology -- Standard Music Description Language (SMDL) Ed. 1 ??p. $69.00 1993-04-30 ISO/IEC 10744:1992 Information Technology -- Hypermedia/Time-based Structuring Language (HyTime) Ed. 1 125p. $120.00 1993-10-12 ISO/IEC DIS 10918 (MP) Information Technology -- Digital Compression and Coding of Continuous-Tone Still Images ISO/IEC DIS 10918-1 Part 1: Requirements and Guidelines Ed. 1 ??p. $125.00 1993-10-12 ISO/IEC CD 10918-2 Part 2: Compliance Testing Ed. 1 ??p. $82.00 1993-10-12 ISO/IEC 11172 (MP) Information Technology -- Coding of Moving Pictures and Associated Audio for Digital Storage Media at Up To About 1.5 MBits/S ISO/IEC 11172-1:1993 Part 1: Systems Ed. ? ??p. $76.00 1993-10-12 ISO/IEC 11172-2:1993 Part 2: Video Ed. ? ??p. $102.00 1993-10-12 ISO/IEC 11172-3:1993 Part 2: Audio Ed. ? ??p. $109.00 1993-10-12 ISO/IEC DIS 11544 Information Technology -- Coded Representation of Picture and Audio Information -- Progressive Bi-Level Image Compression Ed. 1 ??p. $89.00 1993-10-12 ISO PDTR 11585:1992 Information Technology -- Text and Office Systems -- Operational Model for Document Description and Processing Languages Ed. 1 ??p. NA 1993-04-30 ISO/IEC DIS 13673:1993 Information Technology -- Text and Office Systems -- Conformance Testing for Standard Generalized Markup Language (SGML) Systems Ed. 1 ??p. ?? 1993-08-18 \ </message> <message id="<2b0mlo$1qa@figment.dircon.co.uk>" date="2961157624"> Newsgroups: comp.text.sgml Date: 01 Nov 1993 15:47:04 UT From: "Bruce Hunter" \<bruce@sgml.dircon.co.uk> Organization: SGML Systems Engineering Message-ID: <2b0mlo$1qa@figment.dircon.co.uk> Subject: SGML tools for Folio Views 3.0 I have uploaded two files to ftp.ifi.uio.no in /pub/SGML/DEMO. These are fv3sgml.zip and readme.fv3. The text of readme.fv3 is included below. These are public domain tools for anyone to try. Any comments or suggestions gratefully received. Best wishes, Bruce Hunter SGML Systems Engineering bruce@sgml.dircon.co.uk \ These tools (with the noted exception of SGMLS - a superb product courtesy of James Clark) have been produced by SGML Systems Engineering to aid in the processes of importing SGML documents into Folio Views 3.0 (FV) and exporting documents from existing infobases as valid SGML instances. These tools are the first in a line of what will hopefully evolve into a suite of tools to provide generalised SGML import/export facilities to/from FV. They should not be regarded at present as offering such generalised functionality, as this was not their design aim. They are offered at present as public domain tools. There are no licence terms or requirements for the tools produced by SGML Systems Engineering. Basically, you may do with them what you want, but I accept no responsibilty for anything! The file LICENCE contains the licence conditions for SGMLS. There is no guarantee that future tools will also be offered in the public domain, or, indeed, that they will ever actually materialise. FILELIST When you unzip the file fv3sgml.zip you should find the following files : flatfile.dtd - a dtd expressing the semantics of a FV 3.0 flatfile views30.ent - entity declarations readme - this file concepts - explains a little of the concepts behind these tools instal - installation instructions and example usage licence - licence conditions for SGMLS sgmls.exe - DOS executable of SGMLS 1.1 (public domain SGML parser) sgmls.doc - documentation file for SGMLS 1.1 fff2sgml.exe - produces an SGML file conforming to the flatfile DTD from a FV 3.0 flatfile fff2sgml.doc - documentation file for fff2sgml cleansgm.exe - cleans up SGML files produced by fff2sgml cleansgm.doc - documentation file for cleansgm sgml2fff.exe - produces a FV 3.0 flatfile from an SGML file conforming to the flatfile DTD sgml2fff.doc - documentation file for sgml2fff CURRENT STATUS These tools are very much, at present, in the testing stage. They have been tested exhaustively on the infobases I have been able to obtain, and perform correctly on these. However, because of the seeming variability between the Folio documentation and what may actually be contained in a flatfile, no guarantees are offered. It is recommended that all data files are first backed-up and archived, and that the tools be run at present only on a standalone PC. Because of the very basic error trapping at present, and the extensive use of pointers, if anything unforseen is encountered in a flatfile, the tools fff2sgml, cleansgm and sgml2fff may do one of four things : a) if the problem is correctable, simply report it in the error file and continue b) stop with a FATAL ERROR message c) loop infinitely (it is recommended that the line counter is left on so that this condition may be detected by seeing the line counter stop) d) hang the machine or, in some really terminal cases, reboot the machine The next release will contain more extensive error-handling capabilities which should alleviate the worst of these problems. How extensive these are will depend a lot upon the feedback obtained from people trying these current tools. So please, whatever your experiences with these, let me know. All bugs, queries or suggestions (all feedback appreciated) should be directed to Bruce Hunter via either Internet (bruce@sgml.dircon.co.uk) or Compuserve (100117,1357). </message> <message id="<1993Oct31.204028.7402@stats.govt.nz>" date="2961175228"> Newsgroups: comp.text.sgml Date: 01 Nov 1993 20:40:28 UT From: "Gary Houston" \<ghouston@stats.govt.nz> Organization: Statistics New Zealand Message-ID: <1993Oct31.204028.7402@stats.govt.nz> References: \<saucc00.751080631@roselin> \<schwartz.751733066@lead17> <1993Oct27.230434.23747@news.cs.brandeis.edu> \<CFM2Au.39n@dove.nist.gov> Subject: Re: SGML and TeX [Bob Bagwill] | Gary Houston's \<ghouston@stats.govt.nz> gf-0.39, which uses sgmls | version 1.1, is pretty nice: Be careful however, it remains flawed. For example, my definition of DTD is still incorrect. I think it should be: SGML application = document type definition = DTD DTD = formal document type declaration + semantic and/or processing specifications. So to refer to a DTD and its documentation is strictly wrong (sigh). p.s. I suggest looking at the following for the SGML->TeX problem: C C++ perl Tcl 100 lisp dialects awk yacc TeX text mapping, e.g., ASP specialized / proprietary languages Choose the language you prefer. About the only things that haven't (I think) been tried yet are Fortran and COBOL. Perhaps the DSSSL will be useful for building converters, or is it for some other purpose entirely? In any case converting SGML to TeX will always be a complex problem, for non-trivial DTDs. </message> <message id="<1993Oct31.211232.7644@stats.govt.nz>" date="2961177152"> Newsgroups: comp.text.sgml Date: 01 Nov 1993 21:12:32 UT From: "Gary Houston" \<ghouston@stats.govt.nz> Organization: Statistics New Zealand Message-ID: <1993Oct31.211232.7644@stats.govt.nz> References: \<saucc00.751080631@roselin> <1993Oct27.230434.23747@news.cs.brandeis.edu> \<CFM2Au.39n@dove.nist.gov> <1993Oct31.204028.7402@stats.govt.nz> Subject: Re: SGML and TeX [Gary Houston] | C | C++ | perl | Tcl | 100 lisp dialects | awk | yacc | TeX | text mapping, e.g., ASP | specialized / proprietary languages Oops, forgot REXX of course. I know REXX, it runs on the CMS system here. I haven't tried to write a REXX TeX converter however (nor output to DCF). Gary </message> <message id="<HT.93Oct31230434@barclay.cogsci.ed.ac.uk>" date="2961183874"> Newsgroups: comp.text.sgml Date: 01 Nov 1993 23:04:34 UT From: "Henry S. Thompson" \<ht@cogsci.ed.ac.uk> Organization: HCRC, University of Edinburgh Message-ID: \<HT.93Oct31230434@barclay.cogsci.ed.ac.uk> References: <22831.smithn@orvb.saic.com> Subject: Re: RAST2SGM I'll join the throng -- I've taken a different line, taking the PERL script schema sgml.pl included in the sgmls distribution and instantiating it as a normaliser. Plus points: you can declare various things to control attribute value printing (e.g., default values, presentation order); you can provide explicit control over where blank lines will appear; you can provide reverse general entity definitions. Minus points: Various features of sgmls output are not (yet) handled; all begin and end tags appear on separate lines. If anyone is interested in developing this further and turning it into a more generally useful tool, let me know and I'll gladly provide you with the code as it now stands. ht -- Henry Thompson, Human Communication Research Centre, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 31 650-4440 Fax: (44) 31 650-4587 ARPA: ht@cogsci.ed.ac.uk JANET: ht@uk.ac.ed.cogsci UUCP: ...!uunet!mcsun!uknet!cogsci!ht </message>