STG Logo Scholarly Technology Group

XML Validator Frequently Asked Questions List

This page lists frequently asked questions regarding STG's XML Validator. If you have a question, check to see if it is answered here before firing off e-mail to STG.

Your Validator is Broken, Why?

Because it's a beta test version. If you run into a problem, we'd actually be very grateful if you'd send us a bug report.

How Do I Report a Possible Bug?

First, check to be sure the "bug" isn't discussed below. If it isn't, create a short XML file (preferably standalone) that illustrates the problem; then mail it to us at STG. We'll get back to you. If you can't illustrate the problem with a single XML file, feel free to send him a .zip or .tar archive.

Why am I Getting So Many Error Messages?

Although it's difficult to answer this question without seeing the actual XML document that is being validated, experience has shown that this question most often arises when someone attempts to validate a document that lacks a document type definition (DTD). Any XML document that lacks a DTD is, by definition, invalid, and may trigger a cascade of error messages.

(Of course, the other typical reason that people get a lot of error messages is that the document being validated has a lot of errors.)

See also the next two FAQs.

Why am I Getting So Many Warning Messages?

STG's validator follows the XML 1.0 specification pretty closely, providing a wide assortment of warnings about problems, both potential and actual, that most other validators ignore. Most of these messages have to do with XML - SGML compatibility and interoperability issues.

Here are some sample warning messages, with explanations of what they mean, and why you may (or may not) want to pay attention to them:

built-in entity not redeclared according to the spec
The entities <, >, ", ', and & are built-in. They are predefined, that is, by the XML parser. If you declare them yourself, you need to be very careful (see the 1.0 spec, section 4.6). Typically it's better not to bother with them, unless you are using a lot of legacy SGML software.
discarding apparent old-style SGML comment
You're forgetting that this is XML. Comments can't be stuck inside just any markup. You must place them inside special comment delimiters, <!-- and -->.
element has more than one attlist declaration
For interoperability with SGML software, an XML processor may issue a warning when more than one attlist declaration is provided for a single element type, or more than one attribute definition is provided for a given attribute. Ignore this warning if interoperabilty with SGML is not a concern. Otherwise, if possible, try to gather your ATTLIST declarations together into a single declaration.
empty-tag syntax used for element not declared with EMPTY content model
To facilitate interoperability with SGML software, the XML 1.0 specification says that elements using the special XML empty-element syntax (e.g., <HR/>) should be declared explicitly as EMPTY in the DTD. If you're not using SGML software, ignore this warning.
value appears in multiple enumerations for attributes of one element
The same token should not occur more than once in the enumerated attribute types of a single element (e.g., <!ATTLIST employee exempt (true | false) "false" citizen (true | false) "true">). In SGML, this was not allowed. So if you want to interoperate with SGML software, make sure you don't do it.

Why Do I Get Error Messages about Entities I'm Not Even Using?

The short answer here is that STG's validator goes a bit above and beyond what the specification actually calls for in the way of validation.

The longer answer follows.

Most validating XML parsers validate XML documents as part of a more general process (e.g., readying them for manipulation and/or display). That is, they aren't there simply to flag errors. STG's parser/validator on the other hand, does little else. Our validator, in other words, has as its primary purpose to flag errors, and to help you locate potential problems in your XML.

As a result, our validator can be far more aggressive than it strictly needs to be. In particular, it can resolve and/or process all entities declared in your DTD. If it finds errors that may pose problems down the road, it will flag them - even if you don't happen to use the entity in question in the document you are validating.

The idea here is to help designers avoid half-baked DTDs that seem to work fine with some documents, but suddenly start producing unexpected errors when used on documents that happen to make use of invalid entities that were lurking unused in the DTD.

Can I Run the Validator Locally?

Yes. But to do so you may need to compile the back-end parser from source yourself and install it. The source code for the parser is available at STG's website.

Note that this software is still in beta testing, and will doubtless contain many bugs. Please let us know if you find one, preferably giving us enough information to reproduce it (e.g., your OS version, parser version, and sample XML input).

Why Am I Getting Ambiguous Content Model Errors?

You are getting ambiguous content model errors because at least one of your content models is nondeterministic (in SGML terms, "ambiguous"). In essence what this means is that the content model(s) in question can match identical XML element sequences in more than one way.

STG's XML validation system aggressively reports such ambiguities not only because the specification says it should (appendix D)., but also because XML software strives for simplicity and consistency. If you give XML software an element stream that can be processed in several different ways, it will normally select just one of those ways (probably not even telling you what it's done), and then continue processing. This situation can lead to confusion, especially when you aren't aware that there were any ambiguities in the first place.

The most frequent cause of ambiguous content models is the use of patterns like ((a, b?) | a). Take, for example, the following DTD fragment:

<!ELEMENT postalcode (#PCDATA)>
<!ELEMENT postalcode_extension (#PCDATA)>

(In the United States, a postal code consists of five digits plus an optional extension.) Although old SGML hands rarely make such mistakes, one often sees XML DTDs containing expressions that, in this instance, would reduce to:

((postalcode, postalcode_extension?) | postalcode)

When an XML processor, having internalized the above content-model fragment, sees an actual document instance containing a basic five-digit postal code

<postalcode>02912</postalcode>

it has no idea whether to process this postal code as an instance of a postal code plus a null extension, or as a complete postal code in and of itself. That is, it has no idea whether to treat it as an instance of (postalcode, postalcode_extension?) or of (postalcode).

If you find you are getting ambiguous content model errors, check for situations like the above, where the same XML text could match your content model in multiple ways.

If you aren't concerned about such problems, feel free to turn off warning messages altogether using the checkbox provided on the main validation form.

Why Can't I Validate XML Files with Local DTDs?

The reason why STG's validator cannot validate XML document instances against local DTDs (e.g., DTDs on your local hard drive) is that it must be able resolve and fetch over the network any external entities it needs to in order to process your document. For it to resolve and fetch arbitrary local files on people's hard drives, everyone would need to offer our validation system access to their local filesystems.

Needless to say, this sort of access (if a reasonable way could be found to offer it) would present an unacceptable security risk.

If you want our validator to be able to find your DTDs, therefore, you must place them in a public directory on a webserver you have access to, and change your system identifiers to point to the relevant URIs.

If you have no access to a webserver, or if you are working on private DTDs and files, see above on compiling the parser locally.